Exam Notes
Google Cloud and Assignment 3
- Trends putting pressure on conventional computing centers:
- Explosive growth in applications.
- Extreme scale content generation.
- Extraordinary rate of digital content consumption.
- Exponential growth in compute capabilities.
- Very short cycle of obsolescence in technologies.
- Newer architectures.
- Cloud Computing Definition:
- Using a network of remote servers to store, manage, and process data.
- Internet-based computing providing shared resources on-demand, like the electricity grid.
- Other names for cloud computing: On-demand computing, utility computing, ubiquitous computing, autonomic computing, platform computing, edge computing, elastic computing, grid computing
- Platform: Integrated and networked hardware, software, and Internet infrastructure.
- Cloud Computing platform functions:
- Provides hardware, software, and networking services via the Internet.
- Hides complexity of underlying infrastructure with simple interface/API.
- Offers on-demand, 24/7 services.
- Pay-as-you-go, elastic services.
- Hardware and software services available to everyone.
- Benefits of cloud computing:
- Saves capital and operational investment with pay-as-you-go model.
- Enables companies to be infrastructure-less.
- Virtualization: Creation of a virtual version of something (OS, server, etc.).
- Advantages of virtual machines:
- Run OS where hardware is unavailable.
- Easier to create and backup machines.
- Software testing with clean OS installs.
- Emulate more machines than physically available.
- Timeshare lightly loaded systems.
- Debug problems (suspend and resume).
- Easy migration.
- Run legacy systems.
- Hypervisor/VMM: Software, firmware, or hardware that creates and runs virtual machines.
- Host vs. Guest Machine:
- Host: Computer running the hypervisor.
- Guest: Each virtual machine.
- Hypervisor Vendors: VMware ESX Server, Microsoft Windows Hyper-V, Oracle VM Virtual Box, Xen Project
- Virtual Workspace: Abstraction of an execution environment dynamically available to clients.
- Full-virtualization vs. Para-virtualization:
- Full: Runs an unmodified guest OS.
- Para: Modifies the guest OS for performance gains, requires modified kernels.
- Top-5 cloud-computing vendors: Microsoft, Amazon, IBM, Oracle, Google Cloud, Alibaba
Map/Reduce
- MapReduce: Methodology for exploiting parallelism in computing clouds.
- Developed by Google to process large amounts of data quickly.
- Motivation: Need to manage immense data quickly, exploit parallelism in regular data.
- Examples beyond search: Dish network (click data), Tesla (car usage).
- Search engine uses: index building, article clustering, spam detection
- Apache Hadoop: Open-source implementation of MapReduce in Java for distributed storage and processing on commodity hardware.
- Hadoop parts: Hadoop Distributed File System (HDFS) and MapReduce.
- GFS, HDFS, and CloudStore:
- Common: Distributed files, rarely updated, often read/appended, divided into chunks and replicated.
- GFS: Google File System.
- HDFS: written in Java.
- CloudStore: C++ implementation of GFS.
- Parallelization issues:
- Assigning work units to worker threads.
- Handling more work units than threads.
- Aggregating/combining results.
- Knowing when all workers have finished.
- Dividing work into separate tasks.
- How MapReduce solves parallelization issues:
- Automatic parallelization & distribution.
- Fault tolerance.
- I/O scheduling.
- Monitoring & status updates.
- Typical MapReduce Architecture:
- Racks of CPUs with 16-64 nodes.
- Nodes connected by gigabyte Ethernet within a rack.
- Racks connected by a switch (2-10 Gbps).
- Cluster Computing: Collection of compute nodes on racks connected by switches.
- Inter-rack bandwidth is generally faster than intra-rack bandwidth.
- Dealing with failure in DFS:
- Redundant file storage.
- Computations divided into restartable tasks.
- Constant machine pinging.
- MapReduce is a programming model borrowed from Lisp.
- Paradigm of Map/Reduce:
- Records are broken into segments
- Map: extract something of interest from each segment
- Group and sort intermediate results from each segment
- Reduce: aggregate intermediate results
- Generate final output
- To use MapReduce, you must write Map and Reduce functions.
- MapReduce computation process:
- Map tasks process chunks into key-value pairs.
- Master controller collects pairs, sorts by key, divides among Reduce tasks
- Reduce tasks combine values for each key.
- Master controller in MapReduce:
- Knows the number of Reduce tasks (r).
- Picks a hash function, assigns keys to buckets (0 to r-1).
- Merges files from Map tasks for each Reduce task, feeds to process.
- Reduce function:
- Takes key and list of values, combines them, generally associative and commutative.
- The Reduce function output is a sequence of key-value pairs consisting of each input key k paired with the combined value.
- Outputs from all Reduce tasks are merged into a single file.
- MapReduce Process:
- User program forks a Master controller and Worker processes.
- Worker handles Map/Reduce tasks.
- States of a map or reduce task: idle, executing, completed.
- Map task puts files for reduce task on local disk of Worker; Master knows locations.
- Fault tolerance strategy of MapReduce:
- Task crashes: retry on another node.
- Repeated failures: fail job or ignore input block.
- Node crashes: relaunch tasks on other nodes.
- Slow task (straggler): launch second copy.
- Coping with node failure in MapReduce:
- Master fails: restart entire job.
- Map worker fails: Master sets status to idle, re-schedules, informs Reduce tasks.
- Reduce worker fails: Master sets status to idle, re-schedules.
- Example uses of MapReduce: distributed grep, distributed sort, web link-graph reversal, web access log stats, inverted index construction, document clustering, statistical machine translation
Search Engine Advertising
- Earliest form of online ads: Banner Advertising
- Types of online advertising: Banner Advertising, Pay-per-click Advertising, Website Advertising, Affiliate Marketing, Social Media Marketing
- Search Engine Advertising Revenue:
- Google earned 100 billion in 2017
- Yahoo earned 4.6 billion in 2017
- Bing earned 1.8 billion in 2017
- Organic search results produced by algorithm, no guarantees of high ranking.
- Search engine optimizers(SEO) focus on developing and refining a company’s online presence
- Search engine optimization involves:
- Making pages show up higher in search engine’s organic results
- Optimizing content to target certain keyword phrases
- Developing web page content that responds to each seeker’s interests
- Many search engines use pay-per-click (PPC) model for advertising.
- Adwords: Google's program for pay-per-click ads with keyword auction.
- Mapping keyword phrases on website:
- Use Key Phrases in the content on your page
- Develop meta data with Key Phrases
- Name directories, files and images with the same key words or phrases
- Keyword matching options in Adwords: Broad Match, Exact Match, Phrase Match, Negative Keyword Match
- Default keyword matching in Adwords: Broad matching.
- Broad match features:
- Results will show for expanded matches including synonyms and plurals, related terms and variations even if you don’t include these terms in your keyword list.
- Disadvantage of broad match:
- Often less targeted than exact or phrase matches.
- Exact match rule in Adwords:
- The search query must exactly match your keyword, but Google allows rewording and reordering, ignoring function words, conjunctions, articles and other words that don’t impact the intent of the query
- Phrase Match rule in Adwords:
- Your ad appears when users search on the exact phrase, and when their search contains additional terms, as long as the keyword phrase is in exactly the same order.
- Negative keyword rule in Adwords:
- Negative keywords allow you to eliminate searches that you know are not related to your message.
- Ad Server Functionality:
- Basic: Uploading creatives, maintaining business rules, targeting ads, optimizing appearance.
- Advanced: Frequency capping, sequencing, excluding competitors, roadblocks, behavioral targeting.
- Advertisers can define criteria in Google AdWords, Like time display: from 9:00AM-5:00PM EST, once/day
- Bidders specify:
- Search terms that trigger bid
- Bid amount for each term
- Overall ad budget, bid limits
- How Google ranks bidders in AdWords:
- Estimates click-through rate.
- Ads with low click-through rates are not displayed.
- Ranks by multiplying click-through-rate and bid amount.
- Google is paid only when an ad gets clicked. the price it receives is the smallest price the bidder could have bid to get its ranking
- Position of advertising in AdWords determined by Ad Rank.
- Ad rank = Bid * Click Probability
- Determination of actual cost per click (CPC) of ads in AdWords:
- Determined by ad rank of the next highest ad below you
- Exception: if your the only bidder or lowest bid, you pay your maximum bid
- CPC = The ad rank of the person below you / Your Quality Score + $0.01
- Click through rates influence bidders in AdWords; AdWords penalizes advertisers with low quality scores and Conversely, those with high Quality Scores get higher ad ranks and lower CPC
- Average cost per click on AdWords: roughly 2.32 on the search network and 0.58 on the display network
- Probability of clicking thorough may depend on :Historical click performance of the ad, Landing page quality, Relevance to the user, User click through rates
- Landing pages are the pages that appear when a user clicks on your ad
- AdSense from Google is a service for placing Google ads on web pages.
- Kinds of ads Google offer in AdSense:
- Cost per Thousand displays, CPM, Cost per Engagement,
- Google will assign