Note
0.0(0)
MR

Exam Notes

Google Cloud and Assignment 3

  • Trends putting pressure on conventional computing centers:
    • Explosive growth in applications.
    • Extreme scale content generation.
    • Extraordinary rate of digital content consumption.
    • Exponential growth in compute capabilities.
    • Very short cycle of obsolescence in technologies.
    • Newer architectures.
  • Cloud Computing Definition:
    • Using a network of remote servers to store, manage, and process data.
    • Internet-based computing providing shared resources on-demand, like the electricity grid.
  • Other names for cloud computing: On-demand computing, utility computing, ubiquitous computing, autonomic computing, platform computing, edge computing, elastic computing, grid computing
  • Platform: Integrated and networked hardware, software, and Internet infrastructure.
  • Cloud Computing platform functions:
    • Provides hardware, software, and networking services via the Internet.
    • Hides complexity of underlying infrastructure with simple interface/API.
    • Offers on-demand, 24/7 services.
    • Pay-as-you-go, elastic services.
    • Hardware and software services available to everyone.
  • Benefits of cloud computing:
    • Saves capital and operational investment with pay-as-you-go model.
    • Enables companies to be infrastructure-less.
  • Virtualization: Creation of a virtual version of something (OS, server, etc.).
  • Advantages of virtual machines:
    • Run OS where hardware is unavailable.
    • Easier to create and backup machines.
    • Software testing with clean OS installs.
    • Emulate more machines than physically available.
    • Timeshare lightly loaded systems.
    • Debug problems (suspend and resume).
    • Easy migration.
    • Run legacy systems.
  • Hypervisor/VMM: Software, firmware, or hardware that creates and runs virtual machines.
  • Host vs. Guest Machine:
    • Host: Computer running the hypervisor.
    • Guest: Each virtual machine.
  • Hypervisor Vendors: VMware ESX Server, Microsoft Windows Hyper-V, Oracle VM Virtual Box, Xen Project
  • Virtual Workspace: Abstraction of an execution environment dynamically available to clients.
  • Full-virtualization vs. Para-virtualization:
    • Full: Runs an unmodified guest OS.
    • Para: Modifies the guest OS for performance gains, requires modified kernels.
  • Top-5 cloud-computing vendors: Microsoft, Amazon, IBM, Oracle, Google Cloud, Alibaba

Map/Reduce

  • MapReduce: Methodology for exploiting parallelism in computing clouds.
  • Developed by Google to process large amounts of data quickly.
  • Motivation: Need to manage immense data quickly, exploit parallelism in regular data.
  • Examples beyond search: Dish network (click data), Tesla (car usage).
  • Search engine uses: index building, article clustering, spam detection
  • Apache Hadoop: Open-source implementation of MapReduce in Java for distributed storage and processing on commodity hardware.
  • Hadoop parts: Hadoop Distributed File System (HDFS) and MapReduce.
  • GFS, HDFS, and CloudStore:
    • Common: Distributed files, rarely updated, often read/appended, divided into chunks and replicated.
    • GFS: Google File System.
    • HDFS: written in Java.
    • CloudStore: C++ implementation of GFS.
  • Parallelization issues:
    • Assigning work units to worker threads.
    • Handling more work units than threads.
    • Aggregating/combining results.
    • Knowing when all workers have finished.
    • Dividing work into separate tasks.
  • How MapReduce solves parallelization issues:
    • Automatic parallelization & distribution.
    • Fault tolerance.
    • I/O scheduling.
    • Monitoring & status updates.
  • Typical MapReduce Architecture:
    • Racks of CPUs with 16-64 nodes.
    • Nodes connected by gigabyte Ethernet within a rack.
    • Racks connected by a switch (2-10 Gbps).
  • Cluster Computing: Collection of compute nodes on racks connected by switches.
  • Inter-rack bandwidth is generally faster than intra-rack bandwidth.
  • Dealing with failure in DFS:
    • Redundant file storage.
    • Computations divided into restartable tasks.
    • Constant machine pinging.
  • MapReduce is a programming model borrowed from Lisp.
  • Paradigm of Map/Reduce:
    • Records are broken into segments
    • Map: extract something of interest from each segment
    • Group and sort intermediate results from each segment
    • Reduce: aggregate intermediate results
    • Generate final output
  • To use MapReduce, you must write Map and Reduce functions.
  • MapReduce computation process:
    • Map tasks process chunks into key-value pairs.
    • Master controller collects pairs, sorts by key, divides among Reduce tasks
    • Reduce tasks combine values for each key.
  • Master controller in MapReduce:
    • Knows the number of Reduce tasks (r).
    • Picks a hash function, assigns keys to buckets (0 to r-1).
    • Merges files from Map tasks for each Reduce task, feeds to process.
  • Reduce function:
    • Takes key and list of values, combines them, generally associative and commutative.
    • The Reduce function output is a sequence of key-value pairs consisting of each input key k paired with the combined value.
    • Outputs from all Reduce tasks are merged into a single file.
  • MapReduce Process:
    • User program forks a Master controller and Worker processes.
    • Worker handles Map/Reduce tasks.
  • States of a map or reduce task: idle, executing, completed.
  • Map task puts files for reduce task on local disk of Worker; Master knows locations.
  • Fault tolerance strategy of MapReduce:
    • Task crashes: retry on another node.
    • Repeated failures: fail job or ignore input block.
    • Node crashes: relaunch tasks on other nodes.
    • Slow task (straggler): launch second copy.
  • Coping with node failure in MapReduce:
    • Master fails: restart entire job.
    • Map worker fails: Master sets status to idle, re-schedules, informs Reduce tasks.
    • Reduce worker fails: Master sets status to idle, re-schedules.
  • Example uses of MapReduce: distributed grep, distributed sort, web link-graph reversal, web access log stats, inverted index construction, document clustering, statistical machine translation

Search Engine Advertising

  • Earliest form of online ads: Banner Advertising
  • Types of online advertising: Banner Advertising, Pay-per-click Advertising, Website Advertising, Affiliate Marketing, Social Media Marketing
  • Search Engine Advertising Revenue:
    • Google earned 100 billion in 2017
    • Yahoo earned 4.6 billion in 2017
    • Bing earned 1.8 billion in 2017
  • Organic search results produced by algorithm, no guarantees of high ranking.
  • Search engine optimizers(SEO) focus on developing and refining a company’s online presence
  • Search engine optimization involves:
    • Making pages show up higher in search engine’s organic results
    • Optimizing content to target certain keyword phrases
    • Developing web page content that responds to each seeker’s interests
  • Many search engines use pay-per-click (PPC) model for advertising.
  • Adwords: Google's program for pay-per-click ads with keyword auction.
  • Mapping keyword phrases on website:
    • Use Key Phrases in the content on your page
    • Develop meta data with Key Phrases
    • Name directories, files and images with the same key words or phrases
  • Keyword matching options in Adwords: Broad Match, Exact Match, Phrase Match, Negative Keyword Match
  • Default keyword matching in Adwords: Broad matching.
  • Broad match features:
    • Results will show for expanded matches including synonyms and plurals, related terms and variations even if you don’t include these terms in your keyword list.
  • Disadvantage of broad match:
    • Often less targeted than exact or phrase matches.
  • Exact match rule in Adwords:
    • The search query must exactly match your keyword, but Google allows rewording and reordering, ignoring function words, conjunctions, articles and other words that don’t impact the intent of the query
  • Phrase Match rule in Adwords:
    • Your ad appears when users search on the exact phrase, and when their search contains additional terms, as long as the keyword phrase is in exactly the same order.
  • Negative keyword rule in Adwords:
    • Negative keywords allow you to eliminate searches that you know are not related to your message.
  • Ad Server Functionality:
    • Basic: Uploading creatives, maintaining business rules, targeting ads, optimizing appearance.
    • Advanced: Frequency capping, sequencing, excluding competitors, roadblocks, behavioral targeting.
  • Advertisers can define criteria in Google AdWords, Like time display: from 9:00AM-5:00PM EST, once/day
  • Bidders specify:
    • Search terms that trigger bid
    • Bid amount for each term
    • Overall ad budget, bid limits
  • How Google ranks bidders in AdWords:
    • Estimates click-through rate.
    • Ads with low click-through rates are not displayed.
    • Ranks by multiplying click-through-rate and bid amount.
  • Google is paid only when an ad gets clicked. the price it receives is the smallest price the bidder could have bid to get its ranking
  • Position of advertising in AdWords determined by Ad Rank.
    • Ad rank = Bid * Click Probability
  • Determination of actual cost per click (CPC) of ads in AdWords:
    • Determined by ad rank of the next highest ad below you
    • Exception: if your the only bidder or lowest bid, you pay your maximum bid
    • CPC = The ad rank of the person below you / Your Quality Score + $0.01
  • Click through rates influence bidders in AdWords; AdWords penalizes advertisers with low quality scores and Conversely, those with high Quality Scores get higher ad ranks and lower CPC
  • Average cost per click on AdWords: roughly 2.32 on the search network and 0.58 on the display network
  • Probability of clicking thorough may depend on :Historical click performance of the ad, Landing page quality, Relevance to the user, User click through rates
  • Landing pages are the pages that appear when a user clicks on your ad
  • AdSense from Google is a service for placing Google ads on web pages.
  • Kinds of ads Google offer in AdSense:
    • Cost per Thousand displays, CPM, Cost per Engagement,
  • Google will assign
Note
0.0(0)