Platform: Integrated and networked hardware, software, and Internet infrastructure.
Cloud Computing platform functions:
Provides hardware, software, and networking services via the Internet.
Hides complexity of underlying infrastructure with simple interface/API.
Offers on-demand, 24/7 services.
Pay-as-you-go, elastic services.
Hardware and software services available to everyone.
Benefits of cloud computing:
Saves capital and operational investment with pay-as-you-go model.
Enables companies to be infrastructure-less.
Virtualization: Creation of a virtual version of something (OS, server, etc.).
Advantages of virtual machines:
Run OS where hardware is unavailable.
Easier to create and backup machines.
Software testing with clean OS installs.
Emulate more machines than physically available.
Timeshare lightly loaded systems.
Debug problems (suspend and resume).
Easy migration.
Run legacy systems.
Hypervisor/VMM: Software, firmware, or hardware that creates and runs virtual machines.
Host vs. Guest Machine:
Host: Computer running the hypervisor.
Guest: Each virtual machine.
Hypervisor Vendors: VMware ESX Server, Microsoft Windows Hyper-V, Oracle VM Virtual Box, Xen Project
Virtual Workspace: Abstraction of an execution environment dynamically available to clients.
Full-virtualization vs. Para-virtualization:
Full: Runs an unmodified guest OS.
Para: Modifies the guest OS for performance gains, requires modified kernels.
Top-5 cloud-computing vendors: Microsoft, Amazon, IBM, Oracle, Google Cloud, Alibaba
Map/Reduce
MapReduce: Methodology for exploiting parallelism in computing clouds.
Developed by Google to process large amounts of data quickly.
Motivation: Need to manage immense data quickly, exploit parallelism in regular data.
Examples beyond search: Dish network (click data), Tesla (car usage).
Search engine uses: index building, article clustering, spam detection
Apache Hadoop: Open-source implementation of MapReduce in Java for distributed storage and processing on commodity hardware.
Hadoop parts: Hadoop Distributed File System (HDFS) and MapReduce.
GFS, HDFS, and CloudStore:
Common: Distributed files, rarely updated, often read/appended, divided into chunks and replicated.
GFS: Google File System.
HDFS: written in Java.
CloudStore: C++ implementation of GFS.
Parallelization issues:
Assigning work units to worker threads.
Handling more work units than threads.
Aggregating/combining results.
Knowing when all workers have finished.
Dividing work into separate tasks.
How MapReduce solves parallelization issues:
Automatic parallelization & distribution.
Fault tolerance.
I/O scheduling.
Monitoring & status updates.
Typical MapReduce Architecture:
Racks of CPUs with 16-64 nodes.
Nodes connected by gigabyte Ethernet within a rack.
Racks connected by a switch (2-10 Gbps).
Cluster Computing: Collection of compute nodes on racks connected by switches.
Inter-rack bandwidth is generally faster than intra-rack bandwidth.
Dealing with failure in DFS:
Redundant file storage.
Computations divided into restartable tasks.
Constant machine pinging.
MapReduce is a programming model borrowed from Lisp.
Paradigm of Map/Reduce:
Records are broken into segments
Map: extract something of interest from each segment
Group and sort intermediate results from each segment
Reduce: aggregate intermediate results
Generate final output
To use MapReduce, you must write Map and Reduce functions.
MapReduce computation process:
Map tasks process chunks into key-value pairs.
Master controller collects pairs, sorts by key, divides among Reduce tasks
Reduce tasks combine values for each key.
Master controller in MapReduce:
Knows the number of Reduce tasks (r).
Picks a hash function, assigns keys to buckets (0 to r-1).
Merges files from Map tasks for each Reduce task, feeds to process.
Reduce function:
Takes key and list of values, combines them, generally associative and commutative.
The Reduce function output is a sequence of key-value pairs consisting of each input key k paired with the combined value.
Outputs from all Reduce tasks are merged into a single file.
MapReduce Process:
User program forks a Master controller and Worker processes.
Worker handles Map/Reduce tasks.
States of a map or reduce task: idle, executing, completed.
Map task puts files for reduce task on local disk of Worker; Master knows locations.
Fault tolerance strategy of MapReduce:
Task crashes: retry on another node.
Repeated failures: fail job or ignore input block.
Node crashes: relaunch tasks on other nodes.
Slow task (straggler): launch second copy.
Coping with node failure in MapReduce:
Master fails: restart entire job.
Map worker fails: Master sets status to idle, re-schedules, informs Reduce tasks.
Reduce worker fails: Master sets status to idle, re-schedules.
Example uses of MapReduce: distributed grep, distributed sort, web link-graph reversal, web access log stats, inverted index construction, document clustering, statistical machine translation
Search Engine Advertising
Earliest form of online ads: Banner Advertising
Types of online advertising: Banner Advertising, Pay-per-click Advertising, Website Advertising, Affiliate Marketing, Social Media Marketing
Search Engine Advertising Revenue:
Google earned 100 billion in 2017
Yahoo earned 4.6 billion in 2017
Bing earned 1.8 billion in 2017
Organic search results produced by algorithm, no guarantees of high ranking.
Search engine optimizers(SEO) focus on developing and refining a company’s online presence
Search engine optimization involves:
Making pages show up higher in search engine’s organic results
Optimizing content to target certain keyword phrases
Developing web page content that responds to each seeker’s interests
Many search engines use pay-per-click (PPC) model for advertising.
Adwords: Google's program for pay-per-click ads with keyword auction.
Mapping keyword phrases on website:
Use Key Phrases in the content on your page
Develop meta data with Key Phrases
Name directories, files and images with the same key words or phrases
Keyword matching options in Adwords: Broad Match, Exact Match, Phrase Match, Negative Keyword Match
Default keyword matching in Adwords: Broad matching.
Broad match features:
Results will show for expanded matches including synonyms and plurals, related terms and variations even if you don’t include these terms in your keyword list.
Disadvantage of broad match:
Often less targeted than exact or phrase matches.
Exact match rule in Adwords:
The search query must exactly match your keyword, but Google allows rewording and reordering, ignoring function words, conjunctions, articles and other words that don’t impact the intent of the query
Phrase Match rule in Adwords:
Your ad appears when users search on the exact phrase, and when their search contains additional terms, as long as the keyword phrase is in exactly the same order.
Negative keyword rule in Adwords:
Negative keywords allow you to eliminate searches that you know are not related to your message.
Ad Server Functionality:
Basic: Uploading creatives, maintaining business rules, targeting ads, optimizing appearance.
Advanced: Frequency capping, sequencing, excluding competitors, roadblocks, behavioral targeting.
Advertisers can define criteria in Google AdWords, Like time display: from 9:00AM-5:00PM EST, once/day
Bidders specify:
Search terms that trigger bid
Bid amount for each term
Overall ad budget, bid limits
How Google ranks bidders in AdWords:
Estimates click-through rate.
Ads with low click-through rates are not displayed.
Ranks by multiplying click-through-rate and bid amount.
Google is paid only when an ad gets clicked. the price it receives is the smallest price the bidder could have bid to get its ranking
Position of advertising in AdWords determined by Ad Rank.
Ad rank = Bid * Click Probability
Determination of actual cost per click (CPC) of ads in AdWords:
Determined by ad rank of the next highest ad below you
Exception: if your the only bidder or lowest bid, you pay your maximum bid
CPC = The ad rank of the person below you / Your Quality Score + $0.01
Click through rates influence bidders in AdWords; AdWords penalizes advertisers with low quality scores and Conversely, those with high Quality Scores get higher ad ranks and lower CPC
Average cost per click on AdWords: roughly 2.32 on the search network and 0.58 on the display network
Probability of clicking thorough may depend on :Historical click performance of the ad, Landing page quality, Relevance to the user, User click through rates
Landing pages are the pages that appear when a user clicks on your ad
AdSense from Google is a service for placing Google ads on web pages.
Kinds of ads Google offer in AdSense:
Cost per Thousand displays, CPM, Cost per Engagement,