unit v notes cloud computing

Chapter 8: Using Google Web Services

  • Google is a prototypical cloud computing services company, supporting some of the largest Web sites and services globally.
  • This chapter explores Google's applications and services for users and the developer tools it offers.

Exploring Google Applications

  • Google's core business revolves around search technology, employing automated technology to index the Web.
  • Google offers its search service as a standard search engine for users and a collection of specialized search tools for developers, limited to specific content areas.
  • The application of Google’s searches to content aggregation has led to societal changes and a trend of disintermediation.
  • AdWords and AdSense are commercially important, targeting advertising businesses with services like Google Analytics.
  • Google applications are cloud-based, spanning productivity, mobile, media delivery, and social interactions.
  • Google is commercializing some applications as cloud-based enterprise application suites.
  • Google has a large developer program across its applications and services, featuring Google's AJAX APIs, the Google Web Toolkit, and the Google Apps Engine hosting service.
  • The Google App Engine (GAE) allows developers to create Web applications in Java and Python, deploy them on Google's infrastructure, and scale them.

Google's Impact

  • Google has significantly impacted the computer industry and the Internet.
  • Despite competitors having more Internet users or higher stock valuation, Google remains a technology and thought leader.
  • The "Google Effect" refers to the impact of consumer tracking, targeted advertising, and the pursuit of knowledge domains.
  • The bulk of Google's income comes from targeted advertising sales, based on information gathered from Google accounts or cookies through its AdWords system.
  • In 2009, Google's revenue was 23.6 billion, controlling roughly 65 percent of the search market.
  • The company's profitability has enabled a large infrastructure and free cloud-based applications and services, representing Google's Software as a Service portfolio.

Google's Cloud Computing Services

  • Google offers an extensive set of applications to the general public, including Google Docs, Google Health, Picasa, and Google Mail.
  • These applications can be accessed via the "More" and "Even More" links on Google's home page, directing to the More Google Products page.
  • Google's cloud-based applications put pressure on vendors of traditional software like office suites and image-management programs.
  • In April 2008, Google introduced the Google App Engine (GAE), a development platform for hosted Web applications using Google's infrastructure.
  • GAE allows developers to create and deploy Web applications without the need to manage the infrastructure.
  • GAE applications can be written in Java and Python using the Google App Engine Framework, reducing development effort.
  • Google offers a certain free level of service, assessing charges only when applications exceed a certain level of processor load, storage usage, and network bandwidth.

Google App Engine Limitations

  • Google App Engine applications must comply with Google's infrastructure, limiting the range of application types and making porting difficult.

Google Application Portfolio

  • Nearly all products in Google's application and service portfolio are cloud computing services, relying on systems worldwide on Google's million-plus servers in nearly 30 datacenters.
  • Roughly 17 of the 48 services listed leverage Google’s search engine.
  • Some sites search through selected content like Books, Images, and Scholar, while others like Finance and News format search results into an aggregation page.

Indexed Search

  • Google's search technology indexes pages and retrieves information using Web crawlers, also called spiders or robots.
  • Content is scanned up to a certain number of words and placed into an index. Google caches copies of Web pages and stores documents like DOC or PDF files.
  • Google uses a patented algorithm called PageRank to determine page importance, considering the number of quality links, keywords, site availability, and traffic.
  • Google tweaks its algorithm to prevent Search Engine Optimization (SEO) strategies from gaming the system.
  • Google returns a Search Engine Results Page (SERP) for a query parsed for its keywords.

Google's Search Limitations

  • Google does not search all sites. Sites that don't register or aren't linked prominently may remain undiscovered.
  • Sites can use a ROBOTS.TXT file to indicate whether the site can be searched and what pages can be searched.
  • Google developed the Sitemaps protocol, which lets a Web site list information about how the Google robot can work with the site.
  • Sitemaps can be useful for crawling content that isn't browsable and for finding media information that isn't normally considered.
  • Dynamic content presented in AJAX isn't normally indexed, but Google has a procedure to help the engine crawl this information.

The Dark Web

  • Online content not indexed by search engines is called the “Deep Web.”
  • Any site suppressing Web crawlers from indexing is part of the Deep Web.
  • Examples include Facebook and peer-to-peer networks like Ian Clarke's Freenet.

Components of the Deep Web

  • Database-generated Web pages or dynamic content
  • Pages without links
  • Private or limited access Web pages and sites
  • Information contained in sources available through executable code (e.g., JavaScript)
  • Documents and files not in a searchable form
  • The amount of information stored in the Deep Web is many times larger than what can be accessed by search engines.
  • Some estimates suggest the Dark Web could be an order of magnitude larger than the content in search engines.

Aggregation and Disintermediation

  • Aggregation pages are controversial as Google's display of information from various sites may violate copyright laws and damage content providers.
  • Google has defended its right to display capsule information under the Digital Millennium Copyright Act.
  • Google reached a negotiated agreement with the Authors Guild regarding unauthorized scanning and copying of books for Google Books, specifying Google's obligations under fair use exemption.
  • Google argues that the publicity associated with searchable content adds value.
  • Google has been a major factor in disintermediation, which is the removal of intermediaries from a supply chain.
  • Disintermediation connects producers directly with consumers but impacts organizations like news collection agencies, publishers, and retail outlets.

Productivity Applications and Services

  • Google introduced productivity applications starting in 2004 with Gmail, expanding these services through homegrown products and acquisitions.

Privacy Implications

  • These products store information online, which Google uses to build a profile of your activities.
  • Google states this information is never viewed individually by humans and lists its policies in the Privacy Center.
  • Google has been vigilant in protecting its privacy reputation, but the collection of personal data must be considered.

Google Products (Table 8.1)

  • Alerts: Sends periodic email alerts based on search terms.
  • Blog Search: Displays an aggregation page from blogs.
  • Blogger: A blogging site for personal blogs.
  • Books: A vast library of book content and previews of copyrighted material.
  • Calendar: Calendar service for managing schedules and sharing them.
  • Chrome: Google's browser and operating system.
  • Checkout: A payment processing system.
  • Code: Developer tools and resources.
  • Custom Search: Creates a custom search utility for a particular Web site.
  • Desktop: Indexes content on local drive for fast searches.
  • Directory: Search the Web by topics.
  • Docs: Online productivity applications.
  • Earth: An online atlas and mapping service with mashups.
  • Finance: A financial news aggregation service and site.
  • GOOG-411: Mobile phone search.
  • Google Health: Health information management system.
  • Groups: Discussion groups on specific topics.
  • iGoogle: AJAX customized home page.
  • Images: Web image search.
  • Knol: Short articles submitted by users.
  • Labs: A collection of applications and utilities under development and testing.
  • Orkut: Social media service with instant messaging.
  • Maps: Mapping and direction service.
  • Maps for Mobile: Mapping and direction service for mobile devices with GPS.
  • Mobile: Mobile search using voice and location.
  • News: News aggregation service and Web site.
  • Pack: Free Windows-based software selected by Google.
  • Patent Search: Patent and trademark search.
  • Picasa: Photo-editing and management software.
  • Product Search: Shopping search function.
  • Reader: An RSS reader.
  • Scholar: Search site for research and scholarly work.
  • Search for Mobile: Google's search application optimized for mobile devices.
  • Sites: Web site and wiki creation and staging tool.
  • SketchUp: Allows users to create 3D models and share them with others.
  • Talk: Instant messaging and chat utility.
  • Toolbar: Provides search features inside different browsers.
  • Translate: Language translation utility.
  • Trends: Statistical information on different search terms.
  • Videos: Searches for videos on the Web.
  • Voice: Free phone service.
  • Web Search: Google's core Web search engine.
  • Web Search Features: A help page for special Web searches in Google.
  • YouTube: Flash video sharing site.

Enterprise Offerings

  • Google has released special versions of its products for the enterprise market.

Enterprise Products

  • Google Commerce Search: Search service for online retailers to market products.
  • Google Site Search: Google's search engine customized for enterprises.
  • Google Search Appliance: Server for local and Internet searching with document management features.
  • Google Mini: Smaller version of the GSA that stores 300,000 indexed documents.

Google Apps for Business

  • Google markets its productivity applications as office suites to organizations, offering packages for governments, schools, non-profits, and ISPs.
Google Apps Premier Edition
  • A paid service for businesses and governmental agencies, offering Gmail, Docs, and Calendar as core applications.
  • Adds 25GB of Gmail storage, e-mail server synchronization, Groups, Sites, Talk, Video, enhanced security, directory services, authentication and authorization services, and customer's own supported domain.
  • Premium Edition also adds access to Google APIs and 24/7 support with a 99.9-percent uptime guarantee Service Level Agreement.
  • The cost per use is 50 per user account/per year.

Google Postini Services

  • Provides security services, e-mail message encryption, message archiving, and message discovery services.
  • These are paid services that add from 12 to 45 per user/per year, based on the options chosen.
  • Postini allows e-mail to be retained for up to 10 years and can be used to demonstrate regulatory compliance.
  • Google's online offerings give users essential features for a fraction of the Microsoft Office price, leading to expectation that its collaborative tools and features will put pressure on shrink-wrapped competitors.

AdWords

  • AdWords is a targeted ad service matching advertisers and keywords to user search profiles, transforming Google into an industry giant.
  • AdWords' competitors include Microsoft adCenter and Yahoo! Search Marketing.
  • Ads are displayed as text, banners, or media and can be tailored based on various factors.

How AdWords Works

  • Advertisers bid on keywords to match users to products or services.
  • Up to 12 ads per search can be returned.
  • Google gets paid when a user clicks the ad (pay-per-click advertising), and success is measured by the click-through rate (CTR).
  • Google calculates a quality score for ads based on CTR, connection between ad and keywords, and advertiser's history.
  • This quality score is a trade secret and used to price the minimum bid of a keyword.
  • In 2007, Google purchased DoubleClick, an Internet advertising services company that helps clients create ads, provides hosting, and tracks results.
  • DoubleClick ads leave browser cookies collecting user information.

Google Analytics

  • Google Analytics (GA) is a statistical tool measuring the number and types of visitors to a Web site and how the Web site is used.
  • It is offered as a free service and has been widely adopted.

Google Analytics Usage

  • Builtwith.com indicates GA was in use on 54 percent of the top 10,000 and 100,000, and 35 percent of the top one million of the world's Web sites.
  • BackendBattles.com sets GA's market share at 57 percent for the top 10,000 sites.

Google Analytics Tracking Code (GATC)

  • Works by using a JavaScript snippet on individual pages.
  • When the page loads, the JavaScript runs and creates a first-party browser cookie for managing return visitors, tracking, and testing browser characteristics.
  • GATC requests and stores information from the user's account and sends visitor data back to GA servers for processing.
  • Tracks visitors from search engines, referral links, display ads, PPC networks, and other sources.
  • GA aggregates data and presents it in visual form and is connected to the AdWords system to track ad performance.
  • Saves and stores up to 50 individual site profiles, restricted to sites with less than 5 million pageviews per month, unless an AdWords subscription is active.
  • GA cookies can be blocked manually or by technologies like Firefox Adblock and NoScript.

Google Translate

  • Google Translate performs machine translation as a cloud service between two of 35 different languages.
  • Introduced in 2007, it replaced the SYSTRAN system.
  • The translation method uses a statistical approach developed by Franz-Joseph Och in 2003.

Corpus Linguistics Approach

  • Building a translation system involves collecting a database of words and matching it to two bilingual text corpuses.
  • A text corpus or parallel collection is a database of word- and phrase-usage taken from the language in everyday use obtained by examining documents translated by professionals to software analysis.
  • Documents analyzed include translations of the United Nations and European Parliament.

Accessing Google Translate

  • Can be accessed directly or through the Google Translator Toolkit.
  • Functions include direct text input, URL input, phonetic equivalent for script languages, and document uploads.

Google Translator Toolkit

  • Provides a means for using Translate to perform translations that can be edited.
  • Translation services have been in development for many years, but Google's efforts may be unique due to its work in language transcription.
  • Combining language transcription with cloud service could create a translation device with great utility.

Exploring the Google Toolkit

  • Google has an extensive program supporting developers who want to leverage Google's cloud-based applications and services.

Google's Development Services

  • AJAX APIs: Used to build widgets and applets commonly found in places like iGoogle.
  • Android: A phone operating system development.
  • Google App Engine: Google's Platform as a Service (PaaS) development and deployment system for cloud computing applications.
  • Google Apps Marketplace: Offers application development tools and a distribution channel for cloud-based applications.
  • Google Gears: Provides offline access to online data, including a database engine installed on the client that caches data and synchronizes it.
  • Google Web Toolkit (GWT): A set of development tools for browser-based applications, used to create Google Wave and Google AdWords.
  • Project Hosting: A project management tool for managing source code.

The Google APIs

  • Most Google services are exposed by an API, allowing for integration into other Web sites.

API Categories

  • Ads and AdSense: Integrate Google's advertising services into Web applications.
  • AJAX: Adds content such as RSS feeds, maps, search boxes, and other information sources using JavaScript snippets.
  • Browser: APIs related to browser-based applications, including Chrome browser APIs and Google Cloud Print API.
  • Data: APIs that exchange data with various Google services.
  • Geo: APIs providing location-specific information, maps, and geo-specific databases.
  • Search: APIs leveraging Google's core competency, including Google AJAX Search, Book Search, Code Search, Custom Search, and Webmaster Tools Data APIs.
  • Social: APIs used for information exchange and communication tools, supporting applications such as Gmail, Calendar, and others.

Google APIs Summary (Table 8.2)

  • Google Accounts Authentication: Get access into desktop or mobile applications.
  • Google AdWords API: Automate and streamline campaign management activities.
  • AdSense for AJAX: Target ads to dynamic page content.
  • AdSense for Search Ads Only: Target ads to search results.
  • Google AJAX APIs: Implement rich, dynamic Web sites entirely in JavaScript and HTML.
  • Google AJAX Feed API: Easily mash up public feeds using JavaScript.
  • Google AJAX Language API: Easily translate and detect multiple languages using JavaScript.
  • Google AJAX Search API: Put a Google Search box and results on your own site.
  • Google Analytics: Track site traffic and use Analytics data in Google Data API feeds.
  • Android: Build mobile apps for Android, a software stack for mobile devices.
  • Google App Engine: Run Web applications on Google's infrastructure.
  • Google Apps Script: Automate tasks across Google products.
  • BigQuery (Labs): Interactively analyze large datasets.
  • Google Apps: Extend Google Apps, integrate with other systems, or build new apps.
  • Google Apps Marketplace: Sell integrated applications to millions of Google Apps users.
  • Gmail APIs and Tools: Create gadgets for Gmail and interact with the inbox.
  • Google Base Data API (Labs): Manage Google Base content programmatically.
  • Blogger Data API (Labs): Enable apps to view and update Blogger content.
  • Google Books Search APIs (Labs): Search the complete index of Book Search and integrate with social features.
  • Google Buzz (Labs): Share updates, photos, videos, and more, and start conversations.
  • Google Calendar APIs and Tools: Create and manage events, calendars, and gadgets for Google Calendar.
  • Chart Tools: Add charts and graphs to Web pages.
  • Google Checkout: Start selling on Web sites.
  • Chromium: Contribute to the open-source project behind Google Chrome.
  • Google Chrome Frame: Enable open Web technologies and Google Chrome's JavaScript implementation within Internet Explorer.
  • Google Chrome Extensions (Labs): Modify and enhance the functionality of Google Chrome.
  • Installable Web Apps (Labs): Package Web apps for installation in Google Chrome.
  • Closure Tools: Create powerful and efficient JavaScript.
  • Google Cloud Print (Labs): Enable any app on any device to print to any printer.
  • Google Code Search Data API (Labs): Enable apps to view data from Code Search.
  • Google Contacts API: Allow apps to view and update user contacts.
  • Google Coupon Feeds (Labs): Provide coupon listings that are included in Google search results.
  • Google Custom Search API: Create a custom search engine for Web sites.
  • Google DoubleClick for Publishers (Labs): Build applications that interact directly with Google's next-generation display advertising platform.
  • Google Data Protocol: A simple protocol for reading and writing data on the Web.
  • Google Desktop APIs (Labs): Create gadgets and indexing plugins for Google Desktop.
  • Google Documents List Data API: Enable apps to view and update lists of Google Documents.
  • Google Interactive Media Ads (Labs): Enable publishers to request and display ads in video, audio, and game content.
  • Google Earth API: Embed Google Earth into Web pages.
  • Google Plugin for Eclipse: Simplify development of GWT and App Engine projects in the Eclipse IDE.
  • Feedburner API (Labs): Interact with FeedBurner's feed management and awareness-generating capabilities.
  • Google Finance Data API (Labs): View and update Finance content in Google Data API feeds.
  • Google Friend Connect APIs (Labs): JS and REST/RPC APIs for Google Friend Connect.
  • Google Fusion Tables API (Labs): Manage Google Fusion Tables content programmatically.
  • Gadgets API: Build mini-apps that run on multiple sites, including iGoogle, Google Desktop, or any Web page.
  • Gears (Labs): Enable Web applications to work offline from desktop PCs or mobile devices.
  • Google Health API: Manage personal health information with Google.
  • iGoogle Developer Home (Labs): Build and test gadgets for iGoogle.
  • iGoogle Themes API (Labs): Design a dynamic theme for the iGoogle home page.
  • KML API: Create and share content with Google Earth, Maps, and Maps for mobile.
  • Google Latitude API (Labs): Build applications that read and update user locations and location histories.
  • Google Libraries API: Load open-source JavaScript libraries.
  • Google Moderator API (Labs): Collect ideas, questions, and recommendations from audiences.
  • Google Geocoding API: Convert addresses from geographic coordinates.
  • Google Directions API: Plot directions with transportation options.
  • Google JavaScript Maps API: Integrate Google's interactive maps.
  • Google Maps API: Integrate Google Maps for Flash in Flash applications.
  • OpenSocial: Build social applications that work across many Web sites.
  • Orkut Developer Home: Create social applications for Orkut users.
  • Google Project Hosting: Host open-source projects on Google Code.
  • Picasa APIs (Labs): Create custom buttons and upload files to third-party services.
  • Picasa Web Albums Data API: Include Picasa Web Albums in applications or Web sites.
  • Google PowerMeter API (Labs): Integrate with