unit v notes cloud computing
Chapter 8: Using Google Web Services
- Google is a prototypical cloud computing services company, supporting some of the largest Web sites and services globally.
- This chapter explores Google's applications and services for users and the developer tools it offers.
Exploring Google Applications
- Google's core business revolves around search technology, employing automated technology to index the Web.
- Google offers its search service as a standard search engine for users and a collection of specialized search tools for developers, limited to specific content areas.
- The application of Google’s searches to content aggregation has led to societal changes and a trend of disintermediation.
- AdWords and AdSense are commercially important, targeting advertising businesses with services like Google Analytics.
- Google applications are cloud-based, spanning productivity, mobile, media delivery, and social interactions.
- Google is commercializing some applications as cloud-based enterprise application suites.
- Google has a large developer program across its applications and services, featuring Google's AJAX APIs, the Google Web Toolkit, and the Google Apps Engine hosting service.
- The Google App Engine (GAE) allows developers to create Web applications in Java and Python, deploy them on Google's infrastructure, and scale them.
Google's Impact
- Google has significantly impacted the computer industry and the Internet.
- Despite competitors having more Internet users or higher stock valuation, Google remains a technology and thought leader.
- The "Google Effect" refers to the impact of consumer tracking, targeted advertising, and the pursuit of knowledge domains.
- The bulk of Google's income comes from targeted advertising sales, based on information gathered from Google accounts or cookies through its AdWords system.
- In 2009, Google's revenue was 23.6 billion, controlling roughly 65 percent of the search market.
- The company's profitability has enabled a large infrastructure and free cloud-based applications and services, representing Google's Software as a Service portfolio.
Google's Cloud Computing Services
- Google offers an extensive set of applications to the general public, including Google Docs, Google Health, Picasa, and Google Mail.
- These applications can be accessed via the "More" and "Even More" links on Google's home page, directing to the More Google Products page.
- Google's cloud-based applications put pressure on vendors of traditional software like office suites and image-management programs.
- In April 2008, Google introduced the Google App Engine (GAE), a development platform for hosted Web applications using Google's infrastructure.
- GAE allows developers to create and deploy Web applications without the need to manage the infrastructure.
- GAE applications can be written in Java and Python using the Google App Engine Framework, reducing development effort.
- Google offers a certain free level of service, assessing charges only when applications exceed a certain level of processor load, storage usage, and network bandwidth.
Google App Engine Limitations
- Google App Engine applications must comply with Google's infrastructure, limiting the range of application types and making porting difficult.
Google Application Portfolio
- Nearly all products in Google's application and service portfolio are cloud computing services, relying on systems worldwide on Google's million-plus servers in nearly 30 datacenters.
- Roughly 17 of the 48 services listed leverage Google’s search engine.
- Some sites search through selected content like Books, Images, and Scholar, while others like Finance and News format search results into an aggregation page.
Indexed Search
- Google's search technology indexes pages and retrieves information using Web crawlers, also called spiders or robots.
- Content is scanned up to a certain number of words and placed into an index. Google caches copies of Web pages and stores documents like DOC or PDF files.
- Google uses a patented algorithm called PageRank to determine page importance, considering the number of quality links, keywords, site availability, and traffic.
- Google tweaks its algorithm to prevent Search Engine Optimization (SEO) strategies from gaming the system.
- Google returns a Search Engine Results Page (SERP) for a query parsed for its keywords.
Google's Search Limitations
- Google does not search all sites. Sites that don't register or aren't linked prominently may remain undiscovered.
- Sites can use a ROBOTS.TXT file to indicate whether the site can be searched and what pages can be searched.
- Google developed the Sitemaps protocol, which lets a Web site list information about how the Google robot can work with the site.
- Sitemaps can be useful for crawling content that isn't browsable and for finding media information that isn't normally considered.
- Dynamic content presented in AJAX isn't normally indexed, but Google has a procedure to help the engine crawl this information.
The Dark Web
- Online content not indexed by search engines is called the “Deep Web.”
- Any site suppressing Web crawlers from indexing is part of the Deep Web.
- Examples include Facebook and peer-to-peer networks like Ian Clarke's Freenet.
Components of the Deep Web
- Database-generated Web pages or dynamic content
- Pages without links
- Private or limited access Web pages and sites
- Information contained in sources available through executable code (e.g., JavaScript)
- Documents and files not in a searchable form
- The amount of information stored in the Deep Web is many times larger than what can be accessed by search engines.
- Some estimates suggest the Dark Web could be an order of magnitude larger than the content in search engines.
- Aggregation pages are controversial as Google's display of information from various sites may violate copyright laws and damage content providers.
- Google has defended its right to display capsule information under the Digital Millennium Copyright Act.
- Google reached a negotiated agreement with the Authors Guild regarding unauthorized scanning and copying of books for Google Books, specifying Google's obligations under fair use exemption.
- Google argues that the publicity associated with searchable content adds value.
- Google has been a major factor in disintermediation, which is the removal of intermediaries from a supply chain.
- Disintermediation connects producers directly with consumers but impacts organizations like news collection agencies, publishers, and retail outlets.
Productivity Applications and Services
- Google introduced productivity applications starting in 2004 with Gmail, expanding these services through homegrown products and acquisitions.
Privacy Implications
- These products store information online, which Google uses to build a profile of your activities.
- Google states this information is never viewed individually by humans and lists its policies in the Privacy Center.
- Google has been vigilant in protecting its privacy reputation, but the collection of personal data must be considered.
Google Products (Table 8.1)
- Alerts: Sends periodic email alerts based on search terms.
- Blog Search: Displays an aggregation page from blogs.
- Blogger: A blogging site for personal blogs.
- Books: A vast library of book content and previews of copyrighted material.
- Calendar: Calendar service for managing schedules and sharing them.
- Chrome: Google's browser and operating system.
- Checkout: A payment processing system.
- Code: Developer tools and resources.
- Custom Search: Creates a custom search utility for a particular Web site.
- Desktop: Indexes content on local drive for fast searches.
- Directory: Search the Web by topics.
- Docs: Online productivity applications.
- Earth: An online atlas and mapping service with mashups.
- Finance: A financial news aggregation service and site.
- GOOG-411: Mobile phone search.
- Google Health: Health information management system.
- Groups: Discussion groups on specific topics.
- iGoogle: AJAX customized home page.
- Images: Web image search.
- Knol: Short articles submitted by users.
- Labs: A collection of applications and utilities under development and testing.
- Orkut: Social media service with instant messaging.
- Maps: Mapping and direction service.
- Maps for Mobile: Mapping and direction service for mobile devices with GPS.
- Mobile: Mobile search using voice and location.
- News: News aggregation service and Web site.
- Pack: Free Windows-based software selected by Google.
- Patent Search: Patent and trademark search.
- Picasa: Photo-editing and management software.
- Product Search: Shopping search function.
- Reader: An RSS reader.
- Scholar: Search site for research and scholarly work.
- Search for Mobile: Google's search application optimized for mobile devices.
- Sites: Web site and wiki creation and staging tool.
- SketchUp: Allows users to create 3D models and share them with others.
- Talk: Instant messaging and chat utility.
- Toolbar: Provides search features inside different browsers.
- Translate: Language translation utility.
- Trends: Statistical information on different search terms.
- Videos: Searches for videos on the Web.
- Voice: Free phone service.
- Web Search: Google's core Web search engine.
- Web Search Features: A help page for special Web searches in Google.
- YouTube: Flash video sharing site.
Enterprise Offerings
- Google has released special versions of its products for the enterprise market.
Enterprise Products
- Google Commerce Search: Search service for online retailers to market products.
- Google Site Search: Google's search engine customized for enterprises.
- Google Search Appliance: Server for local and Internet searching with document management features.
- Google Mini: Smaller version of the GSA that stores 300,000 indexed documents.
Google Apps for Business
- Google markets its productivity applications as office suites to organizations, offering packages for governments, schools, non-profits, and ISPs.
Google Apps Premier Edition
- A paid service for businesses and governmental agencies, offering Gmail, Docs, and Calendar as core applications.
- Adds 25GB of Gmail storage, e-mail server synchronization, Groups, Sites, Talk, Video, enhanced security, directory services, authentication and authorization services, and customer's own supported domain.
- Premium Edition also adds access to Google APIs and 24/7 support with a 99.9-percent uptime guarantee Service Level Agreement.
- The cost per use is 50 per user account/per year.
Google Postini Services
- Provides security services, e-mail message encryption, message archiving, and message discovery services.
- These are paid services that add from 12 to 45 per user/per year, based on the options chosen.
- Postini allows e-mail to be retained for up to 10 years and can be used to demonstrate regulatory compliance.
- Google's online offerings give users essential features for a fraction of the Microsoft Office price, leading to expectation that its collaborative tools and features will put pressure on shrink-wrapped competitors.
AdWords
- AdWords is a targeted ad service matching advertisers and keywords to user search profiles, transforming Google into an industry giant.
- AdWords' competitors include Microsoft adCenter and Yahoo! Search Marketing.
- Ads are displayed as text, banners, or media and can be tailored based on various factors.
How AdWords Works
- Advertisers bid on keywords to match users to products or services.
- Up to 12 ads per search can be returned.
- Google gets paid when a user clicks the ad (pay-per-click advertising), and success is measured by the click-through rate (CTR).
- Google calculates a quality score for ads based on CTR, connection between ad and keywords, and advertiser's history.
- This quality score is a trade secret and used to price the minimum bid of a keyword.
- In 2007, Google purchased DoubleClick, an Internet advertising services company that helps clients create ads, provides hosting, and tracks results.
- DoubleClick ads leave browser cookies collecting user information.
Google Analytics
- Google Analytics (GA) is a statistical tool measuring the number and types of visitors to a Web site and how the Web site is used.
- It is offered as a free service and has been widely adopted.
Google Analytics Usage
- Builtwith.com indicates GA was in use on 54 percent of the top 10,000 and 100,000, and 35 percent of the top one million of the world's Web sites.
- BackendBattles.com sets GA's market share at 57 percent for the top 10,000 sites.
Google Analytics Tracking Code (GATC)
- Works by using a JavaScript snippet on individual pages.
- When the page loads, the JavaScript runs and creates a first-party browser cookie for managing return visitors, tracking, and testing browser characteristics.
- GATC requests and stores information from the user's account and sends visitor data back to GA servers for processing.
- Tracks visitors from search engines, referral links, display ads, PPC networks, and other sources.
- GA aggregates data and presents it in visual form and is connected to the AdWords system to track ad performance.
- Saves and stores up to 50 individual site profiles, restricted to sites with less than 5 million pageviews per month, unless an AdWords subscription is active.
- GA cookies can be blocked manually or by technologies like Firefox Adblock and NoScript.
Google Translate
- Google Translate performs machine translation as a cloud service between two of 35 different languages.
- Introduced in 2007, it replaced the SYSTRAN system.
- The translation method uses a statistical approach developed by Franz-Joseph Och in 2003.
Corpus Linguistics Approach
- Building a translation system involves collecting a database of words and matching it to two bilingual text corpuses.
- A text corpus or parallel collection is a database of word- and phrase-usage taken from the language in everyday use obtained by examining documents translated by professionals to software analysis.
- Documents analyzed include translations of the United Nations and European Parliament.
Accessing Google Translate
- Can be accessed directly or through the Google Translator Toolkit.
- Functions include direct text input, URL input, phonetic equivalent for script languages, and document uploads.
- Provides a means for using Translate to perform translations that can be edited.
- Translation services have been in development for many years, but Google's efforts may be unique due to its work in language transcription.
- Combining language transcription with cloud service could create a translation device with great utility.
- Google has an extensive program supporting developers who want to leverage Google's cloud-based applications and services.
Google's Development Services
- AJAX APIs: Used to build widgets and applets commonly found in places like iGoogle.
- Android: A phone operating system development.
- Google App Engine: Google's Platform as a Service (PaaS) development and deployment system for cloud computing applications.
- Google Apps Marketplace: Offers application development tools and a distribution channel for cloud-based applications.
- Google Gears: Provides offline access to online data, including a database engine installed on the client that caches data and synchronizes it.
- Google Web Toolkit (GWT): A set of development tools for browser-based applications, used to create Google Wave and Google AdWords.
- Project Hosting: A project management tool for managing source code.
The Google APIs
- Most Google services are exposed by an API, allowing for integration into other Web sites.
API Categories
- Ads and AdSense: Integrate Google's advertising services into Web applications.
- AJAX: Adds content such as RSS feeds, maps, search boxes, and other information sources using JavaScript snippets.
- Browser: APIs related to browser-based applications, including Chrome browser APIs and Google Cloud Print API.
- Data: APIs that exchange data with various Google services.
- Geo: APIs providing location-specific information, maps, and geo-specific databases.
- Search: APIs leveraging Google's core competency, including Google AJAX Search, Book Search, Code Search, Custom Search, and Webmaster Tools Data APIs.
- Social: APIs used for information exchange and communication tools, supporting applications such as Gmail, Calendar, and others.
Google APIs Summary (Table 8.2)
- Google Accounts Authentication: Get access into desktop or mobile applications.
- Google AdWords API: Automate and streamline campaign management activities.
- AdSense for AJAX: Target ads to dynamic page content.
- AdSense for Search Ads Only: Target ads to search results.
- Google AJAX APIs: Implement rich, dynamic Web sites entirely in JavaScript and HTML.
- Google AJAX Feed API: Easily mash up public feeds using JavaScript.
- Google AJAX Language API: Easily translate and detect multiple languages using JavaScript.
- Google AJAX Search API: Put a Google Search box and results on your own site.
- Google Analytics: Track site traffic and use Analytics data in Google Data API feeds.
- Android: Build mobile apps for Android, a software stack for mobile devices.
- Google App Engine: Run Web applications on Google's infrastructure.
- Google Apps Script: Automate tasks across Google products.
- BigQuery (Labs): Interactively analyze large datasets.
- Google Apps: Extend Google Apps, integrate with other systems, or build new apps.
- Google Apps Marketplace: Sell integrated applications to millions of Google Apps users.
- Gmail APIs and Tools: Create gadgets for Gmail and interact with the inbox.
- Google Base Data API (Labs): Manage Google Base content programmatically.
- Blogger Data API (Labs): Enable apps to view and update Blogger content.
- Google Books Search APIs (Labs): Search the complete index of Book Search and integrate with social features.
- Google Buzz (Labs): Share updates, photos, videos, and more, and start conversations.
- Google Calendar APIs and Tools: Create and manage events, calendars, and gadgets for Google Calendar.
- Chart Tools: Add charts and graphs to Web pages.
- Google Checkout: Start selling on Web sites.
- Chromium: Contribute to the open-source project behind Google Chrome.
- Google Chrome Frame: Enable open Web technologies and Google Chrome's JavaScript implementation within Internet Explorer.
- Google Chrome Extensions (Labs): Modify and enhance the functionality of Google Chrome.
- Installable Web Apps (Labs): Package Web apps for installation in Google Chrome.
- Closure Tools: Create powerful and efficient JavaScript.
- Google Cloud Print (Labs): Enable any app on any device to print to any printer.
- Google Code Search Data API (Labs): Enable apps to view data from Code Search.
- Google Contacts API: Allow apps to view and update user contacts.
- Google Coupon Feeds (Labs): Provide coupon listings that are included in Google search results.
- Google Custom Search API: Create a custom search engine for Web sites.
- Google DoubleClick for Publishers (Labs): Build applications that interact directly with Google's next-generation display advertising platform.
- Google Data Protocol: A simple protocol for reading and writing data on the Web.
- Google Desktop APIs (Labs): Create gadgets and indexing plugins for Google Desktop.
- Google Documents List Data API: Enable apps to view and update lists of Google Documents.
- Google Interactive Media Ads (Labs): Enable publishers to request and display ads in video, audio, and game content.
- Google Earth API: Embed Google Earth into Web pages.
- Google Plugin for Eclipse: Simplify development of GWT and App Engine projects in the Eclipse IDE.
- Feedburner API (Labs): Interact with FeedBurner's feed management and awareness-generating capabilities.
- Google Finance Data API (Labs): View and update Finance content in Google Data API feeds.
- Google Friend Connect APIs (Labs): JS and REST/RPC APIs for Google Friend Connect.
- Google Fusion Tables API (Labs): Manage Google Fusion Tables content programmatically.
- Gadgets API: Build mini-apps that run on multiple sites, including iGoogle, Google Desktop, or any Web page.
- Gears (Labs): Enable Web applications to work offline from desktop PCs or mobile devices.
- Google Health API: Manage personal health information with Google.
- iGoogle Developer Home (Labs): Build and test gadgets for iGoogle.
- iGoogle Themes API (Labs): Design a dynamic theme for the iGoogle home page.
- KML API: Create and share content with Google Earth, Maps, and Maps for mobile.
- Google Latitude API (Labs): Build applications that read and update user locations and location histories.
- Google Libraries API: Load open-source JavaScript libraries.
- Google Moderator API (Labs): Collect ideas, questions, and recommendations from audiences.
- Google Geocoding API: Convert addresses from geographic coordinates.
- Google Directions API: Plot directions with transportation options.
- Google JavaScript Maps API: Integrate Google's interactive maps.
- Google Maps API: Integrate Google Maps for Flash in Flash applications.
- OpenSocial: Build social applications that work across many Web sites.
- Orkut Developer Home: Create social applications for Orkut users.
- Google Project Hosting: Host open-source projects on Google Code.
- Picasa APIs (Labs): Create custom buttons and upload files to third-party services.
- Picasa Web Albums Data API: Include Picasa Web Albums in applications or Web sites.
- Google PowerMeter API (Labs): Integrate with