Internet reading
What is SEO -
The process of improving the position of your website appears at in the organic searcher results by sites like google.
Spiders - software the crawls the web to Identify the actual copy written on the page along with things like use of key words and phrases.
Page rank - shows how trustworthy your site is
Determines the usefulness of a page by how many pages are linked to it
Google bots - check how many pages are linked to a website
Map reduce - uses clusers to process data after than any super computer
Beware of rank farms
Big data - large amounts of information that grow at increasing rates
Big data is manages by:
data storage
Data mining
Data analytics
Data visualisation
Process -
Crawler goes to website and collects data,
Sends data to cluster.
Cluster sends multiple copies to nodes
Master node makes smaller nodes process list
Nodes shuffles the data
Duplicates get removed
Data gets sent to central disk to update google search engine