Big Data Economics Notes

Zettabyte Era

  • Generation: In 1993, only 3% of data was digital, but by 2007, it had increased to 84%. Now, almost all data is digital due to conversion of analog data and direct digital production since 1990.
  • Data Volume: In 2019, approximately 43 zettabytes of data were generated. Projections estimate an increase to 175 zettabytes by 2025, assuming a 23% annual growth rate. One zettabyte equals one trillion gigabytes or 102110^{21} bytes.
  • Data Sources: Data generation stems from imaging, entertainment, manufacturing/administration, and voice communication. Not all generated data is stored.
  • Processing: The doubling time for processing capacity is estimated to be 1 year and 2 months for general-purpose computers and 10 months for application-specific computers. This is influenced by factors like memory, clock rate, programming language, computer architecture, and operating system.
  • Distribution: By 2020, telecommunications networks were fully digitized. The Internet has become the primary information carrier, with a doubling time of 1 year and 8 months for its capacity. In 2016, one zettabyte of data was transmitted over the Internet, largely due to video streaming.
  • Storage: The amount of stored data increases by about 30% annually, resulting in a doubling time of about 2 years and 4 months. Data storage surpassed one zettabyte in 2012 and is projected to reach nearly 20 zettabytes in 2020.
  • Zettabyte Era: This era marks the systematic analysis of large, unstructured datasets that are difficult to handle using traditional methods.
  • Big Data Economics: This involves studying how big data can be converted into economic value through systematic processing of digital data, detecting hidden information for business purposes.

Big Data Economics Definition

  • Definition 20.1 Big Data Economics: Study of how big data is turned into economic value by processing digital data to detect hidden information for business.
  • Big data uses data analysis methods to determine user behavior and uncover patterns. It does not necessarily refer to the size of the data set but requires a large data set to contain hidden information that requires big data analysis techniques.

AI, Machine Learning, Expert Systems, and Data Mining

  • Artificial Intelligence (AI): Defined as a system’s ability to interpret external data correctly, learn from it, and use these learnings to achieve specific goals through flexible adaptation.
  • Big data analysis uses tools developed for AI, such as advanced search algorithms, image analysis, learning algorithms, trading algorithms, and trend predictions.
  • Machine Learning: Uses computer algorithms that are automatically updated and modified as new information and experience are gathered. Useful for complex algorithms, like spam filters and navigators for trucks in automated warehouses.
  • Expert Systems: Consist of a knowledge base and inference algorithms. The knowledge base is updated by external input and machine learning algorithms. The inference algorithms use if-then rules for scenario analysis and decisions under uncertainty. Expert systems are used to manage business operations and customer relations.
  • Data Mining: Refers to methods for discovering patterns and dependencies in complex data sets, including unknown patterns. It uses AI, machine learning, statistics, mathematical inference, and database management.Causality analysis is threefold:
    • Determine if two variables depend on each other and which is the cause and effect.
    • Determine if two independent variables are correlated because of a common cause.
    • Determine if the variables are independent and the correlation is accidental.
      Correlation implies a linear relationship. Without a linear relationship there could still be a strong relationship but the correlation between the variables will be zero.

Characteristics of Big Data

  • Big data is defined by the four Vs:
    • Volume: The amount of digital data available for analysis.
    • Variety: The richness of data categories, evolving from structured to unstructured data like text, location data, video, images, and social media activity.
    • Velocity: The speed at which data is generated and processed, often in real-time.
    • Veracity: The exactness of the data, considering biases, noise, inaccuracies, and irregularities.
  • Data is generated by both people and machines. People produce content such as photos, text, videos and music. Machines produce sensor data, medical images, surveillance videos, and system updates.
  • Big data analytics converts raw data into meaningful information used in marketing, business planning, behavioral control, trend analysis, and statistics.

Use of Big Data

  • Data abundance creates new business opportunities, leading to new products, markets, and revenue streams.
  • Data management and analytics have become new industries.
  • Some of the new opportunities offered by big data are described next. In Sect. 20.4, we will come back to questions concerning ethics, governmental control, and violation of personal integrity and privacy.
  • Marketing: Targeted and personalized marketing through social media, mobile app usage, web search monitoring, and bank card transaction analysis.
  • Health Care: Computer-aided diagnostics, matching symptoms with possible diseases, interpretation of medical images, and handling complex data sets in medical research.
  • Algorithmic Financial Trading: Fast computer algorithms react to market changes in microseconds to buy or sell stocks or currencies.
  • Government and Public Services: Statistics, monitoring, and improvement of public services. Storage of biometric data.
  • Insurance: Predict variations in life expectancy, health costs, and costs of natural disasters using public and private databases.
  • Retailers: Personalized marketing, logistics, and administrative purposes using customer data from bank and membership cards.
  • Data Brokers: Processing and selling data bought from social networks, retailers, and app owners to other organizations.
  • Electronic Media: Targeted advertisements, editorials, and articles based on customer data. Detection of changes in user behavior and prevention of customer churn.
  • Science: Used in large scientific experiments like the Large Hadron Collider, gravity wave detectors, neutrino detectors, and astronomical radio telescopes. Also used in sports sciences to determine the effect of training, diet, and body functions measured by sensors.
  • Data Illiteracy: A concern is the benefits of big data technologies are neglected by management because of data illiteracy.
    • Big data is commodity for Netflix in business operations.
    • Netflix gathers and stores data from its over 180 million users to discover user behavior and viewer patterns.

Abuse of Big Data

  • Big data uses personal data: Because personal data may be sensitive and contain private information that the subject do not want to share, there are several legal frameworks that big data systems need to adhere to.
  • Big data needs to adhere to legal frameworks like the European General Data Protection Regulation (GDPR) which limits personal data harvesting.
  • Article 12 of the United Nations Declaration of Human Rights protects against arbitrary interference with privacy.
  • It's easy to violate human rights with big data, and difficult to prosecute violators.
  • Ownership of personal data is a political question.
  • Despite regulations and legislation, personal data is used for unethical purposes.
  • Clandestine Operations: Intelligence and security agencies use big data for clandestine information collection and analysis. Examples: NSA-led PRISM program and ECHELON project. The biggest problem that programs like PRISM and ECHELON generate is that they are not under democratic control and can be misused by the government to control and manipulate the population.
  • Metadata, Content, and Privacy: Data protection concepts include metadata, content, and privacy. Metadata includes identities of sender/origin and receiver/destination, URLs identifying the type of content, type of message (WWW message, email, file transfer, VoIP, streaming service, etc.), protocol details (IP, UDP, TCP, and tunneling headers, service initiation protocols, encryption method, etc.), length of the message, and the time the message was sent or intercepted.