Intro to Analytics 2.1.1

Discovery: In this initial phase, the data analytics team familiarizes themselves with the business domain, examines relevant historical data, and assesses available resources such as people, technology, time, and data. This phase involves framing the business problem as an analytics challenge and formulating initial hypotheses to test and explore the data.

Data preparation: This phase requires the establishment of an analytic sandbox, where the team can work with data and perform analytics throughout the project. Data extraction, transformation, and loading (ETL) processes are executed to prepare the data for analysis, and the team becomes thoroughly acquainted with the data.

Model planning: In this phase, the team determines the methods, techniques, and workflows to be used during the subsequent model building phase. The team explores data relationships, selects key variables, and identifies the most suitable models for the project.

Model execution: The team develops datasets for testing, training, and production purposes, builds and executes models based on the planning phase, and evaluates the need for more robust tools or environments for executing models and workflows.

Communicate results: This phase involves determining the project's success or failure based on the criteria developed in the discovery phase. The team identifies key findings, quantifies the business value, and develops a narrative to summarize and communicate the results to stakeholders.

Operationalization: In the final phase, the team delivers reports, briefings, code, and technical documents. A pilot project may be implemented to test the models in a production environment, ensuring that the results are framed effectively and demonstrate clear value to stakeholders.

Key Terms

  • data analytics lifecycle: a structured approach to address big data issues and data science projects, consisting of six phases that help teams derive actionable insights from data

  • big data: refers to the vast amount of information collected, stored, and analyzed by businesses and organizations; its unique aspects can differ between organizations and include up to seven characteristics; however, for this course, we will focus on the main four: variety, velocity, veracity, and volume

  • variety: the diverse types of data, including structured (like spreadsheets), semi-structured (such as emails), and unstructured formats (like social media posts); big data comes from numerous sources, including text, images, videos, social media interactions, and sensor data

  • velocity: the speed at which data is produced, collected, and processed; in the context of big data, velocity refers to the need for quick analysis and decision-making based on the data gathered

  • veracity: the accuracy, reliability, and quality of the data collected and analyzed; ensuring data veracity is essential for gaining valuable insights and making informed decisions

  • volume: the sheer amount of data generated and handled by businesses; big data involves dealing with enormous quantities of data, ranging from terabytes to petabytes and beyond, which can be challenging in terms of storage and processing