Introduction to Business Analytics: Chapters 1-2: Specify the Question and Obtain Data

Vocabulary

Business Value: All the items, events, and interactions that determine a company’s financial health. Common measure is long-term profitability

Business Process: A coordinated, standardized set of activities conducted by both people and equipment to accomplish a specific business task.

Business Analyst: A data specialist who curates and uses data to help an organization make effective business decisions.

Data Overload: Access or exposure to too much data; prevents data from being properly synthesized and interpreted.

Analytics Mindset: The willingness and ability to specify which questions need to be addressed, find and extract pertinent data that might address those questions, analyze those data, and then report the results of business analytics to decision-makers.

Data: Raw numbers and facts that have little meaning on their own

Information: Data organized in a way that is meaningful to the user in context; data with context

Context: Setting, event, statement, or situation in which data can become more fully understood and evaluated

Information Value Chain: The events and processes from the collection of data to the compilation of information to an ultimate business decision.

Knowledge: Understanding or familiarity with information gained through learning

Decisions: Conclusions reached after consideration of knowledge gained

Data Scientist: Data Specialist who knows how to work with, manipulate, and statistically test data

Business Analytics: Use of data to make knowledge, draw conclusions, and address business questions

Marketing Analytics: Use of business analytics to measure and improve marketing performance

Financial Analytics: Use of business analytics to help a company measure, evaluate, and improve its financial performance

Operations Analytics: Use of business analytics to measure and improve the efficiency and effectiveness of the company’s operations

Accounting Analytics: Use of business analytics to evaluate financial performance and to address accounting questions

Relevant Data: Data directly or closely connected to the question being asked

Reliable Data: Facts or truth with little or no bias

Data Integrity: Combined accuracy, validity, and consistency of data stored and used over time

Static Report: Not constantly updated report of results

Dynamic Report: Real-time updating report delivered through a dashboard

Data Visualizations: Graphical representations of data which can reveal patterns in data and communicate findings

Exploratory Visualizations: Graphical representation which uncovers patterns in data as part of descriptive or diagnostic analytics

Enterprise System: Business management software that integrates applications from throughout the business into one system

Relational databases: Compile data into separate tables to ensure data is complete and not redundant. Composed of tables, fields, and records. Primary and foreign keys relate tables of data to each other

Big Data: Data sets that are too large and complex for businesses’ existing systems to capture, store, manage, and analyze.

Structured data: Organized data that fits neatly in a table or database

Unstructured data: Data without organization

Semi-structured data: No internal structure but may have tags or markers that explain what the data represent

Internet of Things: A network of physical objects, such as cars and watches, that have sensors connecting them to the internet and allowing for the exchange of data.

Data warehouse: A repository that allows a large amount of structured data to be integrated for reporting and data analysis.

Data lake: A repository for a large amount of both structured and unstructured data (internal and external) to be integrated for reporting and data analysis.

OLAP (Online Analytical Processing): Computing method that enables users to easily and selectively extract and query data for analysis from different points of view. All possible combinations are pre-calculated

Distributed Computing: Data stored across multiple databases

Raw Data: Data that has not been processed, cleaned, or aggregated

Aggregated Data: Individual data points combined into subtotals (counts, sums, averages)

Categorical Data: Categorize into groups represented by labels

Nominal Data: Categorical data that cannot be ranked but are instead summarized through counting, grouping, or proportion

Ordinal Data: Categorical data that can be ranked and sorted and can also be summarized through counting, grouping, or proportion

Numerical Data: Meaningful numbers that represent quantities

Interval Data: No meaningful zero

Ratio Data: The zero value is meaningful, so ratios can be calculated. Most numerical data is ratio data

Data Completeness: Data to analyze is fully extracted from original source

Data Integrity: None of the data was manipulated or tampered with during extraction

Concepts

  • There is a lot of data available now.  

    • This can be good because it can offer valuable insights

    • This can be bad because it can result in data overload

  • Business Analysts are uniquely positioned to perform analysis because they

    • Understand the questions that a business is asking

    • Understand the nature and quality of the business’s data

    • Can act as a intermediary between management and data scientists, helping data scientists to understand the business needs and helping the business to understand the conclusions of the data scientist

  • The SOAR analytics model is useful for remembering the analytics mindset

    • Specify the question

      • More succinct and specific questions are better

    • Obtain the Data

      • Use relevant and reliable data

      • Obtain data ethically

    • Analyze the Data

      • Descriptive analytics

      • Diagnostic analytics

      • Predictive analytics

      • Prescriptive analytics

      • Adaptive/autonomous analytics

    • Report the Results

      • Report the results back to decision makers

  • Relational Databases:

  • Data lakes are better than warehouses because of their ability to store unstructured data, but warehouses are better in that they are less likely to result in data overload

  • Data formats for analysis

    • Text Data is best for unstructured, text-based data

    • Tabular Data is best for structured data

  • Preparing Data for Analysis

    • Ensure Data Quality

    • Validate Data for Completeness and Integrity

    • Cleanse the Data

      • Trim functions remove white space (except single spaces between words)

      • Clean functions remove nonprintable characters (except white spaces)

    • Perform Preliminary Exploratory Analysis