Data Science for Business - A Multi-Disciplinary Vision
Why you need to understand Data Science
Datais coming into companies at remarkable speed and volume.
Information offers a range of opportunities to managers.
– make better predictions about the future
– identify the causes of certain events
– identify your customers’ wants or needs
– assess the chances an initiative will succeed.
– give you insight on factors affecting your industry or marketplace
– inform your decisions about anything from new product development to hiring choices.
• How do you sort through it all and make sense of everything?
• You have some experts (data scientists and analysts) in house but …
• Becoming a requirement that every decision maker have a basic understanding of data analytics.
– Not becoming data scientist
– Developing a clear understanding
– Asking the right questions
– Translate the results to your colleagues and other stakeholders in a way that convinces and persuades
Preliminary Definitions
• Data Science is a set of fundamental principles that guide the extraction of knowledge from data to support the decision-making process.
• The use of the term science in data science indicates that the methods are evidence based and are built on historical observations.
• Data science is the business application of machine learning, artificial intelligence, and other quantitative fields like statistics, visualization, and mathematics. It is an interdisciplinary field that extracts value from data. In the context of how data science is used today, it relies heavily on machine learning.
What is Data Science ? - Associated Fields
The techniques used in the steps of a data science process and in conjunction with the term “data science” are:
• Descriptive statistics
• Exploratory visualization
• Dimensional slicing
• Hypothesis testing
• Data engineering
• Business intelligence
Associated Fields – Descriptive Statistics
Descriptive Statistics: Computing mean, standard deviation, correlation, and other descriptive statistics, quantify the aggregate structure of a dataset. They are used in the exploration stage of the data science process.
Associated Fields – Exploratory Visualization
Exploratory visualization: The process of expressing data in visual coordinates enables users to find patterns and relationships in the data and to comprehend large datasets. Similar to descriptive statistics, they are integral in the pre and post-processing steps in data science.
Associated Fields – Dimensional Slicing
Dimensional slicing: Online analytical processing (OLAP) applications, which are prevalent in organizations, mainly provide information on the data through dimensional slicing, filtering, and pivoting. OLAP analysis is enabled by a unique database schema design where the data are organized as dimensions (e.g., products, regions, dates) and quantitative facts or measures (e.g., revenue, quantity). With a well-defined database structure, it is easy to slice the yearly revenue by products or combination of region and products. These techniques are extremely useful and may unveil patterns in data.
Associated Fields – Hypothesis Testing
Hypothesis testing: In confirmatory data analysis, experimental data are collected to evaluate whether a hypothesis has enough evidence to be supported or not. There are many types of statistical testing and they have a wide variety of business applications (e.g., A/B testing in marketing). In general, data science is a process where many hypotheses are generated and tested based on observational data. Since the data science algorithms are iterative, solutions can be refined in each step.
Associated Fields – Data Engineering
Data Engineering: Data engineering is the process of sourcing, organizing, assembling, storing, and distributing data for effective analysis and usage. Database engineering, distributed storage, and computing frameworks (e.g., Apache Hadoop, Spark, Kafka), parallel computing, extraction transformation and loading processing (ETL), and data warehousing constitute data engineering techniques. Data engineering helps source and prepare for data science learning algorithms.
Associated Fields – Business Intelligence
Business intelligence: Business intelligence helps organizations consume data effectively. It helps query the ad hoc data without the need to write the technical query command or use dashboards or visualizations to communicate the facts and trends.
Data Processing and “Big Data”
• Data Engineering and data processing are critical to support data science
• But: data engineering is not data science
• Big Data -data sets that are too large for traditional data processing systems.
• Big data is often characterized by the 3Vs: Volume, Velocity, and Variety.
• Big Data Technologies are used to process and handle big data, and include pre-processing prior to implementing data mining techniques.
How is Big Data & Analytics actually used
• Better understanding & targeting customers
• Understanding & optimizing business processes
• Improving security and law enforcement
• Improving healt
• Improving and optimizing cities and countries • Improving sport performances