Big Data
It is very large and complete so it can be hard to process using standard techniques → this is why we use algorithms, data analysis, and data mining
It's commonly believed otherwise BUT data is always collected just not always utilized (common misconception: if data is being collected it’s being utilized)
Usable Data
Can use data regardless of whether it’s informative or useful |
You can get info out of it |
Do the means to process or analyze data exit? (you can collect data and just not have the proper technology to process it yet, making it unusable) |
Not always accurate, falls in a range of possibility |
Just because something is usable, doesn’t make it useful |
Useful data
Does someone want to use it (adds value)? |
Allows user to accomplish a task or directive |
Dependent on who wants the data |
Directly related to what one can do with the data |
Often times things that are useful are also useable |
Usefulness is open to interpretation and context |
Data Processing: Data sets can be difficult to process if..
Data is incomplete
Data is invalid
They need to combine data sources
They need to clean data (finding common links like abbreviations, spelling and capitals and replace it with the same work
Bias in the source or program
Size - very big sets are hard to process and take a lot of time (here a parallel system may be a solution)
Spiders
Kind of programs used to process data and acquire information
Descriptive Analysis
Summarizes and describes data
Doesn't provide any conclusions yet
Data visualization is often used to present information in form of graphs, tables, and charts
Data may be used to help an organization with future plans
Answers the question “ What happened?”
Descriptive Analysis Advantages
Reveals previously hidden patterns
Helps businesses communicate information among departments and people outside the company
Descriptive Analysis Disadvantages
Cannot tell you anything about relationships or causes of effects
Can do math but doesn’t draw conclusions from them
Predictive Analysis
Looks are current and historical data patterns to see if they’re going to show up again
This data analysis makes predictions
Data may be used to allow an organization, business, or investor adjust and rework their future plans
Answers the question “What might happen in the future?”
Predictive Analysis Advantages
Key player in search advertising and recommendation engines
Can provide managers with tools to influence upselling, manufacturing optimization and even new product development
Predictive Analysis Disadvantages
Circuits argue computers fail to consider all variables even when it has sufficient data
Customer behavior is bound to change with time so the model would need to be repeatedly
Prescriptive Analysis
Using data to determine an optimal path or actions
Statistically conceded (proven to be true) by relevant factors
Gives recommendations for the future
Answers the question “What should we do next?”
Prescriptive Analysis Advantages
By simulating and running different scenarios of sudden shifts, you can find the best way to respond to the shifts quickly
Reduces risk and minimizes fraud
Prescriptive Analysis Disadvantages
Requires large amounts of data
Results aren’t always accurate
High computing power is required
Not completely reliable for long term solutions
Data Mining
Process of sorting through large data sets to identify patterns
These patterns can help solve business problems
It’s a crucial part of business as it helps with strategizing and managing
It can even detect fraud and reduce risk
Data Mining Strategies Used
Association rules: searches for a relationship between variables → provides additional value within the data set as it links data
Classification: uses predefined classes to assign to objects. The classes describe the similarities between data points → allows for better summarization and categories
Clustering: similar to classification but it not only does similarities but also groups by differences. It can provide more general topics whereas classification is more specific
Predictive Analysis: uses historical information to build graphical or mathematical models to forecast future outcomes. This overlaps with regression analysis.
Anomaly/outlier detection: it identifies rare or unusual events or an item that differs significantly from standard patterns
Regression: used to predict the range of data in a dataset. Is sometimes seen as a line of best fit to see where actual data compares to ir
Summarization: Used to find a compact description of the a dataset → provide a general categorization