1/40
Midterm preparation
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Artificial Intelligence
Defined as the science and engineering of
making intelligent machines. It is a branch of Computer Science which deals with the research and design of intelligent systems that can take inputs from their environment and takes actions based on it as a human being would.
Logical Reasoning
Goals of AI.
AI aims at making computers capable of doing all the intelligent and sophisticated tasks that we humans can do.
Knowledge Representation
Goals of AI.
Make computers capable of describing objects. For example, describing a car that just violated the traffic norms.
Planning and Navigation
Goals of AI.
Making computers capable of traveling from Point X to Point Y. For example, a self-driving robot.
Natural Language Processing:
Goals of AI.
Make computers capable of understanding and processing a language. For example, a web translator that translates one language to another.
Perception
Goals of AI.
Make computers capable of interacting with real-world objects by the sense touch, sound, smell and eyesight.
Emergent Intelligence
Goals of AI.
Make computers capable of Intelligence that is not explicitly programmed but is derived from AI capabilities. The basic vision for this goal is to enable machines to exhibit emotional intelligence, moral reasoning, and more.
Data
Collection of facts which have not been processed and arranged in any specific manner.
Structured Data
Categories of Data.
Data that is organized in a fixed format, typically in rows and columns, making it easily searchable and analyzable.
Unstructured Data
Categories of Data.
Data that does not have a predefined format or structure, making it more complex to analyze.
Natural Language Data
Categories of Data.
Data consisting of human language, it used to derive meaning and insights.
Machine-Generated Data
Categories of Data.
Data created by machines, sensors, or automated systems, often as a byproduct of operations or monitoring.
Graph-Based Data
Categories of Data.
Data represented in the form of graphs, where entities (nodes) and relationships (edges) are used to model connections and interactions.
Audio, Video, and Images
Categories of Data.
Data in the form of multimedia content, encompassing sound, visual content, and video.
Streaming Data
Categories of Data.
Data that is continuously generated and delivered in real-time or near-real-time, often from various sources.
Statistics
Real World Application of Data.
Specification of a (mathematical) relationship between different variables
Evaluation
Real World Application of Data.
How well does the model work
Data Analytics
The process of examining data to uncover patterns, correlations, and insights.
Data Science
It is an interdisciplinary field of scientific methods, processes, algorithms, and systems to extract knowledge or insight from various data in various forms.
Fraud Detection
Some Applications of Data Science
• Early detection is important
• Precision is important
• Real-time analytics
Recommender Systems
Some Applications of Data Science
The ability to offer unique personalized service
Text Analytics
can be defined as the process of collecting unstructured text from various sources and analyzing and extracting relevant information from it. It can also be used for transforming it into structured information that can then be used in various other ways.
Analytics on image data
Image recognition can be described as a process by which
we can process images for identifying people, patterns, logos, objects, or places.
Data Collection
Data Science Process
Collect data from various sources
Data Cleaning and Preparation
Data Science Process
Clean and preprocess data to remove errors and inconsistencies
Data Exploration and Preparation
Data Science Process
Analyze and understand data through summary statistics and visualization.
Model Building and Evaluation
Data Science Process
-Select the right machine learning algorithm.
-Train and evaluate the model using training and testing datasets.
-Assess model performance using metrics like accuracy and precision
Deployment and Monitoring
Data Science Process
-Deploy the model into production systems.
-Continuously monitor the model's performance.
-Update and retrain the model as needed.
Data Set
is a collection of data with a defined structure.
Data point
(record, object or example) is a single instance in the dataset. Each row in dataset is a data point. Each instance contains the same structure as the dataset.
Attribute
(feature, input, dimension, variable, or predictor) is a single property of the dataset. Each column in dataset is an attribute. Attributes can be numeric, categorical, date-time, text, or Boolean data types.
Label
Is the special attribute to be predicted based on all the input attributes. In the example, the interest rate is the output variable.
Identifiers
Are special attributes that are used for locating or providing context to individual records. Often used as lookup keys to join multiple datasets. They bear no information that is suitable for building data science models and should, thus, be excluded for the actual modeling step.
Sampling
Is a process of selecting a subset of records as a representation of the original dataset for use in data analysis or modeling.
Correlation
A statistical indicator of the relationship between variables
Causation
Means that changes in one variable brings about changes in the
other.
Descriptive statistics
Refers to the study of the aggregate quantities of a data set. These measures are some of the commonly used notations in everyday life.
Median
Is the value of the central point in the distribution. It is calculated by sorting all the observations from small to large and selecting the mid-point observation in the sorted list. If the number of data points is even, then the average of the middle two data points is used.
Mode
The most frequently occurring observation. In the dataset, data points may be repetitive, and the most repetitive data point of the dataset.
Range
The difference between the maximum value and the minimum value of the attribute. This is simple to calculate and articulate but has shortcomings as it is severely impacted by the presence of outliers and fails to consider the distribution of all other data points in the attributes.
Deviation
The variance and standard deviation measures the spread, by considering all the values of the attribute.