CS 103 - Data Science

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/15

flashcard set

Earn XP

Description and Tags

Part 2

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

16 Terms

1
New cards

Descriptive statistics

as the name suggests, assist in describing and comprehending datasets by providing a short summary pertaining to the dataset provided. The most common types of ——- include the measure of central tendencies, the measure of deviation, and others.

2
New cards

Statistics

is a branch of mathematics that deals with collecting, organizing, and interpreting data. Hence, by using statistical concepts, we can understand the nature of the data, a summary of the dataset, and the type of distribution that the data has.

3
New cards

1. Measures of central tendency

2. Measures of variability (spread)

There are two types of descriptive statistics

4
New cards

measure of central tendency

tends to describe the average or mean value of datasets that is supposed to provide an optimal summarization of the entire set of measurements.

5
New cards

mean, or average

is a number around which the observed continuous variables are distributed. This number estimates the value of the entire dataset. Mathematically, it is the result of the division of the sum of numbers by the number of integers in the dataset.

6
New cards

Median

Given a dataset that is sorted either in ascending or descending order, the ———— divides the data into two parts.

7
New cards

Mode

The ——- is the integer that appears the maximum number of times in the dataset. It happens to be the value with the highest frequency in the dataset. In the x dataset in the median example, the ——— is 2 because it occurs twice in the set.

8
New cards

Measures of dispersion

also known as a measure of variability. It is used to describe the variability in a dataset, which can be a sample or population. It is usually used in conjunction with a measure of central tendency, to provide an overall description of a set of data. A ——————- gives us an idea of how well the central tendency represents the data.

9
New cards

Standard deviation

In simple language, the —————— is the average/mean of the difference between

each value in the dataset with its average/mean; that is, how data is spread out from the

mean.

10
New cards

Variance

——— is the square of the average/mean of the difference between each value in the

dataset with its average/mean; that is, it is the square of standard deviation.

11
New cards

Skewness

In probability theory and statistics, ————- is a measure of the asymmetry of the variable in the dataset about its mean. The ————- value can be positive or negative, or undefined.

The —————- value tells us whether the data is skewed or symmetric.

12
New cards

Kurtosis

Basically, ——— is a statistical measure that illustrates how heavily the tails of distribution differ from those of a normal distribution. This technique can identify whether a given distribution contains extreme values.

13
New cards

Mesokurtic, Leptokurtic, Platykurtic

Types of Kurtosis

14
New cards

Mesokurtic

If any dataset follows a normal distribution, it follows a ———— distribution. It has kurtosis around 0.

15
New cards

Leptokurtic

In this case, the distribution has kurtosis greater than 3 and the fat tails indicate that   the distribution produces more outliers.

16
New cards

Platykurtic

In this case, the distribution has negative kurtosis and the tails are very thin   compared to the normal distribution.