Data Analysis Unit 1

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/74

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

75 Terms

1
New cards

What is the primary purpose of data visualizations?

To summarize data and communicate findings

2
New cards

When would you use a scatter plot in data analysis

To show relationships between two numeric variables

3
New cards

What does a pie chart primarily illustrate

Part of a whole

4
New cards

Which chart type is best used to show the distribution and outliers of a dataset?

Bow Plot

5
New cards

In which scenario would you best use a histogram?

To show frequency distribution of continuous data

6
New cards

Comparing Test Scores

Evaluates performance across different classes

7
New cards

Analyzing salary ranges

Summarizes income distribution in a job sector

8
New cards

Summarizing survey responses

Provides insights into public opinion or preferences

9
New cards

Identifying outliers

Highlights values that fall far outside typical ranges

10
New cards

IQR

Interquartile range, the difference between Q3 and Q1

11
New cards

Skewness

A measure of the asymmetry of the data distribution

12
New cards

Outliers

Values that are significantly higher or lower than most of the data

13
New cards

Box Plot

A graphical representation of the Five-Number Summary

14
New cards

Summarizes key aspects of data

Provides a quick overview of data distribution

15
New cards

Identifies center

Helps locate the median of the dataset

16
New cards

Useful for skewed distributions

Can describe data that isn’t symmetrically distributed

17
New cards

Visualizing with a Box Plot

Uses the Five-Number summary to illustrate data characteristics

18
New cards

Minimum

The smallest value in a dataset

19
New cards

First Quartile (Q1)

The median of the lower half of the dataset

20
New cards

Median (Q2)

The middle value separating the higher half from the lower half of the dataset

21
New cards

Maximum

The largest value in a dataset

22
New cards

Mean

Average of data

23
New cards

Median

Middle value of data

24
New cards

Skewness

Asymmetry of data distribution

25
New cards

Mode

Most frequently occurring value

26
New cards

Left Skew

Mean < Median

27
New cards

Right Skew

Mean > Median

28
New cards

Symmetric Distribution

Mean = Median

29
New cards

Extreme Left Skew

Mean «« Median (mean much lower than median)

30
New cards

Which of the following measures is NOT considered a measure of center?

Range

31
New cards

What is the primary purpose of descriptive statistics?

To summarize and describe data sets

32
New cards

What does a confidence interval represent

A range of values that likely contains the population parameter

33
New cards

In hypothesis testing, a one-tailed test is used for what purpose?

To test a claim in one specific direction

34
New cards

Which sampling method ensures that every individual in the population has an equal chance of being selected?

Simple Random Sampling

35
New cards

In which scale can all three measures of central tendency (mean, median, mode) be used effectively?

Raito Scale

36
New cards

Which scale of measurement is used when data can be categorized, ordered, but the intervals between categories are not equal?

Ordinal Scale

37
New cards

What type of data measurement is ‘gender’ classified as?

Nominal Scale

38
New cards

Which of the following is an example of a Ratio Scale measurement

Height of a person

39
New cards

Why is it important to understand scales of measurement in statistics?

To select appropriate statistical methods and ensure accuracy in research

40
New cards

Which measure of dispersion is most sensitive to outliers?

Range

41
New cards

Why is standard deviation often preferred over variance in data analysis?

It is expressed in the same units as the data, making it easier to interpret

42
New cards

What is the primary purpose of measure of data dispersion?

To describe how spread out data values are

43
New cards

What does the range of dataset indicate

The difference between the maximum and minimum values

44
New cards

What is a major advantage of using non-probability sampling methods?

It is convenient and cot-effective

45
New cards

Which of the following is a type of non-probability sampling?

Snowball sampling

46
New cards

Which situation is most appropriate for using convenience sampling?

Surveying students in a classroom setting

47
New cards

What is a key limitation of non-probability sampling

Results may not generalize to the entire population

48
New cards

What is non-probability sampling

Selecting participants without giving every member of the population an equal chance

49
New cards

What strategy is recommended to reduce measurement error in surveys

Increase the number of survey respondents

50
New cards

What is the primary cause of sampling error

Natural variability between a smaple and the population

51
New cards

Which of the following is NOT a type of non-sampling error

Sampling error

52
New cards

How can sampling error be minimzed?

By increasing the sample size

53
New cards

Which error occurs when respondents fail to participate in a survey

Nonresponse Error

54
New cards

When using a z-score, a higher z-value indicates that the sample proportion is further from the population proportion, which can impact the probability calculation

true

55
New cards

To reliability use a normal approximation for sample proportions, random sampling is not necessary as long as the sample size is large enough

False

56
New cards

The shape of the sampling distribution of sampling proportions will always be normal regardless of sample size or population proportion

False

57
New cards

In the context of estimating probabilities using sampling distributions for proportions, what is the primary purpose of calculating the z-score

To standardize the sample proportion for use with the standard normal distribution

58
New cards

Why is it important to ensure that the sampling distribution is approximately normal before finding probabilities

Because the properties of the normal distribution are use to calculate probabilities

59
New cards

What does the Central Limit Theorem (CLT) state about the sampling distribution of the sample mean for a large enough sample size?

It is approximately normal even if the population distribution is not

60
New cards

Hypothesis Testing

CLT allows us to use normal approximation to determine the significance of results

61
New cards

Confidence intervals

Provides a range around the sample estimate that reflects uncertainty due to sample variability

62
New cards

Approximation of probabilities

Helps in estimating probabilities for sample statistics using the normal distribution

63
New cards

Data visualization

Plays a critical role in creating visuals to represent raw data

64
New cards

CLT

States that the distribution of sample means approaches a normal distribution as sample size increases

65
New cards

Standard Error (SE)

Calculated as the standard deviation divided by the square root of the sample size

66
New cards

Sampling Distribution

The probability distribution of a statistic obtained through a large number of samples drawn from a specific population

67
New cards

Population proportion

The true proportion of a certain characteristic in the entire population

68
New cards

Which measure of central tendency can be used for categorical data?

Mode

69
New cards

In which scenario would you prefer to use the median over the mean?

When the data is skewed or contains outliers

70
New cards

Which measures of central tendency is calculated by summing all values and dividing by the number of values?

mean

71
New cards

What is a key benefit of using stratified sampling?

It ensures representation across key subgroups of the population

72
New cards

Which of the following is an example of systematic sampling

Selecting every 5th participant from a list of volunteers

73
New cards

In which scenario might cluster sampling be most advantageous

For research involving large, geographically dispersed populations

74
New cards

What is the primary characteristic of probability sampling

Every member of the population has a known, non-zero chance of being selected

75
New cards

Which method of probability sampling is most effective for subgroup

Stratified sampling