Data Types, Descriptive Statistics, and Basic Data Analysis Concepts

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/74

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

75 Terms

1
New cards

Which of the following describes categorical data

Can be either nominal or ordinal

2
New cards

In a perfectly symmetric distribution

Mean = Median = Mode

3
New cards

A researcher wants to test if average exam scores of one class differ from the national average. Which test applies

One-sample t-test

4
New cards

Which of the following best describes structured data

Data organized in rows and columns

5
New cards

Which of the following is a numerical variable

Temperature in Celsius

6
New cards

Temperature in Fahrenheit is measured on which scale

Interval

7
New cards

Which of the following is a floating-point number

3

8
New cards

In the same list, the value 10 is

Integer

9
New cards

Metadata refers to

Data about data

10
New cards

Which of the following is a difference between metadata and big data

Metadata is small and descriptive, big data is large and complex

11
New cards

The difference between data and information is mainly that

Information is organized and meaningful, while data is raw and unorganized

12
New cards

What is the main goal of data cleaning

To ensure the dataset is accurate and consistent

13
New cards

What is a common approach to handling outliers

Investigate context and decide whether to keep, remove, or transform

14
New cards

What is the best way to resolve inconsistent date formats

Convert all dates into a standard format

15
New cards

Which sampling method ensures every member of the population has an equal chance of being selected

Simple random sampling

16
New cards

Which measure of spread is most influenced by extreme values

Range

17
New cards

What does standard deviation measure

The average distance of data from the mean

18
New cards

Which is NOT a measure of central tendency

Standard deviation

19
New cards

If a distribution has a longer tail on the right, it is

Positively skewed

20
New cards

If the mean is greater than the median, the distribution is likely

Positively skewed

21
New cards

Which function calculates the standard deviation for a sample in Excel

STDEV.S()

22
New cards

Which Excel function can be used to measure skewness of a dataset

SKEW()

23
New cards

If SKEW(data) returns a positive value, what does this indicate

Data is positively skewed

24
New cards

The alternative hypothesis (Hₐ) represents

The claim we want to provide evidence for

25
New cards

Which correlation coefficient indicates the strongest relationship

r = −0.75

26
New cards

If X = temperature and Y = electricity use, a positive slope means

Higher temperatures are associated with higher electricity use

27
New cards

Categorical variables represent

Labels or groups without inherent numerical meaning

28
New cards

Nominal data is characterized by

Categories without a natural order

29
New cards

The main difference between interval and ratio data is

Interval data lacks a true zero point

30
New cards

Big data is typically characterized by the four Vs. Which is NOT one of them

Validity

31
New cards

Which of the following correctly represents the steps in the SOAR analytic model

Specify, Obtain, Analyze, Report

32
New cards

A pivot table is best described as

A statistical tool that reorganizes and summarizes data in a spreadsheet or database to create a report

33
New cards

Which of the following best describes data

Raw numbers and facts with little meaning on their own

34
New cards

When raw data are organized in a way that is meaningful to the user, they become

Information

35
New cards

In analytics, why is context important

It determines the setting in which data can be better understood and evaluated

36
New cards

Which is NOT a method to handle missing data

Visualization

37
New cards

Which Excel function is commonly used to remove extra spaces from text

TRIM()

38
New cards

Which of the following best defines a population in statistics

The entire set of individuals or items of interest

39
New cards

What is a sample

A subset of the population used to make inferences

40
New cards

Which of the following is a parameter

Population mean

41
New cards

Which of the following statements is true

Parameters describe populations; statistics describe samples

42
New cards

The mean is defined as

The sum of values divided by the number of observations

43
New cards

The mode is

The most frequently occurring value

44
New cards

Which dataset has a larger spread

Mean = 50, SD = 15

45
New cards

Which measure of central tendency is most affected by extreme values

Mean

46
New cards

Which of the following is most useful in identifying skewness

Comparison of mean and median

47
New cards

Which descriptive statistic is least reliable for skewed distributions

Mean

48
New cards

Which Excel function calculates the arithmetic mean of a dataset

AVERAGE()

49
New cards

Which function returns the middle value of an ordered dataset

MEDIAN()

50
New cards

Which function would you use to calculate the 25th percentile (Q1)

QUARTILE.EXC(array, 1)

51
New cards

What is the null hypothesis (H₀)

A statement of no effect or no difference

52
New cards

What does a p-value measure

Probability of observing results as extreme as the data if H₀ is true

53
New cards

If p-value < α, the correct conclusion is

Reject H₀

54
New cards

If p-value > α, then

Fail to reject H₀

55
New cards

Which of the following is a two-tailed test

H₀: μ = 50, Hₐ: μ ≠ 50

56
New cards

Which test compares a sample mean to a known population mean when σ is unknown

One-sample t-test

57
New cards

Which test compares means of two independent groups

Two-sample t-test

58
New cards

Which test checks differences in more than two group means

ANOVA

59
New cards

Which test is appropriate for paired data (before vs. after measurements)

Paired t-test

60
New cards

Which test examines whether two categorical variables are independent

Chi-square test of independence

61
New cards

A doctor measures patient weights before and after a new diet plan. Which test applies

Paired t-test

62
New cards

If r = 0, it means

No linear relationship exists between variables

63
New cards

A correlation coefficient of r = −0.25 suggests

Weak negative relationship

64
New cards

If two variables have r = 0.02, the relationship is

Almost no linear relationship

65
New cards

If all data points lie exactly on a straight line with positive slope, r =

1

66
New cards

High correlation between two variables means

A strong linear association exists

67
New cards

Which situation likely violates assumptions for correlation

Strong nonlinear relationship

68
New cards

The main purpose of linear regression is to

Predict the value of a dependent variable using one or more independent variables

69
New cards

In the simple linear regression equation Y = a + bX + ε, the term b represents

Slope

70
New cards

The intercept in a regression line represents

The expected value of Y when X = 0

71
New cards

The slope coefficient tells us

The change in Y for a one-unit increase in X

72
New cards

If the regression equation is Ŷ = 20 + 3X, then when X = 5, predicted Y is

35

73
New cards

If R² = 0.80, this means

80% of variation in Y is explained by X

74
New cards

A company predicts sales (Y) based on advertising spending (X). Which method should they use

Linear regression

75
New cards

A researcher finds slope = 0.75 for predicting GPA from study hours. This means

Each additional hour studied increases GPA by 0.75 units