LSU ISDS 2000 Test 1 Study Guide: David Whitchurch

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/91

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

92 Terms

1
New cards

Statistics

a branch of applied mathematics which involves the collection, organization, analysis, interpretation, and presentation of data

2
New cards

The study of statistics consists of two branches

descriptive statistics and inferential statistics

3
New cards

Population

includes ALL observations for which conclusions are to be made. In many situations, it is either impossible or not practical to collect information from this, so the analyst will take a sample instead

4
New cards

Sample

a subset of the population

5
New cards

descriptive statistics

methods used to summarize your data so that you can explain the important characteristics

6
New cards

descriptive statistics examples

Examples include creating pie charts, histograms, or line graphs, calculating the mean, median, mode of home values by geographic region, reporting crime rates by types of crimes, unemployment rate over time, DJIA, the number of freshman entering LSU this past fall by academic major, etc.

7
New cards

inferential statistics

methods that use data from a sample to make conclusions and decisions about the population

8
New cards

inferential statistics example

According to the Centers for Disease Control, 'people who smoke cigarettes are 15 to 30 times more likely to get lung cancer or die from lung cancer than people who do not smoke.'

9
New cards

Parameter

a summary measure that describes a characteristic of an entire population

10
New cards

Statistic

a summary measure that describes a characteristic of a sample

11
New cards

Cross-sectional data

contains measurements of observations at one point in time (e.g., results from a survey taken on January 1, 2024)

12
New cards

Time series data

contains measurements of observations over multiple periods of time (e.g., results from a survey taken every year from 2019- 2024)

13
New cards

Structured data

data stored in spreadsheets or relational databases and have a pre-defined row-column format

14
New cards

Unstructured data

has no structure and does not follow a pre-defined format.

15
New cards

Examples of unstructured data include

email messages, blogs, customer comments, medical imaging, photos, videos, music clips

16
New cards

Big data

a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools

17
New cards

Variable

the characteristic of an observation that is apt to change or vary

18
New cards

Data

the values associated with each variable

19
New cards

Categorical Variables (also known as Qualitative variables)

have values that facilitate placing an observation into a specific category

20
New cards

Categorical Variables examples

Examples: gender, political affiliation, city of birth, whether a product is defective (yes or no), product quality (superior, good, fair, poor)

21
New cards

Numerical Variables (also known as Quantitative Variables)

have values that represent quantities and are the result of a measuring process

22
New cards

Numerical Variables Examples

salary, revenue, expenses, return on investment, amount spent, number of items purchased, salary, return on investment, GPA, number of children

23
New cards

Subtypes of numerical variables include

Discrete and Continuous

24
New cards

Discrete

result of counting

25
New cards

continous

measurements can take on infinitely many values within an interval

26
New cards

Variables are also identified by their

scales or levels of measure

27
New cards

The four scales of measures

Nominal, Ordinal, Interval, Ratio

28
New cards

a categorical variable has a nominal scale if

its values allow us only to categorize observations into mutually exclusive groups

29
New cards

Nominal examples

gender, academic major, race, state of birth, commute to campus or not, etc.

30
New cards

a categorical variable has an ordinal scale if

its values allow us to both categorize and rank the observations according to some quantity or trait

31
New cards

Ordinal examples

grade in your class (A, B, C, D, or F), customer rating when purchasing a product (Excellent, Good, Fair, Poor), Skip Class (Never, Very Rarely, Somewhat Often, Very Often), Salary (Low, Middle, High), etc

32
New cards

a numeric variable has an interval scale if

its values allow us to both categorize and rank observations, and, in addition, the differences in values have a consistent meaning

33
New cards

Interval examples

Temperature in Fahrenheit or Celsius. Ninety degrees is hotter than 80 degrees, and a 10-degree difference has the same meaning across its entire range. (i.e.: equivalent to the difference between 30 degrees and 20 degrees)

34
New cards

a numeric variable has a ratio scale if

it has all characteristics of an interval-scaled variable and has a true zero point

35
New cards

Ratio examples

example: salary

36
New cards

Variables having nominal and ordinal scales of measurement are always

categorical

37
New cards

Variables having interval and ratio scales are always

numerical

38
New cards

Frequency Table

a tabular summary of a data showing the frequency (or percent) of items in each of the distinct categories represented by the categorical.

39
New cards

Bar Graph

a graphical display of data where each category is depicted by a unique bar with the height of the bar representing the frequency, or proportion, of observations in that category

<p>a graphical display of data where each category is depicted by a unique bar with the height of the bar representing the frequency, or proportion, of observations in that category</p>
40
New cards

Pie Chart

a graphical display of data where each category is depicted by a unique slice of the pie, in degrees, which represents the frequency, or proportion,

of observations in that category

<p>a graphical display of data where each category is depicted by a unique slice of the pie, in degrees, which represents the frequency, or proportion,</p><p>of observations in that category</p>
41
New cards

The number of categories usually ranges from ___________, depending upon the data

set size

5 to 20

42
New cards

Larger data sets require _________ categories, whereas smaller data sets require ________ categories; # of classes = # of bars in the histogram.

more, less

43
New cards

The categories are _____________ so that they do not overlap, and each observation is placed in only one category

mutually exclusive

44
New cards

The categories are exhaustive in that they all cover the....

entire range of data

45
New cards

The endpoints and width of the categories are.....

(note that the width is the same across all categories)

easy to interpret

46
New cards

Steps to Construct a Frequency Table for a Numerical Variable

1. Determine the range of the data from an ordered array

2. Specify the number of categories and calculate the WIDTH of each category

3. Determine the limits, or interval, that make up each category

4. Using the ordered array, count and record the number of observations

47
New cards

Width =

Max - min / # of categories

48
New cards

When creating a frequency table, the original observations are lost in the grouping process, but you gain...

the power of interpretation that you don't have with the original list of raw numbers

49
New cards

Histogram

a visual representation of numerical data where the horizontal axis represents the values of the variable of interest and the vertical axis (or the height of the bars) represents the frequencies or relative frequencies in each of the category.

<p>a visual representation of numerical data where the horizontal axis represents the values of the variable of interest and the vertical axis (or the height of the bars) represents the frequencies or relative frequencies in each of the category.</p>
50
New cards

Frequency Polygon

alternative to histogram, formed by connecting the midpoints at the top bar of each category, then anchoring on the x-axis on each side, maintaining the same width

<p>alternative to histogram, formed by connecting the midpoints at the top bar of each category, then anchoring on the x-axis on each side, maintaining the same width</p>
51
New cards

If you have too many categories, where the frequencies in each category are low, your resulting histogram may suffer from the...

pancake effect (a histogram that is too wide and flat)

52
New cards

If you have too few categories, the frequencies will 'pile up' in those categories, and you may see the ______________ _______________ within your histogram results

skyscraper effect (a histogram that is tall and narrow)

53
New cards

Ogive

a graphical representation of cumulative values (either cumulative frequencies or cumulative relative frequencies), where the X-coordinates represent the upper limit of each category, and the Y-coordinates represent the cumulative values in the corresponding category

<p>a graphical representation of cumulative values (either cumulative frequencies or cumulative relative frequencies), where the X-coordinates represent the upper limit of each category, and the Y-coordinates represent the cumulative values in the corresponding category</p>
54
New cards

A Stem-and-Leaf Diagram

separates data into leaves, each made up of the right most single digit of each number, and the stems, made up of the leftmost remaining digits of each number after the leaf has been lopped off

<p>separates data into leaves, each made up of the right most single digit of each number, and the stems, made up of the leftmost remaining digits of each number after the leaf has been lopped off</p>
55
New cards

Four attributes of steam and leaf diagram

1. is most effective for relatively small data sets

2. can be used to determine minimum, maximum, range, mode, and shape

3. gives an idea of how the individual values are distributed across the range of the data

4. retains all the original data so that each observation remains distinctly identifiable

56
New cards

The numeric indices describe three major properties of numeric data:

1. Center

2. Variation (Dispersion or Spread)

3. Shape

57
New cards

Measures of Center

are used to describe a typical value, the center, and where data seem to cluster.

There are three types: (1) mean, (2) median, and (3) mode

58
New cards

When describing the histogram, the _______ is the balance point of histogram.

It is calculated by adding all the observations and dividing the sum by the total number of observations in the data set.

mean

59
New cards

Population mean, denoted by µ, is calculated using:

knowt flashcard image
60
New cards

Sample mean is calculated using:

knowt flashcard image
61
New cards

median

the point, in an ordered array, at which half the data lie above and half the lie below.

62
New cards

the median is calculated:

n = size of data set

<p>n = size of data set</p>
63
New cards

If the size of the data set is _________, the median is the average of the two middle

even

64
New cards

If the size of the data set is _________, the median is the middle value.

odd

65
New cards

the _________ is a better reflection of the center when data are skewed or have outliers.

median

66
New cards

Mode

the data value that occurs most often

67
New cards

Measures of variation are used to

describe the spread or dispersion of the data.

68
New cards

3 measures of variation

(1) range, (2) variance, and (3) standard deviation.

69
New cards

Range

the difference between the maximum value and the minimum value and is influenced by outliers

70
New cards

Variance

a measure of variability that utilizes all data values and reflects how the observations vary or deviate from the mean.

71
New cards

population variance formula

knowt flashcard image
72
New cards

sample variance formula

knowt flashcard image
73
New cards

Characteristics of both the population and sample variances:

(1) Both population and sample variances are influenced by outliers.

(2) Both are either zero or positive (never negative).

(3) As data spread out, variance increases.

(4) As data become more concentrated, variance decreases. (5) Data where all values are the same have no variation (variance = 0).

74
New cards

Standard deviation

- square root of variance

75
New cards

Sample standard deviation (s)

the square root of the sample variance

<p>the square root of the sample variance</p>
76
New cards

Population standard deviation

the square root of the population variance

<p>the square root of the population variance</p>
77
New cards

The Shape describes the .

distribution or pattern of the values within the dataset

78
New cards

The shape of data is either

symmetric or skewed

79
New cards

Data are considered __________ if one half of the data is a mirror image of the other half

symmetric

80
New cards

Data are considered skewed if they are

not symmetric and are considered either right-skewed or left-skewed

81
New cards

if mean > mode, median > mode, then the data are

right-skewed

82
New cards

How often does mean > median > mode and what skew

most of the time, right skewed

83
New cards

If mean < mode, median < mode, then the data are; and most of the time:

left skewed

84
New cards

How often does mean < median < mode and what skew

most of the time, left skewed

85
New cards

The Z-Score is a

measure of relative location that describes how far an individual observation is from the mean

<p>measure of relative location that describes how far an individual observation is from the mean</p>
86
New cards

sample z-score

knowt flashcard image
87
New cards

When data are bell-shaped, probabilities about the distance from the mean can be estimated using the

Empirical Rule

88
New cards

Approximately __% of the observations are within 1 standard deviation of the mean

68

89
New cards

Approximately ___% of the observations are within 2 standard deviations of the mean

95

90
New cards

Approximately ___% of the observations are within 3 standard deviations of the mean (𝑋𝑋� ± 3).

100

91
New cards

outlier

A value that "lies outside" (is much smaller or larger than) most of the other values in a set of data.

<p>A value that "lies outside" (is much smaller or larger than) most of the other values in a set of data.</p>
92
New cards

Outlier Rule

Upper Bound = Q3 + 1.5(IQR)

Lower Bound = Q1 - 1.5(IQR)

IQR = Q3 - Q1