LSU ISDS 2000 Test 1 Study Guide: David Whitchurch

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/91

There's no tags or description

Looks like no tags are added yet.

Last updated 7:19 PM on 11/3/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

92 Terms

New cards

Statistics

a branch of applied mathematics which involves the collection, organization, analysis, interpretation, and presentation of data

New cards

The study of statistics consists of two branches

descriptive statistics and inferential statistics

New cards

Population

includes ALL observations for which conclusions are to be made. In many situations, it is either impossible or not practical to collect information from this, so the analyst will take a sample instead

New cards

Sample

a subset of the population

New cards

descriptive statistics

methods used to summarize your data so that you can explain the important characteristics

New cards

descriptive statistics examples

Examples include creating pie charts, histograms, or line graphs, calculating the mean, median, mode of home values by geographic region, reporting crime rates by types of crimes, unemployment rate over time, DJIA, the number of freshman entering LSU this past fall by academic major, etc.

New cards

inferential statistics

methods that use data from a sample to make conclusions and decisions about the population

New cards

inferential statistics example

According to the Centers for Disease Control, 'people who smoke cigarettes are 15 to 30 times more likely to get lung cancer or die from lung cancer than people who do not smoke.'

New cards

Parameter

a summary measure that describes a characteristic of an entire population

New cards

Statistic

a summary measure that describes a characteristic of a sample

New cards

Cross-sectional data

contains measurements of observations at one point in time (e.g., results from a survey taken on January 1, 2024)

New cards

Time series data

contains measurements of observations over multiple periods of time (e.g., results from a survey taken every year from 2019- 2024)

New cards

Structured data

data stored in spreadsheets or relational databases and have a pre-defined row-column format

New cards

Unstructured data

has no structure and does not follow a pre-defined format.

New cards

Examples of unstructured data include

email messages, blogs, customer comments, medical imaging, photos, videos, music clips

New cards

Big data

a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools

New cards

Variable

the characteristic of an observation that is apt to change or vary

New cards

Data

the values associated with each variable

New cards

Categorical Variables (also known as Qualitative variables)

have values that facilitate placing an observation into a specific category

New cards

Categorical Variables examples

Examples: gender, political affiliation, city of birth, whether a product is defective (yes or no), product quality (superior, good, fair, poor)

New cards

Numerical Variables (also known as Quantitative Variables)

have values that represent quantities and are the result of a measuring process

New cards

Numerical Variables Examples

salary, revenue, expenses, return on investment, amount spent, number of items purchased, salary, return on investment, GPA, number of children

New cards

Subtypes of numerical variables include

Discrete and Continuous

New cards

Discrete

result of counting

New cards

continous

measurements can take on infinitely many values within an interval

New cards

Variables are also identified by their

scales or levels of measure

New cards

The four scales of measures

Nominal, Ordinal, Interval, Ratio

New cards

a categorical variable has a nominal scale if

its values allow us only to categorize observations into mutually exclusive groups

New cards

Nominal examples

gender, academic major, race, state of birth, commute to campus or not, etc.

New cards

a categorical variable has an ordinal scale if

its values allow us to both categorize and rank the observations according to some quantity or trait

New cards

Ordinal examples

grade in your class (A, B, C, D, or F), customer rating when purchasing a product (Excellent, Good, Fair, Poor), Skip Class (Never, Very Rarely, Somewhat Often, Very Often), Salary (Low, Middle, High), etc

New cards

a numeric variable has an interval scale if

its values allow us to both categorize and rank observations, and, in addition, the differences in values have a consistent meaning

New cards

Interval examples

Temperature in Fahrenheit or Celsius. Ninety degrees is hotter than 80 degrees, and a 10-degree difference has the same meaning across its entire range. (i.e.: equivalent to the difference between 30 degrees and 20 degrees)

New cards

a numeric variable has a ratio scale if

it has all characteristics of an interval-scaled variable and has a true zero point

New cards

Ratio examples

example: salary

New cards

Variables having nominal and ordinal scales of measurement are always

categorical

New cards

Variables having interval and ratio scales are always

numerical

New cards

Frequency Table

a tabular summary of a data showing the frequency (or percent) of items in each of the distinct categories represented by the categorical.

New cards

Bar Graph

a graphical display of data where each category is depicted by a unique bar with the height of the bar representing the frequency, or proportion, of observations in that category

<p>a graphical display of data where each category is depicted by a unique bar with the height of the bar representing the frequency, or proportion, of observations in that category</p>

New cards

Pie Chart

a graphical display of data where each category is depicted by a unique slice of the pie, in degrees, which represents the frequency, or proportion,

of observations in that category

<p>a graphical display of data where each category is depicted by a unique slice of the pie, in degrees, which represents the frequency, or proportion,</p><p>of observations in that category</p>

New cards

The number of categories usually ranges from ___________, depending upon the data

set size

5 to 20

New cards

Larger data sets require _________ categories, whereas smaller data sets require ________ categories; # of classes = # of bars in the histogram.

more, less

New cards

The categories are _____________ so that they do not overlap, and each observation is placed in only one category

mutually exclusive

New cards

The categories are exhaustive in that they all cover the....

entire range of data

New cards

The endpoints and width of the categories are.....

(note that the width is the same across all categories)

easy to interpret

New cards

Steps to Construct a Frequency Table for a Numerical Variable

1. Determine the range of the data from an ordered array

2. Specify the number of categories and calculate the WIDTH of each category

3. Determine the limits, or interval, that make up each category

4. Using the ordered array, count and record the number of observations

New cards

Width =

Max - min / # of categories

New cards

When creating a frequency table, the original observations are lost in the grouping process, but you gain...

the power of interpretation that you don't have with the original list of raw numbers

New cards

Histogram

a visual representation of numerical data where the horizontal axis represents the values of the variable of interest and the vertical axis (or the height of the bars) represents the frequencies or relative frequencies in each of the category.

<p>a visual representation of numerical data where the horizontal axis represents the values of the variable of interest and the vertical axis (or the height of the bars) represents the frequencies or relative frequencies in each of the category.</p>

New cards

Frequency Polygon

alternative to histogram, formed by connecting the midpoints at the top bar of each category, then anchoring on the x-axis on each side, maintaining the same width

<p>alternative to histogram, formed by connecting the midpoints at the top bar of each category, then anchoring on the x-axis on each side, maintaining the same width</p>

New cards

If you have too many categories, where the frequencies in each category are low, your resulting histogram may suffer from the...

pancake effect (a histogram that is too wide and flat)

New cards

If you have too few categories, the frequencies will 'pile up' in those categories, and you may see the ______________ _______________ within your histogram results

skyscraper effect (a histogram that is tall and narrow)

New cards

Ogive

a graphical representation of cumulative values (either cumulative frequencies or cumulative relative frequencies), where the X-coordinates represent the upper limit of each category, and the Y-coordinates represent the cumulative values in the corresponding category

<p>a graphical representation of cumulative values (either cumulative frequencies or cumulative relative frequencies), where the X-coordinates represent the upper limit of each category, and the Y-coordinates represent the cumulative values in the corresponding category</p>

New cards

A Stem-and-Leaf Diagram

separates data into leaves, each made up of the right most single digit of each number, and the stems, made up of the leftmost remaining digits of each number after the leaf has been lopped off

<p>separates data into leaves, each made up of the right most single digit of each number, and the stems, made up of the leftmost remaining digits of each number after the leaf has been lopped off</p>

New cards

Four attributes of steam and leaf diagram

1. is most effective for relatively small data sets

2. can be used to determine minimum, maximum, range, mode, and shape

3. gives an idea of how the individual values are distributed across the range of the data

4. retains all the original data so that each observation remains distinctly identifiable

New cards

The numeric indices describe three major properties of numeric data:

1. Center

2. Variation (Dispersion or Spread)

3. Shape

New cards

Measures of Center

are used to describe a typical value, the center, and where data seem to cluster.

There are three types: (1) mean, (2) median, and (3) mode

New cards

When describing the histogram, the _______ is the balance point of histogram.

It is calculated by adding all the observations and dividing the sum by the total number of observations in the data set.

mean

New cards

Population mean, denoted by µ, is calculated using:

New cards

Sample mean is calculated using:

New cards

median

the point, in an ordered array, at which half the data lie above and half the lie below.

New cards

the median is calculated:

n = size of data set

New cards

If the size of the data set is _________, the median is the average of the two middle

even

New cards

If the size of the data set is _________, the median is the middle value.

odd

New cards

the _________ is a better reflection of the center when data are skewed or have outliers.

median

New cards

Mode

the data value that occurs most often

New cards

Measures of variation are used to

describe the spread or dispersion of the data.

New cards

3 measures of variation

(1) range, (2) variance, and (3) standard deviation.

New cards

Range

the difference between the maximum value and the minimum value and is influenced by outliers

New cards

Variance

a measure of variability that utilizes all data values and reflects how the observations vary or deviate from the mean.

New cards

population variance formula

New cards

sample variance formula

New cards

Characteristics of both the population and sample variances:

(1) Both population and sample variances are influenced by outliers.

(2) Both are either zero or positive (never negative).

(3) As data spread out, variance increases.

(4) As data become more concentrated, variance decreases. (5) Data where all values are the same have no variation (variance = 0).

New cards

Standard deviation

- square root of variance

New cards

Sample standard deviation (s)

the square root of the sample variance

New cards

Population standard deviation

the square root of the population variance

New cards

The Shape describes the .

distribution or pattern of the values within the dataset

New cards

The shape of data is either

symmetric or skewed

New cards

Data are considered __________ if one half of the data is a mirror image of the other half

symmetric

New cards

Data are considered skewed if they are

not symmetric and are considered either right-skewed or left-skewed

New cards

if mean > mode, median > mode, then the data are

right-skewed

New cards

How often does mean > median > mode and what skew

most of the time, right skewed

New cards

If mean < mode, median < mode, then the data are; and most of the time:

left skewed

New cards

How often does mean < median < mode and what skew

most of the time, left skewed

New cards

The Z-Score is a

measure of relative location that describes how far an individual observation is from the mean

New cards

sample z-score

New cards

When data are bell-shaped, probabilities about the distance from the mean can be estimated using the

Empirical Rule

New cards

Approximately __% of the observations are within 1 standard deviation of the mean

New cards

Approximately ___% of the observations are within 2 standard deviations of the mean

New cards

Approximately ___% of the observations are within 3 standard deviations of the mean (𝑋𝑋� ± 3).

100

New cards

outlier

A value that "lies outside" (is much smaller or larger than) most of the other values in a set of data.

New cards

Outlier Rule

Upper Bound = Q3 + 1.5(IQR)

Lower Bound = Q1 - 1.5(IQR)

IQR = Q3 - Q1

Explore top notes

Chapter 10: Sequences and Series

Updated 1029d ago

Note

Interphase

Updated 1283d ago

Note

ap macro: unit 4 notes

Updated 370d ago

Note

6. Energy Generation and Storage

Updated 1076d ago

Note

Robbins: Chapter 3 Inflammation and Repair

Updated 1064d ago

Note

Test 3 learning objectives

Updated 427d ago

Note

Chapter 15: Interest Groups and the Mass Media

Updated 1027d ago

Note

Unit 3: Cellular Energetics

Updated 43d ago

Note

Chapter 10: Sequences and Series

Updated 1029d ago

Note

Interphase

Updated 1283d ago

Note

ap macro: unit 4 notes

Updated 370d ago

Note

6. Energy Generation and Storage

Updated 1076d ago

Note

Robbins: Chapter 3 Inflammation and Repair

Updated 1064d ago

Note

Test 3 learning objectives

Updated 427d ago

Note

Chapter 15: Interest Groups and the Mass Media

Updated 1027d ago

Note

Unit 3: Cellular Energetics

Updated 43d ago

Note

Explore top flashcards

Intro to Business - Final

Updated 1121d ago

Flashcards (49)

Russian Alphabet

Updated 396d ago

Flashcards (33)

Key Foreign Policy Decisions to Know for AP History

Flashcards (30)

Flashcards (57)

Flashcards (27)

ORGANIC CHEMISTRY FINALS

Updated 1047d ago

Flashcards (372)

bio cell cycle flashcards

Updated 1096d ago

Flashcards (27)

Unit 10: Global Contemporary, 1980 CE to Present

Updated 1046d ago

Flashcards (37)

Intro to Business - Final

Updated 1121d ago

Flashcards (49)

Russian Alphabet

Updated 396d ago

Flashcards (33)

Key Foreign Policy Decisions to Know for AP History

Flashcards (30)

Flashcards (57)

Flashcards (27)

ORGANIC CHEMISTRY FINALS

Updated 1047d ago

Flashcards (372)

bio cell cycle flashcards

Updated 1096d ago

Flashcards (27)

Unit 10: Global Contemporary, 1980 CE to Present

Updated 1046d ago

Flashcards (37)