data types & descriptive statistics

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/39

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

40 Terms

New cards

what is a sample?

subset of the population, used to find out information about the population as a whole

New cards

why should a sample have the same characteristics as the population it is representing?

- Generalizability: It allows you to apply findings from the sample to the whole population.

- Bias Reduction: A similar sample minimizes bias, ensuring results are accurate for everyone.

- Statistical Validity: Many statistical methods assume the sample reflects the population; if it doesn't, results can be misleading.

- Understanding Variability: A representative sample shows the diversity within the population, helping identify trends.

- Informed Decision-Making: It ensures that decisions based on research are relevant to the entire population.

New cards

What type of people are excluded from trials?

- Pregnant or Nursing Individuals: To protect the baby.

- Children: Due to ethical concerns and different reactions to treatments.

- Elderly: Often excluded because of multiple health issues.

- People with Multiple Health Conditions: To focus on specific effects of the treatment.

- Certain Ethnic or Racial Groups: Sometimes excluded for targeted studies or due to historical biases.

- Those on Other Medications: To avoid drug interactions.

- Individuals with Specific Allergies: To reduce risk.

- People with Mental Health Disorders: If it could affect study results

New cards

2 classifications of data

Qualitative (categorical) data

Quantitative (numerical) data

New cards

types of qualitative data?

nominal

ordinal

New cards

types of quantitative data

discrete

continuous

interval

ratio

New cards

nominal data

consists of names, labels, or categories

categories without any intrinsic order, size

no units

e.g gender, blood types, types of animal

New cards

ordinal data

ranking of some kind

meaningful order, but intervals are not uniform, so difficult to know what diff is between them

no units

e.g Customer satisfaction ratings (Poor, Fair, Good, Excellent), Education level (Secondary School, Bachelor's, Master's), Social class

New cards

interval data

ordered categories

equal diff between 2 points

e.g Temperature in Celsius, dates, IQ scores, pH

Can perform arithmetic operations like addition and subtraction but no meaningful multiplication or division because of the lack of a true zero

zero does not mean a lack of

New cards

ratio data

has true zero point, which indicates absence of the variable being measured

zero point designates where measurement begins

A meaningful conclusion can be made on the ratio between scores

e.g weight, height, income

can multiply, divide

New cards

discrete data

counting things

whole numbers

e.g number of students in a class, number of cars in a parking lot, or number of pills in a bottle

New cards

continuous data

units of measurement

not limited to whole numbers

e.g weight, height, blood pressure

New cards

summary

nominal- mutually exclusive, no order

ordinal- order, no units

metric- order, units/ no of things

discrete- counting

continuous/ration- measuring, zo zero

interval- measuring, zero point

New cards

what data is this: data on student grades: A, B, C, D, E, F

ordinal

New cards

what data is this: The number of pages in a book

ratio

New cards

what data is this: A survey asking "Which mode of transportation do you prefer?"

(Bus, Car, Bicycle, Walk)

nominal

New cards

descriptive statistics

describe or summarize a set of data

collect data e.g survey

most time consuming, most expensive, most difficult

This data could be presented e.g. tables and graphs

simplify large amounts of data, show patterns, make comparisons between groups, present in informative way

(see next cards)

New cards

graphical techniques of presenting data

diagrams for numerical data

diagrams for graphical data

New cards

graphs for qualitative (categorical) data

bar graph, pie chart

New cards

graphs for numerical data

dotplots, histograms, and stemplots

New cards

histogram

normal distribution curve, highest number is median

mean+ median are the same here

skewness: If one tail stretches out farther than the other, mean + median are diff here

Positively or right skewed: median is typically less than the mean and is located to the left of the center

Negatively or left skewed: median is usually greater than the mean and is located to the right of the center

New cards

mean or median

in normal distribution we normally look at mean

in skewed we normally look at median

New cards

diff between bar chart + histogram

Histogram- continuous data

Bar chart- separate entities, gap in between

New cards

numerical techniques of presenting data

- frequencies

- central tendency

- dispersions

e.g Mean, standard deviation, range, mode, median, frequencies, percentages, incidence, prevalence, risk, odds

New cards

frequency distribution

number of occurences in each of several categories

used to summarize large volumes of data values- you might group values into intervals like 0-10, 11-20, 21-30 etc

New cards

mean

balance point" of a data set, representing its central tendency

Advantages:

Incorporates every data point, making it comprehensive.

Easily combined with other statistical measures

Disadvantages:

Sensitive to extreme values (outliers), which can skew the result and introduce bias.

New cards

lower SD

lower standard deviation is more accurate

New cards

median

middle value

It divides the data into two equal halves—50% above and 50% below

New cards

mode

The value that occurs most frequently in a given data set

possible to have more than one mode

may not be at the centre of a distribution

New cards

range

Distance between the smallest value and highest

Not affected by skewness but is sensitive to the addition or removal of an outlier value

New cards

why is a smaller range better?

Consistency: Less variability in results.

Predictability: More reliable outcomes.

Reduced Risk: Fewer unexpected results.

Easier Analysis: Simpler to interpret data.

Improved Quality: Generally means higher quality and reliability.

New cards

Interquartile range

The Interquartile Range (IQR) measures the spread of the middle 50% of data.

Q1: Find the first quartile (25th percentile).

Q3: Find the third quartile (75th percentile).

Calculate IQR: Subtract Q1 from Q3:

IQR=Q3−Q1IQR=Q3−Q1

Why It Matters:

Less Affected by Outliers: IQR is robust and not influenced by extreme values.

Shows Data Spread: It helps you understand how data is distributed

New cards

standard deviation

average distance of data values from collective mean

Unlike the IQR, it uses all information in the data

Use alongside mean

New cards

scatterplot correlation

visually displays the relationship between two variables

Types of Correlation:

Positive Correlation:

As one variable increases, the other also increases.

Points trend upwards from left to right.

Negative Correlation:

As one variable increases, the other decreases.

Points trend downwards from left to right.

No Correlation:

There is no apparent relationship between the variables.

Points are scattered randomly

Strength of Correlation:

Strong Correlation: Points are close to a straight line.

Weak Correlation: Points are more spread out but still show a trend.

Correlation Coefficient:

A numerical value (between -1 and 1) that quantifies the correlation:1: Perfect positive correlation-1: Perfect negative correlation0: No correlation

New cards

linear regression

New cards

Why is it important to produce a scatter plot first before fitting a regression line (trend line)?

Linear regression is a method used to find the relationship between two variables by fitting a straight line to the data

we assume that a change in x will lead directly to a change in y

Key Points:

Dependent Variable (Y): What you want to predict.

Independent Variable (X): The predictor you use.

Equation:

The relationship is expressed as:

Y=a+bXY=a+bX

Y: Predicted value

a: Y-intercept (value of Y when X is 0)

b: Slope (how much Y changes for a one-unit change in X)

Steps:

Collect Data: Gather information for both variables.

Plot Data: Create a scatterplot to see the relationship.

Fit the Line: Find the best-fitting straight line.

Evaluate: Check how well the line predicts Y.

New cards

Why is it important to produce a scatter plot first before fitting a regression line?

because it:

Shows Relationships: Helps you see if there's a relationship between the two variables.

Reveals Patterns: Allows you to spot trends or patterns that might need a different model.

Identifies Outliers: Highlights unusual data points that could affect results.

Checks Assumptions: Helps verify if the data meets linear regression assumptions.

Guides Analysis: Informs whether you need to adjust your approach or use a more complex model

In short, a scatter plot provides valuable insights to ensure a proper regression analysis

New cards

systolic bp slide?

New cards

box & whisker plot

top whisker- max value

bottom whisker- lowest value

The top of the box is the value below which 75% of the values lie. This value is known as the 75th percentile or upper quartile

The middle of the box is the central value in the data after it has been arranged in ascending order. This value is known as the median.

The bottom of the box is the value below which 25% of the value lie. This value is known as the 25th percentile or lower quartile.

New cards

Summary-Questions to ask when assessing descriptive statistics in published literature

Have several tests of normality been considered and reported?

Are appropriate statistics used to describe the centre and spread of the data?

Do the values of the mean ± 2SD represent a reasonable 95% range?

If a distribution is skewed, has the mean of either group been underestimated or overestimated?

If the data are skewed, have the median and inter-quartile range been reported?