data types & descriptive statistics

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/39

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

40 Terms

1
New cards

what is a sample?

subset of the population, used to find out information about the population as a whole

2
New cards

why should a sample have the same characteristics as the population it is representing?

- Generalizability: It allows you to apply findings from the sample to the whole population.

- Bias Reduction: A similar sample minimizes bias, ensuring results are accurate for everyone.

- Statistical Validity: Many statistical methods assume the sample reflects the population; if it doesn't, results can be misleading.

- Understanding Variability: A representative sample shows the diversity within the population, helping identify trends.

- Informed Decision-Making: It ensures that decisions based on research are relevant to the entire population.

3
New cards

What type of people are excluded from trials?

- Pregnant or Nursing Individuals: To protect the baby.

- Children: Due to ethical concerns and different reactions to treatments.

- Elderly: Often excluded because of multiple health issues.

- People with Multiple Health Conditions: To focus on specific effects of the treatment.

- Certain Ethnic or Racial Groups: Sometimes excluded for targeted studies or due to historical biases.

- Those on Other Medications: To avoid drug interactions.

- Individuals with Specific Allergies: To reduce risk.

- People with Mental Health Disorders: If it could affect study results

4
New cards

2 classifications of data

Qualitative (categorical) data ​

Quantitative (numerical) data​

5
New cards

types of qualitative data?

nominal

ordinal

6
New cards

types of quantitative data

discrete

continuous

interval

ratio

7
New cards

nominal data

consists of names, labels, or categories

categories without any intrinsic order, size​

no units

e.g gender, blood types, types of animal

8
New cards

ordinal data

ranking of some kind

meaningful order, but intervals are not uniform, so difficult to know what diff is between them

no units

e.g Customer satisfaction ratings (Poor, Fair, Good, Excellent), Education level (Secondary School, Bachelor's, Master's), Social class

9
New cards

interval data

ordered categories

equal diff between 2 points

e.g Temperature in Celsius, dates, IQ scores, pH

Can perform arithmetic operations like addition and subtraction but no meaningful multiplication or division because of the lack of a true zero

zero does not mean a lack of

10
New cards

ratio data

has true zero point, which indicates absence of the variable being measured​

zero point designates where measurement begins​

A meaningful conclusion can be made on the ratio between scores

e.g weight, height, income​

can multiply, divide

11
New cards

discrete data

counting things

whole numbers

e.g number of students in a class, number of cars in a parking lot, or number of pills in a bottle​

12
New cards

continuous data

units of measurement

not limited to whole numbers

e.g weight, height, blood pressure​

13
New cards

summary

nominal- mutually exclusive, no order

ordinal- order, no units

metric- order, units/ no of things

discrete- counting

continuous/ration- measuring, zo zero

interval- measuring, zero point

14
New cards

what data is this: data on student grades: A, B, C, D, E, F

ordinal

15
New cards

what data is this: The number of pages in a book

ratio

16
New cards

what data is this: A survey asking "Which mode of transportation do you prefer?" ​

(Bus, Car, Bicycle, Walk) ​

nominal

17
New cards

descriptive statistics

describe or summarize a set of data

collect data e.g survey

most time consuming, most expensive, most difficult

This data could be presented e.g. tables and graphs

simplify large amounts of data, show patterns, make comparisons between groups, present in informative way

(see next cards)

18
New cards

graphical techniques of presenting data

diagrams for numerical data

diagrams for graphical data

19
New cards

graphs for qualitative (categorical) data

bar graph, pie chart

20
New cards

graphs for numerical data

dotplots, histograms, and stemplots

21
New cards

histogram

normal distribution curve, highest number is median

mean+ median are the same here

skewness: If one tail stretches out farther than the other, mean + median are diff here

Positively or right skewed​: median is typically less than the mean and is located to the left of the center

Negatively or left skewed: median is usually greater than the mean and is located to the right of the center

22
New cards

mean or median

in normal distribution we normally look at mean

in skewed we normally look at median

23
New cards

diff between bar chart + histogram

Histogram- continuous data​

Bar chart- separate entities, gap in between

24
New cards

numerical techniques of presenting data

- frequencies

- central tendency

- dispersions

e.g ​​​Mean, standard deviation, range, mode, median, frequencies, percentages, incidence, prevalence, risk, odds

25
New cards

frequency distribution

number of occurences in each of several categories​

used to summarize large volumes of data values- you might group values into intervals like 0-10, 11-20, 21-30 etc

26
New cards

mean

balance point" of a data set, representing its central tendency

Advantages:

Incorporates every data point, making it comprehensive.

Easily combined with other statistical measures

Disadvantages:

Sensitive to extreme values (outliers), which can skew the result and introduce bias.

27
New cards

lower SD

lower standard deviation is more accurate

28
New cards

median

middle value

It divides the data into two equal halves—50% above and 50% below

29
New cards

mode

The value that occurs most frequently in a given data set

possible to have more than one mode​

may not be at the centre of a distribution

30
New cards

range

Distance between the smallest value and highest

Not affected by skewness but is sensitive to the addition or removal of an outlier value

31
New cards

why is a smaller range better?

Consistency: Less variability in results.

Predictability: More reliable outcomes.

Reduced Risk: Fewer unexpected results.

Easier Analysis: Simpler to interpret data.

Improved Quality: Generally means higher quality and reliability.

32
New cards

Interquartile range

The Interquartile Range (IQR) measures the spread of the middle 50% of data.

Q1: Find the first quartile (25th percentile).

Q3: Find the third quartile (75th percentile).

Calculate IQR: Subtract Q1 from Q3:

IQR=Q3−Q1IQR=Q3−Q1

Why It Matters:

Less Affected by Outliers: IQR is robust and not influenced by extreme values.

Shows Data Spread: It helps you understand how data is distributed

33
New cards

standard deviation

average distance of data values from collective mean

Unlike the IQR, it uses all information in the data​

Use alongside mean​

34
New cards

scatterplot correlation

visually displays the relationship between two variables

Types of Correlation:

Positive Correlation:

As one variable increases, the other also increases.

Points trend upwards from left to right.

Negative Correlation:

As one variable increases, the other decreases.

Points trend downwards from left to right.

No Correlation:

There is no apparent relationship between the variables.

Points are scattered randomly

Strength of Correlation:

Strong Correlation: Points are close to a straight line.

Weak Correlation: Points are more spread out but still show a trend.

Correlation Coefficient:

A numerical value (between -1 and 1) that quantifies the correlation:1: Perfect positive correlation-1: Perfect negative correlation0: No correlation

35
New cards

linear regression

36
New cards

Why is it important to produce a scatter plot first before fitting a regression line (trend line)?​

Linear regression is a method used to find the relationship between two variables by fitting a straight line to the data

we assume that a change in x will lead directly to a change in y

Key Points:

Dependent Variable (Y): What you want to predict.

Independent Variable (X): The predictor you use.

Equation:

The relationship is expressed as:

Y=a+bXY=a+bX

Y: Predicted value

a: Y-intercept (value of Y when X is 0)

b: Slope (how much Y changes for a one-unit change in X)

Steps:

Collect Data: Gather information for both variables.

Plot Data: Create a scatterplot to see the relationship.

Fit the Line: Find the best-fitting straight line.

Evaluate: Check how well the line predicts Y.

37
New cards

Why is it important to produce a scatter plot first before fitting a regression line?​

because it:

Shows Relationships: Helps you see if there's a relationship between the two variables.

Reveals Patterns: Allows you to spot trends or patterns that might need a different model.

Identifies Outliers: Highlights unusual data points that could affect results.

Checks Assumptions: Helps verify if the data meets linear regression assumptions.

Guides Analysis: Informs whether you need to adjust your approach or use a more complex model

In short, a scatter plot provides valuable insights to ensure a proper regression analysis

38
New cards

systolic bp slide?

39
New cards

box & whisker plot

top whisker- max value

bottom whisker- lowest value

The top of the box is the value below which 75% of the values lie. This value is known as the 75th percentile or upper quartile

The middle of the box is the central value in the data after it has been arranged in ascending order. This value is known as the median.​

The bottom of the box is the value below which 25% of the value lie. This value is known as the 25th percentile or lower quartile.​

40
New cards

Summary-Questions to ask when assessing descriptive statistics in published literature

Have several tests of normality been considered and reported?​

Are appropriate statistics used to describe the centre and spread of the data?​

Do the values of the mean ± 2SD represent a reasonable 95% range?​

If a distribution is skewed, has the mean of either group been underestimated or overestimated?​

If the data are skewed, have the median and inter-quartile range been reported? ​