STOR 120 UNC FINAL EXAM MCLEAN

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/112

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 4:40 AM on 5/6/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

113 Terms

1
New cards

What is an int and what does it never have?

an integer of any size that never has a decimal point

2
New cards

What is a float and what does it always have? What other form can it also be in?

a number with an optional fractional part that always has a decimal point.

Can be in scientific notation as well

3
New cards

What are the three limitations to floats?

1. They have limited size (but the limit is huge)

2. They have limited precision of 15-16 decimal places

3. After arithmetic, the final few decimal places can be wrong

4
New cards

What is a string? What else can you convert it to if it has numbers in it?

a snippet of text of any length.

Can convert it into an int or float

5
New cards

Can any value be converted into a string?

Yes!

str(5)

'Water' + 'melon' = 'watermelon'

'Water' + ' ' + 'melon' = 'water melon'

'Ha' * 5 = hahahahaha but cant multiply by floats

6
New cards

==

Is equal to (comparison operator)

7
New cards

!=

Is NOT equal to

8
New cards

<

is less than

9
New cards

>

is more than

10
New cards

<=

less than or equal to

11
New cards

>=

greater than or equal to

12
New cards

What values do you get when using comparisons?

Boolean values-- either true(1) or false (0)

13
New cards

What is an array?

An array contains a sequence of values

14
New cards

What are some characteristics of arrays?

- All elements of an array should have the same type

- Arithmetic is applied to each element individually

- Adding arrays adds elements (if same length!)

- indices start at 0

15
New cards

what is the column of a table? and what function do you use to separate it?

An array!

use table.column('label')

16
New cards

len(array)

counts how many elements in an array

17
New cards

sum(array)

adds all items of an array together

18
New cards

np.average(array)

np.mean(array)

np.sqrt(array)

gives you average, mean, and squareroot of the array

19
New cards

if you divide an array by an integer...

this will always give you floats if they're needed or not

20
New cards

array.item()

pick out an item or multiple from an array

21
New cards

table().with_column('column name', array)

To take an array and put into a table as new column

22
New cards

How do you make an array from scratch?

make_array(x,x,x,x,...)

23
New cards

What is a range?

A range is an array of consecutive numbers

24
New cards

np.arange(end)

An array of increasing integers from 0 up to end

25
New cards

np.arange(start, end):

An array of increasing integers from start up to end

26
New cards

np.arange(start, end, step):

A range with step between consecutive values

27
New cards

What does a range always include and exclude?

The range always includes start but excludes end

28
New cards

What is a table?

A Table is a sequence of labeled columns

29
New cards

What does each row in a table represent?

one individual or attribute

30
New cards

What does each column in a table represent?

Data within a column represents one attribute of the individuals

31
New cards

What are some ways to create tables?

Table.read_table(filename): reads a table from a spreadsheet

Table().with_column(s) --> an empty table

32
New cards

What should all columns of a table be?

All values in a column of a table should be both the same type and be comparable to each other in some way

33
New cards

What are numerical attributes of a table?

Numerical — Each value is from a numerical scale

- Numerical measurements are ordered

- Differences are meaningful

34
New cards

What are categorical attributes of a table?

Categorical — Each value is from a fixed inventory

- May or may not have an ordering

- Categories are the same or different

35
New cards

table.labels

list of column labels

36
New cards

table.relabeled('new label', 'old label')

To relabel a column in a table

37
New cards

Table.num_columns

number of columns in a table

38
New cards

table.num_rows

number of rows in a table

39
New cards

table.select(label)

constructs a new table with just the specified columns

40
New cards

table.drop(label)

constructs a new table in which the specified columns are omitted

41
New cards

table.sort(label)

constructs a new table with rows sorted by the specified column

- Sorts in ascending order

Can add descending = True to sort it from largest to smallest

42
New cards

table.where(label, condition)

constructs a new table with just the rows that match the condition

43
New cards

table.group(label)

Produces a table as the output of the counts all the attributes or rows that correspond with the label

44
New cards

Creating and extending tables:

Table().with_column(s) and

Table.read_table

45
New cards

Using array methods to work with data in columns:

item, sum, min, max, np.avg, np.median.. ...

46
New cards

table.take(row_numbers)

keeps the numbered rows. Each row has an index, starting at 0

47
New cards

t.where(column, value)

keeps all rows containing a certain value in a column

48
New cards

What does a bar plot show and when do you use it?

Heights of bars = amounts or frequencies for each category (categorical data).

Use it to visualize comparisons across different groups when data is categorical (e.g., colors, genres, brands).

49
New cards

What is necessary before making a bar plot

Grouping the data to get the counts of each category you are visualizing

50
New cards

What does on the x and y axis for bar plots?

x = categories you want to analyze

y = the counts of them

51
New cards

table.barh('Category')

creates a bar plot of the distribution of counts in the category you pick

52
New cards

What is a histogram and when do you use it?

Chart that displays the distribution of a numerical variable

Use when:

You have quantitative data (e.g., Heights, Ages, Exam Scores).

You want to study the shape, center, spread, skewness of the data.

53
New cards

what are bins?

A bin is a range of numeric values on the x-axis of a histogram.

54
New cards

What is binning?

Binning is coubnting the # of numerical values that lie within ranges called bins

55
New cards

What are some important characteristics of bins?

Bins are defined by their lower bounds (inclusive)

The upper bound is the lower bound of the next bin (not inclusive) like np.arange

56
New cards

What goes on x and y axes for histograms?

X-axis = number line ( ex. years) in numeric bins (ranges of values)

Y-axis = the rate (ex. Percent per year) (area under histogram reflects total proportion = 1)

57
New cards

What is the area principle in histograms?

What percent (or proportion) of the data lies in that bin.

The area of each bar is a percentage of the whole

58
New cards

How do you calculate the area of a histogram?

Area=Height×Width

Example: the [20,30] bin contains 54-8 out of 16327 games is 33.123

So bin is 30-20 = 10 years wide

So 33.123/ 10 years equals 3.31 percent per year

59
New cards

How do you calculate the height of each bar and what does it measure?

It measures the % of data iin the bin relative to the amount of space in the bin --> measures crowdedness or density

Height= Percent in the bin​ (area) / Bin width

60
New cards

Use line plots for sequential data: if...

...your x-axis has an order

...sequential differences in y values are meaningful

...there's only one y-value for each x- value

Usually: x-axis is time or distance

61
New cards

WHat is on the x and y axes for a line graph?

X- Axis = Ordered values (like dates)

Y- axis = Numeric outcome (like price, sales)

62
New cards

how to plot a line gaph?

table.plot('x', 'y')

63
New cards

When to use a scatter plot?

Use scatter plots for non-sequential, NUMERICAL data when you're looking for associations

64
New cards

How do you plot a scatterplot?

Table.scatter ('X', 'Y')

65
New cards

What can you do in a scatter plot to compare two different groups?

Table.scatter ('X', 'Y').group('label')

66
New cards

What does on the x and y axes of a scatterplot?

X = independent variable (cause)

Y = dependent variable (effect)

67
New cards

What is the purpose of overlaid histograms?

Compare distributions of a numeric variable between two groups.

68
New cards

when should you use overlaid histograms?

You want to compare the same variable across different groups (e.g., Heights of men vs women).

69
New cards

What do you need to do to plot and overlaid histogram?

✅ Each group separately filtered.

✅ Must use overlay=True on second call to hist().

example:

# Plot male heights

students.where('Gender', 'Male').hist('Height')

# Overlay female heights

students.where('Gender', 'Female').hist('Height', overlay=True)

70
New cards

What is a function?

Takes input(s) (called parameters)

Does something with those inputs

Returns an output

71
New cards

What is function syntax?

def name_of_function(argument1, argument2):

computation = do_something(argument1, argument2)

return computation

72
New cards

table.apply(function_name, 'ColumnLabel')

This takes every value in 'ColumnLabel', feeds it into function_name, and returns an array of results.

73
New cards

What is randomness?

Randomness means outcomes are uncertain — we can't predict individual results, but we can understand the pattern over many repetitions.

74
New cards

Examples of Random Processes:

Flipping a coin

Rolling a die

Drawing a name from a hat

Shuffling labels in an A/B test

75
New cards

Outcome

A single result (e.g., "Heads")

76
New cards

event

A set of outcomes (e.g., "getting Heads or Tails")

77
New cards

probability

A long-run relative frequency of an event

78
New cards

sample space

All possible outcomes (e.g., {1, 2, 3, 4, 5, 6})

79
New cards

What is iteration?

Iteration means repeating an experiment or simulation many times to see long-run behavior.

This is how you build empirical distributions (e.g., simulating 10,000 coin flips).

80
New cards

np.random.choice(array, size)

Selects uniformly at random

with replacement

from an array,

a specified number of times

examples:

np.random.choice(['Heads', 'Tails'])

81
New cards

table.sample(size)

Samples rows from a table.

examples:

students.sample(5, with_replacement=False)

82
New cards

When Do You Use Loops? (iteration)

You use a for loop when you want to repeat a simulation multiple times — like:

Generating a bootstrap distribution

Estimating a sampling distribution

Repeating A/B shuffles

Repeating a coin flip trial

83
New cards

what is syntax for loops?

results = make_array()

for i in np.arange(1000):

simulated_value = do_something_random()

results = np.append(results, simulated_value)

84
New cards

np.append('array', item)

appends an item onto a previous existing array

85
New cards

np.append('array1', 'array2')

appends two arrays together but they must be the same elements

86
New cards

What is simulation?

Simulation is using randomness in code to imitate a real-world process and study what happens.

You want to estimate probabilities

You're testing a hypothesis (A/B test)

You're bootstrapping

87
New cards

what are the steps to simulation?

Define a model (e.g., assume coin is fair)

Simulate a trial (e.g., flip coin 1,000 times)

Repeat many times (iteration!)

Analyze the results (e.g., histogram of % heads)

88
New cards

What are the probability basics?

lowest value = 0 (chance of event that is impossible)

highest value = 1 ( chance of event that is certain)

complement = if an event has a chance of 70% then chance it doesnt happen is 100-70 = 30%

89
New cards

Equally Likely Outcomes in Probability

assuming all outcomes are equally likely, the chance of event A is :

P(A) = # of outcomes that makes A happen / total # of outcomes

90
New cards

If I have three tickets: red, blue, green. I shuffle them and draw two tickets at random w/out replacement.

What is the chance I get the green ticket followed by the red?

So if you draw 2 at random

then the probability of getting a green is 1/3

then the probability of getting a red after that is 1/2 so

1/3 * 1/2 = 1/6

91
New cards

If I have three tickets: red, blue, green... What is the chance that I get the green ticket followed by a red or blue ticket?

So green first = 1/3

get a blue on second = 1/2

get a red on second = 1/2

so just add the 1/2 of 1/3 for green/red ticket prob and the 1/2 of 1/3 for green/blue ticket prob so 1/6 + 1/6 equals 2/6!

92
New cards

Multiplication rule in probability

Chance that 2 events A and B BOTH happen:

= P(A) * P(B happens given A happens)

the answer will be LESS than or equal to each of the two chances being multiplied

93
New cards

In probability, the more conditions you have to satisfy...

the less likely you are to

94
New cards

What is the addition rule in probability

If event A can happen in exactly 1 of 2 ways then...

P(A) = P(first way) + P(second way)

the answer is greater than or equal to each individual way

95
New cards

what is sampling?

Sampling is the process of selecting a subset (a sample) from a larger group (a population) to study.

Instead of measuring every person or thing, we collect data from a few and use it to make inferences about the whole.

96
New cards

What does with replacement = True mean?

Each time you draw, the same item can be selected again — you're putting it back after each draw.

You can get duplicates

Some values might not appear at all

It reflects a resampling process, not a one-time draw

97
New cards

What does with replacement = False mean?

Once you draw an item, it's removed from the pool — you cannot select it again in that sample.

You get unique values

You're mimicking simple random sampling

This reflects selecting without replacement, like drawing raffle tickets

98
New cards

When to use with_replacement = False?

Drawing a real sample from a population

99
New cards

When to use with_replacement = True?

Resampling from your own data (bootstrap)

You use with_replacement=True because:

You're treating your sample as a stand-in for the population

To simulate new samples, you must allow repeats (just like in a population)

If you did it without replacement, every "resample" would be exactly the same → useless!

100
New cards

What is a probability distribution?

A probability distribution describes the theoretical likelihood of different outcomes for a random variable.

Applies to things like rolling a die, flipping a coin

Based on known probabilities, not data