1/117
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is an int and what does it never have?
an integer of any size that never has a decimal point
What is a float and what does it always have? What other form can it also be in?
a number with an optional fractional part that always has a decimal point.
Can be in scientific notation as well
What are the three limitations to floats?
1. They have limited size (but the limit is huge)
2. They have limited precision of 15-16 decimal places
3. After arithmetic, the final few decimal places can be wrong
What is a string? What else can you convert it to if it has numbers in it?
a snippet of text of any length.
Can convert it into an int or float
Can any value be converted into a string?
Yes!
str(5)
'Water' + 'melon' = 'watermelon'
'Water' + ' ' + 'melon' = 'water melon'
'Ha' * 5 = hahahahaha but cant multiply by floats
==
Is equal to (comparison operator)
!=
Is NOT equal to
<
is less than
>
is more than
<=
less than or equal to
>=
greater than or equal to
What values do you get when using comparisons?
Boolean values-- either true(1) or false (0)
What is an array?
An array contains a sequence of values
What are some characteristics of arrays?
- All elements of an array should have the same type
- Arithmetic is applied to each element individually
- Adding arrays adds elements (if same length!)
- indices start at 0
what is the column of a table? and what function do you use to separate it?
An array!
use table.column('label')
len(array)
counts how many elements in an array
sum(array)
adds all items of an array together
np.average(array)
np.mean(array)
np.sqrt(array)
gives you average, mean, and squareroot of the array
if you divide an array by an integer...
this will always give you floats if they're needed or not
array.item()
pick out an item or multiple from an array
table().with_column('column name', array)
To take an array and put into a table as new column
How do you make an array from scratch?
make_array(x,x,x,x,...)
What is a range?
A range is an array of consecutive numbers
np.arange(end)
An array of increasing integers from 0 up to end
np.arange(start, end):
An array of increasing integers from start up to end
np.arange(start, end, step):
A range with step between consecutive values
What does a range always include and exclude?
The range always includes start but excludes end
What is a table?
A Table is a sequence of labeled columns
What does each row in a table represent?
one individual or attribute
What does each column in a table represent?
Data within a column represents one attribute of the individuals
What are some ways to create tables?
Table.read_table(filename): reads a table from a spreadsheet
Table().with_column(s) --> an empty table
What should all columns of a table be?
All values in a column of a table should be both the same type and be comparable to each other in some way
What are numerical attributes of a table?
Numerical — Each value is from a numerical scale
- Numerical measurements are ordered
- Differences are meaningful
What are categorical attributes of a table?
Categorical — Each value is from a fixed inventory
- May or may not have an ordering
- Categories are the same or different
table.labels
list of column labels
table.relabeled('new label', 'old label')
To relabel a column in a table
Table.num_columns
number of columns in a table
table.num_rows
number of rows in a table
table.select(label)
constructs a new table with just the specified columns
table.drop(label)
constructs a new table in which the specified columns are omitted
table.sort(label)
constructs a new table with rows sorted by the specified column
- Sorts in ascending order
Can add descending = True to sort it from largest to smallest
table.where(label, condition)
constructs a new table with just the rows that match the condition
table.group(label)
Produces a table as the output of the counts all the attributes or rows that correspond with the label
Creating and extending tables:
Table().with_column(s) and
Table.read_table
Using array methods to work with data in columns:
item, sum, min, max, np.avg, np.median.. ...
table.take(row_numbers)
keeps the numbered rows. Each row has an index, starting at 0
t.where(column, value)
keeps all rows containing a certain value in a column
What does a bar plot show and when do you use it?
Heights of bars = amounts or frequencies for each category (categorical data).
Use it to visualize comparisons across different groups when data is categorical (e.g., colors, genres, brands).
What is necessary before making a bar plot
Grouping the data to get the counts of each category you are visualizing
What does on the x and y axis for bar plots?
x = categories you want to analyze
y = the counts of them
table.barh('Category')
creates a bar plot of the distribution of counts in the category you pick
What is a histogram and when do you use it?
Chart that displays the distribution of a numerical variable
Use when:
You have quantitative data (e.g., Heights, Ages, Exam Scores).
You want to study the shape, center, spread, skewness of the data.
what are bins?
A bin is a range of numeric values on the x-axis of a histogram.
What is binning?
Binning is coubnting the # of numerical values that lie within ranges called bins
What are some important characteristics of bins?
Bins are defined by their lower bounds (inclusive)
The upper bound is the lower bound of the next bin (not inclusive) like np.arange
What goes on x and y axes for histograms?
X-axis = number line ( ex. years) in numeric bins (ranges of values)
Y-axis = the rate (ex. Percent per year) (area under histogram reflects total proportion = 1)
What is the area principle in histograms?
What percent (or proportion) of the data lies in that bin.
The area of each bar is a percentage of the whole
How do you calculate the area of a histogram?
Area=Height×Width
Example: the [20,30] bin contains 54-8 out of 16327 games is 33.123
So bin is 30-20 = 10 years wide
So 33.123/ 10 years equals 3.31 percent per year
How do you calculate the height of each bar and what does it measure?
It measures the % of data iin the bin relative to the amount of space in the bin --> measures crowdedness or density
Height= Percent in the bin (area) / Bin width
Use line plots for sequential data: if...
...your x-axis has an order
...sequential differences in y values are meaningful
...there's only one y-value for each x- value
Usually: x-axis is time or distance
WHat is on the x and y axes for a line graph?
X- Axis = Ordered values (like dates)
Y- axis = Numeric outcome (like price, sales)
how to plot a line gaph?
table.plot('x', 'y')
When to use a scatter plot?
Use scatter plots for non-sequential, NUMERICAL data when you're looking for associations
How do you plot a scatterplot?
Table.scatter ('X', 'Y')
What can you do in a scatter plot to compare two different groups?
Table.scatter ('X', 'Y').group('label')
What does on the x and y axes of a scatterplot?
X = independent variable (cause)
Y = dependent variable (effect)
What is the purpose of overlaid histograms?
Compare distributions of a numeric variable between two groups.
when should you use overlaid histograms?
You want to compare the same variable across different groups (e.g., Heights of men vs women).
What do you need to do to plot and overlaid histogram?
✅ Each group separately filtered.
✅ Must use overlay=True on second call to hist().
example:
# Plot male heights
students.where('Gender', 'Male').hist('Height')
# Overlay female heights
students.where('Gender', 'Female').hist('Height', overlay=True)
What is a function?
Takes input(s) (called parameters)
Does something with those inputs
Returns an output
What is function syntax?
def name_of_function(argument1, argument2):
computation = do_something(argument1, argument2)
return computation
table.apply(function_name, 'ColumnLabel')
This takes every value in 'ColumnLabel', feeds it into function_name, and returns an array of results.
What is randomness?
Randomness means outcomes are uncertain — we can't predict individual results, but we can understand the pattern over many repetitions.
Examples of Random Processes:
Flipping a coin
Rolling a die
Drawing a name from a hat
Shuffling labels in an A/B test
Outcome
A single result (e.g., "Heads")
event
A set of outcomes (e.g., "getting Heads or Tails")
probability
A long-run relative frequency of an event
sample space
All possible outcomes (e.g., {1, 2, 3, 4, 5, 6})
What is iteration?
Iteration means repeating an experiment or simulation many times to see long-run behavior.
This is how you build empirical distributions (e.g., simulating 10,000 coin flips).
np.random.choice(array, size)
Selects uniformly at random
with replacement
from an array,
a specified number of times
examples:
np.random.choice(['Heads', 'Tails'])
table.sample(size)
Samples rows from a table.
examples:
students.sample(5, with_replacement=False)
When Do You Use Loops? (iteration)
You use a for loop when you want to repeat a simulation multiple times — like:
Generating a bootstrap distribution
Estimating a sampling distribution
Repeating A/B shuffles
Repeating a coin flip trial
what is syntax for loops?
results = make_array()
for i in np.arange(1000):
simulated_value = do_something_random()
results = np.append(results, simulated_value)
np.append('array', item)
appends an item onto a previous existing array
np.append('array1', 'array2')
appends two arrays together but they must be the same elements
What is simulation?
Simulation is using randomness in code to imitate a real-world process and study what happens.
You want to estimate probabilities
You're testing a hypothesis (A/B test)
You're bootstrapping
what are the steps to simulation?
Define a model (e.g., assume coin is fair)
Simulate a trial (e.g., flip coin 1,000 times)
Repeat many times (iteration!)
Analyze the results (e.g., histogram of % heads)
What are the probability basics?
lowest value = 0 (chance of event that is impossible)
highest value = 1 ( chance of event that is certain)
complement = if an event has a chance of 70% then chance it doesnt happen is 100-70 = 30%
Equally Likely Outcomes in Probability
assuming all outcomes are equally likely, the chance of event A is :
P(A) = # of outcomes that makes A happen / total # of outcomes
If I have three tickets: red, blue, green. I shuffle them and draw two tickets at random w/out replacement.
What is the chance I get the green ticket followed by the red?
So if you draw 2 at random
then the probability of getting a green is 1/3
then the probability of getting a red after that is 1/2 so
1/3 * 1/2 = 1/6
If I have three tickets: red, blue, green... What is the chance that I get the green ticket followed by a red or blue ticket?
So green first = 1/3
get a blue on second = 1/2
get a red on second = 1/2
so just add the 1/2 of 1/3 for green/red ticket prob and the 1/2 of 1/3 for green/blue ticket prob so 1/6 + 1/6 equals 2/6!
Multiplication rule in probability
Chance that 2 events A and B BOTH happen:
= P(A) * P(B happens given A happens)
the answer will be LESS than or equal to each of the two chances being multiplied
In probability, the more conditions you have to satisfy...
the less likely you are to
What is the addition rule in probability
If event A can happen in exactly 1 of 2 ways then...
P(A) = P(first way) + P(second way)
the answer is greater than or equal to each individual way
what is sampling?
Sampling is the process of selecting a subset (a sample) from a larger group (a population) to study.
Instead of measuring every person or thing, we collect data from a few and use it to make inferences about the whole.
What does with replacement = True mean?
Each time you draw, the same item can be selected again — you're putting it back after each draw.
You can get duplicates
Some values might not appear at all
It reflects a resampling process, not a one-time draw
What does with replacement = False mean?
Once you draw an item, it's removed from the pool — you cannot select it again in that sample.
You get unique values
You're mimicking simple random sampling
This reflects selecting without replacement, like drawing raffle tickets
When to use with_replacement = False?
Drawing a real sample from a population
When to use with_replacement = True?
Resampling from your own data (bootstrap)
You use with_replacement=True because:
You're treating your sample as a stand-in for the population
To simulate new samples, you must allow repeats (just like in a population)
If you did it without replacement, every "resample" would be exactly the same → useless!
What is a probability distribution?
A probability distribution describes the theoretical likelihood of different outcomes for a random variable.
Applies to things like rolling a die, flipping a coin
Based on known probabilities, not data