1/86
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What is the key difference between a statement and an expression?
Statements don't have a value (they perform actions); expressions evaluate to a value.
What does an assignment statement do?
It changes the meaning (binds a name to a value) of the name to the left of the = symbol.
What do , , , mean in comparisons?
Greater than, less than, greater than or equal to, less than or equal to, respectively.
What is the difference between == and !=?
== means "equal to", while != means "not equal to".
How are strings compared?
By their alphabetical order.
How do arithmetic and comparison operations apply to arrays?
They are applied to each element of an array individually (e.g., make_array(1,2,3) ** 2 results in array([1, 4, 9])).
How are element-wise operations performed on arrays of the same size?
They operate on corresponding elements (e.g., makearray(3,2) * makearray(5,4) results in array([15,8])).
What is the basic syntax for defining a function with arguments?
def function_name(arg1, arg2, …): followed by the function body and a return statement for output.
What is the basic syntax for defining a function without arguments?
def function_name(): followed by the function body and a return statement.
How do you call a function without arguments?
By its name followed by parentheses: function_name().
Describe the execution of a for statement.
The body is executed for every item in a sequence. The body can have multiple lines and should perform actions like assigning, sampling, or printing.
What is the general structure of if-elif-else conditional statements?
if
What does Total Variation Distance (TVD) represent?
It's a statistic that represents the difference between two distributions.
What are the four basic arithmetic operations and their Python symbols?
Addition (+), subtraction (-), multiplication (), division (/). (Also, modulo % and exponentiation *).
List the primary data types mentioned.
String ("hello"), boolean (True, False), integer (1, 5), float (2.3, -52.52).
How can you negate any Table.where predicate?
By adding not_ in front of it (e.g., are.not_equal_to(x)).
List some common Table.where predicates and their meanings.
are.equal_to(x): val == x 2. are.above(x): val > x 3. are.above_or_equal_to(x): val >= x 4. are.below(x): val < x 5. are.between(x, y): x <= val < y 6. are.containing(s): val contains the string s
What are the defining properties of a histogram?
Bins are continuous and drawn to scale, the area of each bar equals the percent of entries in the bin, and the total area is 100%.
How do theoretical and empirical histograms relate, especially with more data?
A theoretical histogram shows actual probabilities. An empirical histogram shows observed distributions. With more data (e.g., more rolls of a die), the empirical histogram is likely to look more like the theoretical one.
State the Complement Rule.
P(event does not happen) = 1 - P(event happens).
State the Multiplication Rule.
P(two events both happen) = P(one happens) * P(the other happens, given that the first happened).
State the Addition Rule for mutually exclusive events.
P(event happens) = P(first way it can happen) + P(second way it can happen).
What are the general steps for simulating a statistic?
Create an empty array to collect simulated values. 2. For each repetition: simulate one value of the statistic and append it to the collection array. 3. All simulated values will be in the collection array at the end.
What do max(array) and min(array) do?
Return the maximum or minimum value in an array.
What does sum(array) do, especially with boolean values?
Sums all elements in an array. For boolean arrays, it counts the number of True values.
What does len(array) return?
The number of elements in an array.
What do round(num) and np.round(array) do?
Round a single number or each number in an array to the nearest integer.
What do abs(num) and np.abs(array) do?
Return the absolute value of a single number or each number in an array.
How do you calculate the average of values in an array?
Use np.average(array) or np.mean(array).
Explain np.arange(start, stop, step).
Creates an array of numbers starting from start, incrementing by step, up to but excluding stop. Default start is 0, default step is 1.
How do you access an item in an array by its index?
array.item(index) (e.g., array.item(0) for the first item).
How do you add an item to the end of an array?
np.append(array, item). If item is another array, all its elements are appended.
What does np.exp(array) calculate?
The exponential (e^x) for all elements in the array.
How do you randomly select items from an array?
np.random.choice(array) selects one item. np.random.choice(array, n) selects n items with replacement (default n is 1).
What does np.ones(n) create?
An array of length n consisting of all ones.
What does np.diff(array) return?
An array of length len(array)-1 containing the differences between adjacent elements.
What does np.count_nonzero(array) count?
The number of non-zero (or True) elements in an array.
What does sample_proportions(sample_size, model_proportions) do?
Returns an array of proportions summing to 1, representing the result of sampling sample_size elements from a specified distribution.
How do you create an empty table and read a table from a file?
Table() creates an empty table. Table.read_table(filename) reads from a file.
How do you get the number of rows, columns, and column labels of a table?
tbl.num_rows, tbl.num_columns, tbl.labels.
How do you add or replace columns in a table?
tbl.with_column(name, values) for one, tbl.with_columns(n1, v1, n2, v2…) for multiple.
How do you retrieve the values of a specific column?
tbl.column(column_name_or_index).
How do you select or drop columns from a table?
tbl.select(col1, col2, …) and tbl.drop(col1, col2, …).
How do you change a column's label?
tbl.relabeled(old_label, new_label).
How do you select or exclude rows by index?
tbl.take(row_index) or tbl.take(row_indices) to select. tbl.exclude(row_index) or tbl.exclude(row_indices) to exclude.
How do you sort a table by a column?
tbl.sort(column_name_or_index). Use descending=True for descending order.
How do you filter a table based on a predicate?
tbl.where(column, predicate).
How do you apply a function to each item in a column?
tbl.apply(function, column_or_columns).
How do you group rows in a table?
tbl.group(column_or_columns) counts rows by unique values. tbl.group(column_or_columns, func) aggregates other values using func.
How do you join two tables?
tblA.join(colA, tblB, colB). Joins on colA from tblA and colB from tblB.
How do you create a pivot table?
tbl.pivot(col1, col2) for row counts. tbl.pivot(col1, col2, vals, collect) to aggregate values from a third column.
How do you sample rows from a table?
tbl.sample(n) samples n rows with replacement (default). Use with_replacement=False for sampling without replacement.
How do you create a scatter plot from a table?
tbl.scatter(x_column, y_column).
How do you create a horizontal bar chart?
tbl.barh(categories) or tbl.barh(categories, values).
How do you bin values in a column?
tbl.bin(column, bins).
How do you create a histogram from a table?
tbl.hist(column, unit, bins, group). unit, bins, and group are optional.
What is the implication if a p-value is small and the null hypothesis is true?
Something very unlikely has happened.
What conclusion can be drawn when the data supports the alternative hypothesis more than the null?
Conclude that the data support the alternative hypothesis.
What is a "cutoff" for P-value?
The probability that your test makes the wrong conclusion when the null hypothesis is true.
Why is using a small cutoff important?
It limits the probability of making this type of error.
How is a 95% Confidence Interval (CI) defined?
An interval constructed so that it will contain the true population parameter for approximately 95% of samples.
What does it mean for a particular sample's CI?
For a specific sample, the generated interval either contains the true parameter or it doesn't. The process itself works 95% of the time.
What is "Bootstrap" sampling?
When we want to sample again from the population but can't, we sample from the original large random sample the same number of times as there are data-points in the sample.
How are Null and Alternative Hypotheses typically stated for testing a numerical parameter with a CI?
Null Hypothesis: Population parameter = x. Alternative Hypothesis: Population parameter \neq x.
What is the method for testing a hypothesis using a confidence interval?
Give an example of a question for an A/B test.
"Among babies born at some hospital, is there an association between birth weight and whether the mother smokes?"
What is the Null Hypothesis for comparing two samples in an A/B test?
The distributions of the two groups (e.g., birth weights for smoking vs. non-smoking mothers) are the same.
What is the Inferential Idea behind simulating under the null in an A/B test?
If there's no association, we can simulate new samples by randomly re-pairing outcomes (e.g., birth weights) with groups (e.g., smoking status) from the combined data.
How do you simulate the test statistic under the null for an A/B test?
Permute (shuffle) the outcome column many times. Each time: 1. Create a shuffled table that pairs each individual with a random outcome. 2. Compute a sampled test statistic that compares the two groups (e.g., difference in mean birth weights).
Define the Nth percentile.
The value in a set that is at least as large as N% of the elements in the set.
How do you determine the percentile if it doesn't exactly correspond to an element in the ordered data?
Take the next greater element instead. (e.g., percentile(21, s) for s=[1,7,3,9,5] is 3).
What does percentile(n, arr) do?
Returns the n-th percentile of array arr.
What does np.std(arr) calculate?
The standard deviation of an array arr of numbers.
What does minimize(fn) return?
An array of arguments that minimize the function fn.
What is the purpose of tbl1.append(tbl2) and tbl1.append(row)?
They append a row or all rows of tbl2 to tbl1, mutating tbl1. The appended object must have identical columns.
What does table.rows represent?
All rows of a table (useful for iterating, e.g., for row in table.rows:).
How do you access a specific row or an item within a row?
table.row(i) returns the row at index i. row.item(j) returns item j from row.
What is the "mean" or "average" sometimes referred to as in the context of a histogram?
The balance point of the histogram.
What does Standard Deviation (SD) measure?
Roughly how far off the values are from the average.
What does a z-score (or standard unit) measure?
"How many SDs above average" a value is.
What does a negative z-score indicate?
The value is below average.
What is Chebyshev's Inequality?
At most 1/z^2 of the values are z or more SDs from the mean.
What is the typical range for almost all standard unit values?
(-5, 5).
How do you convert a value to standard units (z-score)?
(value - average) / SD.
How do you convert a z-score back to the original value?
z * SD + average.
According to Chebyshev's Inequality ("All Distributions"), what percentage of values are within 1, 2, and 3 standard deviations of the mean?
1 SD: at least 0% 2 SDs: at least 75% 3 SDs: at least 88.888…%
For a Normal Distribution, what percentage of values are within 1, 2, and 3 standard deviations of the mean?
1 SD: about 68% 2 SDs: about 95% 3 SDs: about 99.73%