Data science final

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/86

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

87 Terms

1
New cards

What is the key difference between a statement and an expression?

Statements don't have a value (they perform actions); expressions evaluate to a value.

2
New cards

What does an assignment statement do?

It changes the meaning (binds a name to a value) of the name to the left of the = symbol.

3
New cards

What do , , , mean in comparisons?

Greater than, less than, greater than or equal to, less than or equal to, respectively.

4
New cards

What is the difference between == and !=?

== means "equal to", while != means "not equal to".

5
New cards

How are strings compared?

By their alphabetical order.

6
New cards

How do arithmetic and comparison operations apply to arrays?

They are applied to each element of an array individually (e.g., make_array(1,2,3) ** 2 results in array([1, 4, 9])).

7
New cards

How are element-wise operations performed on arrays of the same size?

They operate on corresponding elements (e.g., makearray(3,2) * makearray(5,4) results in array([15,8])).

8
New cards

What is the basic syntax for defining a function with arguments?

def function_name(arg1, arg2, …): followed by the function body and a return statement for output.

9
New cards

What is the basic syntax for defining a function without arguments?

def function_name(): followed by the function body and a return statement.

10
New cards

How do you call a function without arguments?

By its name followed by parentheses: function_name().

11
New cards

Describe the execution of a for statement.

The body is executed for every item in a sequence. The body can have multiple lines and should perform actions like assigning, sampling, or printing.

12
New cards

What is the general structure of if-elif-else conditional statements?

if : elif : else: (multiple elif blocks are possible).

13
New cards

What does Total Variation Distance (TVD) represent?

It's a statistic that represents the difference between two distributions.

14
New cards

What are the four basic arithmetic operations and their Python symbols?

Addition (+), subtraction (-), multiplication (), division (/). (Also, modulo % and exponentiation *).

15
New cards

List the primary data types mentioned.

String ("hello"), boolean (True, False), integer (1, 5), float (2.3, -52.52).

16
New cards

How can you negate any Table.where predicate?

By adding not_ in front of it (e.g., are.not_equal_to(x)).

17
New cards

List some common Table.where predicates and their meanings.

  1. are.equal_to(x): val == x 2. are.above(x): val > x 3. are.above_or_equal_to(x): val >= x 4. are.below(x): val < x 5. are.between(x, y): x <= val < y 6. are.containing(s): val contains the string s

18
New cards

What are the defining properties of a histogram?

Bins are continuous and drawn to scale, the area of each bar equals the percent of entries in the bin, and the total area is 100%.

19
New cards

How do theoretical and empirical histograms relate, especially with more data?

A theoretical histogram shows actual probabilities. An empirical histogram shows observed distributions. With more data (e.g., more rolls of a die), the empirical histogram is likely to look more like the theoretical one.

20
New cards

State the Complement Rule.

P(event does not happen) = 1 - P(event happens).

21
New cards

State the Multiplication Rule.

P(two events both happen) = P(one happens) * P(the other happens, given that the first happened).

22
New cards

State the Addition Rule for mutually exclusive events.

P(event happens) = P(first way it can happen) + P(second way it can happen).

23
New cards

What are the general steps for simulating a statistic?

  1. Create an empty array to collect simulated values. 2. For each repetition: simulate one value of the statistic and append it to the collection array. 3. All simulated values will be in the collection array at the end.

24
New cards

What do max(array) and min(array) do?

Return the maximum or minimum value in an array.

25
New cards

What does sum(array) do, especially with boolean values?

Sums all elements in an array. For boolean arrays, it counts the number of True values.

26
New cards

What does len(array) return?

The number of elements in an array.

27
New cards

What do round(num) and np.round(array) do?

Round a single number or each number in an array to the nearest integer.

28
New cards

What do abs(num) and np.abs(array) do?

Return the absolute value of a single number or each number in an array.

29
New cards

How do you calculate the average of values in an array?

Use np.average(array) or np.mean(array).

30
New cards

Explain np.arange(start, stop, step).

Creates an array of numbers starting from start, incrementing by step, up to but excluding stop. Default start is 0, default step is 1.

31
New cards

How do you access an item in an array by its index?

array.item(index) (e.g., array.item(0) for the first item).

32
New cards

How do you add an item to the end of an array?

np.append(array, item). If item is another array, all its elements are appended.

33
New cards

What does np.exp(array) calculate?

The exponential (e^x) for all elements in the array.

34
New cards

How do you randomly select items from an array?

np.random.choice(array) selects one item. np.random.choice(array, n) selects n items with replacement (default n is 1).

35
New cards

What does np.ones(n) create?

An array of length n consisting of all ones.

36
New cards

What does np.diff(array) return?

An array of length len(array)-1 containing the differences between adjacent elements.

37
New cards

What does np.count_nonzero(array) count?

The number of non-zero (or True) elements in an array.

38
New cards

What does sample_proportions(sample_size, model_proportions) do?

Returns an array of proportions summing to 1, representing the result of sampling sample_size elements from a specified distribution.

39
New cards

How do you create an empty table and read a table from a file?

Table() creates an empty table. Table.read_table(filename) reads from a file.

40
New cards

How do you get the number of rows, columns, and column labels of a table?

tbl.num_rows, tbl.num_columns, tbl.labels.

41
New cards

How do you add or replace columns in a table?

tbl.with_column(name, values) for one, tbl.with_columns(n1, v1, n2, v2…) for multiple.

42
New cards

How do you retrieve the values of a specific column?

tbl.column(column_name_or_index).

43
New cards

How do you select or drop columns from a table?

tbl.select(col1, col2, …) and tbl.drop(col1, col2, …).

44
New cards

How do you change a column's label?

tbl.relabeled(old_label, new_label).

45
New cards

How do you select or exclude rows by index?

tbl.take(row_index) or tbl.take(row_indices) to select. tbl.exclude(row_index) or tbl.exclude(row_indices) to exclude.

46
New cards

How do you sort a table by a column?

tbl.sort(column_name_or_index). Use descending=True for descending order.

47
New cards

How do you filter a table based on a predicate?

tbl.where(column, predicate).

48
New cards

How do you apply a function to each item in a column?

tbl.apply(function, column_or_columns).

49
New cards

How do you group rows in a table?

tbl.group(column_or_columns) counts rows by unique values. tbl.group(column_or_columns, func) aggregates other values using func.

50
New cards

How do you join two tables?

tblA.join(colA, tblB, colB). Joins on colA from tblA and colB from tblB.

51
New cards

How do you create a pivot table?

tbl.pivot(col1, col2) for row counts. tbl.pivot(col1, col2, vals, collect) to aggregate values from a third column.

52
New cards

How do you sample rows from a table?

tbl.sample(n) samples n rows with replacement (default). Use with_replacement=False for sampling without replacement.

53
New cards

How do you create a scatter plot from a table?

tbl.scatter(x_column, y_column).

54
New cards

How do you create a horizontal bar chart?

tbl.barh(categories) or tbl.barh(categories, values).

55
New cards

How do you bin values in a column?

tbl.bin(column, bins).

56
New cards

How do you create a histogram from a table?

tbl.hist(column, unit, bins, group). unit, bins, and group are optional.

57
New cards

What is the implication if a p-value is small and the null hypothesis is true?

Something very unlikely has happened.

58
New cards

What conclusion can be drawn when the data supports the alternative hypothesis more than the null?

Conclude that the data support the alternative hypothesis.

59
New cards

What is a "cutoff" for P-value?

The probability that your test makes the wrong conclusion when the null hypothesis is true.

60
New cards

Why is using a small cutoff important?

It limits the probability of making this type of error.

61
New cards

How is a 95% Confidence Interval (CI) defined?

An interval constructed so that it will contain the true population parameter for approximately 95% of samples.

62
New cards

What does it mean for a particular sample's CI?

For a specific sample, the generated interval either contains the true parameter or it doesn't. The process itself works 95% of the time.

63
New cards

What is "Bootstrap" sampling?

When we want to sample again from the population but can't, we sample from the original large random sample the same number of times as there are data-points in the sample.

64
New cards

How are Null and Alternative Hypotheses typically stated for testing a numerical parameter with a CI?

Null Hypothesis: Population parameter = x. Alternative Hypothesis: Population parameter \neq x.

65
New cards

What is the method for testing a hypothesis using a confidence interval?

  1. Construct a (100-p)% confidence interval for the population parameter. 2. If the hypothesized value x is not in the interval, reject the null. 3. If x is in the interval, fail to reject the null.
66
New cards

Give an example of a question for an A/B test.

"Among babies born at some hospital, is there an association between birth weight and whether the mother smokes?"

67
New cards

What is the Null Hypothesis for comparing two samples in an A/B test?

The distributions of the two groups (e.g., birth weights for smoking vs. non-smoking mothers) are the same.

68
New cards

What is the Inferential Idea behind simulating under the null in an A/B test?

If there's no association, we can simulate new samples by randomly re-pairing outcomes (e.g., birth weights) with groups (e.g., smoking status) from the combined data.

69
New cards

How do you simulate the test statistic under the null for an A/B test?

Permute (shuffle) the outcome column many times. Each time: 1. Create a shuffled table that pairs each individual with a random outcome. 2. Compute a sampled test statistic that compares the two groups (e.g., difference in mean birth weights).

70
New cards

Define the Nth percentile.

The value in a set that is at least as large as N% of the elements in the set.

71
New cards

How do you determine the percentile if it doesn't exactly correspond to an element in the ordered data?

Take the next greater element instead. (e.g., percentile(21, s) for s=[1,7,3,9,5] is 3).

72
New cards

What does percentile(n, arr) do?

Returns the n-th percentile of array arr.

73
New cards

What does np.std(arr) calculate?

The standard deviation of an array arr of numbers.

74
New cards

What does minimize(fn) return?

An array of arguments that minimize the function fn.

75
New cards

What is the purpose of tbl1.append(tbl2) and tbl1.append(row)?

They append a row or all rows of tbl2 to tbl1, mutating tbl1. The appended object must have identical columns.

76
New cards

What does table.rows represent?

All rows of a table (useful for iterating, e.g., for row in table.rows:).

77
New cards

How do you access a specific row or an item within a row?

table.row(i) returns the row at index i. row.item(j) returns item j from row.

78
New cards

What is the "mean" or "average" sometimes referred to as in the context of a histogram?

The balance point of the histogram.

79
New cards

What does Standard Deviation (SD) measure?

Roughly how far off the values are from the average.

80
New cards

What does a z-score (or standard unit) measure?

"How many SDs above average" a value is.

81
New cards

What does a negative z-score indicate?

The value is below average.

82
New cards

What is Chebyshev's Inequality?

At most 1/z^2 of the values are z or more SDs from the mean.

83
New cards

What is the typical range for almost all standard unit values?

(-5, 5).

84
New cards

How do you convert a value to standard units (z-score)?

(value - average) / SD.

85
New cards

How do you convert a z-score back to the original value?

z * SD + average.

86
New cards

According to Chebyshev's Inequality ("All Distributions"), what percentage of values are within 1, 2, and 3 standard deviations of the mean?

1 SD: at least 0% 2 SDs: at least 75% 3 SDs: at least 88.888…%

87
New cards

For a Normal Distribution, what percentage of values are within 1, 2, and 3 standard deviations of the mean?

1 SD: about 68% 2 SDs: about 95% 3 SDs: about 99.73%