BU DS110 Midterm 2

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/107

There's no tags or description

Looks like no tags are added yet.

Last updated 12:36 AM on 3/30/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

108 Terms

New cards

Pandas Series

1D homogenous (all entries of same type) array.

New cards

Pandas DataFrame

2D heterogenous (different types) table, made of Pandas Series

New cards

How to import a Python library

Import {library} as {keyword} (examples: import pandas as pd, import numpy as np)

New cards

Index

Explicit axis of a Pandas Series or DataFrame.

New cards

TSV and CSV files

Tab Separated Value, Comma Separated Value

New cards

Import a CSV file as a DataFrame

import pandas as pd

dataframe = pd.read_csv('csv_title.csv', index_col = 'index')

New cards

.head()

Displays the first few rows of a DataFrame

New cards

.tail()

Displays the last few rows of a DataFrame

New cards

Sort a DataFrame with .sort_values()

sorted_df = dataframe.sort_values(by='example_col', ascending=False)

New cards

Access one column in a DataFrame

dataframe['access_col']

New cards

Access one row in a DataFrame

df.iloc[0]

New cards

Access one cell in a DataFrame

Row comes first

dataframe.loc['example_row', 'access_col']

New cards

Indexing with conditions

Creates a table of booleans. Passing this to dataframe again returns a smaller table with only True values. You can link together multiple criteria as well.

New cards

Computing new columns

dataframe['new_col'] = dataframe['col_1'] + dataframe['col_2']

New cards

.max(), .min(), .mean()

Returns the maximum value, minimum value, and mean value in a column

New cards

.idxmax(), .idxmin()

Returns the index of the maximum or minimum value in a column

New cards

.describe()

Returns the count of entries, mean value, standard deviation, minimum, 25th percentile, 50th percentile, 75th percentile, and maximum of each column in the DataFrame

New cards

.corr()

Returns the values of the correlation (-1 to 1) of two variables. -1 indicates a perfectly inverse relationship between the values of the variables and 1 indicates a perfectly positive relationship.

New cards

.columns

No parentheses! Returns the names of each column.

New cards

.dtypes

No parentheses! Returns the types of each column, since each Series can only hold one type.

New cards

.isnull(), .dropna()

Detects and drops empty rows/columns in a DataFrame

New cards

.iterrows()

Iterates through every row in a DataFrame. To use this use 'for index, row in df.iterrows()'.

New cards

Histogram

A bar chart. You can convert DataFrames to this by using .hist(bins=x). This creates a histogram with x number of bars.

New cards

Box Plot

Created using .boxplot(), this gives…

median value
middle 50% of data
range of non-outliers

New cards

F-strings

Formatted strings. Easier to use than concatenated normal strings. Allows you to edit substrings easily. An example of a formatted string, if cost = 15.0925, would be f'The total cost was {cost.2f} dollars.' The .2f reduces the decimal to two places.

New cards

.split()

Turns a string with multiple parts separated by the argument into a list of separate strings.

if redsoxplayers = 'Abreu, Devers, Duran, Casas', and sox_split = redsoxplayers.split(',') is called…

sox_split = ['Abreu','Devers','Duran','Casas'].

New cards

.join()

The opposite of .split(). You need to call this on the separator and input a list though. ','.join(['Abreu','Devers','Duran','Casas']) = 'Abreu,Devers,Duran,Casas'.

New cards

.strip()

Strips whitespace off the ends of strings.

New cards

.splitlines()

Shortcut for .split('\n')

New cards

.startswith(), .endswith()

Returns True or False if the string starts or ends with the argument

New cards

in operator

Detects if a substring is in a string.

New cards

Calling string functions on DataFrames

Use the .str function. For example, dataframe.str.strip().

New cards

Regular Expressions ("Regex")

Search for patterns in the data. You need to import re!

New cards

Regex escape sequence for any digit

New cards

Regex escape sequence for any whitespace

New cards

Regex escape sequence for any alphanumeric character

New cards

Regex escape sequence for zero or more characters

New cards

Regex escape sequence for one or more characters

New cards

Regex escape sequence for a character that may be there or not

New cards

Regex or operator

(option1|option2)

New cards

Regex capturing information in groups

Use parentheses ()

New cards

with, open()

Used for reading files. open() takes a filename string and returns the file if it is round. The with keyword cleans up resources associated with the file, for example by closing the file after it is used.

New cards

Reader and writer

Objects that read and write files.

New cards

!ls

See the file in its directory.

New cards

JSON

An alternative to CSV. To write a JSON object to file, call json.dump(dict, file) on a dictionary and provide the file to write it into. JSONs can be read into dictionaries.

New cards

Serialization

Committing data to a file.

New cards

pandas.read_csv(filename, index_col)

Reads a CSV file directly into a DataFrame.

New cards

df.to_csv(filename)

Writes a DataFrame to a CSV file.

New cards

Exceptions

Objects that occur when the code has errors.

Examples are FileNotFoundError, ZeroDivisionError, and ValueError (occurs when attempting to parse a non-int string as int).

If an exception isn't caught by the program, it immediately terminates.

Try not to generate exceptions, even if there are ways to work around them.

New cards

try and except keywords

If an exception occurs within the try block, the code will jump to the next except block. This prevents the program from crashing even if the code is faulty.

New cards

Else and finally

Else can occur after except blocks to run if there are no errors.

Finally blocks are run after everything else.

New cards

Object-oriented programming

Programming that uses a system of object classes

New cards

How to initialize a class

class ExampleClass:

New cards

pass keyword

Use to create an empty class ("nothing interesting here")

New cards

Initialize objects of classes

example1 = ExampleClass()

example2 = ExampleClass()

New cards

isinstance()

Checks if an object is an instance of a certain class and returns a boolean.

isinstance(example2, ExampleClass) returns True

isinstance(not_example, ExampleClass) returns False

New cards

Attribute

Variables associated with a class. Can be defined ahead of time.

New cards

Method in object

class ExampleClass:

def example_function(self):

ALL methods in objects must begin with the self attribute, this represents the object itself

New cards

Constructor method

Sets object attributes for the first time.

Usually titled "__init__".

class ExampleClass:

def __init__(self, attribute1, attribute2):

self.attribute1 = attribute1

self.attribute2 = attribute2

New cards

Instance

One copy of an object

New cards

Getter

Returns an attribute of an object.

def get_attribute(self):

return self.attribute

Try to avoid direct attribute access if you can.

New cards

Setter

Sets an attribute of an object.

def set_attribute(self, attribute):

self.attribute = attribute

Try to avoid direct attribute access if you can.

New cards

Validation in the Constructor

Constructors can have other things than initializing attributes! They can also test to make sure a value works for an object.

class GoodTeams:

def __init__(self, championships_21st_century):

if championships_21st_century < 3:

raise ValueError("This team is a bunch of bums, like the New York Yankees!")

else:

print("Wow! What an elite team, such as the Boston Red Sox and the Patriots!")

self.championships_21st_century = championships_21st_century

New cards

Default parameter values

Initialize a default value of an attribute.

class HomeRuns:

def __init__(self, homers = 0):

self.homers = homers

HomeRuns().homers returns 0.

New cards

What should be an object?

Something that has multiple attributes attached to it and needs several functions that relate to it specifically.

New cards

Inheritance

A way of sharing code between classes.

The "child" class inherits from the "parent" class, which means it has all the code of the parent class plus any extra code that is entered into the child class.

New cards

How to create a child class

class Child(Parent):

Child classes are called just like parent classes. You can also use the pass keyword to indicate that the child class does nothing but inherit from the parent, which seems useless but can be functional, for example if you want to use isinstance() to check if an instance is a certain object.

New cards

super()

A function that calls a method in the parent class for use in the child class.

class Player:

def __init__(self, age, number, position):

self.age = age

self.number = number

self.position = position

class Pitcher(Player):

def __init__(self, age, number, position, velocity, is_starter):

super().__init__(age, number, position)

self.velocity = velocity

self.is_starter = is_starter

New cards

When to use inheritance

If A inherits from B, A should satisfy an "is-a" relationship with B.

For example, every baseball pitcher is a baseball player.

Since not every baseball player is a pitcher, the reverse would not make sense.

New cards

Refactoring

Changing and improving existing code, for example in a child class to build upon code existing in a parent class.

New cards

Override a method

Rewrite a method in a child class to make it work differently from its equivalent in the parent class.

New cards

Recursion

Calling a function within itself.

def countdown(num):

print(num)

if num > 0:

countdown(num - 1)

countdown(5) prints:

New cards

Infinite recursion

A recursive function that runs infinitely. Make sure you always avoid these by causing your recursive function to eventually reach a condition that causes it to stop (base case).

New cards

Base case

The condition that prevents the recursive function from running infinitely.

One of two parts of recursion.

New cards

Recursive call

The method that is called within itself. One of two parts of recursion.

New cards

Data structure

An object that holds more data.

Simple examples are lists and dictionaries, more complicated ones would be dataframes and linked lists.

New cards

Linked list

A system of nodes where each node holds both a value and a link to another node.

New cards

Trees

Linked list nodes with multiple links from "parent" nodes to "child" nodes.

The node at the top of the tree (with no parents itself) is the root

The node with no children is called a leaf.

New cards

Binary tree

A tree that limits each parent node's number of children to 2.

New cards

Typical implementation of a tree (detailed for people like me who struggle with trees!)

class Tree:

def __init__(self, val): ##constructor

self.left = None

self.right = None ##initializes only first node, no children

self.val = val ##sets value to that specified in the initialization of the class

##adding specified nodes (this is a binary tree)

def addLeft(self, node):

self.left = node

def addRight(self, node):

self.right = node

def find(self, v): ##finds value in the tree

if self.val == v: ##if the value is here return True

return True

##checking all child nodes recursively

if self.left and self.left.find(v):

return True

if self.right and self.right.find(v):

return True

##if not found

return False

New cards

Supervised machine learning

Extrapolates values for the test data based on the training data.

Two types are classification and regression.

Classification

Goal: To predict a categorical label or discrete outcome (e.g., True/False, Red/Blue/Green).
Output: Labels/Classes.

Regression

Goal: To predict a continuous, numerical output variable.
Output: A continuous numerical value.

New cards

Classification

One type of machine learning that classifies multiple types of things into groups.

New cards

Regression

One type of machine learning that fits a function to data.

New cards

scikit-learn

A free and open-source machine learning library for Python.

New cards

k-nearest neighbors

Extrapolate a value for a node based on the values of its nearest neighbor nodes.

New cards

KNeighborsClassifier

Module for k-nearest neighbors.

from sklearn.neighbors import KNeighborsClassifier

Important: modules are created as objects.

nbrs = KNeighborsClassifier(n_neighbors=3).fit(digits.data, digits.target)

New cards

Transforming

Turns raw data into usable data for machine learners.

New cards

Preprocessing

Like transforming but broader, for example can handle missing values.

New cards

Pipeline

The process of different steps of machine learning: transforming and pre-processing, training, then predicting/testing.

New cards

.fit(X, y)

Learns the parameters for preprocessing transformation, trains the machine learner on the training data.

New cards

.score(X, y)

Runs the trained machine learner on the test data and gives the accuracy.

New cards

.transform()

Applies a learned transformation to new data.

New cards

.fit_transform()

Combines .fit() and .transform().

New cards

Overfitting

This occurs when the training model fits itself too closely to the training data.

To avoid this, you should split the data into training and testing data.

New cards

Train/test split

Splitting one set of data into training and testing pieces.

New cards

train_test_split

Splits the dataset into training and testing data for you.

from sklearn.model_selection import train_test_split

#The four objects created here are then inserted into other functions to represent the training and testing data. They can be named whatever you want but X_train, X_test, y_train, y_test is good practice. No one knows why X is capitalized and y is not, not even the DS tutors we have here.

X_train, X_test, y_train, y_test = train_test_split(test_size=0.25, train_size=0.75, random_state=69, shuffle=True)

Relevant parameters:

test_size and train_size should add to one and control the ratio of testing data to training data.

random_state controls the pattern of shuffling before the data is split.

shuffle determines whether or not the data should be shuffled. This is useful if you have ordered data where splitting it normally would cause bad sampling.

New cards

Random

Sets the seed of the random number generator. This can make operations that require randomness repeatable when they would otherwise not be due to shuffle regenerating the randomness each time. The random_state parameter handles this in train_test_split.

New cards

Validation data

Split off the training data to further evaluate the performance of a model during training.

New cards

Cross-validation

Repeatedly using different sections of the data as validation data. For example:

-Training on the first 80% then testing on the remaining 20%

-Then, training on the first 60% while testing on the next 20% and training on the 20% that was previously used as the test data

-Repeating however many times is required

100

New cards

cross_val_score

scikit-learn's function for cross-validation. Scores accuracy of the model.