Data Science with Python -- Exam #1

0.0(0)
studied byStudied by 1 person
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/44

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

45 Terms

1
New cards

Uses of data science

  1. Exploration

  2. Inference

  3. Prediction

2
New cards

Association

Any relation or link. doesn’t mean that there is causality.

3
New cards

How to identify causality

Usually need to conduct an experiment that contains both a treatment (or experimental) group and a control group (no treatment or placebo). If the groups are similar apart from the treatment, differences between group outcomes can be due to the treatment.

4
New cards

Confounding

Leads researchers astray. Creates a false sense that there is a causal relationship. Usually due to systematic differences outside of the treatment.

5
New cards

Benefits of random assortment

the groups are likely to be similar outside of the treatment. Lets you account for assignment variability.

6
New cards

Observational Study

Researchers observe subjects. No direct impact on them, just observing.

7
New cards

Assignment Statement

Changes the meaning of the name to the left of the “=” symbol. Bound to the expression value to the right.

ex: A_New_Name = 5×10

8
New cards

Functions

Code that performs a task. Organized & reusable. Breaks complex stuff into smaller parts

Can have Functions that are built into python and that you define yourself.

9
New cards

What happens to a past calculation if you change the variable

It does not change, you must calculate again in order for it to change.

10
New cards

How a function works

  1. Call function

  2. Input values

  3. Interpreter

  4. Function Output

11
New cards

Data Structures

Vector - 1 dimensional

Table - 2 dimensional

12
New cards

Table

Sequence of labeled columns.

Each row — individual and all their data

Each column — observations for each variable 

13
New cards

Table Operations

t.drop(label) - makes table where chosen columns are omitted

t.sort(label, descending = True) - makes table where the rows are sorted by a specific column. originally in increasing order.

t.where(label, condition) - makes table with just rows matching condition

t.select(label) - makes table that only has selected columns

14
New cards

Condition arguments

are.equal_to(value)

are.below()

are.above()

are.above_or_equal_to()

are.below_or_equal_to()

15
New cards

Data Types

All data values have a type.

  • int

  • float

  • str

  • Table

  • random one: builtin_function_or_method

16
New cards

type function

gives the type of a value. Based on value, not appearance. ex: type(5+5) returns int and type(5.5) returns float.

17
New cards

Column/Array Index

First value is always zero.

18
New cards

Float

number w/ fractional/decimal part. Always has a decimal point.

19
New cards

int

integer of any size, never has a decimal

20
New cards

Float Limitations

  1. Final decimal places can be wrong after arithmetic

  2. Limited precision of 15-16 decimals

  3. Limited size, but still big limit

21
New cards

Operators

Addition: +

Subtraction: -

Multiplication: *

Division: /

Remainder: %

Exponent: **

22
New cards

String

Text of any length. Can be created with “__“ or ‘__'.

23
New cards

Converting from strings

int(“15”)

float(“1.9”)

Need to make sure the string values can actually convert or you will get an error.

24
New cards

Converting to strings

Any value can be converted to a string with str().

25
New cards

Converting between types

floats and integers can be converted to each other. But converting a float to an int will lose information if there is a decimal.

26
New cards

Array

A sequence of values. All have same type. Table column is an array. You can add elements between arrays if they have the same length. Each element has math applied individually to it.

27
New cards

Array functions

t.column(label) - makes an array out of a column. can also use an index

a.item(index) - value at a particular index

to aggregate array values, either:

np.mean(), np.sum(), np.max(), np.min()

a.mean(), a.sum(), a.max(), a.min()

make_array()

28
New cards

Repeating a string

if you multiply by an integer you can get a longer string. ex: “hi” * 2 = “hihi”

29
New cards

np.arange

makes an array.

np.arange(end) - increasing integers from 0 to end

np.arange(start, end) - increasing integers from start up to end

np.arange(start, end, step) - range with a defined amount (step) between each value.

Does not include end value in range

30
New cards

Creating tables

Table.read_table(filename)

Table() - empty

Table().with_column - one array to a table

Table().with_columns - multiple arrays to a table

31
New cards

num_rows and num_columns

Find the size of a table.

32
New cards

Row-related functions

t.take(row_numbers) - keeps the rows that are numbered

t.where - keeps rows when column value matches condition

33
New cards

Lists

sequence of values where there can be different types.

34
New cards

Numerical attributes

each value is from a numerical scale

35
New cards

categorical attributes

Values are from a preset inventory

36
New cards

Ways to plot 2 numerical variables

  • line graph - t.plot(x, y). Better for stuff with order and trends over a period.

  • Scatter graph - t.scatter(x, y). Better for associations

37
New cards

Bins

defined by their lower and upper bounds. Upper bound is the lower bound of the next bin.

38
New cards

Histogram

Shows the distribution of a numerical variable

One bar for each bin.

Area of a bar is the percent individuals in the bin.

Height of Bar = (% in bin) / (bin width)

height measures the percent of data in the bin relative to space in the bin. Height measures density.

39
New cards

Histogram area

Area measures percent.

Area of bar = % in bin = Height x width of bin

40
New cards

Understanding histograms

Looking for the number of individuals in a bin → use area

Looking for how crowded or dense a bin is → use height

41
New cards

Bar Chart vs histogram

  • Bar chart focuses on distribution of a categorical variable while a histogram focuses on distribution of a numerical variable.

  • Bars have random (yet equal) widths and spacing with any order in bar charts, while histograms have a numerical horizontal axis and drawn to scale.

  • Bar chart - height and area of bars is proportional to percent of individuals. Histogram has area of bars proportional to percent of individuals and height measuring density.

42
New cards

Building functions

def name(argument names):

. return expression

Made up of def, name, argument parameters, and a return expression.

43
New cards

Apply Function

t.apply(function, argument1, argument2, etc)

first - function to apply

arguments - input columns 

44
New cards

table.with_column(array)

table.with_row(list)

column stack and row stack. Adds these things to a table

45
New cards

len()

find length of something