MGSC 291 Exam 1 University of South Carolina

0.0(0)

Studied by 0 people

0.0(0)

Call with Kai

Knowt Play

New

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/104

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

105 Terms

New cards

putting in a dataset

data<-c(81,85,93,93,99,76,75,84,78,84,81,82,89,81,96,82,74,70,84,86,80,70,131,75,88,102,115,89,82,79,106)

New cards

length(data)

tells you how may entries are in your vector

New cards

sort(data)

puts data smallest to largest

New cards

summary(data)

returns the five number summary and the sample mean

New cards

mean(data)

sd(data)

sum(data)

sample mean

sample standard deviation

adds all the elements of the vector

New cards

seq(1,100,1)

seq(2,100,2)

tells r to create the sequence of numbers from 1 to 100 by 1 (1,2,3...)

1:100 does the same thing

(2,4,6...100)

New cards

data<-read.table(file.choose(),header=TRUE)

read.cvs

calls this dataset into r and you can name it whatever you want

New cards

getwd()

creates the current working directory and calls in your data too

New cards

dim(data)

checks dimensions of data

New cards

data

type the name of the data to see it

New cards

data[1:5]

calls in the first five rows of the dataset if you are working with a large dataset

New cards

head(data)

shows the first few rows

New cards

data[,1:2]

all rows for columns 1 and 2

New cards

data[1:5,1:2]

first five rows and first two columns

New cards

str(data)

structure of the object

New cards

View(data)

(capital V) puts your data in a viewable popup window

New cards

data$shoes

calls in the column called shoes

New cards

attach(data)

attaches data so you can work with it and don't have to keep calling it in

(if you attach several datasets that have the same column names r will be confused so you have to detach before attaching again)

New cards

data<-subset(data,Type=="WT")

you can pick out certain rows/columns and you can enter more == and "" in order to be more specific

New cards

hist(data)

give you a histogram

New cards

hist(data,breaks=c(60,70,80,90))

creates a sequence and breaks them

New cards

xlab="percents of ..."

x axis name

New cards

ylab="Frequency"

y axis name

New cards

main="..."

title of the graph

New cards

boxplot(data)

gives a box plot

New cards

boxplot(...~...)

gives side by side boxplots

New cards

boxplot(Oil~Type,names=c("WIld Type","Mutated"))

creates your own label for the boxplot

New cards

plot(X,Y)

create a scatterplot

New cards

X1 <- c(8,5,14,13,29)

X2 <- c(13,8,6,18,4)

X1 is first line with the numbers you want

X2...

New cards

dbinom(j,n,p)

gives P(Y = j) discrete binomial probability

New cards

pbinom(J,n,p)

gives P(<=J) = P(Y = 0) + P(Y = 1) + ... + P(Y = J) exp binomial prob

New cards

dpois(j,lambda)

gives P(X=j) poisson discrete

New cards

ppois(J,lambda)

gives P(X<=J) poisson exponential

New cards

pexp(x,lambda)

gives P(Y<=j) exponential

New cards

qexp(p, lambda)

gives the pth percentile exponential

New cards

pnorm(x,mu,sigma)

Pr{X < x} for X~N(mu,sigma) .....so, 1-pnorm(x,mu,sigma) gives Pr(X > x)

New cards

qnorm(p,mu,sigma)

gives the value in the normal distribution (with mean mu and sd sigma) that has p to the left of it

norm prob

New cards

pt(t,df)

Pr{T < t} for T~t(df)

t distribution

New cards

qt(p,df)

gives the value of the t distribution with df=df that has p to the left of it

New cards

qqnorm(data)

normal QQplot of the data.

New cards

t.test(data,conf.level=0.95)

Using it for a 95% confidence interval for the population mean

New cards

descriptive statistics

collecting, organizing, and presenting the data.

New cards

inferential statistics

drawing conclusions about a population based on sample data from that population.

New cards

statistic

is a number calculated from a sample and is used to estimate the parameter.

New cards

Parameter

is a number used to describe a population.

New cards

time series data

A variable that is measured at regular intervals over time

New cards

cross-sectional data

When a characteristic is measured on many subjects at the same time point (or same time frame)

New cards

data warehouse

These data are recorded and stored electronically, in vast digital repositories

New cards

big data

describe data sets so large that traditional methods of storage and analysis are inadequate.

New cards

types of variables

quantitative, qualitative

New cards

categorical

arise from descriptive responses to questions like "What kind of advertising do you use?".may only have two possible values (like "yes" or "no").may be a number like a zip code

New cards

Qualitative

don't have a meaningful numerical value

aka categorical variables

New cards

nominal

Categorical variables used only to name categories that don't have order (grocery, clothing, hardware)

New cards

ordinal

When data values can be ordered (freshman, sophomore, junior, senior)

New cards

quantitative

have a numerical value that works like a number

New cards

discrete

there are jumps between the possible values

New cards

continuous

there is another possible value between any two values

New cards

identifier variables

a unique identifier assigned to each individual or item in a group (social security number, student ID number)

New cards

pattern of a distribution

skewed left is when the little tail is to the left and most of the data is on the right (

New cards

qualities of a good graphical display(and things to avoid)

Avoid 3-D

GooD:

good title

sample size

units

New cards

Frequency

New cards

Relative Frequency

number of times an allele occurs in a gene pool compared with the number of times other alleles occur

New cards

Pie Chart

Categorical (qualitative) data

Displays parts of a whole

Not good when there are too many categories

Don't ever make it 3-D or "tilted"!

New cards

Bar Graph (frequency and relative frequency)

Categorical (qualitative) data

Can be horizontal or vertical

Can display parts of a whole or separate values

For nominal data, put bars in ascending or descending order

For ordinal data, put bars in order of categories

New cards

Pictograms

A pictorial symbol or sign representing an object or concept. Used by many non-alphabetic written scripts.(can be misleading with pictures like the people pictures if one is bigger and one is smaller)

New cards

Line Graph

Displays quantitative data changing over timeTime should go on the horizontal axisVariable should go on the vertical axisUse different lines to denote separate categories or groups

New cards

Boxplot

A graph of the five-number summary.

good at comparing two datasets next to each other

New cards

Histogram (frequency and relative frequency)

medium to large quantitative datasets•

Bins touch•

Choice of number of bins can distort features of the shape of the distribution

(Notice, a boxplot displays the same data as the histogram, but the histogram shows more details about the shape. )

New cards

Scatterplot

is used to depict two potentially related quantitative variables.-Each point is a pairing: (x1,y1), (x2,y2), etc.-Linear, curvilinear, or no relationships-Positive vs. negative relationships

New cards

Five Number Summary

Boxplot: An efficient way to communicate the measure of center and variation all at once

New cards

IQR

Q3-Q1

New cards

Range

Max-Min

New cards

Sample Mean (x bar)

average of the sample

balancing point

New cards

Sample Standard Deviation (s)(just its properties not how to compute)

sqrt s squared

New cards

Mode

Most frequently occurring value

New cards

According to the shape of the distribution know when to use:

Mean vs. Median

Sample Standard Deviation vs. Quartiles (IQR and Range)

New cards

Coefficient of variation and What it is and why it's used

Standard deviation expressed as a percent of the mean

Compare variation in datasets with different units or means

New cards

Estimating percent of observations within certain standard deviations

Chebychev's Inequality

Empirical Rule

K = 2

1 -1/22

= 1 -¼

= 0.75 so at least 75% of observations lie within 2 standard deviations

New cards

Chebychev's Inequality

For any population with a mean, μ, and standard deviation, σ, the percent of observations that lie within kstandard deviations of the mean is at least (1−1/ksquared)×100

New cards

Empirical Rule

For unimodal distributions that are roughly normal, approximately

68% of observations are within 1 sdof the mean

95% of observations are within 2 sdof the mean

99.7% of observations are within 3 sdof the mean

New cards

Z-scores - calculation, interpretation, properties and when to use

A Z-score represents the number of standard deviations an observation is above or below the mean

Beware of using Z-scores from skewed distributions (the Z-scores have the same shape distributions as the original observations)

Cannot compare a Z-score from a skewed distribution to a Z-score from a symmetric distribution

New cards

2 basic rules (probability is on scale from 0 to 1; sum of probability of all (disjoint) events in sample space = 1)

P(A) = 0 → Event A will not occurP(A) = 1 → Event A will surely occurP(A) = ½ → Event A will happen 50% of the time

New cards

Compute probabilities for complements, unions, intersections, conditional events

Complement:

The probability of the complement of an event, P(Ac), is equal to one minus the probability of the event

Union:

The probability that event A or B occurs (at least one of the events happens) is:P(A U B) = P(A) + P(B) -P(A ∩ B)

Intersection:

General intersection rulefor two events both occurring (always works): P(A ∩ B) = P(A)P(B|A)= P(B) P(A|B)

New cards

Disjoint events

Two events, A and B, are said to be disjoint(or mutually exclusive) if they share no outcomes in common. Disjoint events have no intersection.P(A ∩ B) = 0

New cards

independent events

Two events are independentif the occurrence of one event does not affect the probability of the occurrence of the other event.Examples: Two flips of a coin, two rolls of a die, two spins of a roulette wheel

New cards

tree diagram( how to use and determine independence)

New cards

Bayes' Theorem

go over some examples in the notes

a theorem describing how the conditional probability of each of a set of possible causes for a given observed outcome can be computed from knowledge of the probability of each cause and the conditional probability of the outcome of each cause.

New cards

mean for a discrete random variable

The mean or expected value of a discrete random variable X is muX= E(X)= Σxi P(xi)

New cards

variance(sd) for a discrete random variable

The variance of a discrete random variable Y is sigmaX2= Var(X) = Σ[(Xi-mux)squaredP (xi)]

New cards

Binomial (when to use, mean and variance(sd))

A fixed number, n,trials take place, where:

1.Each trial has only two possible outcomes ("Success" or "Failure")

2.Probability of "Success" is a constant pfor every trial

3.Trials are identical and independentThese are called Bernoulli trials.

New cards

Poisson (When to use

Lambda (λ) and its relation to the exponential distribution

1)The event cannot occur twice at exactly the same time.

2)No occurrence of the event being analyzed affects the probability of the event re-occurring (events occur independently).

3)The expected number of occurrences of the event during any such interval is a constant

Note: The Poisson Distribution is only designed to be applied to events that occur relatively rarely.

New cards

Normal distribution (Properties, Standard normal distribution, z-scores and their interpretation)

Properties:

Lengths and weights of newborn babies

Scores on SAT

Cumulative debt in college students

Advertising expenditure of firms

New cards

Standard Normal Distribution

has a mean of 0 and a standard deviation of 1

(normal can be transferred to standard by z=x-u/sd)

New cards

z-scores

is the number of standard deviations from the mean a data point is. But more technically it's a measure of how many standard deviations below or above the population mean a raw score is

New cards

u is the population mean and sigma is the population standard deviation

New cards

Parameter (ch. 7)

number used to describe a population

(usually do not know the value of a parameter; it is a fixed number)

New cards

Statistic (ch. 7)

is a number calculated from a sample and is used to estimate the parameter

(we know the value; it will change from sample to sample)

New cards

sample survey