MGSC 291 Exam 1 University of South Carolina

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
GameKnowt Play
New
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/104

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

105 Terms

1
New cards

putting in a dataset

data<-c(81,85,93,93,99,76,75,84,78,84,81,82,89,81,96,82,74,70,84,86,80,70,131,75,88,102,115,89,82,79,106)

2
New cards

length(data)

tells you how may entries are in your vector

3
New cards

sort(data)

puts data smallest to largest

4
New cards

summary(data)

returns the five number summary and the sample mean

5
New cards

mean(data)

sd(data)

sum(data)

sample mean

sample standard deviation

adds all the elements of the vector

6
New cards

seq(1,100,1)

seq(2,100,2)

tells r to create the sequence of numbers from 1 to 100 by 1 (1,2,3...)

1:100 does the same thing

(2,4,6...100)

7
New cards

data<-read.table(file.choose(),header=TRUE)

read.cvs

calls this dataset into r and you can name it whatever you want

8
New cards

getwd()

creates the current working directory and calls in your data too

9
New cards

dim(data)

checks dimensions of data

10
New cards

data

type the name of the data to see it

11
New cards

data[1:5]

calls in the first five rows of the dataset if you are working with a large dataset

12
New cards

head(data)

shows the first few rows

13
New cards

data[,1:2]

all rows for columns 1 and 2

14
New cards

data[1:5,1:2]

first five rows and first two columns

15
New cards

str(data)

structure of the object

16
New cards

View(data)

(capital V) puts your data in a viewable popup window

17
New cards

data$shoes

calls in the column called shoes

18
New cards

attach(data)

attaches data so you can work with it and don't have to keep calling it in

(if you attach several datasets that have the same column names r will be confused so you have to detach before attaching again)

19
New cards

data<-subset(data,Type=="WT")

you can pick out certain rows/columns and you can enter more == and "" in order to be more specific

20
New cards

hist(data)

give you a histogram

21
New cards

hist(data,breaks=c(60,70,80,90))

creates a sequence and breaks them

22
New cards

xlab="percents of ..."

x axis name

23
New cards

ylab="Frequency"

y axis name

24
New cards

main="..."

title of the graph

25
New cards

boxplot(data)

gives a box plot

26
New cards

boxplot(...~...)

gives side by side boxplots

27
New cards

boxplot(Oil~Type,names=c("WIld Type","Mutated"))

creates your own label for the boxplot

28
New cards

plot(X,Y)

create a scatterplot

29
New cards

X1 <- c(8,5,14,13,29)

X2 <- c(13,8,6,18,4)

X1 is first line with the numbers you want

X2...

30
New cards

dbinom(j,n,p)

gives P(Y = j) discrete binomial probability

31
New cards

pbinom(J,n,p)

gives P(<=J) = P(Y = 0) + P(Y = 1) + ... + P(Y = J) exp binomial prob

32
New cards

dpois(j,lambda)

gives P(X=j) poisson discrete

33
New cards

ppois(J,lambda)

gives P(X<=J) poisson exponential

34
New cards

pexp(x,lambda)

gives P(Y<=j) exponential

35
New cards

qexp(p, lambda)

gives the pth percentile exponential

36
New cards

pnorm(x,mu,sigma)

Pr{X < x} for X~N(mu,sigma) .....so, 1-pnorm(x,mu,sigma) gives Pr(X > x)

37
New cards

qnorm(p,mu,sigma)

gives the value in the normal distribution (with mean mu and sd sigma) that has p to the left of it

norm prob

38
New cards

pt(t,df)

Pr{T < t} for T~t(df)

t distribution

39
New cards

qt(p,df)

gives the value of the t distribution with df=df that has p to the left of it

40
New cards

qqnorm(data)

normal QQplot of the data.

41
New cards

t.test(data,conf.level=0.95)

Using it for a 95% confidence interval for the population mean

42
New cards

descriptive statistics

collecting, organizing, and presenting the data.

43
New cards

inferential statistics

drawing conclusions about a population based on sample data from that population.

44
New cards

statistic

is a number calculated from a sample and is used to estimate the parameter.

45
New cards

Parameter

is a number used to describe a population.

46
New cards

time series data

A variable that is measured at regular intervals over time

47
New cards

cross-sectional data

When a characteristic is measured on many subjects at the same time point (or same time frame)

48
New cards

data warehouse

These data are recorded and stored electronically, in vast digital repositories

49
New cards

big data

describe data sets so large that traditional methods of storage and analysis are inadequate.

50
New cards

types of variables

quantitative, qualitative

51
New cards

categorical

arise from descriptive responses to questions like "What kind of advertising do you use?".may only have two possible values (like "yes" or "no").may be a number like a zip code

52
New cards

Qualitative

don't have a meaningful numerical value

aka categorical variables

53
New cards

nominal

Categorical variables used only to name categories that don't have order (grocery, clothing, hardware)

54
New cards

ordinal

When data values can be ordered (freshman, sophomore, junior, senior)

55
New cards

quantitative

have a numerical value that works like a number

56
New cards

discrete

there are jumps between the possible values

57
New cards

continuous

there is another possible value between any two values

58
New cards

identifier variables

a unique identifier assigned to each individual or item in a group (social security number, student ID number)

59
New cards

pattern of a distribution

skewed left is when the little tail is to the left and most of the data is on the right (

60
New cards

qualities of a good graphical display(and things to avoid)

Avoid 3-D

GooD:

good title

sample size

units

61
New cards

Frequency

62
New cards

Relative Frequency

number of times an allele occurs in a gene pool compared with the number of times other alleles occur

63
New cards

Pie Chart

Categorical (qualitative) data

Displays parts of a whole

Not good when there are too many categories

Don't ever make it 3-D or "tilted"!

64
New cards

Bar Graph (frequency and relative frequency)

Categorical (qualitative) data

Can be horizontal or vertical

Can display parts of a whole or separate values

For nominal data, put bars in ascending or descending order

For ordinal data, put bars in order of categories

65
New cards

Pictograms

A pictorial symbol or sign representing an object or concept. Used by many non-alphabetic written scripts.(can be misleading with pictures like the people pictures if one is bigger and one is smaller)

66
New cards

Line Graph

Displays quantitative data changing over timeTime should go on the horizontal axisVariable should go on the vertical axisUse different lines to denote separate categories or groups

67
New cards

Boxplot

A graph of the five-number summary.

good at comparing two datasets next to each other

68
New cards

Histogram (frequency and relative frequency)

medium to large quantitative datasets•

Bins touch•

Choice of number of bins can distort features of the shape of the distribution

(Notice, a boxplot displays the same data as the histogram, but the histogram shows more details about the shape. )

69
New cards

Scatterplot

is used to depict two potentially related quantitative variables.-Each point is a pairing: (x1,y1), (x2,y2), etc.-Linear, curvilinear, or no relationships-Positive vs. negative relationships

70
New cards

Five Number Summary

Boxplot: An efficient way to communicate the measure of center and variation all at once

71
New cards

IQR

Q3-Q1

72
New cards

Range

Max-Min

73
New cards

Sample Mean (x bar)

average of the sample

balancing point

74
New cards

Sample Standard Deviation (s)(just its properties not how to compute)

sqrt s squared

75
New cards

Mode

Most frequently occurring value

76
New cards

According to the shape of the distribution know when to use:

Mean vs. Median

Sample Standard Deviation vs. Quartiles (IQR and Range)

77
New cards

Coefficient of variation and What it is and why it's used

Standard deviation expressed as a percent of the mean

Compare variation in datasets with different units or means

78
New cards

Estimating percent of observations within certain standard deviations

Chebychev's Inequality

Empirical Rule

K = 2

1 -1/22

= 1 -¼

= 0.75 so at least 75% of observations lie within 2 standard deviations

79
New cards

Chebychev's Inequality

For any population with a mean, μ, and standard deviation, σ, the percent of observations that lie within kstandard deviations of the mean is at least (1−1/ksquared)×100

80
New cards

Empirical Rule

For unimodal distributions that are roughly normal, approximately

68% of observations are within 1 sdof the mean

95% of observations are within 2 sdof the mean

99.7% of observations are within 3 sdof the mean

81
New cards

Z-scores - calculation, interpretation, properties and when to use

A Z-score represents the number of standard deviations an observation is above or below the mean

Beware of using Z-scores from skewed distributions (the Z-scores have the same shape distributions as the original observations)

Cannot compare a Z-score from a skewed distribution to a Z-score from a symmetric distribution

82
New cards

2 basic rules (probability is on scale from 0 to 1; sum of probability of all (disjoint) events in sample space = 1)

P(A) = 0 → Event A will not occurP(A) = 1 → Event A will surely occurP(A) = ½ → Event A will happen 50% of the time

83
New cards

Compute probabilities for complements, unions, intersections, conditional events

Complement:

The probability of the complement of an event, P(Ac), is equal to one minus the probability of the event

Union:

The probability that event A or B occurs (at least one of the events happens) is:P(A U B) = P(A) + P(B) -P(A ∩ B)

Intersection:

General intersection rulefor two events both occurring (always works): P(A ∩ B) = P(A)P(B|A)= P(B) P(A|B)

84
New cards

Disjoint events

Two events, A and B, are said to be disjoint(or mutually exclusive) if they share no outcomes in common. Disjoint events have no intersection.P(A ∩ B) = 0

85
New cards

independent events

Two events are independentif the occurrence of one event does not affect the probability of the occurrence of the other event.Examples: Two flips of a coin, two rolls of a die, two spins of a roulette wheel

86
New cards

tree diagram( how to use and determine independence)

87
New cards

Bayes' Theorem

go over some examples in the notes

a theorem describing how the conditional probability of each of a set of possible causes for a given observed outcome can be computed from knowledge of the probability of each cause and the conditional probability of the outcome of each cause.

88
New cards

mean for a discrete random variable

The mean or expected value of a discrete random variable X is muX= E(X)= Σxi P(xi)

89
New cards

variance(sd) for a discrete random variable

The variance of a discrete random variable Y is sigmaX2= Var(X) = Σ[(Xi-mux)squaredP (xi)]

90
New cards

Binomial (when to use, mean and variance(sd))

A fixed number, n,trials take place, where:

1.Each trial has only two possible outcomes ("Success" or "Failure")

2.Probability of "Success" is a constant pfor every trial

3.Trials are identical and independentThese are called Bernoulli trials.

91
New cards

Poisson (When to use

Lambda (λ) and its relation to the exponential distribution

1)The event cannot occur twice at exactly the same time.

2)No occurrence of the event being analyzed affects the probability of the event re-occurring (events occur independently).

3)The expected number of occurrences of the event during any such interval is a constant

Note: The Poisson Distribution is only designed to be applied to events that occur relatively rarely.

92
New cards

Normal distribution (Properties, Standard normal distribution, z-scores and their interpretation)

Properties:

Lengths and weights of newborn babies

Scores on SAT

Cumulative debt in college students

Advertising expenditure of firms

93
New cards

Standard Normal Distribution

has a mean of 0 and a standard deviation of 1

(normal can be transferred to standard by z=x-u/sd)

94
New cards

z-scores

is the number of standard deviations from the mean a data point is. But more technically it's a measure of how many standard deviations below or above the population mean a raw score is

95
New cards

u is the population mean and sigma is the population standard deviation

96
New cards

Parameter (ch. 7)

number used to describe a population

(usually do not know the value of a parameter; it is a fixed number)

97
New cards

Statistic (ch. 7)

is a number calculated from a sample and is used to estimate the parameter

(we know the value; it will change from sample to sample)

98
New cards

sample survey

when the respondents in a survey provide their own data

99
New cards

census

when a survey attempts to use the entire population as the sample

100
New cards

Central Limit Theorem

the sampling distribution of a sum or percentage will become approximately normal as the sample size gets larger