Intro to Data Science Midterm

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/113

flashcard set

Earn XP

Description and Tags

COMPSCI 181

114 Terms

1
New cards

Computer

A programmable machine designed to follow instructions

2
New cards

Program

Instructions in computer memory to make it do something

3
New cards

Programmer

A person who writes instructions (programs) to make a computer perform a task

4
New cards

Central Processing Unit (CPU)

The main hardware component responsible for executing instructions and performing calculations

5
New cards

Main Memory

The primary storage area in a computer where data and instructions are stored for immediate access

6
New cards

Secondary Memory / Storage

Non-volatile storage devices that retain data even when the computer is turned off

7
New cards

Input Devices

Devices that send information to the computer from outside, such as a keyboard or mouse

8
New cards

Output Devices

Devices that display or present information from the computer to the user, such as a monitor or printer

9
New cards

Random Access Memory (RAM)

Another term for main memory, which is volatile and erased when the program terminates or the computer is turned off

10
New cards

Byte

A unit of memory that consists of 8 consecutive bits and has its own unique address

11
New cards

Address

A unique number that identifies each byte in the main memory

12
New cards

Secondary Storage

Non-volatile storage media, such as hard drives or flash drives, that retain data when the program is not running or the computer is turned off

13
New cards

Machine Language

The language that the computer understands, consisting of binary numbers (0s and 1s)

14
New cards

Programming Languages

Languages used by programmers to write instructions for the computer to execute

15
New cards

Algorithm

A set of well-defined steps that a program follows to perform a task

16
New cards

Low-level Language

A programming language that communicates directly with the computer hardware and is often written in binary machine code

17
New cards

Data Science

A blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from raw data

Primarily used to make decisions and predictions making use of casual and prescriptive analytics, and machine learning

18
New cards

Cons of Business Intelligence (BI) tools

Most data used is unstructured or semi-structured and tools are not capable of processing said data

19
New cards

R language

Software environment for statistical computing and graphics

Supported by the R foundation

20
New cards

Assignment operator

21
New cards

#

Comments

22
New cards

Variable Name

These must start with a letter or a dot, and can only contain:

  • Letters

  • Numbers

  • Underlines

  • Dots

23
New cards

Data Type

These are what classify the different values present in R

  • Numeric

  • Logical

  • Characters

24
New cards

as.integer( )

This is used to create or convert a value into the ‘integer’ type

25
New cards

print( )

The explicit function used to display a variable

26
New cards

paste( )

The concatenates strings and characters

Example:

print(paste(“My name is”, name, “.”))

27
New cards

^

Exponentation

28
New cards

*

Multiplication

29
New cards

/

Division

30
New cards

%%

Modulo

5 %% 2 = 1

31
New cards

%/%

Integer division

5 %/% 2 = 2

32
New cards

install.package

Imports additional features from the library into the R environment

33
New cards

Vectors

One-dimensional arrays that can hold:

  • Numeric data

  • Character data

  • Logical data

34
New cards

c( )

Function used to concatenate and form a vector

35
New cards

Elements

Components of a vector or array

36
New cards

length( )

The number of members in a vector or array

37
New cards

seq( )

Creating a general sequence vector

38
New cards

rep( )

Creating a vector of replicated elements

39
New cards

:

This operator generates regular sequences

40
New cards

sum( )

Returns the total of all elements in the vector

41
New cards

prod( )

Returns the product of all elements in the vector

42
New cards

mean( )

Returns the sum of the vector divided by the length of the vector

43
New cards

sort( )

This can be used to sort vectors

Defaults to ascending order unless decreasing = TRUE is set

44
New cards

Value Coercion

When one value type is converted to another to better maintain the same primitive data type throughout the entire vector

45
New cards

is.element( )

This function is used to test if a vector contains a given element

46
New cards

unique( )

Finds the unique elements of a vector

47
New cards

rev( )

Reverses the elements

48
New cards

which.max/min( )

This shows the index of the max or min

49
New cards

fivenum( )

Shows the five-number summary of a set

50
New cards

Recycling Rule

If there are two vectors of unequal length, the shorter will be reused in order to match the longer vector

51
New cards

Negative Index

This will strip the member whose position has the same absolute value as this

52
New cards

sample( )

This creates a random set of data

53
New cards

Boolean Expression

This is an expression that evaluates to either TRUE or FALSE

54
New cards

!

Logical NOT operator

55
New cards

&

Logical AND operator

56
New cards

|

Logical OR operator

57
New cards

Selection by Comparison

This is when a comparison operator is used on an entire vector at once. The results are then displayed in order

58
New cards

plot( )

This creates a basic plot of a graph that is ready to be tweaked and altered

59
New cards

Plot Types

  • ‘p’ for points (default value)

  • ‘l’ for lines

  • ‘b’ for both points and lines

  • ‘o’ for both overplotted

  • ‘h’ for histogram

  • ‘s’ for stair steps

  • ‘n’ for no plotting

60
New cards

names( )

How we assign names to vector members

61
New cards

TableName$Column

Accessing a column in an R function

62
New cards

Matrix

A two dimensional generalization of a vector

63
New cards

matrix( )

Function to create a matrix

  • data =

  • nrow =

  • ncol =

  • byrow =

64
New cards

x[y, z]

Accessing row y and column z

65
New cards

col/rownames( )

names( ) ← name_vector

66
New cards

Multiplying Matrices

We can only do this if the number of columns in the first matrix is the same as the number of rows in the second column

2 × 3 × 3 × 4

67
New cards

%*%

Matrix multiplication operator

68
New cards

c/rbind( )

These functions both create matrices by combining several factors of the same length

69
New cards

dim( )

This function returns the dimension of the matrix

70
New cards

t( )

The transpose of a matrix

71
New cards

det( )

Determinate of the matrix

72
New cards

solve( )

Inverse of the matrix

73
New cards

apply( )

Apply a function to the margins of a matrix

74
New cards

Traspose

This is a matrix whose rows are the columns of the original

75
New cards

diag( )

Used to create the identity matrix for a desired dimension

76
New cards

Deconstruction

When we apply the c( ) function to a matrix to combine all the vectors into one

77
New cards

Matrix[, -x]

Deleting a matrix column

78
New cards

col/rowSums( )

Used to calculate the totals for each column/row of a matrix

79
New cards

as.matrix( )

This function converts a data.table into a character matrix

Uses one of the columns in the table as the row names

80
New cards

data.matrix( )

Converts a data.table into a numeric matrix

81
New cards

round( )

This function rounds off values in the first argument to the specified number of places

82
New cards

barplot( )

Used to create a barplot

Can be used with either a vector or matrix

83
New cards

names.arg

Argument used when plotting that plots names of each bar/point

84
New cards

horiz = TRUE

This changes the orientation of a barplot to be horizontal

85
New cards

text(x, y, label )

This adds labels to a data plot

Specify values for the x variable, the y variable and the labels

86
New cards

Values placed on a plot

We can display the data’s values near data points/bars with a custom code:

text(b, matrix$column + 5, labels = as.character(matrix$column), col = ‘red’)

87
New cards

legend( )

This adds a legend to an R plot

Placed with locations such as ‘topleft’

88
New cards

col = heat.colors(x)

Creates x amount of the heat color pattern in plots

89
New cards

col = cm.colors(x)

Creates x amount of the cm color pattern in plots

90
New cards

col = topo.colors(x)

Creates x amount of the topo color pattern in plots

91
New cards

beside = TRUE

This argument can be used to create clustered bar plots

92
New cards

pairs( )

This can be used to create a matrix of scatter plots

Used to compare multiple values of a matrix at once

93
New cards

Point Shapes

knowt flashcard image
94
New cards

lower.panel = NULL

This can be used on the upper or lower half of a scatterplot matrix

Used to only view half of the usual plots

95
New cards

apply(x, Margin, fun )

This allows us to make entry-by-entry changes to data frame and matrices

  • x is a matrix

  • Margin is:

    • 1 for rows

    • 2 for columns

    • fun is the function to be applied

96
New cards

cex

Numeric values indicating point size

97
New cards

col

Color name for points

98
New cards

Factors

Variables that take categorial variables and stores them in levels

99
New cards

Categorial variable

Variables that take values based on labels or names

100
New cards

Continuous variable

These can take any values