1/113
Computer
A programmable machine designed to follow instructions
Program
Instructions in computer memory to make it do something
Programmer
A person who writes instructions (programs) to make a computer perform a task
Central Processing Unit (CPU)
The main hardware component responsible for executing instructions and performing calculations
Main Memory
The primary storage area in a computer where data and instructions are stored for immediate access
Secondary Memory / Storage
Non-volatile storage devices that retain data even when the computer is turned off
Input Devices
Devices that send information to the computer from outside, such as a keyboard or mouse
Output Devices
Devices that display or present information from the computer to the user, such as a monitor or printer
Random Access Memory (RAM)
Another term for main memory, which is volatile and erased when the program terminates or the computer is turned off
Byte
A unit of memory that consists of 8 consecutive bits and has its own unique address
Address
A unique number that identifies each byte in the main memory
Secondary Storage
Non-volatile storage media, such as hard drives or flash drives, that retain data when the program is not running or the computer is turned off
Machine Language
The language that the computer understands, consisting of binary numbers (0s and 1s)
Programming Languages
Languages used by programmers to write instructions for the computer to execute
Algorithm
A set of well-defined steps that a program follows to perform a task
Low-level Language
A programming language that communicates directly with the computer hardware and is often written in binary machine code
Data Science
A blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from raw data
Primarily used to make decisions and predictions making use of casual and prescriptive analytics, and machine learning
Cons of Business Intelligence (BI) tools
Most data used is unstructured or semi-structured and tools are not capable of processing said data
R language
Software environment for statistical computing and graphics
Supported by the R foundation
Assignment operator
Comments
Variable Name
These must start with a letter or a dot, and can only contain:
Letters
Numbers
Underlines
Dots
Data Type
These are what classify the different values present in R
Numeric
Logical
Characters
This is used to create or convert a value into the ‘integer’ type
The explicit function used to display a variable
The concatenates strings and characters
Example:
print(paste(“My name is”, name, “.”))
Exponentation
Multiplication
Division
Modulo
5 %% 2 = 1
Integer division
5 %/% 2 = 2
Imports additional features from the library into the R environment
Vectors
One-dimensional arrays that can hold:
Numeric data
Character data
Logical data
Function used to concatenate and form a vector
Elements
Components of a vector or array
The number of members in a vector or array
Creating a general sequence vector
Creating a vector of replicated elements
This operator generates regular sequences
Returns the total of all elements in the vector
Returns the product of all elements in the vector
Returns the sum of the vector divided by the length of the vector
This can be used to sort vectors
Defaults to ascending order unless decreasing = TRUE is set
Value Coercion
When one value type is converted to another to better maintain the same primitive data type throughout the entire vector
This function is used to test if a vector contains a given element
Finds the unique elements of a vector
Reverses the elements
This shows the index of the max or min
Shows the five-number summary of a set
Recycling Rule
If there are two vectors of unequal length, the shorter will be reused in order to match the longer vector
Negative Index
This will strip the member whose position has the same absolute value as this
This creates a random set of data
Boolean Expression
This is an expression that evaluates to either TRUE or FALSE
Logical NOT operator
Logical AND operator
Logical OR operator
Selection by Comparison
This is when a comparison operator is used on an entire vector at once. The results are then displayed in order
This creates a basic plot of a graph that is ready to be tweaked and altered
Plot Types
‘p’ for points (default value)
‘l’ for lines
‘b’ for both points and lines
‘o’ for both overplotted
‘h’ for histogram
‘s’ for stair steps
‘n’ for no plotting
How we assign names to vector members
Accessing a column in an R function
Matrix
A two dimensional generalization of a vector
Function to create a matrix
data =
nrow =
ncol =
byrow =
Accessing row y and column z
col/rownames( )
names( ) ← name_vector
Multiplying Matrices
We can only do this if the number of columns in the first matrix is the same as the number of rows in the second column
2 × 3 × 3 × 4
Matrix multiplication operator
These functions both create matrices by combining several factors of the same length
This function returns the dimension of the matrix
The transpose of a matrix
Determinate of the matrix
Inverse of the matrix
Apply a function to the margins of a matrix
Traspose
This is a matrix whose rows are the columns of the original
Used to create the identity matrix for a desired dimension
Deconstruction
When we apply the c( ) function to a matrix to combine all the vectors into one
Deleting a matrix column
Used to calculate the totals for each column/row of a matrix
This function converts a data.table into a character matrix
Uses one of the columns in the table as the row names
Converts a data.table into a numeric matrix
This function rounds off values in the first argument to the specified number of places
Used to create a barplot
Can be used with either a vector or matrix
Argument used when plotting that plots names of each bar/point
This changes the orientation of a barplot to be horizontal
This adds labels to a data plot
Specify values for the x variable, the y variable and the labels
Values placed on a plot
We can display the data’s values near data points/bars with a custom code:
text(b, matrix$column + 5, labels = as.character(matrix$column), col = ‘red’)
This adds a legend to an R plot
Placed with locations such as ‘topleft’
Creates x amount of the heat color pattern in plots
Creates x amount of the cm color pattern in plots
Creates x amount of the topo color pattern in plots
This argument can be used to create clustered bar plots
This can be used to create a matrix of scatter plots
Used to compare multiple values of a matrix at once
Point Shapes
This can be used on the upper or lower half of a scatterplot matrix
Used to only view half of the usual plots
This allows us to make entry-by-entry changes to data frame and matrices
x is a matrix
Margin is:
1 for rows
2 for columns
fun is the function to be applied
Numeric values indicating point size
Color name for points
Factors
Variables that take categorial variables and stores them in levels
Categorial variable
Variables that take values based on labels or names
Continuous variable
These can take any values