1/141
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
R programming language
A language and environment for statistical computing and graphics, data analysis, and the development of statistical software.
Hypertext Markup Language v5
Defines the content and structure of web browser content.
Java
An Android programming language designed to minimize implementation dependencies.
SWIFT
A programming language developed by Apple.
C++
A general-purpose programming language used for developing operating systems.
CSS
Cascading Style Sheets language used for specifying the presentation and style of a document.
Pipe tool in R
(%>%) chains multiple operations together, using the output of the left expression as the first argument of the function on the right.
Tidyverse package Tidyr
Provides functions to help achieve tidy data (consistent form data).
Tidyverse package readxl
Used for importing Excel files into R.
YAML
A human-readable data serialization language for configuring files and data storage in applications.
Metadata
Data that provides information about other data; descriptive metadata gives information about a resource.
Calculated field
A new field in a Pivot Table that performs calculations based on existing fields.
Algorithm
A process or set of rules to be followed in calculations, especially by a computer.
Pivot tables
A powerful Excel tool for calculating, summarizing, and analyzing data to visualize patterns and comparisons.
Code chunk
A runnable piece of R code in R markdown, allowing for sharing of input history with other analysts.
Jupyter Notebook
A web-based interactive computing platform for live code, equations, and narrative text in Python.
Rich Text Format
A file format that enables the exchange of text files between different word processing programs.
Dashboard
A graphical user interface that provides information on key performance indicators.
GitHub document
Management of documentation for software projects created and hosted by GitHub.
Dot RMD format
An R markdown file format that combines markdown text with embedded R code chunks for reproducible research and data analysis.
The knit option
Converts a file from dot RMD format to another file type for sharing.
Geoms layer
Combines data, aesthetic mapping, an object, and a position adjustment to create visualizations.
Annotate layer
Displays text about features on a map, serving as a comments section for visualizations.
Tidyverse package ggplot2
Transforms and creates various plots (e.g., box plots, scatter plots, bar graphs).
ggplot() function
Provides the data to be inserted in a graph or chart.
geom_boxplot() function
Displays the distribution of a continuous variable using a boxplot.
geom_line() function
Creates line graphs by connecting specified variables in the data.
geom_bar() function
Makes the height of the bar proportional to the number of cases in each group.
geom_point() function
Creates a layer of points for scatter plots.
Data Validation tool
Restricts the type of data or values that users can enter in Excel cells.
LEN function
Returns the number of characters in a given cell.
CONVERT function
Converts a number from one measurement system to another.
MIN and MAX functions
Return the minimum and maximum values within Excel.
VALUE function
Converts a text string that represents a number into a number.
VLOOKUP function
a spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information
HLOOKUP function
Searches for a value in the top row of an array and retrieves a value from a specified row.
TRIM function
Eliminates extra spaces within a text string.
SUMPRODUCT function
Multiplies ranges of cells and arrays and returns the sum of the products.
COUNT function
Returns the number of entries in a number field within a range or array.
COUNTIF function
Counts the number of cells that meet a specified criterion.
COUNTIFS function
Counts cells in a range based on one or more true or false conditions.
MATCH function
Searches for a specified item in a range and returns its position.
SUM function
Performs the addition operation on specified values.
SUMIF function
Adds values if a certain criterion is met.
SUMIFS function
Adds values if multiple criteria are met.
SQL DATETIME format
YYYY-MM-DD HH:MI:SS.
CONCAT_WS function
Adds two or more strings together with a separator.
JOIN
Combines rows from two or more tables based on a related column.
INNER JOIN
Selects records with matching values in both tables.
OUTER JOIN
Returns all records with values in either the left and right table.
AS
Renames a column or table with an alias.
FROM:
specific which table to select and delete from
WITH
allows you to give a sub-query block a name (a process also called sub-query refactoring)
ORDER BY
keyword sorts the records in ascending order by default. (type DESC for descending)
GROUP BY
statement is often used with clustered functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) - A SQL clause that groups rows that have the same values from a table into summary rows
mean() - R
method calculates the mean (average) of the given data set.
median() - R
calculates in R to find the middle most value in a data series.
sd() - R
calculates the standard deviation within R.
var() - R
calculates the variances within a number set.
max() - R
returns the maximum of a specified table or list.
min() - R
returns the minimum of a specified table or list.
quantile() - R
divides the data into equal halves, in which the median acts as middle and over that the remaining lower part is lower quartile and upper part is upper quartile.
summary() - R
useful to quickly summarize the values in a vector, data frame model in R.
typeof() - R
provide useful information about your vectors and R objects in general.
sum() - R
returns the addition of the values passed as arguments to the function.
print() - R
generic function which means that new printing methods can be easily added.
range() - R
returns a vector containing the minimum and maximum of all the given arguments.
str() R
displays the internal structure of an object such as an array, list, matrix, factor, or data frame.
ncol() - R
returns the total number of columns present in the object.
length() - R
used to find out how many items are present in a vector.
annotate() - R
allows to add all kind of shape on a ggplot2 chart.
Names() - R
give the names of the corresponding list (in environment pane).
unite()
convenience function to paste together multiple variable values into one.
split()
takes a vector or other objects and splits it into groups determined by a factor or list of factors.
Pipe operator (%>%) - R
way to chain multiple operations together in a concise and expressive way.
bias function - R
the average amount by which actual is greater than predicted (closer to 0.0 less bias).
Data Validation tool
restrict the type of data or values that users enter in cells.
4 Phases of Analysis
Organize Dara
Format and adjust data
Ask for input
Transform Data
Data Validation
Allows you to control what can and can’t be entered in your work.
Examples of Data Validation
Add dropdown lists with predetermined options to choose from
Examples of Data Validation
Create custom check boxes
Examples of Data Validation
Protect structured data and formulas
SQL
Order by sorts the query in Ascending order by default.
Subqueries
must be enclosed within parentheses.
Aubquery
can have only one column specified in the SELECT clause. If you want a subquery to compare multiple columns, those columns must be selected in the main query.
CONCAT
CONCAT(‘GOOGLE’,’.com’)
CONCAT
Google.com
CONCAT_WS
CONCAT_WS(‘ . ‘ , ‘WWW’ , ‘GOOGLE’ , ‘.COM’)
CONCAT_WS
WWW.GOOGLE.COM
| | OPERATOR
‘GOOGLE’ || ‘.COM’
Simple Spreadsheets
use pivot table
Multiple data sets or large spreadsheets
USE SQL
For larger complex data
Use R
Aggregation
collecting or gathering many separate pieces into a whole.
Unique Advantages of R
Data manipulation, data visualization, and statistics packages
Unique Advantages of Python
Easy syntax for machine learning
Unique Advantages of Python
Integrates with cloud platforms like Google Cloud, Amazon Web services, and Azure
Unique Challenges of Python
Many more decisions for beginners to make about data input/output, structure, variables, packages, and objects
What is a primary advantage of SQL?
Allows users to manipulate and reorganize data as needed to aid analysis