465 Data Applications --- Terms and Concepts to Review

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/141

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

142 Terms

1
New cards

R programming language

A language and environment for statistical computing and graphics, data analysis, and the development of statistical software.

2
New cards

Hypertext Markup Language v5

Defines the content and structure of web browser content.

3
New cards

Java

An Android programming language designed to minimize implementation dependencies.

4
New cards

SWIFT

A programming language developed by Apple.

5
New cards

C++

A general-purpose programming language used for developing operating systems.

6
New cards

CSS

Cascading Style Sheets language used for specifying the presentation and style of a document.

7
New cards

Pipe tool in R

(%>%) chains multiple operations together, using the output of the left expression as the first argument of the function on the right.

8
New cards

Tidyverse package Tidyr

Provides functions to help achieve tidy data (consistent form data).

9
New cards

Tidyverse package readxl

Used for importing Excel files into R.

10
New cards

YAML

A human-readable data serialization language for configuring files and data storage in applications.

11
New cards

Metadata

Data that provides information about other data; descriptive metadata gives information about a resource.

12
New cards

Calculated field

A new field in a Pivot Table that performs calculations based on existing fields.

13
New cards

Algorithm

A process or set of rules to be followed in calculations, especially by a computer.

14
New cards

Pivot tables

A powerful Excel tool for calculating, summarizing, and analyzing data to visualize patterns and comparisons.

15
New cards

Code chunk

A runnable piece of R code in R markdown, allowing for sharing of input history with other analysts.

16
New cards

Jupyter Notebook

A web-based interactive computing platform for live code, equations, and narrative text in Python.

17
New cards

Rich Text Format

A file format that enables the exchange of text files between different word processing programs.

18
New cards

Dashboard

A graphical user interface that provides information on key performance indicators.

19
New cards

GitHub document

Management of documentation for software projects created and hosted by GitHub.

20
New cards

Dot RMD format

An R markdown file format that combines markdown text with embedded R code chunks for reproducible research and data analysis.

21
New cards

The knit option

Converts a file from dot RMD format to another file type for sharing.

22
New cards

Geoms layer

Combines data, aesthetic mapping, an object, and a position adjustment to create visualizations.

23
New cards

Annotate layer

Displays text about features on a map, serving as a comments section for visualizations.

24
New cards

Tidyverse package ggplot2

Transforms and creates various plots (e.g., box plots, scatter plots, bar graphs).

25
New cards

ggplot() function

Provides the data to be inserted in a graph or chart.

26
New cards

geom_boxplot() function

Displays the distribution of a continuous variable using a boxplot.

27
New cards

geom_line() function

Creates line graphs by connecting specified variables in the data.

28
New cards

geom_bar() function

Makes the height of the bar proportional to the number of cases in each group.

29
New cards

geom_point() function

Creates a layer of points for scatter plots.

30
New cards

Data Validation tool

Restricts the type of data or values that users can enter in Excel cells.

31
New cards

LEN function

Returns the number of characters in a given cell.

32
New cards

CONVERT function

Converts a number from one measurement system to another.

33
New cards

MIN and MAX functions

Return the minimum and maximum values within Excel.

34
New cards

VALUE function

Converts a text string that represents a number into a number.

35
New cards

VLOOKUP function

a spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information

36
New cards

HLOOKUP function

Searches for a value in the top row of an array and retrieves a value from a specified row.

37
New cards

TRIM function

Eliminates extra spaces within a text string.

38
New cards

SUMPRODUCT function

Multiplies ranges of cells and arrays and returns the sum of the products.

39
New cards

COUNT function

Returns the number of entries in a number field within a range or array.

40
New cards

COUNTIF function

Counts the number of cells that meet a specified criterion.

41
New cards

COUNTIFS function

Counts cells in a range based on one or more true or false conditions.

42
New cards

MATCH function

Searches for a specified item in a range and returns its position.

43
New cards

SUM function

Performs the addition operation on specified values.

44
New cards

SUMIF function

Adds values if a certain criterion is met.

45
New cards

SUMIFS function

Adds values if multiple criteria are met.

46
New cards

SQL DATETIME format

YYYY-MM-DD HH:MI:SS.

47
New cards

CONCAT_WS function

Adds two or more strings together with a separator.

48
New cards

JOIN

Combines rows from two or more tables based on a related column.

49
New cards

INNER JOIN

Selects records with matching values in both tables.

50
New cards

OUTER JOIN

Returns all records with values in either the left and right table.

51
New cards

AS

Renames a column or table with an alias.

52
New cards

FROM:

specific which table to select and delete from

53
New cards

WITH

allows you to give a sub-query block a name (a process also called sub-query refactoring)

54
New cards

ORDER BY

keyword sorts the records in ascending order by default. (type DESC for descending)

55
New cards

GROUP BY

statement is often used with clustered functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) - A SQL clause that groups rows that have the same values from a table into summary rows

56
New cards

mean() - R

method calculates the mean (average) of the given data set.

57
New cards

median() - R

calculates in R to find the middle most value in a data series.

58
New cards

sd() - R

calculates the standard deviation within R.

59
New cards

var() - R

calculates the variances within a number set.

60
New cards

max() - R

returns the maximum of a specified table or list.

61
New cards

min() - R

returns the minimum of a specified table or list.

62
New cards

quantile() - R

divides the data into equal halves, in which the median acts as middle and over that the remaining lower part is lower quartile and upper part is upper quartile.

63
New cards

summary() - R

useful to quickly summarize the values in a vector, data frame model in R.

64
New cards

typeof() - R

provide useful information about your vectors and R objects in general.

65
New cards

sum() - R

returns the addition of the values passed as arguments to the function.

66
New cards

print() - R

generic function which means that new printing methods can be easily added.

67
New cards

range() - R

returns a vector containing the minimum and maximum of all the given arguments.

68
New cards

str() R

displays the internal structure of an object such as an array, list, matrix, factor, or data frame.

69
New cards

ncol() - R

returns the total number of columns present in the object.

70
New cards

length() - R

used to find out how many items are present in a vector.

71
New cards

annotate() - R

 allows to add all kind of shape on a ggplot2 chart.

72
New cards

Names() - R

give the names of the corresponding list (in environment pane).

73
New cards

unite()  

convenience function to paste together multiple variable values into one. 

74
New cards

split()

takes a vector or other objects and splits it into groups determined by a factor or list of factors.

75
New cards

Pipe operator (%>%) - R

way to chain multiple operations together in a concise and expressive way.

76
New cards

bias function - R

the average amount by which actual is greater than predicted (closer to 0.0 less bias).

77
New cards
78
New cards

Data Validation tool

restrict the type of data or values that users enter in cells.

79
New cards

4 Phases of Analysis

  1. Organize Dara

  2. Format and adjust data

  3. Ask for input

  4. Transform Data

80
New cards

Data Validation

Allows you to control what can and can’t be entered in your work.

81
New cards

Examples of Data Validation

Add dropdown lists with predetermined options to choose from

82
New cards

Examples of Data Validation

Create custom check boxes

83
New cards

Examples of Data Validation

Protect structured data and formulas

84
New cards

SQL

Order by sorts the query in Ascending order by default.

85
New cards

Subqueries

must be enclosed within parentheses.

86
New cards

Aubquery

can have only one column specified in the SELECT clause. If you want a subquery to compare multiple columns, those columns must be selected in the main query.

87
New cards

CONCAT

CONCAT(‘GOOGLE’,’.com’)

88
New cards

CONCAT

Google.com

89
New cards

CONCAT_WS

CONCAT_WS(‘ . ‘ , ‘WWW’ , ‘GOOGLE’ , ‘.COM’)

90
New cards

CONCAT_WS

WWW.GOOGLE.COM

91
New cards

| | OPERATOR

‘GOOGLE’ || ‘.COM’

92
New cards

Simple Spreadsheets

use pivot table

93
New cards

Multiple data sets or large spreadsheets

USE SQL

94
New cards

For larger complex data

Use R

95
New cards

Aggregation

collecting or gathering many separate pieces into a whole.

96
New cards

Unique Advantages of R

Data manipulation, data visualization, and statistics packages

97
New cards

Unique Advantages of Python

Easy syntax for machine learning

98
New cards

Unique Advantages of Python

Integrates with cloud platforms like Google Cloud, Amazon Web services, and Azure

99
New cards

Unique Challenges of Python

Many more decisions for beginners to make about data input/output, structure, variables, packages, and objects

100
New cards

What is a primary advantage of SQL?

Allows users to manipulate and reorganize data as needed to aid analysis