Types of Data and Working with Data Frames

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/19

flashcard set

Earn XP

Description and Tags

These flashcards cover key concepts related to types of data, terminology, methods of data manipulation in programming languages (R and Python), clustering methods, evaluation metrics, and essential linear algebra for data analysis.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

20 Terms

1
New cards

Quantitative Data

Numeric data that can be divided into continuous or discrete types.

2
New cards

Continuous Data

Quantitative data that can take infinitely many values, such as height or time.

3
New cards

Discrete Data

Quantitative data that can only take countable values, such as the number of classes.

4
New cards

Qualitative Data

Categorical data that can be classified into nominal or ordinal types.

5
New cards

Nominal Data

Qualitative data that consist of unordered labels, such as eye color.

6
New cards

Ordinal Data

Qualitative data that consist of ordered labels, such as a satisfaction scale.

7
New cards

Data Frame

A rectangular table of data consisting of columns (variables) and rows (observations).

8
New cards

dplyr

An R package used for data manipulation with key functions such as select(), filter(), and mutate().

9
New cards

Accuracy

A metric defined as the ratio of correct predictions to total predictions.

10
New cards

Confusion Matrix

A table used to describe the performance of a classification model by comparing actual and predicted labels.

11
New cards

Feature Scaling

The process of standardizing or normalizing features to prevent distortion in distance-based methods.

12
New cards

Z-Score Standardization

A method of standardizing data such that the mean equals 0 and standard deviation equals 1.

13
New cards

Hierarchical Clustering

A method of cluster analysis which seeks to build a hierarchy of clusters through merging or splitting.

14
New cards

Dendrogram

A tree diagram that represents the sequence of merges or splits in hierarchical clustering.

15
New cards

k-Nearest Neighbors (KNN)

A non-parametric method for classification or regression that uses distance metrics to determine labels.

16
New cards

Manhattan Distance

A distance metric that calculates the sum of absolute differences between coordinates.

17
New cards

Cosine Similarity

A measure of similarity that calculates the cosine of the angle between two vectors.

18
New cards

Matrix Multiplication

The operation of multiplying two matrices where the inner dimensions must match.

19
New cards

Identity Matrix

A square matrix that, when multiplied by another matrix, outputs that matrix unchanged.

20
New cards