Data Management Final

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/89

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

90 Terms

1
New cards

SQL Code: Create a table based on its meta data using CREATE TABLE

CREATE TABLE schema.table (column1name datatype NULL/NOT NULL, column2name datatype NULL/NOT NULL), PRIMARY KEY (column name) FOREIGN KEY (column name) REFERENCES table

2
New cards

which column always has NOT NULL when creating a table

primary key

3
New cards

SQL Code: change the structure of a table using ALTER TABLE - adding a column to a table

ALTER TABLE schema.table ADD column_name datatype NULL/NOT NULL;

4
New cards

SQL Code: change the structure of a table using ALTER TABLE - removing a column and its data from a table

ALTER TABLE schema.table DROP COLUMN column_name

5
New cards

SQL Code: change the structure of a table using ALTER TABLE - changing a column datatype and/or NULL in a table

ALTER TABLE schema.table

ALTER COLUMN column_name datatype NULL/NOT NULL;

6
New cards

which SQL command to use for adding a column

ALTER TABLE, because you are changing the table structure (this affects every row)

7
New cards

which SQL command to use for adding a row

INSERT INTO, because you are changing the table data but not the structure

8
New cards

SQL Code: add a record to a table using INSERT INTO

INSERT INTO schema.table (column1, column2, column3) VALUES (value1, value2, value3);

9
New cards

which datatype values must be surrounded with single quotes using INSERT INTO

VARCHAR, DATE, and DATETIME

10
New cards

SQL Code: delete a record from a table using DELETE

DELETE FROM schema.table WHERE condition;

11
New cards

SQL Code: delete a table using DROP TABLE

DROP TABLE schema.table

12
New cards

order of dropping tables

if the primary key of a table is referenced as a FK in another table, must drop the table w/ the FK reference first

13
New cards

SQL Code: create a database view using CREATE VIEW

CREATE VIEW schema.view AS SELECT column_name(s) FROM schema.table WHERE condition

14
New cards

SQL Code: to see a view

SELECT * FROM view

15
New cards

SQL Code: deleting a view

DROP VIEW schema.view

16
New cards

SQL CODE: updating a row w/ condition

UPDATE schema.table SET column1=value1, column2=value2 WHERE condition;

17
New cards

what is the safest way to update records in SQL

one row at a time, based on primary key

18
New cards

SQL Code: enforcing FK/PK relations in tables

PRIMARY KEY (columnname) FOREIGN KEY (fkcolumn) REFERENCES table (fkcolumn)

19
New cards

SQL Code: add records to one table using data from another (extract and load w/ SQL)

INSERT INTO target_table (column1, column2, column3) SELECT column1, column2, column3 FROM source_table WHERE condition;

20
New cards

SQL data types

INT, DECIMAL (p,s), VARCHAR(n), DATE, DATETIME

21
New cards

DECIMAL (p,s)

p = precision (maximum # of digits to be stored) and s = scale (# of digits stored to the right of the decimal point)

22
New cards

DATE format

YYYY-MM-DD

23
New cards

advantages of normalization

1. less redundancy

2. less storage

24
New cards

disadvantages of normalization

takes longer to do joins

25
New cards

advantages of denormalization

1. quicker (less joins)

2. easier to understand structure

3. simpler to construct queries

26
New cards

disadvantages of denormalization

1. more storage

2. redundancy

27
New cards

first normal form (1NF) test

is each row uniquely identified and do all fields contain one value (atomicity)

28
New cards

first normal form (1NF) fix

create additional rows and/or identifiers (maybe composite key)

29
New cards

second normal form (2NF) test

is the table free from partial functional dependencies?

30
New cards

partial functional dependencies

columns that are functionally dependent on just a part of the composite key (is knowing that part of the key sufficient to determining the value?

31
New cards

second normal form (2NF) fix

split the table into separate tables based on dependencies

32
New cards

third normal form (3NF) test

is the table free from transitive functional dependencies?

33
New cards

transitive functional dependency

occurs when a non-key column is functionally dependent on another non-key column

34
New cards

candidate key

a column (or group of columns) that is not the key but could be in a new table

35
New cards

third normal form (3NF) fix

split the table according to these dependencies

36
New cards

denormalization

process of reintroducing redundant data into a normalized database

37
New cards

connection between views and denormalization

views make normalized data look denormalized, but data integrity is maintained because the original tables are still fully normalized

38
New cards

data warehouse

repository for many sources of data across the organization, where data is "preprocessed"

39
New cards

data mart

type of data warehouse, more focused and topic specific (ex: sales)

40
New cards

data cube

aggregates data from data mart in a multidimensional format with dimensions and facts and allows you to analyze data across dimensions

41
New cards

data cubes describe _____

a business event

42
New cards

dimensions

aspects of the event (product, store, time)

43
New cards

facts

raw data about the event (quantity sold, price, etc)

44
New cards

star schema (OLAP)

a. optimized for analytics and reporting (aggregation, slicing and dicing)

b. large fact table with denormalized dimension tables

c. redundant (month, day as attributes)

d. stored in data marts/data warehouses

45
New cards

relational schema (OLTP)

a. optimized for day to day transactions (insert, update, delete)

b. normalized tables

c. no redundancy, attributes only appear once

d. stored in operational databases

46
New cards

role of denormalization within star schema

dimension tables are denormalized and this reduces the number of joins (faster and easier analyis)

47
New cards

advantages of star schema

1. faster

2. less joins

3. easier for non-tech users

4. allows slicing and dicing

48
New cards

disadvantages of star schema

1. data redundancy increases

2. slower to update

3. not good for frequent insert/delete

4. storage

49
New cards

in star schema, the fact table has FKs from ______

each dimension table

50
New cards

___ relationship between dimension tables and fact table

1:n (one store could be associated with many sales, but sale can only occur at one store)

51
New cards

which table in star schema is normalized

fact table (can get very big)

52
New cards

each row in the fact table is...

a block in the data cube

53
New cards

slicing data

constraining a dimension on a value (specific store, date, etc)

54
New cards

dicing data

filtering by multiple dimensions (sales by product for a certain store in a certain quarter)

55
New cards

how to slice and dice in SQL

using WHERE and HAVING

56
New cards

data mart granularity

level of detail in the data

57
New cards

costs of higher granularity

1. more space, less speed

2. effort and expense gathering and storing the data

3. more complex analysis and reporting

58
New cards

benefits of higher granularity

1. more accurate reporting

2. more use cases for the data mart

3. more flexibility in how the data is sliced and diced

59
New cards

data granularity rule of thumb

use the lowest level of granularity that still allows you to answer the question

60
New cards

ETL: extract

pulling data from the operational data store

61
New cards

ETL: transform

make sure data is in an analysis format (data consistency, missing data)

62
New cards

ETL: load

load the data into analytical data store

63
New cards

data quality

the degree to which the data reflects the actual environment (consistent with purpose, verified measurement)

64
New cards

role of SQL in ETL process

SELECT * then save results as Excel file to load into application

65
New cards

what does power query do

allows users to extract data from variety of sources, transform it and clean it, and load it into worksheet/dashboard

66
New cards

major functions of power query

1. find and replace

2. add/split/merge columns

3. filter rows

4. remove duplicates

5. change data type

6. calculated fields

7. join tables (append queries)

67
New cards

why you would use SQL instead of other data analysis tools

1. better at scale

2. more control

3. easier to automate

68
New cards

SQL data analysis: better at scale

a. more efficient at joins, filtering, and sorting

b. allows you to filter/sort data so excel/powerBI don't have to process entire database

69
New cards

SQL data analysis: control

more flexible since it is a language

70
New cards

SQL data analysis: easier to automate

a. write a query once, run it multiple times

b. can run a query within a programming language

71
New cards

data visualization vs dashboard

data visualization = single chart/graph, dashboard = collection

72
New cards

dashboard

collection of data visualizations that summarize key metrics, facilitate understanding, and can be interactive

73
New cards

dashboard interaction

clickable elements impose filters

74
New cards

Power BI and Pivot Table similarities

1. drag and drop

2. Power Query for ETL

3. star schemas

4. slicing and dicing (filtering)

75
New cards

unstructured data

no data model and no predefined organization (text documents, images)

76
New cards

semi-structured data

no formal data model, but contains symbols to separate and label data elements (CSV, XML, JSON)

77
New cards

semistructured data purpose

common way to transfer data between software applications

78
New cards

CSV (comma separated value file)

a. each value separated by a comma, otherwise its plain text

b. no specified field lengths

c. first row is often the field names

d. quotes don't indicate data type, but rather allows commas to be considered part of the value instead of a separator

e. cannot skip values in a row

f. no hierarchies

79
New cards

XML (extensible markup language)

a. plain text file

b. text between tags (><) is the value, and tags (<>) create labels

c. values can be any length, and include commas and quotes

d. field can be skipped

e. can have hierarchies (attributes nested underneath)

80
New cards

JSON (javascript object notation)

a. plain text file, organized as objects within braces {}

b. uses key-value pairs

c. fields can be skipped

d. values can be any length

e. can have hierarchies (nested)

81
New cards

key-value pairs (JSON)

keys = column names; strings in quotes and values = data (can be strings, integers, or decimals)

82
New cards

semi-structured data comparison: space

least = CSV, middle = XML (tags), most = JSON (tags and spaces)

83
New cards

semi-structured data comparison: hierarchies

XML and JSON

84
New cards

semi-structured data comparison: lightweight

most = JSON, then XML then CSV

85
New cards

semi-structured data comparison: skipping fields

XML and JSON

86
New cards

standard for transferring data across web

XML and JSON

87
New cards

NoSQL

category of databases that use flexible schema, are optimized for unstructured/semi-structured data and are designed for high performance through horizontal scaling

88
New cards

horizontal scaling (NoSQL)

add more computers instead of upgrading computers

89
New cards

NOSQL is used for

high volume transaction processing with semi-structured data

90
New cards

examples of NoSQL use cases

online shopping carts, product catalogs, real-time chat, etc