CompTIA Data+ Chp1 Identifying Basic Concepts of Data Schemas

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/116

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 12:46 AM on 10/16/24
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

117 Terms

1
New cards

Data schema

used to describe both the organization of data & the relationships between tables in a given data

2
New cards

Database Engineers

plans the database schema before they begin to create the system

3
New cards

Relational Database

uses tables to store the data that's being capture

4
New cards

Spreadsheet

has multiple tables linked together with different relationships

5
New cards

Tabular Schema

use rows and columns in a table format to store all of their data

6
New cards

Relational Database Mgmt. System (RDBMS)

Used with lots of different database software, such as my SQL, Maria DB, and even Amazon serverless database system Aurora

7
New cards

SQL

a programming language for data, and it works across all relationship databases. Structured Query Language is a domain-specific language used to manage data, especially in a relational database management system. It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables

8
New cards

Non-relational Databases

Databases not based on relationships, not use SQL or structured query language, able to handle large amounts of traffic and data, its faster to get info from non relational DB than relational DB, Remember- SQL= relational databases, No SQL or Graph QL= non relational db

9
New cards

Document oriented databases

store the data inside of xml documents or using JSON format, which is the JavaScript object notation

10
New cards

Key value stores

store the data inside of xml documents or using JSON format, which is the JavaScript object notation

11
New cards

Column oriented databases

store data in columns instead of rows like we do in a traditional relational database

12
New cards

Graph Stores

x are used to store individual elements as nodes inside of this database

13
New cards

Relational DB

Tables are used to store fields inside of the different columns

Each row is going to have the record that holds the data for that relationship db

Tool to optimally design the tables and be able to have the least amount of info possible in those tables

Uses SQL, to write data into the DB

the amount of data that can be held in each field and the data type is limited

Each particular field in your db is going to have a specific purpose and field type

14
New cards

Non relational DB

Alternative to relational db

Stores as much info as you want in a key value pair, as opposed to limited to 255 characters

Easier to scale and build out for web based or cloud based applications

Uses any programming language

Stores both structured and unstructured data within those DBs because you have more flexibility than relational DB

15
New cards

Data normalization

Goal of x is to establish the relationships between the different data in forms to be able to have the data that we need when we run our reports, optimizes the storage and use of data within a given DB

16
New cards

First normal form (1NF)

eliminates any redundant info in individual tables

17
New cards

Second normal form (2NF)

has all the related info applicable to multiple tables using a foreign key

18
New cards

Third normal form (3NF)

eliminates fields that do not depend on a given key

19
New cards

Fourth normal form (4NF)

the data has to have a relationship that is in BC, NF or Boyce Code normal form and has no multi valued dependences in them

20
New cards

Fifth Normal form (5NF)

has all the same characteristics as 4NF and ensures there are no joint dependencies

21
New cards

Primary key-

unique identifier for a record that cannot contain duplicates, every table needs to have a primary key

22
New cards

Foreign key-

a primary key that was referenced by a different table

23
New cards

DB Relationships- One to one-

one record in the table will be associated with only one record in another table

24
New cards

DB Relationships: One to many

one record in the table with a primary key is associated with multiple records in other tables

25
New cards

Referential integrity

Remember: make sure that all of the changes we're doing to any part of our db are cascading throughout that db, either through an update or a deletion

Used to establish and maintain record relationships in tables

Ensures and guarantees that the primary key is being used as a foreign key in a table, and making sure that it actually exists in the table before creating records in at the second table

Prevents the occurrence of bad or missing data in any of the tables

Referential integrity comes under attack when people start modifying or deleting parts of the db

26
New cards

Cascade delete/update

make sure that no data is being left orphaned as you make updates or deletion to the primary key and the records they control

27
New cards

Data denormalization

Occurs when the data is not structured in tables using normalization

Will have many redundancies and repetitive data

Happens when dealing with big data, data warehousing, mining, data analysis and data visualization

28
New cards

Data systems

any info tech system that captures, processes, stores, queries or reports on the data contained within them

29
New cards

Data processing types

Its imp to determine whether you should use OLTP or OLAP to prevent performance issues

30
New cards

DB transaction

an insertion, deletion, or simply query of the db

31
New cards

Online Transactional Processing (OLTP)

systems designed to handle very large scales of transaction, built for large numbers of real time db transactions

32
New cards

Online Analytical Processing (OLAP)

systems designed to handle longer running queries for more complex data processing, build for longer running and more complex db queries, built to handle big queries that might involve a lot of joins and large data sets

33
New cards

Data Warehouse

Ingests data from the source systems record and combines them together, considered single source of truth, combines all data into one place to be accessed efficiently, example of an OLAP system where its built to handle really big, heavy queries

34
New cards

Source System

system of record for any particular kind of data

35
New cards

Clickstream data

data where individual users are clicking on different pages of the website

36
New cards

Purchase data

-involved people who handle credit cards to buy something that we then had to fulfil and ship to them

37
New cards

Catalog information

another source system that contains the categorization and the descriptions about individual items

38
New cards

OLAP Cubes

3D structure that provides data grouped together by different dimensions, optimizes the expected queries from each data marts customer

39
New cards

Data warehouse schemas

Fact Table- contains all of the main keys associated with the queries

Dimension table- info associated with a fact table that is tied together using the course id (?)

Star schema- individual dimensions tables branching off from that fact table that looks like a star, on layer dimension

Snowflake schema- multiple dimensions associated with each dimension branching out like a snowflake

40
New cards

Data Lakes

Centralized repository that can hold both structured and unstructured data, does not require data to be structured, but giving it structure makes it easier to query

41
New cards

Data Lakehouse

queries data in place on the data lake, in order to query, need structured data, and schema data off on the side that has been built up, you'll get flexibility and cost effectiveness of a data lake but you can still conduct queries across the entire data set, lake and lake or a true data warehouse can be used to analyse large amounts of data

42
New cards

Dimensional table-

contains metadata about stuff in your fact table, how manage changes to that dimensional data, how retain a history of what it used to be?

43
New cards

Type 1- slowly changing dimension,

slowly changing dimension, where new info is simply overriding the old info, can no longer query for previous names that existed in the past

44
New cards

Type 2 dimensional table

slowly change dimension, has complete history of the info and retains history of all previous data changes

45
New cards

Type 3 dimensional table

approach is to maintain the current and previous data

46
New cards

Discrete data

data that can be counted with a certain # of values

47
New cards

Continuous data

data that can be counted but with changing values, Continuous data is data that can take any value. Height, weight, temperature and length are all examples of continuous data. Some continuous data will change over time; the weight of a baby in its first year or the temperature in a room throughout the day.

48
New cards

Quantitative

data that is defined through #s

49
New cards

Qualitative

data arranged into groups or categories based on some kind of quality

50
New cards

Nominal data

Nominal data is data that can be labelled or classified into mutually exclusive categories within a variable. These categories cannot be ordered in a meaningful way. For example, for the nominal variable of preferred mode of transportation, you may have the categories of car, bus, train, tram or bicycle.

51
New cards

Ordinal data

type of data that has a natural ordering or ranking. It is categorical data that can be ranked or ordered in accordance with a specific attribute or characteristic. Examples of ordinal data are the level of education, the range of income, or the grades.

52
New cards

Data fields

contains the different pieces of info in different databases, data types are a thing we can control

53
New cards

Texts and alphanumeric field data types-

the most common type, and you'll hear this called either a character, a text or a string

54
New cards

Character type

a single character, either a letter or a # that's being stored in a field

55
New cards

Text or string type

grouping of characters that contains letters or #s in this alpha numeric data field type, can be uppercase or lowercase, cannot do mathematical operations on a # in a string

56
New cards

Date type

store exactly a calendar date in a month, day, year or day, month year format, can also store time,

57
New cards

Number data type

will not allow any text, as it only stores numbers (as you might expect). Use of the number data type allows for calculations. It is very important to use the number data type if you need to do any basic arithmetic in your reporting.

58
New cards

Currency data type

special type of number that represents $, allows for two decimal places

59
New cards

Boolean data type

used for things that have only two values, either a yes or a no, a true or a false, on off, one or zero

60
New cards

Structured Data

follows an existing convention, fits into tables and dbs nicely, specific format in a specific data field type for each particular data

61
New cards

Unstructured Data

not organized in a pre-defined manner that meets standards for structured data, can be text, images or video

62
New cards

Semi-structured data

mix of both, webpage is great example, also xml files and webpages, zip files, emails, JSON files

63
New cards

Delimited file

fields in which some form of character is going to be used to separate each field of data from the other data fields, most common type, CSV, common separated value file

64
New cards

Tab delimited file-

uses a tab to be able to separate each of the fields in the fie, can use a different character, like a pipe | to separate fields, .CSV, .TAB .TSV .TXT

65
New cards

Flat file

any delimited file that is exported out of db system and can then be sent to someone else, has been exported from db in real time, is back up data or point in time snapshot of db

66
New cards

SQL

x most commonly used when using a db and working with data, uses a series of statements to provide info to the db

67
New cards

Select statement-

how query info from db to select fields

68
New cards

Where keyword-

allows you to select something where a certain condition happens

69
New cards

HTML

hypertext markup language, used to write webpages and semi structured environment, uses tags to dictate what parts of info are display

70
New cards

XML

extensible markup language, text based market language, like html, but different purpose, interacts really well with JavaScript, goal to transfer data, not display to screen

71
New cards

JavaScript Object Notation (JSON)

language used to get data to and from different website, X is the de facto file format when sending info, uses arrays which make it easier to use in modern networks

72
New cards

Data System

any info tech system that captures, processes, stores, queries or reports on the data contained within it

73
New cards

Extract Transform Load (ETL)

process that occurs when moving data from a source system to a data warehouse by extracting data from the sources, transforming the data and then loading it to the data warehouse

74
New cards

Extract Load and Transform (ELT)

a modern method used when preparing data for data lakes by holding data in preparation for future transformation

75
New cards

Extracting data

Process of extracting source data and importing it into the system, obj of extraction is to connect to data source, SQL, Power BI and Power Query are tools for extracting data from external databases

76
New cards

Transforming data

Process of transforming data to another table format, timestamps are very useful for end users

77
New cards

API-

connection between computers or other programs, designed to present a set of questions and define answers in the system, Application Programming Interface. In the context of APIs, the word Application refers to any software with a distinct function. Interface can be thought of as a contract of service between two applications. This contract defines how the two communicate with each other using requests and responses.

78
New cards

Pull model-

continuously pulls data into system

79
New cards

Push model

only sends notifications when data changes

80
New cards

Web service

comm between or among electronic devices, JSON and SML both encode structured data, has a specific function to provide different kinds of info

81
New cards

Synchronous

request from web service and wait for response

82
New cards

Asynchronous

allows you to do there tasks while waiting for the response

83
New cards

Web/Data/Screen Scraping

act of extracting data from a website

84
New cards

Machine data

data generated by webservers, machine data can be used as a predictive maintenance tool

85
New cards

Observation

the act of collecting data by observing and then analyzing afterwards, observation data should not be manipulated

86
New cards

Sampling

creating a smaller data set from a larger data set, random sampling, systematic sampling, stratified sampling

87
New cards

Data Profiling

the process of working with data to being to discern information and trends present in that data

88
New cards

Steps of Data Profiling

1. ID and document data source

2. ID the field names and data types

3. Determine the fields to be ID'd for reporting

4. Check for the primary, natural or foreign keys

5. Recognize all the data in the dataset

89
New cards

Redundant data

ID data stored in multiple places, determine the redundant data and work on how to minimize it

90
New cards

Duplicated data

data repeated within the same dataset, to find duplicated data, use the built-in tools in a data analytics software

91
New cards

Unnecessary data

Its important to understand what data you need and what data you can ignore, extra data slows down your system Tools: Excel, Power BI, Tableau

92
New cards

Missing values

Missing data is referred as Null, represented as blank fields, Null or n/a, Null is...When the value is not applicable to the field, when the dataset doesn't have the information, when the datasets do not match the expected information, when the survey data is incomplete, What can you do about it? Filter out "NULL" values, replace missing values

93
New cards

Invalid Data

Invalid Data = Incorrect data, Different reasons data can be invalid: hard coding data, invalid data questions, extreme values, incorrect data, invisible characters, look for lead and trailing space, remove/replace invalid data, ASCII is a data code inside of a computer system that is invisible or non printable characters, look for leading or/and trailing space, remove/replace invalid data

94
New cards

Meeting specification

Specifications: certain types or quality set by DB engineers when designing systems, most common reason that data doesn't meet specifications is wrong data type, also improper storage of numeric characters

95
New cards

Data Outliers

Any data or piece of data that is outside the normal distance from the other values in a sample, Nonparametric Statistics: IDs data not assumed to come from a prescribed model that are predetermined by a small number of parameters

96
New cards

Parametric

normal baseline

97
New cards

Nonparametric

distribution independent

98
New cards

SCD type 3 relies on creating _____ to reflect historical data

new columns

99
New cards

Schema that will improve performance and allow for faster queries:

star schema

100
New cards

Common raster graphic format

GIF