CompTIA Data+ Chp1 Identifying Basic Concepts of Data Schemas

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 116

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

117 Terms

1

Data schema

used to describe both the organization of data & the relationships between tables in a given data

New cards
2

Database Engineers

plans the database schema before they begin to create the system

New cards
3

Relational Database

uses tables to store the data that's being capture

New cards
4

Spreadsheet

has multiple tables linked together with different relationships

New cards
5

Tabular Schema

use rows and columns in a table format to store all of their data

New cards
6

Relational Database Mgmt. System (RDBMS)

Used with lots of different database software, such as my SQL, Maria DB, and even Amazon serverless database system Aurora

New cards
7

SQL

a programming language for data, and it works across all relationship databases. Structured Query Language is a domain-specific language used to manage data, especially in a relational database management system. It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables

New cards
8

Non-relational Databases

Databases not based on relationships, not use SQL or structured query language, able to handle large amounts of traffic and data, its faster to get info from non relational DB than relational DB, Remember- SQL= relational databases, No SQL or Graph QL= non relational db

New cards
9

Document oriented databases

store the data inside of xml documents or using JSON format, which is the JavaScript object notation

New cards
10

Key value stores

store the data inside of xml documents or using JSON format, which is the JavaScript object notation

New cards
11

Column oriented databases

store data in columns instead of rows like we do in a traditional relational database

New cards
12

Graph Stores

x are used to store individual elements as nodes inside of this database

New cards
13

Relational DB

Tables are used to store fields inside of the different columns

Each row is going to have the record that holds the data for that relationship db

Tool to optimally design the tables and be able to have the least amount of info possible in those tables

Uses SQL, to write data into the DB

the amount of data that can be held in each field and the data type is limited

Each particular field in your db is going to have a specific purpose and field type

New cards
14

Non relational DB

Alternative to relational db

Stores as much info as you want in a key value pair, as opposed to limited to 255 characters

Easier to scale and build out for web based or cloud based applications

Uses any programming language

Stores both structured and unstructured data within those DBs because you have more flexibility than relational DB

New cards
15

Data normalization

Goal of x is to establish the relationships between the different data in forms to be able to have the data that we need when we run our reports, optimizes the storage and use of data within a given DB

New cards
16

First normal form (1NF)

eliminates any redundant info in individual tables

New cards
17

Second normal form (2NF)

has all the related info applicable to multiple tables using a foreign key

New cards
18

Third normal form (3NF)

eliminates fields that do not depend on a given key

New cards
19

Fourth normal form (4NF)

the data has to have a relationship that is in BC, NF or Boyce Code normal form and has no multi valued dependences in them

New cards
20

Fifth Normal form (5NF)

has all the same characteristics as 4NF and ensures there are no joint dependencies

New cards
21

Primary key-

unique identifier for a record that cannot contain duplicates, every table needs to have a primary key

New cards
22

Foreign key-

a primary key that was referenced by a different table

New cards
23

DB Relationships- One to one-

one record in the table will be associated with only one record in another table

New cards
24

DB Relationships: One to many

one record in the table with a primary key is associated with multiple records in other tables

New cards
25

Referential integrity

Remember: make sure that all of the changes we're doing to any part of our db are cascading throughout that db, either through an update or a deletion

Used to establish and maintain record relationships in tables

Ensures and guarantees that the primary key is being used as a foreign key in a table, and making sure that it actually exists in the table before creating records in at the second table

Prevents the occurrence of bad or missing data in any of the tables

Referential integrity comes under attack when people start modifying or deleting parts of the db

New cards
26

Cascade delete/update

make sure that no data is being left orphaned as you make updates or deletion to the primary key and the records they control

New cards
27

Data denormalization

Occurs when the data is not structured in tables using normalization

Will have many redundancies and repetitive data

Happens when dealing with big data, data warehousing, mining, data analysis and data visualization

New cards
28

Data systems

any info tech system that captures, processes, stores, queries or reports on the data contained within them

New cards
29

Data processing types

Its imp to determine whether you should use OLTP or OLAP to prevent performance issues

New cards
30

DB transaction

an insertion, deletion, or simply query of the db

New cards
31

Online Transactional Processing (OLTP)

systems designed to handle very large scales of transaction, built for large numbers of real time db transactions

New cards
32

Online Analytical Processing (OLAP)

systems designed to handle longer running queries for more complex data processing, build for longer running and more complex db queries, built to handle big queries that might involve a lot of joins and large data sets

New cards
33

Data Warehouse

Ingests data from the source systems record and combines them together, considered single source of truth, combines all data into one place to be accessed efficiently, example of an OLAP system where its built to handle really big, heavy queries

New cards
34

Source System

system of record for any particular kind of data

New cards
35

Clickstream data

data where individual users are clicking on different pages of the website

New cards
36

Purchase data

-involved people who handle credit cards to buy something that we then had to fulfil and ship to them

New cards
37

Catalog information

another source system that contains the categorization and the descriptions about individual items

New cards
38

OLAP Cubes

3D structure that provides data grouped together by different dimensions, optimizes the expected queries from each data marts customer

New cards
39

Data warehouse schemas

Fact Table- contains all of the main keys associated with the queries

Dimension table- info associated with a fact table that is tied together using the course id (?)

Star schema- individual dimensions tables branching off from that fact table that looks like a star, on layer dimension

Snowflake schema- multiple dimensions associated with each dimension branching out like a snowflake

New cards
40

Data Lakes

Centralized repository that can hold both structured and unstructured data, does not require data to be structured, but giving it structure makes it easier to query

New cards
41

Data Lakehouse

queries data in place on the data lake, in order to query, need structured data, and schema data off on the side that has been built up, you'll get flexibility and cost effectiveness of a data lake but you can still conduct queries across the entire data set, lake and lake or a true data warehouse can be used to analyse large amounts of data

New cards
42

Dimensional table-

contains metadata about stuff in your fact table, how manage changes to that dimensional data, how retain a history of what it used to be?

New cards
43

Type 1- slowly changing dimension,

slowly changing dimension, where new info is simply overriding the old info, can no longer query for previous names that existed in the past

New cards
44

Type 2 dimensional table

slowly change dimension, has complete history of the info and retains history of all previous data changes

New cards
45

Type 3 dimensional table

approach is to maintain the current and previous data

New cards
46

Discrete data

data that can be counted with a certain # of values

New cards
47

Continuous data

data that can be counted but with changing values, Continuous data is data that can take any value. Height, weight, temperature and length are all examples of continuous data. Some continuous data will change over time; the weight of a baby in its first year or the temperature in a room throughout the day.

New cards
48

Quantitative

data that is defined through #s

New cards
49

Qualitative

data arranged into groups or categories based on some kind of quality

New cards
50

Nominal data

Nominal data is data that can be labelled or classified into mutually exclusive categories within a variable. These categories cannot be ordered in a meaningful way. For example, for the nominal variable of preferred mode of transportation, you may have the categories of car, bus, train, tram or bicycle.

New cards
51

Ordinal data

type of data that has a natural ordering or ranking. It is categorical data that can be ranked or ordered in accordance with a specific attribute or characteristic. Examples of ordinal data are the level of education, the range of income, or the grades.

New cards
52

Data fields

contains the different pieces of info in different databases, data types are a thing we can control

New cards
53

Texts and alphanumeric field data types-

the most common type, and you'll hear this called either a character, a text or a string

New cards
54

Character type

a single character, either a letter or a # that's being stored in a field

New cards
55

Text or string type

grouping of characters that contains letters or #s in this alpha numeric data field type, can be uppercase or lowercase, cannot do mathematical operations on a # in a string

New cards
56

Date type

store exactly a calendar date in a month, day, year or day, month year format, can also store time,

New cards
57

Number data type

will not allow any text, as it only stores numbers (as you might expect). Use of the number data type allows for calculations. It is very important to use the number data type if you need to do any basic arithmetic in your reporting.

New cards
58

Currency data type

special type of number that represents $, allows for two decimal places

New cards
59

Boolean data type

used for things that have only two values, either a yes or a no, a true or a false, on off, one or zero

New cards
60

Structured Data

follows an existing convention, fits into tables and dbs nicely, specific format in a specific data field type for each particular data

New cards
61

Unstructured Data

not organized in a pre-defined manner that meets standards for structured data, can be text, images or video

New cards
62

Semi-structured data

mix of both, webpage is great example, also xml files and webpages, zip files, emails, JSON files

New cards
63

Delimited file

fields in which some form of character is going to be used to separate each field of data from the other data fields, most common type, CSV, common separated value file

New cards
64

Tab delimited file-

uses a tab to be able to separate each of the fields in the fie, can use a different character, like a pipe | to separate fields, .CSV, .TAB .TSV .TXT

New cards
65

Flat file

any delimited file that is exported out of db system and can then be sent to someone else, has been exported from db in real time, is back up data or point in time snapshot of db

New cards
66

SQL

x most commonly used when using a db and working with data, uses a series of statements to provide info to the db

New cards
67

Select statement-

how query info from db to select fields

New cards
68

Where keyword-

allows you to select something where a certain condition happens

New cards
69

HTML

hypertext markup language, used to write webpages and semi structured environment, uses tags to dictate what parts of info are display

New cards
70

XML

extensible markup language, text based market language, like html, but different purpose, interacts really well with JavaScript, goal to transfer data, not display to screen

New cards
71

JavaScript Object Notation (JSON)

language used to get data to and from different website, X is the de facto file format when sending info, uses arrays which make it easier to use in modern networks

New cards
72

Data System

any info tech system that captures, processes, stores, queries or reports on the data contained within it

New cards
73

Extract Transform Load (ETL)

process that occurs when moving data from a source system to a data warehouse by extracting data from the sources, transforming the data and then loading it to the data warehouse

New cards
74

Extract Load and Transform (ELT)

a modern method used when preparing data for data lakes by holding data in preparation for future transformation

New cards
75

Extracting data

Process of extracting source data and importing it into the system, obj of extraction is to connect to data source, SQL, Power BI and Power Query are tools for extracting data from external databases

New cards
76

Transforming data

Process of transforming data to another table format, timestamps are very useful for end users

New cards
77

API-

connection between computers or other programs, designed to present a set of questions and define answers in the system, Application Programming Interface. In the context of APIs, the word Application refers to any software with a distinct function. Interface can be thought of as a contract of service between two applications. This contract defines how the two communicate with each other using requests and responses.

New cards
78

Pull model-

continuously pulls data into system

New cards
79

Push model

only sends notifications when data changes

New cards
80

Web service

comm between or among electronic devices, JSON and SML both encode structured data, has a specific function to provide different kinds of info

New cards
81

Synchronous

request from web service and wait for response

New cards
82

Asynchronous

allows you to do there tasks while waiting for the response

New cards
83

Web/Data/Screen Scraping

act of extracting data from a website

New cards
84

Machine data

data generated by webservers, machine data can be used as a predictive maintenance tool

New cards
85

Observation

the act of collecting data by observing and then analyzing afterwards, observation data should not be manipulated

New cards
86

Sampling

creating a smaller data set from a larger data set, random sampling, systematic sampling, stratified sampling

New cards
87

Data Profiling

the process of working with data to being to discern information and trends present in that data

New cards
88

Steps of Data Profiling

1. ID and document data source

2. ID the field names and data types

3. Determine the fields to be ID'd for reporting

4. Check for the primary, natural or foreign keys

5. Recognize all the data in the dataset

New cards
89

Redundant data

ID data stored in multiple places, determine the redundant data and work on how to minimize it

New cards
90

Duplicated data

data repeated within the same dataset, to find duplicated data, use the built-in tools in a data analytics software

New cards
91

Unnecessary data

Its important to understand what data you need and what data you can ignore, extra data slows down your system Tools: Excel, Power BI, Tableau

New cards
92

Missing values

Missing data is referred as Null, represented as blank fields, Null or n/a, Null is...When the value is not applicable to the field, when the dataset doesn't have the information, when the datasets do not match the expected information, when the survey data is incomplete, What can you do about it? Filter out "NULL" values, replace missing values

New cards
93

Invalid Data

Invalid Data = Incorrect data, Different reasons data can be invalid: hard coding data, invalid data questions, extreme values, incorrect data, invisible characters, look for lead and trailing space, remove/replace invalid data, ASCII is a data code inside of a computer system that is invisible or non printable characters, look for leading or/and trailing space, remove/replace invalid data

New cards
94

Meeting specification

Specifications: certain types or quality set by DB engineers when designing systems, most common reason that data doesn't meet specifications is wrong data type, also improper storage of numeric characters

New cards
95

Data Outliers

Any data or piece of data that is outside the normal distance from the other values in a sample, Nonparametric Statistics: IDs data not assumed to come from a prescribed model that are predetermined by a small number of parameters

New cards
96

Parametric

normal baseline

New cards
97

Nonparametric

distribution independent

New cards
98

SCD type 3 relies on creating _____ to reflect historical data

new columns

New cards
99

Schema that will improve performance and allow for faster queries:

star schema

New cards
100

Common raster graphic format

GIF

New cards

Explore top notes

note Note
studied byStudied by 1 person
86 days ago
5.0(1)
note Note
studied byStudied by 14 people
761 days ago
5.0(2)
note Note
studied byStudied by 66 people
511 days ago
5.0(1)
note Note
studied byStudied by 14 people
953 days ago
5.0(1)
note Note
studied byStudied by 5 people
926 days ago
4.0(1)
note Note
studied byStudied by 10 people
895 days ago
5.0(1)
note Note
studied byStudied by 11 people
972 days ago
4.5(2)
note Note
studied byStudied by 5237 people
150 days ago
4.4(9)

Explore top flashcards

flashcards Flashcard (28)
studied byStudied by 7 people
662 days ago
5.0(1)
flashcards Flashcard (96)
studied byStudied by 73 people
748 days ago
5.0(5)
flashcards Flashcard (43)
studied byStudied by 3 people
635 days ago
5.0(1)
flashcards Flashcard (30)
studied byStudied by 8 people
789 days ago
5.0(1)
flashcards Flashcard (170)
studied byStudied by 7 people
121 days ago
5.0(1)
flashcards Flashcard (32)
studied byStudied by 41 people
97 days ago
5.0(1)
flashcards Flashcard (1000)
studied byStudied by 29 people
852 days ago
4.0(1)
flashcards Flashcard (53)
studied byStudied by 3742 people
709 days ago
4.2(54)
robot