ISC vocabulary 2: data engineering verbs (EN-definitions)

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/29

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

30 Terms

1
New cards

ingest

to bring data into a system from external or internal sources

2
New cards

transform

to modify data to meet specific requirements or formats

3
New cards

extract

to pull raw data from various sources such as databases

4
New cards

load

to write transformed or extracted data into a storage system such as a data warehouse

5
New cards

clean

to identify and correct or remove errors

6
New cards

validate

to ensure data meets predefined quality criteria

7
New cards

aggregate

to combine data from multiple records or datasets to provide summarized information

8
New cards

integrate

to merge data from multiple systems or sources into a unified system for a more comprehensive view or analysis.

9
New cards

parse

to break down raw data into structured

10
New cards

optimize

to improve the performance or efficiency of data processing tasks

11
New cards

schedule

to set up automated tasks to run at specified intervals

12
New cards

monitor

to observe systems

13
New cards

stream

to process data in real-time as it is being produced

14
New cards

query

to retrieve specific data from databases or systems by using languages like SQL or NoSQL query languages.

15
New cards

index

to create data structures to speed up retrieval operations by mapping specific fields to database records

16
New cards

partition

to divide data into distinct sections or chunks based on specific criteria (like date ranges) to improve performance and organization.

17
New cards

cluster

to organize data into groups that have similar characteristics OR to organize servers to process data in parallel

18
New cards

scale

to increase or decrease the capacity of a system to handle more or less data efficiently

19
New cards

replicate

to create copies of data across multiple systems or nodes to ensure high availability

20
New cards

archive

to store older or less frequently used data in long-term storage solutions

21
New cards

migrate

to move data from one system

22
New cards

profile

to analyze the content

23
New cards

secure

to implement safeguards such as encryption

24
New cards

version

to manage different versions of data or schema to ensure traceability and compatibility

25
New cards

catalog

to create an organized inventory of available data assets

26
New cards

visualize

to represent data graphically (charts

27
New cards

deploy

to implement data pipelines

28
New cards

automate

to reduce manual intervention by configuring systems to carry out repetitive data-related tasks (e.g.

29
New cards

backfill

to process and load historical data into a system when data was previously unavailable or missing to ensure completeness.

30
New cards

log

to record events or actions taken by data systems