Data

Anything you can encode using binary representation

Aim of having it

To retrieve information

How?

Process the data by establishing a flexible, understandable, and common representation for the data
Use databases

Engineering

Process of designing and building systems that allow people to collect, manage, and analyse data

Engineers

Work to make raw data useable for data scientists and business analysts so that organisations can use it to improve performance

Responsible for

  • Data pipelines - creating data pipelines (flows) to manage and process large sets of data

  • Data integration - ensuring that data from different sources is integrated seamlessly

  • Data quality - ensuring that data is of high quality and that the data infrastructure is reliable and efficient

    • Low quality data - data that doesn’t fit your requirements as developer

  • Data analysis - creating raw data analyses to provide predictive models and show trends

  • Data security - managing and storing data securely to protect it from loss or theft

  • Automation - creating ways to automate tasks within the data pipeline to improve efficiency

Types

Data can be broadly classified into four types

Structured Data

  • Has a predefined model, which organizes data into a form that is relatively easy to store, process, retrieve and manage

  • e.g., relational data

Unstructured Data

  • Opposite of structured data

  • e.g., flat binary files containing text, video or audio

  • Note: data may not be completely devoid of structure (e.g., an audio file may still have an encoding structure and some metadata associated with it)

Dynamic Data

  • Data that changes relatively frequently

  • e.g., office documents and transactional entries in a financial database

Static Data

  • Opposite of dynamic data, rarely changes.

  • e.g., Medical imaging data from MRI or CT scan

Why Classify?

Can help in designing and developing a pertaining storage solution:

  • Relational databases are usually used for structured data

  • File systems or NoSQL databases can be used for (static), unstructured data