Data Mapping

Data Mapping

  • process through which you take one set of data and assign or map its destination

  • acts as a translator to bridge that gap

Use Cases of data mapping

Data Integration

  • bring all your data to a centralized location and normalizing two different sets of data into a single stream. (take both data sets, remove duplicate info, and format data)

Data Migration

  • move data from one location to a similar but structurally different location

Data Transformation

  • translate data from one format to another

Techniques in Data Mapping

Automated

  • requires specialized software that will take new data and match it to your existing structure/schema

Semi-automated data mapping

  • also known as “schema mapping”. Working with software that specifically created the connection between different sources and targets. Once the process has been mapped, team will manually check and make necessary changes.

Manual

  • requires a developer who can code rules to transfer or inject data from one source field to another.

Metadata

  • Information that describes and explains data.

  • Provides context with details such as source, type, owner, and relationships to other data sets.

Metadata Types

  • Technical: technical (row, column count, data type, etc.)

  • Governance: governance terms, ownership info, etc.

  • Operational: flow of data (dependencies, code, runtime)

  • Collaboration: data-related comments, discussions, and issues

  • Quality: quality metrics and measures (dataset status, test runs, etc.)

  • Usage: how much dataset is used (view count, popularity, top users, etc.)

XML

  • Extensible Markup Language (XML)

  • Defines a set of rules for encoding documents in a format that is both human and machine-readable.

  • Designed to store and transport data.

  • Self-descriptive.

  • Design goals focus on simplicity, generality, and usability across the Internet.

Syntax Rules

  • XML Prolog – must be at the top of the document (optional)

  • Root – parent of all elements

  • Case sensitive

  • Proper Nesting

  • Avoid pre-defined references – < > & ‘ “ “

XSLT

  • Extensible Stylesheet Language Transformations (XSLT)

  • Allows a stylesheet author to transform a primary XML document in two significant ways:

    • Manipulating and sorting the content, including wholesale reordering

    • Transforming the content into a different format.

DTD

  • Document Type Definition

  • Used to define document structure with a list of legal elements and attributes

JSON

  • JavaScript Object Notation

  • Format for structuring data

  • Supports data structures like arrays and objects and JSON documents that are rapidly executed on the server

  • Language-independent format that is derived from JavaScript

Features of JSON

  • Easy to Understand: easy to read and write

  • Format: text-based interchange format, can store any kind of data in an array

  • Support: lightweight and supported by almost every language and OS

  • Dependency: much faster compared to other text-based structured data

robot