Data Mapping
process through which you take one set of data and assign or map its destination
acts as a translator to bridge that gap
Use Cases of data mapping
Data Integration
bring all your data to a centralized location and normalizing two different sets of data into a single stream. (take both data sets, remove duplicate info, and format data)
Data Migration
move data from one location to a similar but structurally different location
Data Transformation
translate data from one format to another
Techniques in Data Mapping
Automated
requires specialized software that will take new data and match it to your existing structure/schema
Semi-automated data mapping
also known as “schema mapping”. Working with software that specifically created the connection between different sources and targets. Once the process has been mapped, team will manually check and make necessary changes.
Manual
requires a developer who can code rules to transfer or inject data from one source field to another.
Metadata
Information that describes and explains data.
Provides context with details such as source, type, owner, and relationships to other data sets.
Metadata Types
Technical: technical (row, column count, data type, etc.)
Governance: governance terms, ownership info, etc.
Operational: flow of data (dependencies, code, runtime)
Collaboration: data-related comments, discussions, and issues
Quality: quality metrics and measures (dataset status, test runs, etc.)
Usage: how much dataset is used (view count, popularity, top users, etc.)
XML
Extensible Markup Language (XML)
Defines a set of rules for encoding documents in a format that is both human and machine-readable.
Designed to store and transport data.
Self-descriptive.
Design goals focus on simplicity, generality, and usability across the Internet.
Syntax Rules
XML Prolog – must be at the top of the document (optional)
Root – parent of all elements
Case sensitive
Proper Nesting
Avoid pre-defined references – < > & ‘ “ “
XSLT
Extensible Stylesheet Language Transformations (XSLT)
Allows a stylesheet author to transform a primary XML document in two significant ways:
Manipulating and sorting the content, including wholesale reordering
Transforming the content into a different format.
DTD
Document Type Definition
Used to define document structure with a list of legal elements and attributes
JSON
JavaScript Object Notation
Format for structuring data
Supports data structures like arrays and objects and JSON documents that are rapidly executed on the server
Language-independent format that is derived from JavaScript
Features of JSON
Easy to Understand: easy to read and write
Format: text-based interchange format, can store any kind of data in an array
Support: lightweight and supported by almost every language and OS
Dependency: much faster compared to other text-based structured data