Cleaning Data

Frameworks/Thinking

C.L.E.A.N

  • Conceptualize: Understand the problem.

    • understand what data is for

    • helps you prioritize what to fix, what to optimize it for

    • Identify the grain, measures, and dimensions

      • what each row

    • identify critical vs non-critical columns

    • understand definitions

  • Locate Solvable Issues: Find fixable problems.

  • Evaluate Unsolvable Issues: Identify issues you can’t fix.

  • Augment & Improve: Clean and enhance the data.

  • Note & Document: Record what you did and learned.