Cleaning Data
Frameworks/Thinking
C.L.E.A.N
Conceptualize: Understand the problem.
understand what data is for
helps you prioritize what to fix, what to optimize it for
Identify the grain, measures, and dimensions
what each row
identify critical vs non-critical columns
understand definitions
Locate Solvable Issues: Find fixable problems.
Evaluate Unsolvable Issues: Identify issues you can’t fix.
Augment & Improve: Clean and enhance the data.
Note & Document: Record what you did and learned.