Designing Accurate Data Entry Procedures Notes
Code as a Tool for High-Quality Data Entry
- Data quality axiom: “High-quality input⇒High-quality output.”
- Four analyst objectives for better input:
- Create meaningful codes.
- Design efficient data-capture approaches.
- Assure complete & effective capture.
- Assure data quality via validation.
- Quality of data = degree to which data remain consistently correct within preset limits.
Purposes Humans Have for Coding
- Keeping track (identification only).
- Classifying information.
- Concealing information.
- Revealing information.
- Requesting (triggering) appropriate action.
Major Code Families & Illustrative Examples
- Simple sequence code
- Arbitrary or sequential numbers w/ no intrinsic meaning.
- Ex: Order #5676 vs. full verbal description of furniture order.
- Advantages: uniqueness, implicit timing.
- Alphabetic derivation code
- Digest meaningful substrings (ZIP, consonants, street digits) into account #.
- Ex: 68506KND7533TVG for magazine subscriber.
- Pitfalls: short names (ROE→RXX), data change revokes key.
- Classification code
- Single symbol representing a mutually-exclusive class.
- Ex: Tax categories I,M,T,C,D,S.
- Danger: duplicate letters → forced, unnatural symbols (computer=P, insurance=N, etc.).
- Block sequence code
- Sequential numbers reserved in “blocks.”
- Ex: Software types — browser 100–199, DB 200–299.
- Hybrid benefits: quick next-number, intrinsic grouping.
- Cipher code
- Direct substitution (letters↔digits) to hide info.
- Store markdown price of 25.00\(retail) → cipher “BIMC” → 18.75\.
- Security through obscurity; easy for insiders.
- Significant-digit subset code
- Concatenate mini-codes with specificity.
- Example garment ticket 2023954010 → dept 202 (maternity) | style 395 | color 40 (red) | size 10.
- Mnemonic code
- Human memory aid; mix letters/numbers meaningfully.
- Blood-center city list: BUF, ROS, KEN … minimize shipping errors.
- Unicode & glyph codes
- ISO character set 0–65,535 accommodates all writing systems.
- Web notation: “こ” → Japanese “こ”.
- Empowers multilingual input & display.
- Function code
- Triggers system action.
- Inventory example: 3 ⇒ “Spoiled”, 8 ⇒ “Journal Add”.
Eight General Guidelines When Designing Codes
- Concise.
- Stable (rarely change).
- Unique.
- Sortable.
- Non-confusing (avoid O/0, I/1, Z/2).
- Uniform format.
- Modifiable/expandable.
- Meaningful (unless secrecy needed).
Effective & Efficient Data Capture
Decide What to Capture
- Two data types:
- Variable/transaction-specific (e.g., quantity).
- Differentiating/static (e.g., SSN + 3-letter last-name key).
- Capture only once; let system store & reuse.
Let the Computer Do Repetitive Work
- Auto-insert date/time from system clock.
- Retrieve stored item descriptions; operator enters only key.
- Example OCLC: one catalog record per title shared by thousands of libraries.
- Provide adequate capacity at “neck.”
- Eliminate forms if online capture feasible; otherwise streamline form.
- Source documents logically arranged, machine-ready.
- Online forms: radio buttons w/ default, drop-down with “--Select--” sentinel.
Contemporary Data Entry Technologies
Keyboards
- QWERTY plus function keys, macro keys, ergonomic & wireless variants.
Bar Codes (1-D)
- UPC encodes manufacturer + product + check-digit.
- Mobile cams + apps (e.g., Delicious Library) now read them.
2-D / Matrix Codes (QR etc.)
- Denser, orientation-independent.
- QR recognisable via three nested corner squares.
- Free to create; used for URLs, coupons, ticketing, study-room scheduling, etc.
- Security reminder: malicious stickers possible → use secure scanners (e.g., Norton Snap).
RFID (Radio Frequency Identification)
- Tag = antenna + chip; passive (<0.05) or active (>1).
- Uses: Walmart inventory, toll passes, cattle tracing, supply-chain blockchain logging.
- Privacy & ethics: potential person-tracking.
NFC (Near Field Communication)
- Two-way RFID within ~10 cm.
- Mobile wallet (Apple Pay, Google Pay), transit cards, coupon delivery.
- Security via short range + cryptographic layers.
OCR (Optical Character Recognition)
- Scans print/handwriting → text; boosts speed 60!%–90% over keystrokes.
MICR (Magnetic Ink Character Recognition)
- Bank checks; magnetic ink line.
- Resistant to stray marks; doubles as security feature.
- Bubble sheets (surveys, exams).
- Pros: low training, high volume.
- Cons: stray marks; limited alpha capture; alignment errors.
Ensuring Data Quality through Validation
- Wrong data for system.
- Unauthorized submitter.
- Request to perform unacceptable function.
- Missing data test.
- Field length test.
- Class / composition (numeric vs. alpha).
- Range / reasonableness (day 1!≤!d≤!31; age < 120).
- Invalid value list (active/inactive/closed).
- Cross-reference (price ≥ cost; area-code ↔ state).
- Compare w/ stored data (part # exists).
- Check-digit / self-validating codes.
Check-Digit Example (Luhn Algorithm)
- Original numeric code d<em>1d</em>2…dn.
- Double every second digit from right; if >9, add digits (e.g., 14→1+4=5).
- Sum all digits S.
- If Smod10=0 ⇒ valid.
- Visa 16-digit, AmEx 15-digit numbers embed Luhn check digit.
Validation Order & Techniques
- Syntax first (missing, length, composition).
- Semantics next (range, value, cross-ref, check digit).
- Regular expressions for pattern tests.
- JS snippet:
/^[A-Za-z0-9]\w{2,}@[A-Za-z0-9]{3,}\.[A-Za-z]{3}$/ for email.
- XML validation via DTD or powerful schema constraints.
Data Accuracy Advantages in Ecommerce
- Self-service entry: customer knows own data best.
- Stored info (cookies, autocomplete) ⇒ faster, fewer typos.
- Reuse through entire fulfillment chain (invoice → pick list → shipping → re-order).
- Immediate electronic feedback (confirmations, status) lets customers fix errors quickly.
Ethical, Practical & Forward-Looking Implications
- Privacy vs. utility: RFID & cookies collect rich data; analysts must balance benefit w/ consent.
- Coding stability vital for longitudinal databases; redesigns disrupt analytics (consulting story “Summer Code”).
- Transparent mnemonic or revealing codes enhance job satisfaction, but may expose sensitive info.
- UX design in ecommerce must weigh frictionless input against security (password, card verification value).
- Blockchain integration with RFID offers tamper-evident supply chain but raises scalability & data-governance issues.
Quick Reference: Common Validation Problem Sources
- Transposition (digits swapped).
- Data truncation (field too short).
- Mis-classification (code ambiguous).
- Overlapping class ranges.
- Look-alike symbols (O/0, S/5, Z/2).
- Check-digit remainder: Check=(10−(Smod10))mod10.
- Unicode entity: \text{{&#xNNNN;} where NNNN is hex.
- Range test example: 0≤quantity≤9999.
Study Connections
- Links to Chapter 8: data dictionary entries store code definitions, aid uniqueness & validation tables.
- Links to Chapter 12: form design principles overlap with data-capture preliminaries.
- Quality assurance theme (Part V) culminates here; accurate entry is first gate of QA lifecycle.