Designing Accurate Data Entry Procedures Notes

Code as a Tool for High-Quality Data Entry

  • Data quality axiom: “High-quality input    High-quality output\text{High-quality input} \; \Rightarrow \; \text{High-quality output}.”
  • Four analyst objectives for better input:
    • Create meaningful codes.
    • Design efficient data-capture approaches.
    • Assure complete & effective capture.
    • Assure data quality via validation.
  • Quality of data = degree to which data remain consistently correct within preset limits.

Purposes Humans Have for Coding

  • Keeping track (identification only).
  • Classifying information.
  • Concealing information.
  • Revealing information.
  • Requesting (triggering) appropriate action.

Major Code Families & Illustrative Examples

  • Simple sequence code
    • Arbitrary or sequential numbers w/ no intrinsic meaning.
    • Ex: Order #56765676 vs. full verbal description of furniture order.
    • Advantages: uniqueness, implicit timing.
  • Alphabetic derivation code
    • Digest meaningful substrings (ZIP, consonants, street digits) into account #.
    • Ex: 68506KND7533TVG68506KND7533TVG for magazine subscriber.
    • Pitfalls: short names (ROE→RXX), data change revokes key.
  • Classification code
    • Single symbol representing a mutually-exclusive class.
    • Ex: Tax categories I,M,T,C,D,S{I,M,T,C,D,S}.
    • Danger: duplicate letters → forced, unnatural symbols (computer=P, insurance=N, etc.).
  • Block sequence code
    • Sequential numbers reserved in “blocks.”
    • Ex: Software types — browser 100199100\text{–}199, DB 200299200\text{–}299.
    • Hybrid benefits: quick next-number, intrinsic grouping.
  • Cipher code
    • Direct substitution (letters↔digits) to hide info.
    • Store markdown price of 25.00\(retail) → cipher “BIMC” → 18.75\.
    • Security through obscurity; easy for insiders.
  • Significant-digit subset code
    • Concatenate mini-codes with specificity.
    • Example garment ticket 2023954010202\,395\,40\,10 → dept 202 (maternity) | style 395 | color 40 (red) | size 10.
  • Mnemonic code
    • Human memory aid; mix letters/numbers meaningfully.
    • Blood-center city list: BUF, ROS, KEN … minimize shipping errors.
  • Unicode & glyph codes
    • ISO character set 065,5350\text{–}65{,}535 accommodates all writing systems.
    • Web notation: “こ” → Japanese “こ”.
    • Empowers multilingual input & display.
  • Function code
    • Triggers system action.
    • Inventory example: 33 ⇒ “Spoiled”, 88 ⇒ “Journal Add”.

Eight General Guidelines When Designing Codes

  • Concise.
  • Stable (rarely change).
  • Unique.
  • Sortable.
  • Non-confusing (avoid O/0, I/1, Z/2).
  • Uniform format.
  • Modifiable/expandable.
  • Meaningful (unless secrecy needed).

Effective & Efficient Data Capture

Decide What to Capture

  • Two data types:
    • Variable/transaction-specific (e.g., quantity).
    • Differentiating/static (e.g., SSN + 3-letter last-name key).
  • Capture only once; let system store & reuse.

Let the Computer Do Repetitive Work

  • Auto-insert date/time from system clock.
  • Retrieve stored item descriptions; operator enters only key.
  • Example OCLC: one catalog record per title shared by thousands of libraries.

Avoid Bottlenecks & Extra Steps

  • Provide adequate capacity at “neck.”
  • Eliminate forms if online capture feasible; otherwise streamline form.

Start with a Good Form / GUI

  • Source documents logically arranged, machine-ready.
  • Online forms: radio buttons w/ default, drop-down with “--Select--” sentinel.

Contemporary Data Entry Technologies

Keyboards

  • QWERTY plus function keys, macro keys, ergonomic & wireless variants.

Bar Codes (1-D)

  • UPC encodes manufacturer + product + check-digit.
  • Mobile cams + apps (e.g., Delicious Library) now read them.

2-D / Matrix Codes (QR etc.)

  • Denser, orientation-independent.
  • QR recognisable via three nested corner squares.
  • Free to create; used for URLs, coupons, ticketing, study-room scheduling, etc.
  • Security reminder: malicious stickers possible → use secure scanners (e.g., Norton Snap).

RFID (Radio Frequency Identification)

  • Tag = antenna + chip; passive (<0.050.05) or active (>11).
  • Uses: Walmart inventory, toll passes, cattle tracing, supply-chain blockchain logging.
  • Privacy & ethics: potential person-tracking.

NFC (Near Field Communication)

  • Two-way RFID within ~10 cm.
  • Mobile wallet (Apple Pay, Google Pay), transit cards, coupon delivery.
  • Security via short range + cryptographic layers.

OCR (Optical Character Recognition)

  • Scans print/handwriting → text; boosts speed 60!%90%60!\%–90\% over keystrokes.

MICR (Magnetic Ink Character Recognition)

  • Bank checks; magnetic ink line.
  • Resistant to stray marks; doubles as security feature.

Mark-Sense / OMR Forms

  • Bubble sheets (surveys, exams).
  • Pros: low training, high volume.
  • Cons: stray marks; limited alpha capture; alignment errors.

Ensuring Data Quality through Validation

Validate Input Transactions (macro-level)

  • Wrong data for system.
  • Unauthorized submitter.
  • Request to perform unacceptable function.

Validate Input Data (field-level)

  • Missing data test.
  • Field length test.
  • Class / composition (numeric vs. alpha).
  • Range / reasonableness (day 1!!d!311!\leq!d\leq!31; age < 120120).
  • Invalid value list (active/inactive/closed).
  • Cross-reference (price ≥ cost; area-code ↔ state).
  • Compare w/ stored data (part # exists).
  • Check-digit / self-validating codes.
Check-Digit Example (Luhn Algorithm)
  1. Original numeric code d<em>1d</em>2dnd<em>1d</em>2\dots d_n.
  2. Double every second digit from right; if >9, add digits (e.g., 141+4=514→1+4=5).
  3. Sum all digits SS.
  4. If Smod10=0S \bmod 10 = 0 ⇒ valid.
    • Visa 1616-digit, AmEx 1515-digit numbers embed Luhn check digit.

Validation Order & Techniques

  1. Syntax first (missing, length, composition).
  2. Semantics next (range, value, cross-ref, check digit).
  3. Regular expressions for pattern tests.
    • JS snippet: /^[A-Za-z0-9]\w{2,}@[A-Za-z0-9]{3,}\.[A-Za-z]{3}$/ for email.
  4. XML validation via DTD or powerful schema constraints.

Data Accuracy Advantages in Ecommerce

  • Self-service entry: customer knows own data best.
  • Stored info (cookies, autocomplete) ⇒ faster, fewer typos.
  • Reuse through entire fulfillment chain (invoice → pick list → shipping → re-order).
  • Immediate electronic feedback (confirmations, status) lets customers fix errors quickly.

Ethical, Practical & Forward-Looking Implications

  • Privacy vs. utility: RFID & cookies collect rich data; analysts must balance benefit w/ consent.
  • Coding stability vital for longitudinal databases; redesigns disrupt analytics (consulting story “Summer Code”).
  • Transparent mnemonic or revealing codes enhance job satisfaction, but may expose sensitive info.
  • UX design in ecommerce must weigh frictionless input against security (password, card verification value).
  • Blockchain integration with RFID offers tamper-evident supply chain but raises scalability & data-governance issues.

Quick Reference: Common Validation Problem Sources

  • Transposition (digits swapped).
  • Data truncation (field too short).
  • Mis-classification (code ambiguous).
  • Overlapping class ranges.
  • Look-alike symbols (O/0, S/5, Z/2).

Formulas & Notation Cheat-Sheet

  • Check-digit remainder: Check=(10(Smod10))mod10\text{Check} = (10 - (S \bmod 10)) \bmod 10.
  • Unicode entity: \text{{&#xNNNN;} where NNNN\text{NNNN} is hex.
  • Range test example: 0quantity99990 \le \text{quantity} \le 9999.

Study Connections

  • Links to Chapter 8: data dictionary entries store code definitions, aid uniqueness & validation tables.
  • Links to Chapter 12: form design principles overlap with data-capture preliminaries.
  • Quality assurance theme (Part V) culminates here; accurate entry is first gate of QA lifecycle.