Data Collection & Data Presentation

Data Collection

  • Definition & Context:
    • Process of gathering raw, unprocessed facts needed to answer a research question or test a hypothesis.
    • In statistics for real-estate (REV250) it underpins decisions such as pricing, valuation, or market demand analysis.
    • Conducted after a sample has been identified.
    • Key criterion for choosing a method: maximum information at minimum cost, subject to time and respondent comfort.
  • Two macro-approaches covered in the lecture:
    • Interview (oral questioning recorded by the researcher)
    • Questionnaire (self-administered written/online form)
  • Ethical / Practical notes:
    • Must obtain informed consent, protect confidentiality, and avoid introducing interviewer bias.
    • In real-estate, data often involve personal finance → strict data-protection compliance (e.g. PDPA, GDPR).

Interview

  • Three delivery modes:
    • Face-to-Face (personal)
    • Telephone
    • Online (video-conferencing)
  • Generic workflow:
    1. Researcher poses pre-designed questions.
    2. Respondent answers in real time.
    3. Researcher records responses verbatim or via audio.
  • Strengths common to all interviews:
    • Ability to probe and clarify ambiguous answers.
    • Suitable for complex or sensitive topics where non-verbal cues matter.
  • Weaknesses common to all interviews:
    • Higher cost than questionnaires (travel, time, equipment).
    • Potential interviewer influence (tone, wording, body language).
    • Smaller sample sizes feasible.

Face-to-Face Interview

  • Mechanics:
    • Conducted in person; researcher may use an interview guide or semi-structured schedule.
  • Advantages:
    • Highest response rate (>80\% typical in small samples).
    • Clarification possible immediately.
    • Observation of non-verbal cues (posture, hesitation) → richer qualitative insight.
  • Disadvantages:
    • Requires travel & scheduling; cost per respondent is high.
    • Time intensive.
    • Respondent may feel nervous in formal setting; social-desirability bias.
    • Unsuitable if respondents geographically dispersed.
  • Significance for real-estate:
    • Useful for in-depth stakeholder interviews (developers, valuers, policy makers).

Telephone Interview

  • Mechanics:
    • Questions asked via voice call; responses written in template or recorded.
  • Advantages:
    • Cheaper than face-to-face; no travel.
    • Faster turnaround; ideal for time-sensitive surveys.
  • Disadvantages:
    • Response rate lower (often 40\%{-}60\%) because of call-screening, distrust of unknown numbers.
    • Practical limit on number/length of questions; fatigue sets in.
    • No visual cues; rapport building harder.
  • Example: Short survey of tenants’ satisfaction across multiple complexes.

Online Interview (Video / VOIP)

  • Mechanics:
    • Platforms: Zoom, Teams, Google Meet, Skype.
    • Combines some visual cues with remote convenience.
  • Advantages:
    • Cheapest among interview modes; no travel cost.
    • Ability to share screens, show plans / maps → beneficial in built-environment discussions.
  • Disadvantages:
    • Response rate still lower than face-to-face; technology barriers.
    • Question set must remain concise; long sessions risk dropout.
    • Digital divide may bias sample toward tech-savvy demographics.

Questionnaire

  • Forms: Direct, Mailed, Online.
  • General features:
    • Consist of structured set of items (closed, Likert, open-ended).
    • Self-completion → reduces interviewer bias.
    • Can reach large, geographically dispersed samples.
  • Design best-practice reminders:
    • Clear, concise wording; avoid double-barrel questions.
    • Logical flow; demographics last.
    • Pilot test to ensure reliability (Cronbach’s \alpha \ge 0.7) and validity.
    • Include an introductory statement explaining purpose, confidentiality, approximate completion time.

Direct Questionnaire (Hand-delivered & collected)

  • Steps:
    1. Researcher meets respondent, explains study & instructions.
    2. Respondent fills form immediately while researcher waits.
  • Advantages:
    • No interviewer influence on answers because researcher is not reading questions; only clarification at start.
    • High response rate (often >85\%) because questionnaire is returned on the spot.
  • Disadvantages:
    • Travel & waiting time raise cost.
    • Logistically challenging for large samples.
    • Respondent may rush due to researcher presence.
  • Usage: Building-occupant satisfaction forms collected during site visits.

Mailed Questionnaire (Paper via postal service)

  • Mechanics:
    • Questionnaire packet contains: cover letter, instrument, stamped self-addressed envelope.
    • Follow-up reminder letters boost return.
  • Advantages:
    • Wide geographic coverage (national / international).
    • Respondent can reflect, consult documents → more considered answers.
  • Disadvantages:
    • Low raw response rate (typical 10\%-30\% without incentives).
    • Delay in receiving responses.
    • No real-time clarification; must keep questions simple & unambiguous.
  • Cost note: Printing & postage \rightarrow moderate; still cheaper than in-person interviews.

Online Questionnaire

  • Platforms: Google Forms, SurveyMonkey, Qualtrics, Microsoft Forms.
  • Advantages:
    • Lowest cost; automated data capture in CSV/Excel.
    • Potentially global reach; mobile-friendly.
    • Skip logic / branching easily implemented.
  • Disadvantages:
    • Non-probability bias: sample restricted to internet users.
    • Response rate still modest (typical 15\%-35\%).
    • Risk of multiple submissions (mitigated via IP checks, tokens).
  • Real-estate application: quick sentiment poll of property investors across multiple cities.

Data Presentation

  • Purpose:
    • Convert raw numbers into digestible visuals enabling pattern recognition, comparison, and communication to stakeholders.
    • Facilitates exploratory data analysis (EDA) & reporting.
  • Common formats taught:
    • Table (ungrouped & grouped)
    • Graph / Bar Graph
    • Pie Chart
    • Histogram
    • Ogive (Cumulative Frequency Curve)
  • Selection guidelines:
    • Nature of data (categorical vs. numerical, discrete vs. continuous).
    • Audience preference & interpretability.
    • Maximize data-ink ratio (Tufte principle) – avoid clutter.

Tables

Ungrouped Data Table
  • Lists each observation or category with its frequency/percentage.
  • Suitable for small data sets or categorical variables with few categories.
  • Example: Sales of five property types with associated counts.
Grouped Data Table
  • Continuous data binned into class intervals.
  • Components:
    • Class limits (e.g.
    • Lower limit: 50, upper limit: 59)
    • Class boundaries (e.g. 49.5 \text{ to } 59.5) to prevent gaps.
    • Class mark x_i = \dfrac{\text{lower}+\text{upper}}{2} (used in mean estimation).
    • Frequency (f_i) and maybe Percent (\%).
  • Significance:
    • Essential precursor for histogram & ogive construction.
    • Enables computation of grouped mean \bar X = \dfrac{\sum fi xi}{\sum fi} and variance s^2 = \dfrac{\sum fi (xi - \bar X)^2}{\sum fi -1}.

Bar Graph

  • Visualizes categorical data frequencies.
  • Bars separated (unlike histogram).
  • Height/length proportional to frequency or percentage.
  • Example snapshot from slide:
    • Games played: Football, Basketball, Tennis, Cricket.
    • Highest participation (“Towers” label indicates top category though slide text ambiguous).
  • Design tips:
    • Consistent width; labeled axes.
    • Avoid 3-D distortions.

Pie Chart

  • Circle divided into sectors.
  • Each sector’s angle \thetai proportional to category percentage: \thetai = 360^{\circ}\times (\%/100).
  • Best when number of categories \le 6.
  • Quickly communicates highest & lowest shares.
  • Limitation: Difficult accurate comparison of similar-sized slices; avoid if precise reading needed.

Histogram

  • Visual representation of continuous data distribution.
  • Adjacent bars touch, reflecting continuum.
  • X-axis: class boundaries of a grouped table.
  • Y-axis: frequency or density.
  • Key insights derived:
    • Shape (skewness, modality)
    • Outliers
    • Range & spread
  • Example slide values range from 1360 to 1940 (unit unspecified – possibly height in mm, or rent in RM). Central tendency appears near 1520{-}1690.

Ogive (Cumulative Frequency / Percentage Curve)

  • Constructed by plotting cumulative class boundary vs. cumulative frequency or percent.
  • Graph rises monotonically left to right.
  • Uses:
    • Estimating median (at 50\%) & quartiles (at 25\%, 75\%).
    • Comparing two distributions (overlay ogives).
  • Interpretation from slide:
    • Ogive for speed or energy consumption across locations; plateaus indicate upper limit.

Connections to Previous Lectures & Broader Principles

  • Sampling methods (covered earlier) determine representativeness that underlies data-collection validity.
  • Measurement scale (nominal, ordinal, interval, ratio) drives appropriate presentation choice.
  • Ethical frameworks (Belmont principles: Respect, Beneficence, Justice) apply to both data collection and presentation – avoid deceptive graphics (e.g. truncated axes).
  • Real-world relevance:
    • Property valuation models rely on accurate data; poor collection leads to appraisal error \Delta V which propagates to investment decisions.
    • Planners use histograms of household income to classify affordable-housing need.

Numerical / Statistical References Recap

  • Response rate face-to-face \approx 80\% (contextual estimate).
  • Telephone typical 40\%-60\%, Mailed 10\%-30\%, Online 15\%-35\%.
  • Grouped mean formula: \bar X = \dfrac{\sum fi xi}{\sum f_i}.
  • Class mark: xi = \dfrac{Li + U_i}{2}.
  • Percentage sector angle in pie chart: \theta_i = 3.6^{\circ}\times \text{percentage}.
  • Cronbach’s alpha threshold for reliability: \alpha \ge 0.7.

Practical Tips for Exam Preparation

  • Memorize definitions, advantages, disadvantages for each data-collection method; flash-card friendly.
  • Practice choosing presentation type given scenario (e.g. income distribution \rightarrow histogram).
  • Be ready to draw / interpret a simple ogive, compute median from it.
  • Understand how mis-presentation (e.g. uneven class widths) can mislead – potential short-answer topic.
  • Work through numeric examples: create grouped table, compute \bar X, build histogram & ogive.