Data Collection & Data Presentation
Data Collection
- Definition & Context:
- Process of gathering raw, unprocessed facts needed to answer a research question or test a hypothesis.
- In statistics for real-estate (REV250) it underpins decisions such as pricing, valuation, or market demand analysis.
- Conducted after a sample has been identified.
- Key criterion for choosing a method: maximum information at minimum cost, subject to time and respondent comfort.
- Two macro-approaches covered in the lecture:
- Interview (oral questioning recorded by the researcher)
- Questionnaire (self-administered written/online form)
- Ethical / Practical notes:
- Must obtain informed consent, protect confidentiality, and avoid introducing interviewer bias.
- In real-estate, data often involve personal finance → strict data-protection compliance (e.g. PDPA, GDPR).
Interview
- Three delivery modes:
- Face-to-Face (personal)
- Telephone
- Online (video-conferencing)
- Generic workflow:
- Researcher poses pre-designed questions.
- Respondent answers in real time.
- Researcher records responses verbatim or via audio.
- Strengths common to all interviews:
- Ability to probe and clarify ambiguous answers.
- Suitable for complex or sensitive topics where non-verbal cues matter.
- Weaknesses common to all interviews:
- Higher cost than questionnaires (travel, time, equipment).
- Potential interviewer influence (tone, wording, body language).
- Smaller sample sizes feasible.
Face-to-Face Interview
- Mechanics:
- Conducted in person; researcher may use an interview guide or semi-structured schedule.
- Advantages:
- Highest response rate (>80\% typical in small samples).
- Clarification possible immediately.
- Observation of non-verbal cues (posture, hesitation) → richer qualitative insight.
- Disadvantages:
- Requires travel & scheduling; cost per respondent is high.
- Time intensive.
- Respondent may feel nervous in formal setting; social-desirability bias.
- Unsuitable if respondents geographically dispersed.
- Significance for real-estate:
- Useful for in-depth stakeholder interviews (developers, valuers, policy makers).
Telephone Interview
- Mechanics:
- Questions asked via voice call; responses written in template or recorded.
- Advantages:
- Cheaper than face-to-face; no travel.
- Faster turnaround; ideal for time-sensitive surveys.
- Disadvantages:
- Response rate lower (often 40\%{-}60\%) because of call-screening, distrust of unknown numbers.
- Practical limit on number/length of questions; fatigue sets in.
- No visual cues; rapport building harder.
- Example: Short survey of tenants’ satisfaction across multiple complexes.
Online Interview (Video / VOIP)
- Mechanics:
- Platforms: Zoom, Teams, Google Meet, Skype.
- Combines some visual cues with remote convenience.
- Advantages:
- Cheapest among interview modes; no travel cost.
- Ability to share screens, show plans / maps → beneficial in built-environment discussions.
- Disadvantages:
- Response rate still lower than face-to-face; technology barriers.
- Question set must remain concise; long sessions risk dropout.
- Digital divide may bias sample toward tech-savvy demographics.
Questionnaire
- Forms: Direct, Mailed, Online.
- General features:
- Consist of structured set of items (closed, Likert, open-ended).
- Self-completion → reduces interviewer bias.
- Can reach large, geographically dispersed samples.
- Design best-practice reminders:
- Clear, concise wording; avoid double-barrel questions.
- Logical flow; demographics last.
- Pilot test to ensure reliability (Cronbach’s \alpha \ge 0.7) and validity.
- Include an introductory statement explaining purpose, confidentiality, approximate completion time.
Direct Questionnaire (Hand-delivered & collected)
- Steps:
- Researcher meets respondent, explains study & instructions.
- Respondent fills form immediately while researcher waits.
- Advantages:
- No interviewer influence on answers because researcher is not reading questions; only clarification at start.
- High response rate (often >85\%) because questionnaire is returned on the spot.
- Disadvantages:
- Travel & waiting time raise cost.
- Logistically challenging for large samples.
- Respondent may rush due to researcher presence.
- Usage: Building-occupant satisfaction forms collected during site visits.
Mailed Questionnaire (Paper via postal service)
- Mechanics:
- Questionnaire packet contains: cover letter, instrument, stamped self-addressed envelope.
- Follow-up reminder letters boost return.
- Advantages:
- Wide geographic coverage (national / international).
- Respondent can reflect, consult documents → more considered answers.
- Disadvantages:
- Low raw response rate (typical 10\%-30\% without incentives).
- Delay in receiving responses.
- No real-time clarification; must keep questions simple & unambiguous.
- Cost note: Printing & postage \rightarrow moderate; still cheaper than in-person interviews.
Online Questionnaire
- Platforms: Google Forms, SurveyMonkey, Qualtrics, Microsoft Forms.
- Advantages:
- Lowest cost; automated data capture in CSV/Excel.
- Potentially global reach; mobile-friendly.
- Skip logic / branching easily implemented.
- Disadvantages:
- Non-probability bias: sample restricted to internet users.
- Response rate still modest (typical 15\%-35\%).
- Risk of multiple submissions (mitigated via IP checks, tokens).
- Real-estate application: quick sentiment poll of property investors across multiple cities.
Data Presentation
- Purpose:
- Convert raw numbers into digestible visuals enabling pattern recognition, comparison, and communication to stakeholders.
- Facilitates exploratory data analysis (EDA) & reporting.
- Common formats taught:
- Table (ungrouped & grouped)
- Graph / Bar Graph
- Pie Chart
- Histogram
- Ogive (Cumulative Frequency Curve)
- Selection guidelines:
- Nature of data (categorical vs. numerical, discrete vs. continuous).
- Audience preference & interpretability.
- Maximize data-ink ratio (Tufte principle) – avoid clutter.
Tables
Ungrouped Data Table
- Lists each observation or category with its frequency/percentage.
- Suitable for small data sets or categorical variables with few categories.
- Example: Sales of five property types with associated counts.
Grouped Data Table
- Continuous data binned into class intervals.
- Components:
- Class limits (e.g.
- Lower limit: 50, upper limit: 59)
- Class boundaries (e.g. 49.5 \text{ to } 59.5) to prevent gaps.
- Class mark x_i = \dfrac{\text{lower}+\text{upper}}{2} (used in mean estimation).
- Frequency (f_i) and maybe Percent (\%).
- Significance:
- Essential precursor for histogram & ogive construction.
- Enables computation of grouped mean \bar X = \dfrac{\sum fi xi}{\sum fi} and variance s^2 = \dfrac{\sum fi (xi - \bar X)^2}{\sum fi -1}.
Bar Graph
- Visualizes categorical data frequencies.
- Bars separated (unlike histogram).
- Height/length proportional to frequency or percentage.
- Example snapshot from slide:
- Games played: Football, Basketball, Tennis, Cricket.
- Highest participation (“Towers” label indicates top category though slide text ambiguous).
- Design tips:
- Consistent width; labeled axes.
- Avoid 3-D distortions.
Pie Chart
- Circle divided into sectors.
- Each sector’s angle \thetai proportional to category percentage: \thetai = 360^{\circ}\times (\%/100).
- Best when number of categories \le 6.
- Quickly communicates highest & lowest shares.
- Limitation: Difficult accurate comparison of similar-sized slices; avoid if precise reading needed.
Histogram
- Visual representation of continuous data distribution.
- Adjacent bars touch, reflecting continuum.
- X-axis: class boundaries of a grouped table.
- Y-axis: frequency or density.
- Key insights derived:
- Shape (skewness, modality)
- Outliers
- Range & spread
- Example slide values range from 1360 to 1940 (unit unspecified – possibly height in mm, or rent in RM). Central tendency appears near 1520{-}1690.
Ogive (Cumulative Frequency / Percentage Curve)
- Constructed by plotting cumulative class boundary vs. cumulative frequency or percent.
- Graph rises monotonically left to right.
- Uses:
- Estimating median (at 50\%) & quartiles (at 25\%, 75\%).
- Comparing two distributions (overlay ogives).
- Interpretation from slide:
- Ogive for speed or energy consumption across locations; plateaus indicate upper limit.
Connections to Previous Lectures & Broader Principles
- Sampling methods (covered earlier) determine representativeness that underlies data-collection validity.
- Measurement scale (nominal, ordinal, interval, ratio) drives appropriate presentation choice.
- Ethical frameworks (Belmont principles: Respect, Beneficence, Justice) apply to both data collection and presentation – avoid deceptive graphics (e.g. truncated axes).
- Real-world relevance:
- Property valuation models rely on accurate data; poor collection leads to appraisal error \Delta V which propagates to investment decisions.
- Planners use histograms of household income to classify affordable-housing need.
Numerical / Statistical References Recap
- Response rate face-to-face \approx 80\% (contextual estimate).
- Telephone typical 40\%-60\%, Mailed 10\%-30\%, Online 15\%-35\%.
- Grouped mean formula: \bar X = \dfrac{\sum fi xi}{\sum f_i}.
- Class mark: xi = \dfrac{Li + U_i}{2}.
- Percentage sector angle in pie chart: \theta_i = 3.6^{\circ}\times \text{percentage}.
- Cronbach’s alpha threshold for reliability: \alpha \ge 0.7.
Practical Tips for Exam Preparation
- Memorize definitions, advantages, disadvantages for each data-collection method; flash-card friendly.
- Practice choosing presentation type given scenario (e.g. income distribution \rightarrow histogram).
- Be ready to draw / interpret a simple ogive, compute median from it.
- Understand how mis-presentation (e.g. uneven class widths) can mislead – potential short-answer topic.
- Work through numeric examples: create grouped table, compute \bar X, build histogram & ogive.