CH 16 Data Handling & Preparation/ Data Handing and Data Coding Lecture 9

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/28

There's no tags or description

Looks like no tags are added yet.

Last updated 7:03 PM on 3/30/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

29 Terms

New cards

steps in data preparation for analysis

coding
editing, validating and cleaning
entering
transforming

data editing, 2. coding, 3. statistically adjusting data if required

New cards

data editing

to identify omissions, ambiguities, errors, conducted in field by interviewer and field supervisor

New cards

problems to identify in editing

interviewer error ( may not be giving respondent correct instructions)
omissions (fail to answer all questions)
ambiguity (response unclear or not legible)
inconsistencies (2 respondes inconsistent)
lack of cooperation (rebeling & checking same response)
ineligible respondent (inappropriate respondent (ex. women only 18)

New cards

alternatives available for data editing

contact respondent again done by interviewer
throw whole questionnaire as not useable
throw out problem questions
bypass questions
code illegible to idk or no opinion to simplify
input missing values for certain variables through use of mean profiles

New cards

by product of editing

helps in evaluating and guiding the interviews; an interviews tendency to allow a certain type of error should be detected by editing

New cards

flawed and missing data in editing

discard flawed records with missing answers
treat flawed record as missing data and treat missing data as separate category
obtain additional. information

New cards

data coding

one or multiple columns for each question, each row corresponds to 1 response
indicators or identification #s assigned for each type or responses
different code for diff. answers and missing answers
assign same code for similar answers

New cards

data coding open ended q

specify exactly how responses to be entered
column ref synonymous with variable identification link b/t question # columns for each response
each q in separate column and range of permissible values provide key info of the value to be entered
response value into spreadsheet to generate info but needs to be checked for error
assign same code for similar answers

New cards

data coding close ended q

long list of possible responses generated & placed into list of items
assignment of a response involves a judgement decision if the response doesn’t match a list item exactly
EX: q why did you select instrument from ajax. may get many responses, decisions must be made about response categories. difficult to put into categories

New cards

general rules

one or multiple columns for each question, each row corresponds to 1 response
indicators or identification #s assigned for each type of response

New cards

data coding example

which is your favorite online book store

amazon (A)
barnes and noble (B)
other (o)

records

#1 (2)

#2 (1)

#3 (3)

New cards

statistically adjusting the data

to enhance its quality for data analysis
weighting

New cards

weighting

procedure by which each response in database is assigned a number according into some pre specified rule.
done to make sample data more representative of a target population on specific characteristics
underrepresented categories given higher weights
over-representated given lower weights
used for adjusting the sample so that greater importance is attached to respondents with certain characteristics

example: if a study is conducted to determine market potential of new sports drink → researcher might attach greater weight to opinion of young people because they main users of product

ex: 2.0 to person who under 30

1.0 to person who over 30

New cards

variable respecficiation

a procedure in which the existing data are modified to create new variables or in which large number or variables are collapsed into fewer variables
purpose: create variables consistent with study objectives

ex: original variable represented the reasons for purchasing a car w/o response categories. may be put into 4 categories: performance, price, appearance, service

New cards

dummy variables

variable 0 or 1 to represent categories
uses extensively for respecifying categorial variables
also known as binary, dichotomous, instrumental and qualitative var.
m levels of qualitative we use m-1 to specify them
m categories→ m-1 dummy var

ex: 2 categories (1st half , 2nd half, m=2 m-1-1 so 1 dummy needed)

New cards

scale transformation

for interval, ratio data
manipulation of scale values to ensure comparability with others scales
in same study different scales be employed for measuring different variables
standardization

New cards

standardization

type of scale transformation
allows researchers to compare variables that have been measured using different types of scales
ex: scales measures in $$, cents, value of variance you need to change
to compare variances both variables bought down to common unit of measurement

Zi (Xi - X) / Sx

X has _ on it

New cards

strategy for data analysis

1st step in data analysis after data preparation is to analyze each question or measure itself by using tabulation

New cards

tabulation

consists of counting the number of cases that fall into various categories or counting how many responses fall into each category.

ex: whats your fav color tabulation into a table red:2, Orange: 1, green: 2

New cards

primary use for tabulation

determining empirical distribution (frequency distribution) of the variable in question (ex. 40% said blue, 10% said orange)
calculating the descriptive (summary) statistics particularly means or %’s (can compute % , totals helps spot missing/weird mistakes)

New cards

cross tabulations

compares 2 variables at same time instead of 1
to assess if any associations is present instead separate b/t two nominal and ordinal variables (non number)
if variables measured as interval or ratio they can be transformed to nominally scaled variables
sample is divided into subgroups in order to learn how the dependent variable varies from subgroup to subgroup
% computed on each cell basis or rows or columns, when computations are by rows or columns cross tab usually referred as contingency tables

ex: the income of household can be rescaled as <30,000 and 39,000 to cross tab with another nominally scaled variable

New cards

Frequency Distribution

simply reports the number of responses that each question received and its the simplest way of determining empirical distribution of the variable
part of tabulation
organizes data into classes or groups of values and shows the number of observations from the dataset that falls into each of the classes
response categories may be combined with many question
result in categories w worthwhile # of respondent
histogram: series of rectangles each proportional in width to the range of values with a class proportions in weight to # of items falling into class

New cards

descriptive statistics

statistics normally associated with a frequency distribution that helps summarize information at frequency table

includes

measures of central tendency (mean, median, mode)
measures dispersion (ranges sd, coeff of variation)
measures of shape (skewness & kurtosis)

requires data to be collected using interval or ratio scaled questions

New cards

factors influencing the choice of statistical techniques

type of data
research design
assumptions underlying the test statistic & related considerations

New cards

type of data

nominal, ordinal , interval, ratio

New cards

nominal

most primitive form of data, #s assigned to objects, based on objects, belong to particular categories made important

New cards

ordinal

represents a higher measurement than nominal, because numbers assigned to reflect order also to identify the objects mean, median and nonparametric tests

New cards

interval & ratio

metrics data best for data analysis, wide range of parametic, non parametric test mean, median, mode

New cards

research design

a second consideration that affects the choice of analysis is RD to generate data
decisions analyst have to face involve the dependency of observations the number of observations per project, # of groups being analyzed & control excersised over variable of interest
independent or dependent samples?