L1 Data Management (copy)

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/37

There's no tags or description

Looks like no tags are added yet.

Last updated 9:37 PM on 10/28/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

38 Terms

New cards

Data

Raw information or facts that become useful information when organized in a meaningful way, can be qualitative or quantitative.

New cards

What is Data Management

The process of "looking after" and processing data.

New cards

Data management Examples?

· Looking after field data sheets

· Checking and correcting the raw data

· Preparing data for analysis

· Documenting and archiving the data and meta-data

New cards

What are the Importance of Data Management?

· Ensures that data for analysis are of high quality so that conclusions are correct

· Good data management allows further use of the data in the future and enables efficient integration of results with other studies.

· Good data management leads to improved processing efficiency, improved data quality, and improved meaningfulness of the data.

New cards

These are the methods of Data Collection when Planning and Conducting an Experiment or Study

Census
Sample Survey
Experiment
Observation Study

New cards

Census

the procedure of systematically acquiring and recording information about all members of a given population. Researchers rarely survey the entire population for two (2) reasons: the cost is too high and the population is dynamic in that the individuals making up the population may change over time.

New cards

Sample Survey

sampling is a selection of a subset within a population, to yield some knowledge about the population of concern. The three main advantages of sampling are that (i) the cost is lower, (ii) data collection is faster, and (iii) since the data set is smaller, it is possible to improve the accuracy and quality of the data.

New cards

Experiment

there are some controlled variables (like certain treatment in medicine) and the intention is to study their effect on other observed variables (like health of patients). One of the main requirements to experiments is the possibility of replication.

New cards

Observation Study

appropriate when there are no controlled variables and replication is impossible. This type of study typically uses a survey. An example is one that explores the correlation between smoking and lung cancer. In this case, the researchers would collect observations of both smokers and non-smokers and then look for the number of cases of lung cancer in each group.

New cards

Characteristics of a Well-Designed and Well-Conducted Survey

a. A good survey must be representative of the population.

b. To use the probabilistic results, it always incorporates a chance, such as a random number generator. Often we don’t have a complete listing of the population, so we have to be careful about exactly how we are applying “chance”. Even when the frame is correctly specified, the subjects may choose not to respond or may not be able to respond.

c. The wording of the question must be neutral; subjects give different answers depending on the phrasing.

d. Possible sources of errors and biases should be controlled. The population of concern as a whole may not be available for a survey. Its subset of items possible to measure is called a sampling frame (from which the sample will be selected). The plan of the survey should specify a sampling method, determine the sample size and steps for implementing the sampling plan, and sampling and data collecting.

New cards

it is the Characteristics of a well-designed and well-conducted survey

Possible sources of errors and biases should be controlled. The population of concern as a whole may not be available for a survey. Its subset of items possible to measure is called a sampling frame (from which the sample will be selected). The plan of the survey should specify a sampling method, determine the sample size and steps for implementing the sampling plan, and sampling and data collecting.

New cards

Nonprobability Sampling

is any sampling method where some elements of the population have no chance of selection or where the probability of selection can’t be accurately determined. The selection of elements is based on some criteria other than randomness.

New cards

Probability Sampling

it is possible to both determine which sampling units belong to which sample and the probability that each sample will be selected.

New cards

Simple Random Sampling (SRS)

all samples of a given size have an equal probability of being selected and selections are independent. The frame is not subdivided or partitioned. The sample variance is a good indicator of the population variance, which makes it relatively easy to estimate the accuracy of results.

New cards

However, SRS can be vulnerable to sampling error because the randomness of the selection may result in a sample that doesn’t reflect the makeup of the population.

True

New cards

Systematic Sampling

relies on dividing the target population into strata (subpopulations) of equal size and then selecting randomly one element from the first stratum and corresponding elements from all other strata.

New cards

Stratified Sampling

the population embraces a number of distinct categories, the frame can be organized by these categories into separate “strata”. Each stratum is then sampled as an independent sub-population. Dividing the population into strata can enable researchers to draw inferences about specific subgroups that may be lost in a more generalized random sample.

New cards

A stratified sampling approach is most effective when three conditions are met:

a. Variability within strata are minimized

b. Variability between strata are maximized

c. The variables upon which the population is stratified are strongly correlated with the desired dependent variable (beer consumption is strongly correlated with gender).

New cards

Cluster Sampling

sometimes it is cheaper to ‘cluster’ the sample in some way (e.g. by selecting respondents from certain areas only, or certain time-periods only). Cluster sampling is an example of two-stage random sampling: in the first stage a random sample of areas is chosen; in the second stage a random sample of respondents within those areas is selected. This works best when each cluster is a small copy of the population.

New cards

Matched Random Sampling

there are two (2) samples in which the members are clearly paired, or are matched explicitly by the researcher (for example, IQ measurements or pairs of identical twins). Alternatively, the same attribute, or variable, may be measured twice on each subject, under different circumstances (e.g. the milk yields of cows before and after being fed a particular diet).

New cards

What are the Planning and conducting experiments?

Characteristics of a Well-Designed and Well-Conducted Experiment
Treatment, control groups, experimental units, random assignments and replication
Sources of bias and confounding, including placebo effect and blinding
Completely randomized design, randomized block design and matched pairs

New cards

In Characteristics of a well-designed and well-conducted experiment, A good statistical experiment includes:

a. Stating the purpose of research, including estimates regarding the size of treatment effects, alternative hypotheses, and the estimated experimental variability. Experiments must compare the new treatment with at least one (1) standard treatment, to allow an unbiased estimates of the difference in treatment effects.

b. Design of experiments, using blocking (to reduce the influence of confounding variables) and randomized assignment of treatments to subjects

c. Examining the data set in secondary analyses, to suggest new hypotheses for future study

d. Documenting and presenting the results of the study

New cards

In Treatment, control groups, experimental units, random assignments and replication:

Control groups and experimental units
Random Assignments
Replication

New cards

Control groups and experimental units

To be able to compare effects and make inference about associations or predictions, one typically has to subject different groups to different conditions. Usually, an experimental unit is subjected to treatment and a control group is not.

New cards

Random Assignments

randomization of allocation of (controlled variables) treatments to units. The treatment effects, if present, will be similar within each group.

New cards

Replication

All measurements, observations or data collected are subject to variation, as there are no completely deterministic processes. To reduce variability, in the experiment the measurements must be repeated. The experiment itself should allow for replication itself should allow for replication, to be checked by other researchers.

New cards

Sources of Bias and Confounding includes:

Confounding
Placebo and Blinding
Blocking

New cards

Confounding

is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable.

New cards

Placebo and blinding

is an imitation pill identical to the actual treatment pill, but without the treatment ingredients.

New cards

Blocking

is the arranging of experimental units in groups (blocks) that are similar to one another.

New cards

Completely randomized design, randomized block design and matched pairs includes:

Completely randomized designs
Randomized block design

New cards

Completely randomized designs

are for studying the effects of one primary factor without the need to take other nuisance variables into account.

New cards

Randomized Block Design

is a collection of completely randomized experiments, each run within one of the blocks of the total experiment.

New cards

Chi-Square

The chi-square test is used to determine whether there is significant difference between the expected value frequencies and the observed frequencies in one or more categories.

New cards

Chi-Square Goodness of Fit Test

determines if a sample data matches a population.

New cards

Chi-Square Test for Independence

compares two (2) variables in a contingency table to see if they are related. It tests to see whether the distributions of categorical variables differ from each other.

New cards

Small chi-square test statistic

that your observed data fits your expected data well. In other words, there is a relationship.

New cards

Large chi-square test statistic

that the data does not fit very well. In other words, there is no relationship.