1/13
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
absolute vs derived data
absolute = raw data, counts of measurement
derived = standardized, a count of data then divided → an average
5 types of derived data
Proportion = a proportion of the whole → geography students/total students
Percent = a proportion per 100/1000 → geography students/total students * 100
Ratio = a proportion between two variables → geography students/non-geography students
Density = a proportion within land → geography students/sq.km
rate = calculated for many things → number of cases/population * 100,000
population vs sample
Population = the total set of elements that one can study
Ex: everyone in the class of GEOG 372 is a population, the prof can ask us anything
Sample = portion of the population you actually study
Ex: asking only one row in the classroom questions and assuming it applies to everyone
statistics vs descriptive statistics vs inferential statistics
Statistics = characteristics
Descriptive statistics = description of statistics of the population/sample
Inferential statistics = when you apply the sample values to the whole population
Ex: voting results in the newspaper when it says the results present a margin of error along with the results because it assumes the sample results are reflective of the whole population
extrapolation
= describing a population based on the sample
unordered data vs ordered data
Unordered Data = not sorted by proportion
Ordered data = sorted by proportion
Allows you to see the minimum, maximum, range (could be expressed as a range or a single number by subtracting max from min), duplicates, and outliners
Can kind of see geographic pattern if you are aware of where these places are -> it is easier if you actually make the map to visualize the data
measures of central tendency
= measures of where the data is centered
Mean = average value across all data -> affected by outliers
Median = middle number -> not affected by outliers
Mode = the most often occurring value -> can be useful for nominal data
data distribution
= thinks of how the data is spread out, not forgetting the outliers, but paying attention to the range and outliers. Ways to show data distribution:
Point graph/number line = looks at the range of data through a line and marking each data value within the line
Makes it more obvious that there are outliers
Drawbacks: obscures duplicates, can see the amount of duplicates there are
Histogram = a bar graph, the x axis show the range, the y axis show the amount of that value there is
Useful for looking at how data is distribution
Useful to include these on a map if the value distribution is important to the map
Can tell you a lot how the data is distributed
four types of data distribution
Normal = a bell curve, the largest occurrence is in the middle of the range
Common in nature and all over the place
Uniform = the same or almost the same number of occurrence in the data set (also know as an even distribution)
Less common to have a perfectly uniform distribution
Skewed = a lot of the values fall in out end or the other
Not exactly an outlier, because outliers only show one or two values on the other side
Outlier = a normal with one or two values on the other side
two measures of dispersion
Range - shows how spread out data is
Standard deviation = another measure of how spread out the values on a data set is - it is the average distance between values and the mean in the data set
The larger standard deviation, the larger variation
The smaller standard deviation, the smaller variation
scatterplot graph
= allows you to see two variables which is a way to show correlation between variables
A line of best fit will show the correlation
Correlation does not result in causation
Strength of correlation is dependant on the units it is being measured, knowing that the correlation holds at several units you know that correlation is strong
data classification
= process of combining data into groups, or classes, with each class represented by a different symbol
Usually these are choropleth maps with areas shaded by different hues -> need classification to differentiate between data symbols
Usually use 4-6 different classes -> anything more than six is hard to distinguish between the classifications
There usually isn't one "right" classification to use
Depends on the audience
Depends on what data you are classifying
four methods of data classification
Equal Interval = each class interval has an equal space between
Proper equal intervals = only consider the range of your data
Easy to compute and easy to understand -> but in this map specifically there is an empty class because the values are not evenly distributed
Doesn't show data distribution
Best used when data is evenly distributed to have representation of each class
Quantiles = put an equal number of data points in each class -> the number values are equal
Ex: quantiles (4 classes), quintiles (5 classes), sextiles (6 classes)
For number classes you can just pick how many classes you want
Sometimes when quantiles don't work out perfectly, you can manually move the values so that at least the same values are in the same group
Easy to compute and easy to understand
Good for ordinal data
Map often turns out looking nice
Doesn't consider data distribution, get widely different values class together and close values classed different
Good for evenly distributed data
Mean-SD = Start with the mean, then subtract the SD from the mean, so everything below the mean becomes the first class, the second class is the mean. Then to get the third and fourth class, you add the SD to the mean
Generally works very well when the mean is a useful dividing point in the data and creates an obvious dividing point
Does a good job of showing that there is one small outlier and one big outlier
Good for normally distributed data
Needs an understanding of statistics to compute it, its important for the map user to interpret the map well (so if they don't have a background in statistics, it might not make sense)
natural breaks = use naturally occurring breakpoints/gaps to minimize the difference between values that fall in the same gaps and maximize the difference between values that are different
Can do this manually, by looking at the data and looking for obvious breaking points -> different people will break groups in different points (subjective)
Another way to do this is with the Jenks optimal method/Jenks optimization algorithm = algorithm that does what you would do subjectively, optimizes sameness between classes -> often the best choice
The default algorithm in ARCGIS
how to pick data classification methods
Equal Interval doesn't really represent data well when there are outliers and overall distribution
Quantiles doesn't take into account outliers
Mean - SD places too much importance on the mean, and shouldn't be used for a standard audience
Natural Breaks shows similarities and outliers, and overall data distribution