data processing and classification

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/13

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

14 Terms

1
New cards

absolute vs derived data

absolute = raw data, counts of measurement

derived = standardized, a count of data then divided → an average

2
New cards

5 types of derived data

  • Proportion = a proportion of the whole → geography students/total students

  • Percent = a proportion per 100/1000 → geography students/total students * 100

  • Ratio = a proportion between two variables → geography students/non-geography students

  • Density = a proportion within land → geography students/sq.km

  • rate = calculated for many things → number of cases/population * 100,000

3
New cards

population vs sample

Population = the total set of elements that one can study

  • Ex: everyone in the class of GEOG 372 is a population, the prof can ask us anything

Sample = portion of the population you actually study

  • Ex: asking only one row in the classroom questions and assuming it applies to everyone

4
New cards

statistics vs descriptive statistics vs inferential statistics

Statistics = characteristics

Descriptive statistics = description of statistics of the population/sample

Inferential statistics = when you apply the sample values to the whole population

  • Ex: voting results in the newspaper when it says the results present a margin of error along with the results because it assumes the sample results are reflective of the whole population

5
New cards

extrapolation

= describing a population based on the sample

6
New cards

unordered data vs ordered data

Unordered Data = not sorted by proportion

Ordered data = sorted by proportion

  • Allows you to see the minimum, maximum, range (could be expressed as a range or a single number by subtracting max from min), duplicates, and outliners

  • Can kind of see geographic pattern if you are aware of where these places are -> it is easier if you actually make the map to visualize the data

7
New cards

measures of central tendency

=  measures of where the data is centered

  • Mean = average value across all data -> affected by outliers

  • Median = middle number -> not affected by outliers

  • Mode = the most often occurring value -> can be useful for nominal data

8
New cards

data distribution

= thinks of how the data is spread out, not forgetting the outliers, but paying attention to the range and outliers. Ways to show data distribution:

  • Point graph/number line = looks at the range of data through a line and marking each data value within the line

    • Makes it more obvious that there are outliers

    • Drawbacks: obscures duplicates, can see the amount of duplicates there are

  • Histogram = a bar graph, the x axis show the range, the y axis show the amount of that value there is

    • Useful for looking at how data is distribution

    • Useful to include these on a map if the value distribution is important to the map

    • Can tell you a lot how the data is distributed

9
New cards

four types of data distribution

  • Normal = a bell curve, the largest occurrence is in the middle of the range

    • Common in nature and all over the place

  • Uniform = the same or almost the same number of occurrence in the data set (also know as an even distribution)

    • Less common to have a perfectly uniform distribution

  • Skewed = a lot of the values fall in out end or the other

    • Not exactly an outlier, because outliers only show one or two values on the other side

  • Outlier = a normal with one or two values on the other side

10
New cards

two measures of dispersion

  • Range  - shows how spread out data is

  • Standard deviation = another measure of how spread out the values on a data set is - it is the average distance between values and the mean in the data set

    • The larger standard deviation, the larger variation

    • The smaller standard deviation, the smaller variation

11
New cards

scatterplot graph

= allows you to see two variables which is a way to show correlation between variables

  • A line of best fit will show the correlation

  • Correlation does not result in causation

  • Strength of correlation is dependant on the units it is being measured, knowing that the correlation holds at several units you know that correlation is strong

12
New cards

data classification

= process of combining data into groups, or classes, with each class represented by a different symbol 

  • Usually these are choropleth maps with areas shaded by different hues -> need classification to differentiate between data symbols

  • Usually use 4-6 different classes -> anything more than six is hard to distinguish between the classifications

  • There usually isn't one "right" classification to use

    • Depends on the audience

    • Depends on what data you are classifying

13
New cards

four methods of data classification

  1. Equal Interval = each class interval has an equal space between

  • Proper equal intervals = only consider the range of your data

  • Easy to compute and easy to understand -> but in this map specifically there is an empty class because the values are not evenly distributed

  • Doesn't show data distribution

  • Best used when data is evenly distributed to have representation of each class

  1. Quantiles = put an equal number of data points in each class -> the number values are equal

    • Ex: quantiles (4 classes), quintiles (5 classes), sextiles (6 classes)

    • For number classes you can just pick how many classes you want

    • Sometimes when quantiles don't work out perfectly, you can manually move the values so that at least the same values are in the same group

    • Easy to compute and easy to understand

    • Good for ordinal data

    • Map often turns out looking nice

    • Doesn't consider data distribution, get widely different values class together and close values classed different

    • Good for evenly distributed data

  1. Mean-SD = Start with the mean, then subtract the SD from the mean, so everything below the mean becomes the first class, the second class is the mean. Then to get the third and fourth class, you add the SD to the mean

  • Generally works very well when the mean is a useful dividing point in the data and creates an obvious dividing point

  • Does a good job of showing that there is one small outlier and one big outlier

  • Good for normally distributed data

  • Needs an understanding of statistics to compute it, its important for the map user to interpret the map well (so if they don't have a background in statistics, it might not make sense)

  1. natural breaks  = use naturally occurring breakpoints/gaps to minimize the difference between values that fall in the same gaps and maximize the difference between values that are different

    • Can do this manually, by looking at the data and looking for obvious breaking points -> different people will break groups in different points (subjective)

    • Another way to do this is with the Jenks optimal method/Jenks optimization algorithm = algorithm that does what you would do subjectively, optimizes sameness between classes -> often the best choice

      • The default algorithm in ARCGIS

14
New cards

how to pick data classification methods

  • Equal Interval doesn't really represent data well when there are outliers and overall distribution

  • Quantiles doesn't take into account outliers

  • Mean - SD places too much importance on the mean, and shouldn't be used for a standard audience

  • Natural Breaks shows similarities and outliers, and overall data distribution

Explore top flashcards

Spanish
Updated 1103d ago
flashcards Flashcards (34)
unit 1
Updated 472d ago
flashcards Flashcards (25)
Honors West Civ 6
Updated 87d ago
flashcards Flashcards (68)
Spanish
Updated 1103d ago
flashcards Flashcards (34)
unit 1
Updated 472d ago
flashcards Flashcards (25)
Honors West Civ 6
Updated 87d ago
flashcards Flashcards (68)