1/73
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data model concepts
simplified view fo real world, physical entities/phenomena area approximated by data in GIS with spatial location and extent of physical entities and non-spatial properties, entity is represented by spatial feature/spatial object, essential characteristics are also defined for each entity, objects are abstractions in a spatial database, spatial objects
Spatial objects
objects in a spatial database representing real-world entities with associated attributes
Spatial data
describes location (where), don’t have a million attributes for any type of data, never put everything into 1 single file due to having trouble keeping it organized
Attribute data
specifies characteristics at that location (what, how, much, when, etc)
How do we represent data digitally in GIS?
group into layers based on similar characteristics using a vector or faster model, selecting appropriate data properties for each layer with respect to projection, scale, accuracy, and resolution, database management systems don’t exist in CAD
Vector data model
coverage with ARC/INFO, shapefile with ArcView
Faster data model
GRID/Image with ARC/INFO and ArcView
Spatial data types
continuous, areas, networks, points/lines/areas
Areas
unbounded with land use and rock types, bounded with zoning areas and counties, moving with air masses and animal herds, networks with groves, utilitizes, and streams, points/lines/areas being fixed like wells, streets, or lamps and moving with cars, planes, and any living organism
Nomial
no order, soil types, county names
Ordinal
ordering of things, roads, zip codes, streams
Binary
this/that = red/blue, yes/no, landslide/not landslide
Discrete
only take on certain values, categorized into a classification scheme
Continuous
interval lacks an absolute 0, ratio has a defined 0, income, height, and weight
Goal of relational DBMS
produce map of values by district/neighborhood
Problem with relational DBMS
no district code available in parcel Table
Solution with relational DBMS
join parcel table, containing values with geography table containing location codings, using Block as key field
Secondary/foreign key with relational DBMS
common info in both tables
Field types
default to a shorter integer due to being smaller bytes to use less storage space with different precisions, text, date, BLOBs, object identifers, global identifiers (GUID)
Text
up to 64,000 characters, text strings such as names and descriptions, could be numerical classifications
Date
Mm/dd/yyyy, hh:mm:ss, AM/PM
BLOBs (binary large objects)
annotation and dimensions, images and other multimedia, use custom loader/viewer
Object identifiers
ObjectID field, guarantees a unique ID for each row in the table
Global identifiers (GUID)
registry style string consisting of 36 characters enclosed in curly brackets, uniquely identify a row within a geodatabase
Raster
location is referenced by a grid cell in a rectangular array (matrix), attribute is presented as a single value for that cell, faster than vector, as zoom in more it’s more pixelated, data comes from images from remote sensing, scanned maps, and elevation data from USGS, best for continuous features with elevation, temp, soil type, and land use
Vector
location referenced by x, y coordinates that can be linked to form lines and polygons, attributes referenced through unique ID number to tables, more correct/accurate than raster, data comes from DIME and TIGER files from US Census and DLG from USGS for streams, roads and cansus data, best for features with discrete boundaries like property line, political boundaries, and transportation
Representing data using raster
area covered by grid with equalized cells, location of each cell calculated from origin of grid (2 down and 3 over), pixels are cells, image data/imagery are raster data, attributes are recorded by assigning each cell a single value based on the majority feature in the cell like land use type, easy to do overlays/aalyses by combining and corresponding cell values, yield = rainfall + fertilizer
Raster orientation
angle between true borth and direction defined by raster columns
Class
set of cells with same value (type = sandy soil)
Zone
set of contiguous cells with same value
Neighborhood
set of cells adjacent to a target cell in some systematic manner
Resolution and storage size
say we have an 8-bit image that’s 500 columns by 500 rows, storage size is 500 × 500 × 8 = 2,000,000 bytes, 2,000 kbytes, 2 Mbytes
Tesselations
square grid, rectangular, triangular and hexagonal, triangulated irregular network
Square grid
equal length sides, conceptual simplest, cells can be recursively divided into cells of same shape, 4-connected neighborhood with all neighboring cells being equidistant, 8-connected neighborhood with all neighboring cells not equidistant and center of cells on diagonal is 1.41 unites away
Rectangular
commonly occurs for lat/long when project, data collected at 1 degree by 1 degree will be varying sized rectangles
Triangular and hexagonal
all adjacent cells and points are equidistant
Triangulated irregular network (tin)
vector model used to represent continuous surfaces (elevation) and more later under vector
Example of tessellations
neverending data
What does the size of the square grid divided evenly on a recursive base?
length decreases by half, number of areas increases fourfold, areas decrease by 1/4
What does resampling do with combining the 4 cell values?
storage increases if you save all samples but can save processing costs if some operations don’t need high resolution
How does nominal/binary data save storage?
using maximum block representation with all blacks with same value at any one level in tree can be stored as a single value
Band sequential (BSQ)
each characteristic in separate file, elevation file, temp tile, good for compression, good if focus on one area
Band interleaved by pixel (BIP)
all measurements for a pixel grouped together, good if focus on multiple characteristics of geographical area, bad if you want to remove/add a layer
Band interleaved by line (BIL)
rows follow each other for each characteristics, not good at compressing anything
Raster data structures database representation
raw data may come in BSQ, BIP, BIL but not good for efficient GIS processing, represented as standard database table, joins based on ID as the key field can be sued to relate variables in different tables
Generic raster data model is actually implemented in several different computer file formats
GRID is ESRI’s propietary format for storing and processing raster data only in ESRI, standard industry formats for image formats can be used to display raster data but not for analysis, georeferencing information required to display images with mapped vector data that requires an accompanying world file to provide lacational information, geotiff
Geotiff
single file incorporates both the image and the world information into a single file
Point (node)
0-dimension, single x, y coordinate pair, 0 area, tree, oil well, label location
Line (arc, polyline)
1-dimension, 2+ connected x, y coordiantes, road and stream
Polygon
2-dimension, 4+ ordered and connected x, y coordinates (4 minimum), first and last x, y pairs are the same, encloses an area, census tracts, county, and lakes
Whole polygon (boundary structure)
polygons described by listing coordinates of point in order as you walk around the outside boundary of the polygon, all data stored in 1 file inefficiently store attribute data for polygon in samefile, coordinates/borders for adjacent polygons store 2x may not be same but results in silvers (gaps)/overlaps, all lines are double (except for those outside the periphery), no topological information about polygons, used by 1st computer mapping program, SYMAP in late 60s, adopted by SAS/GRAPH and many business thematic mapping programs
Points and polygons
polygons described by listing ID numbers of points in order as you talk around the outside boundary a second file lists all points and their coordiantes, solves duplicate doordinate/double border problem, lines can be handles similar to polygons, no topological information, used 1st by CALFORM, 2nd generation mapping package from Laboratory for Computer Graphics and Spatial Analysis at harvard in early 70s and good for saving room
Node/arc/polygon topology
topological components which permit relationships between spatial elements to be defined (always going clockwise)
ARC
defines relations between points by specifying which are connected to form arc, defines relationships between arcs by specifying which arc are connected to form routes and networks
Polygon
define polygons (areas) by specifying which arcs comprise their boundary
Left-right
defines from nodes and to nodes that permit, need a topology file separated, left permit and right polygon to be specified
Features in theme (coverage) have unique identifiers
common identifiers with links to coordinate table and attribute table, concepts are those of a relational database and really prerequisite for vector model
Points
elevation points (nodes) chosen based on relief complexity and their 3D location determined
Polygons
elevation points connected to form a set of triangular polygons then represented in a vector structure
Attribute
attribute via relational DBMS
Coverage
multiple physical files (12+) in folder, proprietary
KML/KMZ
keyhole markup language, KMZ is zipped KML file and associated files
OpenStreetMap
crowdsourced GIS data project, XML based file formats, .osm.pbf denotes a OSM protocolbuffer binary format
Shapefile
comprised of several files all of which must be present, additional files with .sbn and .sbx, openly published specs to other vendors can create 2-5 files
Geodatabase
multiple layers saved in single .gdb
Object view
real world is a series of entities loathed in space, object is digital representation of an entity with 3 types of point, line, area objects, some entity is represented at different scaled by different object types, behavior can be associated with objects
Field view
real world has properties which vary continuously over space, raster/vector, value is categorical/integer variable then places with same value can be grouped, floating point data can categorize and treat it as an object
Representing surfaces
surfaces involve 3rd elevation value to x, y horizontal values, complex to represent since there are an infinite number of potential points to model with raster-based digital elevation model and vector based triangulated irregular networks, ¾ alternative digital terrain model approaches available, massed points and breaklines
Raster-based digital elevation model
regular spaced of elevation
Vector based triangulated irregular networks
irregular triangles with elevations at the 3 corners
Vector-base contour lines
lines joining points of equal elevation at specified Harvard
Massed points and breaklines
raw data from which 1 regular/irregularly spaced point elevations, breaklines
Breaklines
point elevations along a line of signifcant change in slope (vallet, floor, ridge, crest)
Digital elevation model
sample array of elevations that are regularly spaced intervals in x and y directions, 2 approaches for determining surface z value of location between sample points and lattice each mesh point represents a value on the surface at the center of the grid approximated by interpolation between adjacent sample points and doesn’t imply an area of constant value, surface grid considers each sample as a square cell with constant surface value