You can have data without information, but you cannot have information without data.”

— Daniel Keys Moran, author and computer programmer

Spatial Data

What’s so special about spatial?

GIS is like the machinery that transforms data into the commodity–information–that is needed to solve problems or create opportunities.”

DiBiase (2018)

GIS data files are complicated. One GIS dataset often consists of multiple linked files that contain different bits of information. It is common for a particular GIS dataset to have a data file and then separate files used to describe the spatial parts of the data.

Spatial data is traditionally found in one of two formats: vector or raster.

Vector Raster
Mathematical equations Pixel based
Easily scaled without losing quality Does not scale up very well; typically in a prescribed resolution
Large dimensions maintain small file size Large dimensions have large file sizes
Can be easily converted into raster format Depending on complexity, conversion to vector can be very time consuming
Vector points Raster points
Vector lines Raster lines
Vector areas Raster areas
Example: Vectors in Action Example: Raster drawing

Vector Data

Vector Data
a classification of spatially referenced data, which includes points, lines (polylines) and areas (polygons)
Points
objects represented by a single pair of locational coordinates (for example, x and y or latitude and longitude); dimension = 0
Lines
sets of ordered and connected points that may describe linear features (for example, boundaries, roads, or rivers); dimension = 1
Areas
collections of line segments that close to form discrete units; dimension = 2
Main Concepts:
  • Attributes
  • Feature Class
  • Shapefile

Vector data is the generic term for GIS data built up from defined points in space. These can simply be points or can be linked into lines and polygons. These features may also include additional non-spatial information called attributes. GIS store these attributes in a table, which is linked to the spatial data.

Vector data does not really have a resolution because each point identifies a unique point in space. These points are recorded with a certain precision (e.g., GPS is often ±10 meters or so) but this precision may not be known. Despite this, vector data can still be imprecise or inaccurate!

A common file format for storing vector data is the shapefile. A shapefile is actually a collection of at least four files, all of which share the same name, but have different file extensions that relate to the following information.

A collection of same-type features (point, line or polygon) is called a feature class.

The key point here is that you have to keep all of the different parts of shapefiles together for the data to work.

Be careful when moving GIS files!

Raster Data

Raster data is the GIS term for gridded data: an equally spaced grid where each cell (or pixel) has one value (digital number or DN) that represents the dominant value of that cell. These values can be continuous (e.g. elevation or temperature) or discrete (e.g. population densities or habitat categories). Sometimes raster data may contain null or ‘no data’ cells. Examples of raster data include aerial and satellite imagery.

Raster datasets come in three varieties (Dempsey 2012):

The main advantage of raster data is the ability to portray continuous data that cannot be well represented by points, lines or polygons.

The main disadvantage of raster data is its inaccuracy as compared to vector data.

Raster Data
a classification of spatially referenced data, which comprise of cell-based surfaces
Cell
a square or rectangular pixel in a raster dataset
Continuous data
data that do not have well-defined geographic boundaries and when mapped appear to have smooth transitions from one value to another (e.g., temperature, elevation)
Discrete data
data that have specific values in fixed locations (e.g., buildings, roads, land cover types)
DN
digital number; the value of a raster cell
Pixel
short for picture element; the smallest element of information in an image

Georeferencing

A georeferenced raster dataset has several pieces of geographic information:

  • the projection of the raster data
  • the resolution of the cells (dimension of each cell in the units of the projection)
  • a reference point with known coordinates (usually the top left corner of the grid)

Many raster datasets are already georeferenced. In these cases, when the raster is added to a GIS map, the data will automatically appear in the correct place with reference to the projection.

In other cases, the raster data is not georeferenced—this is often the case with aerial photos, some satellite images or scans of paper maps. In order to turn such data into a GIS dataset, the data need to be georeferenced.

In georeferencing, a set of ground control points (GCPs) are used to orient the image in space and then this oriented image is resampled into a new dataset with cell boundaries and cell size defined in the projection.

Georeferencing
to define something in physical space (e.g., location, coordinate reference system, map projection)

Metadata

The data about other data.”

—Meriam Webster

Evaluating the quality of data can be difficult, especially if it was created by someone else. Therefore, there is an obligation to those who create data to include a report that summarizes the data quality (e.g., the spatial accuracy) in addition to information on other aspects of the data (e.g., the creation date or when it was last updated, the geographic area/extent, the coordinate system, explanations of attributes, copyright info and/or use restrictions), in such a way as to inform potential users of the data’s limitations and uses such that they can determine whether is best suited for their purpose. This collection of information about data is called metadata. (Price 2012)

Geographic metadata seeks to answer questions, such as:

Standards

The Federal Geographic Data Committee (FGDC) and the International Organization for Standardization (ISO) have worked together to develop metadata standards, including:

  • Content Standard for Digital Geospatial Metadata (CSDGM)
  • ISO 19115

Faced with several standards, preparing a complete set of metadata can be a daunting task; however, it most cases, some information is better than none. Regardless of the standard, the core components of the metadata record should include (https://www.fgdc.gov/metadata):

Metadata Record Information
information about the metadata record including the language in which the record is written, a unique file identifier for the metadata record, the metadata standard used to organize the record, a point of contact for the metadata record, and the date that the metadata record written
Identification Information
citation-level information about the data including the title, abstract, purpose for creation, status, keywords (theme and place), and extent (temporal, vertical and horizontal)
Constraints Information
information about legal and security limitations to data access and use
Data Quality Information
information about the processes and sources used to develop the data and positional and/or accuracy assessments performed
Maintenance Information
information about the scope and frequency of data updates
Spatial Representation
information about the mechanism used to represent spatial data (grid, point, vector)
Reference System Information
information about the reference systems used to represent geographic position and time
Content Information
information about the data set entities and attributes
Symbology Information
information about the symbols used to represent spatial features
Distribution Information
information about the data distributors and methods for obtaining the data
Metadata Extension Information
information about custom, user-based, changes to the elements, domains or conditionality of the standard
Application Schema Information
information about the schema or data models used to structure the data

Content Standard for Digital Geospatial Metadata (CSDGM) graphical representation. FGDC.gov.

Databases

Geographic data (e.g., vector, raster and table) may be collectively stored together in what is called a geodatabase—a versatile format convenient for data editing and management.

These are several types of geodatabases. The one we are focused on in this class is the File geodatabase.

File Geodatabase

This database type is designed for individuals or small groups. The data set is stored as a separate file within a system folder and each file can be up to one terabyte in size.

File geodatabases are best for cross-platform operations (i.e., accessible by multiple operating system architectures, such as Windows, Macintosh, and Linux). (Price 2012)

Database design

Good databases do not happen by accident (Price 2012). Geodatabases are created as empty shells. These shells have a defined organizational structure (this data model is called a schema) into which feature classes and other objects can be added.

One organizational method is to create feature datasets. A feature dataset is a collection of feature classes that are related to one another (e.g., they are categorically similar) and share the same spatial reference.

One advantage of geodatabases is the ability to define a set of rules for data attributes. This is accomplished through the use of an attribute domain—pre-defined attribute value constraints.

There are two types of domains: range domains and coded domains.

Range domains
A specification of the lowest and highest acceptable values.
Coded domains
A specification list of acceptable values. These values are made of code and textual description pairs.

References

Dempsey, Caitlin. 2012. “GIS Data Explored - Vector and Raster Data.” https://www.gislounge.com/geodatabases-explored-vector-and-raster-data/.

DiBiase, David. 2018. The Nature of Geographic Information. University Park, PA: Penn State’s College of Earth; Mineral Sciences. https://www.e-education.psu.edu/natureofgeoinfo/.

Price, Maribeth. 2012. Mastering Arcgis. 5th ed. New York, NY: McGraw Hill.