You can have data without information, but you cannot have information without data.”
— Daniel Keys Moran, author and computer programmer
GIS is like the machinery that transforms data into the commodity–information–that is needed to solve problems or create opportunities.”
— DiBiase (2018)
GIS data files are complicated. One GIS dataset often consists of multiple linked files that contain different bits of information. It is common for a particular GIS dataset to have a data file and then separate files used to describe the spatial parts of the data.
Spatial data is traditionally found in one of two formats: vector or raster.
Vector | Raster |
---|---|
Mathematical equations | Pixel based |
Easily scaled without losing quality | Does not scale up very well; typically in a prescribed resolution |
Large dimensions maintain small file size | Large dimensions have large file sizes |
Can be easily converted into raster format | Depending on complexity, conversion to vector can be very time consuming |
Example: Vectors in Action | Example: Raster drawing |
Vector data is the generic term for GIS data built up from defined points in space. These can simply be points or can be linked into lines and polygons. These features may also include additional non-spatial information called attributes. GIS store these attributes in a table, which is linked to the spatial data.
Vector data does not really have a resolution because each point identifies a unique point in space. These points are recorded with a certain precision (e.g., GPS is often ±10 meters or so) but this precision may not be known. Despite this, vector data can still be imprecise or inaccurate!
A common file format for storing vector data is the shapefile. A shapefile is actually a collection of at least four files, all of which share the same name, but have different file extensions that relate to the following information.
A collection of same-type features (point, line or polygon) is called a feature class.
The key point here is that you have to keep all of the different parts of shapefiles together for the data to work.
Be careful when moving GIS files!
Raster data is the GIS term for gridded data: an equally spaced grid where each cell (or pixel) has one value (digital number or DN) that represents the dominant value of that cell. These values can be continuous (e.g. elevation or temperature) or discrete (e.g. population densities or habitat categories). Sometimes raster data may contain null or ‘no data’ cells. Examples of raster data include aerial and satellite imagery.
Raster datasets come in three varieties (Dempsey 2012):
The main advantage of raster data is the ability to portray continuous data that cannot be well represented by points, lines or polygons.
The main disadvantage of raster data is its inaccuracy as compared to vector data.
A georeferenced raster dataset has several pieces of geographic information:
Many raster datasets are already georeferenced. In these cases, when the raster is added to a GIS map, the data will automatically appear in the correct place with reference to the projection.
In other cases, the raster data is not georeferenced—this is often the case with aerial photos, some satellite images or scans of paper maps. In order to turn such data into a GIS dataset, the data need to be georeferenced.
In georeferencing, a set of ground control points (GCPs) are used to orient the image in space and then this oriented image is resampled into a new dataset with cell boundaries and cell size defined in the projection.
The data about other data.”
—Meriam Webster
Evaluating the quality of data can be difficult, especially if it was created by someone else. Therefore, there is an obligation to those who create data to include a report that summarizes the data quality (e.g., the spatial accuracy) in addition to information on other aspects of the data (e.g., the creation date or when it was last updated, the geographic area/extent, the coordinate system, explanations of attributes, copyright info and/or use restrictions), in such a way as to inform potential users of the data’s limitations and uses such that they can determine whether is best suited for their purpose. This collection of information about data is called metadata. (Price 2012)
Geographic metadata seeks to answer questions, such as:
The Federal Geographic Data Committee (FGDC) and the International Organization for Standardization (ISO) have worked together to develop metadata standards, including:
Faced with several standards, preparing a complete set of metadata can be a daunting task; however, it most cases, some information is better than none. Regardless of the standard, the core components of the metadata record should include (https://www.fgdc.gov/metadata):
Geographic data (e.g., vector, raster and table) may be collectively stored together in what is called a geodatabase—a versatile format convenient for data editing and management.
These are several types of geodatabases. The one we are focused on in this class is the File geodatabase.
This database type is designed for individuals or small groups. The data set is stored as a separate file within a system folder and each file can be up to one terabyte in size.
File geodatabases are best for cross-platform operations (i.e., accessible by multiple operating system architectures, such as Windows, Macintosh, and Linux). (Price 2012)
Good databases do not happen by accident (Price 2012). Geodatabases are created as empty shells. These shells have a defined organizational structure (this data model is called a schema) into which feature classes and other objects can be added.
One organizational method is to create feature datasets. A feature dataset is a collection of feature classes that are related to one another (e.g., they are categorically similar) and share the same spatial reference.
One advantage of geodatabases is the ability to define a set of rules for data attributes. This is accomplished through the use of an attribute domain—pre-defined attribute value constraints.
There are two types of domains: range domains and coded domains.
Dempsey, Caitlin. 2012. “GIS Data Explored - Vector and Raster Data.” https://www.gislounge.com/geodatabases-explored-vector-and-raster-data/.
DiBiase, David. 2018. The Nature of Geographic Information. University Park, PA: Penn State’s College of Earth; Mineral Sciences. https://www.e-education.psu.edu/natureofgeoinfo/.
Price, Maribeth. 2012. Mastering Arcgis. 5th ed. New York, NY: McGraw Hill.