GIS Data Elements and Models

Although the two terms, data and information, are often used indiscriminately, they both have a specific meaning. Data can be described as different observations, which are collected and stored. Information is that data, which is useful in answering queries or solving a problem. Digitizing a large number of maps provides a large amount of data after hours of painstaking works, but the data can only render useful information if it is used in analysis.

GIS DATA TYPES:

1.      Spatial Data:  Geographic position refers to the fact that each feature has a location that must be specified in a unique way. To specify the position in an absolute way a coordinate system is used. For small areas, the simplest coordinate system is the regular square grid. For larger areas, certain approved cartographic projections are commonly used. Internationally there are many different coordinate systems in use. This Locational information is provided in maps by using Points, Lines and Polygons. These geometric descriptions are the basic data elements of a map. Thus spatial data describes the absolute and relative location of geographic features.

The coordinate location of a forest would be spatial data, while the characteristics of that forest, e.g. cover group, dominant species, crown closure, height, etc., would be attribute data. Other data types, in particular image and multimedia data, have become more prevalent with changing technology. Depending on the specific content of the data, image data may be considered either spatial, e.g. photographs, animation, movies, etc., or attribute, e.g. sound, descriptions, narration’s, etc.

2.      Attribute Data:  The attributes refer to the properties of spatial entities. They are often referred to as non-spatial data since they do not in themselves represent location information. This type of data describes characteristics of the spatial features. These characteristics can be quantitative and/or qualitative in nature. Attribute data is often referred to as tabular data.

GIS DATA MODELS:

 A GIS is based on data, hence there must be a data model that has to be followed to standardize procedures.  They are

1.      Spatial Data Models

2.      Attribute Data Models

SPATIAL DATA MODELS:

Traditionally spatial data has been stored and presented in the form of a map. Three basic types of spatial data models have evolved for storing geographic data digitally. These are referred to as:

Raster

Vector

Image

The selection of a particular data model, vector or raster, is dependent on the source and type of data, as well as the intended use of the data. Certain analytical procedures require raster data while others are better suited to vector data.

Raster Data Formats:  A simple raster data set is a regular grid of cells divided into rows and columns. In a raster data set, data values for a given parameter are stored in each cell – these values may represent an elevation in meters above sea level, a land use class, a plant biomass in grams per square meter, and so forth. The spatial resolution of the raster data set is determined by the size of the cell.  For example, Landsat TM satellite imagery data are raster data that are corrected to have a cell size of approximately 30 meters on a side. However, spatial resolution can be much finer, or much coarser than 30 meters. In general, spatial resolution is a function of the data collection techniques used, and the desired outcomes.

The size of cells in a tessellated data structure is selected on the basis of the data accuracy and the resolution needed by the user. There is no explicit coding of geographic coordinates required since that is implicit in the layout of the cells.  A raster data structure is in fact a matrix where any coordinate can be quickly calculated if the origin point is known, and the size of the grid cells is known. Since grid-cells can be handled as two-dimensional arrays in computer encoding many analytical operations are easy to program. This makes tessellated data structures a popular choice for many GIS software. Topology is not a relevant concept with tessellated structures since adjacency and connectivity are implicit in the location of a particular cell in the data matrix.

Since geographic data is rarely distinguished by regularly spaced shapes, cells must be classified as to the most common attribute for the cell. The problem of determining the proper resolution for a particular data layer can be a concern. If one selects too coarse a cell size then data may be overly generalized. If one selects too fine a cell size then too many cells may be created resulting in a large data volume, slower processing times, and a more cumbersome data set. As well, one can imply accuracy greater than that of the original data capture process and this may result in some erroneous results during analysis. As well, since most data is captured in a vector format, e.g. digitizing, data must be converted to the raster data structure. This is called vector-raster conversion. Most GIS software allows the user to define the raster grid (cell) size for vector-raster conversion. It is imperative that the original scale, e.g. accuracy, of the data be known prior to conversion. The accuracy of the data, often referred to as the resolution, should determine the cell size of the output raster map during conversion. Most raster based GIS software requires that the raster cell contain only a single discrete value. Accordingly, a data layer, e.g. forest inventory stands, may be broken down into a series of raster maps, each representing an attribute type, e.g. a species map, a height map, a density map, etc. These are often referred to as one attribute maps. This is in contrast to most conventional vector data models that maintain data as multiple attribute maps.

A Simple Raster Data Set

Each cell in the raster is assigned a single data value. In the above example simple binary data values have been used meaning that the possibilities are limited to two digit numbers – either 0 or 1. This is an example of a 1-bit raster data file. Mathematically, there are only two possibilities for each pixel, 0 or 1. By contrast in an 8-bit data file, there are 256 possibilities of data values for each pixel.  In the above example, the computer “sees” the cells that contain 0 as “turned off”, while the cells that contain 1 as “turned on”.

A One Bit Raster Image

Advantages of Raster Data:

1.      The geographic location of each cell is implied by its position in the cell matrix. Accordingly, other than an origin point, e.g. bottom left corner, no geographic coordinates are stored.

2.      Due to the nature of the data storage technique data analysis is usually easy to program and quick to perform.

3.      The inherent nature of raster maps, e.g. one attribute maps, is ideally suited for mathematical modeling and quantitative analysis.

4.      Discrete data, e.g. forestry stands, is accommodated equally well as continuous data, e.g. elevation data, and facilitates the integrating of the two data types.

Disadvantages of Raster Data:

1.      The cell size determines the resolution at which the data is represented.

2.      It is especially difficult to adequately represent linear features depending on the cell resolution. Accordingly, network linkages are difficult to establish.

3.      Processing of associated attribute data may be cumbersome if large amounts of data exists. Raster maps inherently reflect only one attribute or characteristic for an area.

4.      Since most input data is in vector form, data must undergo vector-to-raster conversion. Besides increased processing requirements this may introduce data integrity concerns due to generalization and choice of inappropriate cell size.

Vector Data Models:

The vector data model is based upon vectors as opposed to space occupancy of raster data structures. The fundamental primitive of the vector model is a point. The various objects are created by connecting the points with straight lines, but some systems allow the points to be connected using arcs of circles.

The areas are defined in this model by sets of lines. The term polygon is synonymous with area in vector databases because of the se of straight-line connections between points. Very large vector databases have been built for facilitating different purposes as vectors dominate in various different fields such as transportation, utility and marketing applications.

Vector

Raster

Several different vector data models exist, however only two are commonly used in GIS data storage. The topologic data structure is often referred to as an intelligent data structure because spatial relationships between geographic features are easily derived when using them. Primarily for this reason the topologic model is the dominant vector data structure currently used in GIS technology. Many of the complex data analysis functions cannot effectively be undertaken without a topologic vector data structure.

The secondary vector data structure that is common among GIS software is the computer-aided drafting (CAD) data structure. This structure consists of listing elements, not features, defined by strings of vertices, to define geographic features, e.g. points, lines, or areas. There is considerable redundancy with this data model since the boundary segment between two polygons can be stored twice, once for each feature. The CAD structure emerged from the development of computer graphics systems without specific considerations of processing geographic features. Accordingly, since features, e.g. polygons, are self-contained and independent, questions about the adjacency of features can be difficult to answer. The CAD vector model lacks the definition of spatial relationships between features that is defined by the topologic data model.

Advantages of Vector Data:

1.      Data can be represented at its original resolution without generalization.

2.      Graphic output is usually more aesthetically pleasing.

3.      Since most data, e.g. hard copy maps are in vector form, no conversion is required.

4.      Accurate geographic location of data is maintained.

5.      Allows for efficient encoding of topology, and as a result more efficient operations that require topological information, e.g. proximity, network analysis.

Disadvantages of Vector Data:

1.      The location of each vertex needs to be stored explicitly.

2.      Algorithms for manipulative and analysis functions are complex and may be processing intensive.

3.      Continuous data, such as elevation data, is not effectively represented in vector form.

4.      Spatial analysis and filtering within polygons is impossible.

Image Data Format:

Image data is most often used to represent graphic or pictorial data. The term image inherently reflects a graphic representation, and in the GIS world, differs significantly from raster data. Most often, image data is used to store remotely sensed imagery, e.g. satellite scenes or orthophotos, or ancillary graphics such as photographs, scanned plan documents, etc. Image data is typically used in GIS systems as background display data (if the image has been rectified and georeferenced); or as a graphic attribute. Remote sensing software makes use of image data for image classification and processing. Typically, this data must be converted into a raster format (and perhaps vector) to be used analytically with the GIS.

Image data is typically stored in a variety of de facto industry standard proprietary formats. These often reflect the most popular image processing systems. Other graphic image formats, such as TIFF, GIF, PCX, etc., are used to store ancillary image data. Most GIS software will read such formats and allow you to display this data.

ATTRIBUTE DATA MODELS (DBMS Models used in GIS):

A separate data model is used to store and maintain attribute data for GIS software. These data models may exist internally within the GIS software, or may be reflected in external commercial Database Management Software (DBMS). A variety of different data models exist for the storage and management of attribute data. The most common are:

Tabular Model:

The simple tabular model stores attribute data as sequential data files with fixed formats (or comma delimited for ASCII data), for the location of attribute values in a predefined record structure. This type of data model is outdated in the GIS arena. It lacks any method of checking data integrity, as well as being inefficient with respect to data storage, e.g. limited indexing capability for attributes or records, etc.

Hierarchical Model:

The hierarchical database organizes data in a tree structure. Data is structured downward in a hierarchy of tables. Any level in the hierarchy can have unlimited children, but any child can have only one parent. Hierarchical DBMS have not gained any noticeable acceptance for use within GIS. They are oriented for data sets that are very stable, where primary relationships among the data change infrequently or never at all. Also, the limitation on the number of parents that an element may have is not always conducive to actual geographic phenomenon.

Network Model:

The network database organizes data in a network or plex structure. Any column in a plex structure can be linked to any other. Like a tree structure, a plex structure can be described in terms of parents and children. This model allows for children to have more than one parent.

Relational Model:

The relational database organizes data in tables. Each table, is identified by a unique table name, and is organized by rows and columns. Each column within a table also has a unique name. Columns store the values for a specific attribute, e.g. cover group, tree height. Rows represent one record in the table. In a GIS each row is usually linked to a separate spatial feature, e.g. a forestry stand. Accordingly, each row would be comprised of several columns, each column containing a specific value for that geographic feature.

Data is often stored in several tables. Tables can be joined or referenced to each other by common columns (relational fields). Usually the common column is an identification number for a selected geographic feature, e.g. a forestry stand polygon number. This identification number acts as the primary key for the table. The ability to join tables through use of a common column is the essence of the relational model. Such relational joins are usually ad hoc in nature and form the basis of for querying in a relational GIS product. Unlike the other previously discussed database types, relationships are implicit in the character of the data as opposed to explicit characteristics of the database set up. The relational database model is the most widely accepted for managing the attributes of geographic data.

The relational DBMS is attractive because of it’s:

·        Simplicity in organization and data modeling.

·        Flexibility - data can be manipulated in an ad hoc manner by joining tables.

·        Efficiency of storage-proper design of data tables can reduce redundancy.

·        Queries do not need to take into account the internal organization of data.

The relational DBMS has emerged as the dominant commercial data management tool in GIS implementation and application.

Object Oriented Model:

The object-oriented database model manages data through objects. An object is a collection of data elements and operations that together are considered a single entity. The object-oriented database is a relatively new model. This approach has the attraction that querying is very natural, as features can be bundled together with attributes at the database administrator's discretion. To date, only a few GIS packages are promoting the use of this attribute data model. However, initial impressions indicate that this approach may hold many operational benefits with respect to geographic data processing. Fulfilment of this promise with a commercial GIS product remains to be seen.

Notes & Handouts

The Himalayas

Kumaon Himalayas

Askot Basemetals

University

   


This website is hosted by

S. Farooq

Department of Geology

Aligarh Muslim University, Aligarh - 202 002 (India)

Phone: 91-571-2721150

email: farooq.amu@gmail.com