PRINCIPAL COMPONENT ANALYSIS
Adjacent bands in a multispectral remotely sensed image are often highly correlated. A simple example to demonstrate this is two datasets, one of which represents children’s age while the other their height. It is expected that as age increases, so does height. The correlation of the two datasets is positive – as one parameter increases, the other one also does. Thus, although the two datasets are separate, they are not independent because, given one parameter; we are able to infer another.
A similar situation arises if DN values of adjacent bands are plotted against each other (Fig 1). A high correlation may exist between adjacent bands, meaning thereby that the two datasets are not statistically independent. FCCs produced by using bands having a high correlation would show subdued pastel colors lacking the contrast so very necessary to distinguish between different land cover types (Fig 2). Since the human eye is adept at discriminating different colors rather than shades of a single color, the image under visual interpretation needs to be as varied in color as possible.
Multiband visible/near-infrared images of vegetated areas will show negative correlations between the near-infrared and visible red bands and positive correlations among the visible bands because the spectral characteristics of vegetation are such that as the vigour or greenness of the vegetation increases the red reflectance diminishes and the near-infrared reflectance increases. Thus presence of correlations among the bands of a multispectral image implies that there is redundancy in the data and Principal Component Analysis aims at removing this redundancy.
Principal Components Analysis (PCA) is related to another statistical technique called factor analysis and can be used to transform a set of image bands such that the new bands (called principal components) are uncorrelated with one another and are ordered in terms of the amount of image variation they explain. Thus the first principal component (PC1) contains the highest variance in a scene, followed by PC3, PC3 and so on. The components are thus a statistical abstraction of the variability inherent in the original band set.
For an n dimensional dataset, n principal components can be produced. The information contained in a multiband image is almost uniformly (but not quite) distributed throughout all the bands (for example the seven bands of ETM datasets). A FCC produced by using any three of these bands would contain less than 50% of the information in the scene. An important advantage of PCA is that most of the information dispersed throughout the seven bands may be compressed into a few bands with virtually no loss of information. The first three principal components typically contain over 98% of the variance in the data and hence the information in the scene. Using the principal components we may prepare FCCs in which the correlation between the bands (PCs now) used is zero. A false color composite produced by using the first principal component (PC1) as red, the second principal component (PC2) as green and the third principal component (PC3) as blue will thus contain almost all the information in a scene (Fig 3). It must be remembered however, that the high PC images (PC6 for example) do contain little variance, they must not be discarded without through examination because they may well contain information not contained in the lower principal components.
This website is hosted by
Department of Geology
Aligarh Muslim University, Aligarh - 202 002 (India)