Process_Description:
" This dataset was created by Earth Satellite Corporation.
This version of the classification is the late-date (2000-era). The study
area is the Coastal California Region. An early-date (1995-era)
classification is also available for the same area.
Summary:
This section outlines the classification procedure for the California
C-CAP. First the imagery was pre-processed to remove cloud contamination.
Then field points were taken to be used as training and also accuracy
assessment. The training points were used as the dependent variable in
the CART (Classification Analysis by Regression Tree) approach. The
tasseled cap Landsat TM imagery for three dates were used as the
independent variables. Ancillary datasets were also used as independent
variables. After many attempts, a rough classification was produced.
Then continuous regression tree masks of urban and other features were
created to refine certain categories. The result of this produced the
provisional classification. Then models were applied to this data to
incorporate information from ancillary data. The result of this was the
final-no-edits version of the classification. This represented a fully
automated product. This product was then altered by hand edits to refine
further the classification. This produced the final-with-edits version
which is the final version of the classification and is the one described
here.
Pre-processing steps:
Each Landsat TM scene was geo-referenced by USGS (United States Geological Survey) EROS Data Center. Then
EarthSat staff verified the scenes for spatial accuracy to within 2 pixels.
The data was geo-referenced to Albers Conical Equal Area,
with a spheroid of GRS 1980, and Datum of WGS84. The data units is in
meters. The California TM data was delivered in the form of USGS zone mosaics.
The data was tasseled cap transformed. Three dates of each zone TM data were
received: leaf-on, leaf-off, and spring. All clouds were removed using an
automated cloud removal process. Then the cloud holes in the data were filled in by
using a CART technique to predict the data based on regression analysis among
the other two dates.
Field-Collected Data:
EarthSat's primary method of field point collection uses the locations generated by the
statistical sample selection to guide both training and validation point selection. Training
and validation points were collected continuously on routes that pass through all or
most of these sample areas. Using available GPS (Global Positioning System)/laptop computers, C-
CAP field teams can reach up to 1000 sites per day. The technology includes:
Laptop computers, Real-time GPS Receiver and interface software with database applications,
Computer based real-time fieldwork database entry and manipulation,
Geo-referenced digital satellite imagery and classified land cover analysis imagery,
GIS ancillary data, such as roads, other land cover analyses, paper maps, and digital elevation models.
The first step to field data collection is to determine where to take points. This can be
decided by using EarthSat's GeoTools software, a spatial statistics package that will be a
standard part of the next installment of ERDAS IMAGINE. First, TIGER (Topologically
Integrated Geographic Encoding and Referencing System) roads are acquired,
registered to the digital imagery, and mosaicked. A 300-meter buffer of the land
cover imagery is generated based upon the TIGER roads. This process of sampling is
done to ensure that many factors that account for a spectral value will be considered
during the field collection. A data layer is created that is made of hundreds of
stratifications based on all of the different types of input data.
The three seasons of tasseled cap data are layer-stacked to create a nine-band
file. Then the data are masked based on the ten pixel buffered TIGER road file so that
only the accessible portions of the imagery remain. These data are stacked in order to
incorporate the seasonality into the sample selection. This creates a nine-band file, which
is clustered using ISODATA to 250 classes. Then data layers representing the dates of
the images used in the zone mosaics for each season are incorporated by matrixing the
dates together. Also matrixed to the dataset is the NLCD (National Land Cover Dataset) recoded to match the C-
CAP classification scheme and masked by the buffered TIGER roads file. This result is
matrixed to the 250-cluster file. This file incorporates information from all of these
datasets to form stratifications for the random sampling process. The matrixed one-band
layer is then input into GeoTools. A 10,000-meter grid is produced in GeoTools. Fifty
stratified grid samples per mosaic are selected based on the stratifications of the
imagery. These grids are used to determine where the field route will occur. The
route passed through nearly all of the grids. This guarantees the best mix of
the field points based on the factors mentioned.
A version of this layer was made that is not limited by the TIGER roads buffer.
This version was used for Digital Ortho-photograph Quadrangle (DOQ) selection. A
list of stratified random DOQ's were submitted to USGS EDC (EROS Data Center).
These DOQ's were used as ground truth for impervious features in the classification
of the developed categories. Fifteen samples were selected for each zone.
The field points were collected by GPS (Global Positioning System). The GPS is connected to a lap-top computer that
is used as a data logger. IMAGINE software (GPS Tool) allows the GPS location to be tracked over
the imagery displayed in the viewer. Another module (RGMID), which was designed by EarthSat
and programmed by ERDAS, allows the selection of a pixel from the viewer and the association
of various characteristics gleaned from the field to be recorded in a table. The items that
are typically noted in the field include:
Canopy cover
Vegetation types by species (where applicable)
Land Cover characterization
Soils (if relevant)
Special conditions and remarks
Photography/video
Date/time
X,Y location (Z if relevant)
The data and equipment used for the fieldwork are as follows:
Ancillary datasets:
TIGER 2000
NLCD
NWI - mosaicked into zones
State road map and Delorme state atlas www.delorme.com
Hardware:
Lap-tops with IMAGINE and data
GARMIN GPS modules and external antennae, redundant data cables
Digital Cameras
Backup devices (CD writers)
Extra batteries (lap-top and GPS)
DC to AC adapters, and splitters
Car fuses, flashlights, basic tools
Mobile phones (if available)
Calculator
System backup CD's with operating system and software
Compass
Field notebooks with instructions and road maps with pre-determined routes
Imagery:
Multi-spectral data for each zone
Initial classifications
EarthSat utilized the RGMID software from ERDAS IMAGINE to facilitate field efforts.
This software allows GPS tracking over imagery in its native IMAGINE file format
(.img).
Classification:
After the field points for training were collected, they were used as the dependent
variable in a CART classification approach. Many layers used as the independent variables such as tasseled cap imagery,
DEM's (Digital Elevation Model), slope and aspect, NWI, other classifications,
and an image date file that corresponds to the mosaics. The rough classification
was created using only the CART discrete decision-tree approach. Then the
provisional classification was produced by doing some regression tree analysis
on certain classes such as urban and to distinguish certain feature types from
each other such as grass and scrub, or scrub and trees, etc. The final-no-edits
version was created using the latest file applied to many models that incorporated
some ancillary data and spatial analysis on the data. Then this data
was hand edited using screen digitizing techniques while training on the
terraserver ortho-photos to produce the final-with-edits classification.
Ancillary Datasets:
Non-TM image datasets used are DEM (Digital Elevation Model), NWI, TIGER2000,
field-collected points, California GAP (Gap Analysis Program) , FRAP (Fire Risk
Assessment Program), and CERES (California Environmental Resources Evaluation
System). Non-TM image datasets used specifically for this classification are DEM,
Slope, Topographic Position Index, and National Wetlands Inventory. There
were several QA/QC steps involved in the creation of this product. First,
there was an internal QA/QC. This was done by viewing the classification frame-
by-frame along with the TM imagery and the Terraserver ortho-photos, then recording
a point everywhere there a classification error along with comments. NOAA staff
did the same to our product as our internal review.
A third review occurred when a Boeing/Autometric representative reviewed the data
mainly for issues that may occur with the format, attributes, slivers, grid, etc.
Finally, a plant identification specialist was hired to field verify the late-date
classification."