Based on freely available, public domain Landsat-7 satellite imagery it is possible to gain much more accurate and diverse landcover data for use in scenery, using automatic classification.
The Landsat-7-data consists of 8 different wavelength bands, including visible light and different infrared bands (IR, near-IR, short wavelength IR, thermal IR). The resolution of the bands ranges from 57m/pixel for thermal IR (ETM+ dataset) via 28.5m/pixel for red, green, blue, near-IR and short wavelength IR to 14.5m/pixel for the panchromatic (visible intensity) band.
On the right you can see a montage of an image showing the Lake of Constance in different color compositions: real color (top left), false color or near-IR (NIR, top right), short wavelength IR 1 (SWIR1, bottom left) and SWIR2 (bottom right). As can be seen from the image, the ease of identification of different features (populated areas, different types of vegetation) varies with the type of image. The SWIR-types are specifically well-suited for identifying vegetation, whereas the NIR image allows for good identification of rural areas.
More information about the image dataset can be found in the Landsat GeoCover Tutorial (PDF).
For automatic classification of current data with as much resolution as possible we would be interested in the Landsat 7 (official NASA Landsat site, Wikipedia) data. Landsat 7 has an inclination of 98.2 degrees, meaning its trajectory forms that angle with earth's equator plane. This makes the Landsat 7 trajectory almost polar, scanning earth in nearly north-south-oriented stripes.
The period of Landsat 7 ist about 99 minutes, i.e., the satellite takes 99 minutes to circle around the earth once. The orbit is sun-synchronous, i.e., the satellite passes each latitude ring at the same local time. This enables pictures always to be taken on the sun-lit side of earth.
Automatic raster classification
GRASS supports supervised and non-supervised automatic raster classification of raster images. Classification methodology is based on "maximum-likelihood"-algorithms, i.e., for each raster pixel and for each landuse class the probability of that pixel depicting this landuse class is derived and each raster pixel is assigned that class for which the probability is highest.
In supervised classification the classification algorithm is trained manually by marking appropriate example areas on an image and assigning a class to them. In non-supervised training the algorithm tries to automatically determine a set of classes which differ enough to be distinguished by the classification algorithm.
On the right you can see the rasterised result of a supervised classification session on the Lake of Constance imagery shown above. Magenta areas represent water areas, yellow and greenish colors represent different types of population, cyan represents forrest areas and red represents any other vegetation. This classification is far from complete: The CORINE project initiated by the European Union classified regular Landsat TM data into over 40 different categories.
Challenges of use in FlightGear
As the classification process outputs a raster image, the data needs to be converted to vector format for FlightGear. One possibility is the direct transformation of each pixel into two triangles. With a resolution of 28.5m/pixel this would result in well over 600.000 triangles for a single scenery tile in the area of Lake of Constance (around 47N 9E). This is clearly too much, as even the first experiments yielded over 30.000 triangles for a scenery tile - including the bad-scaling line-features -, which was considered too much for the user segment FlightGear targets.
Moreover most of these triangles are essentially redundant. Therefore a proper vectorisation is necessary. GRASS supports vectorisation of small areas using the tool r.to.vect and simplification of small areas by v.clean. The restriction to small areas stems from the fact that the algorithms do not scale that well, whereas r.to.vect still scales better than v.clean.
On the left you can see a very small excerpt of the larger image shown above together with a simplified vectorisation of the classification depicted by black boundary lines. It represents a relatively small area of ca. 17km x 13km. Simplification to a minimum size of individual areas of 0.028 sqkm and a maximum line error of 60m took 2 minutes on a Pentium M 1.5GHz. Larger areas such as the whole Lake of Constance were tried an ran for almost 8 hours before they had to be cancelled as the computer had to be put to another use.
This is clearly not acceptable when we have the goal of providing coverage for the whole world in mind.
Actual vector example data can be found at the FlightGear Landcover Database Mapserver (thanks go to Martin Spott for providing this service).
However the simplification algorithm of v.clean has very heavy topology tasks to fulfil. As GRASS handles all areas using polyline boundaries, v.clean has to generically ensure that after removal of a polyline vertex no boundaries intersect, a check which takes a huge amount of CPU power. If simplification of a boundary leads to intersecting boundaries, no simplification may take place on that boundary at all.
A simplification approach based on an irregular triangle network (TIN) can detect such topological distortions with much less effort. An example for such an approach is the quadrics-based simplification by Michael Garland which was already mentioned in the Line data simplification and Static Level-of-Detail articles.
For this the initial vectorisation of the raster classification needs to be converted into a TIN and then simplified. After simplification polygons can be reconstructed by joining adjacent geometry (triangle and triangle, triangle and polygon) with identical class to larger geometry.
Besides performance one of the advantages is also that we can specify, whether we value specific boundaries more than others, e.g., we could try harder to preserve boundaries between water and land than such boundaries between different types of landcover.