In this tutorial you shall learn how to derive landcover data from satellite imagery using the GRASS automatic classification tools and how to use that data for generating FlightGear scenery with TerraGear.
To the right you can see a comparison of the standard VMAP0 landcover data and the results of automatic classification of Landsat-7 satellite imagery in the area of Lake Constance, South Germany (click to get a bigger, animated version). These screenshots were taken from the FlightGear Mapserver.
Note that the level of detail shown in the automatically classified part of these images is currently too much for TerraGear and FlightGear, so simplification is necessary.
This tutorial is split in six parts:
- Fetching Landsat Imagery
- Setting up the GRASS workspace
- Creating a training vector layer
- Running automatic classification
- Simplifying and vectorising the results
- Exporting and reprojecting the vector data
The actual generation of scenery is explained in a different tutorial.
Fetch Landsat Imagery
Landsat imagery is available, e.g., from the Global Land Cover Facility, e.g., via their Earth Science Data Interface (ESDI). In their "Map Search" mode you can select a given area on the map and the ESDI will find for you all imagery overlapping that area. What we are specifically interested in is ETM+ imagery (check in the list on the left).
For this tutorial we will be using an image of South Germany which shows the Lake of Constance, the region where this project originally started.
The impatient may download a package containing only a small subset of that Landsat tile here (ftp, right click to download). Note that the full image is about 319MB to download, while this smaller package has only about 6MB.
In case you want to download the original imagery, go to the ESDI and enter the "Map Search". Then select the "Path Search" (1.) tab on top and select "ETM+" (2.) as imagery type. In the "Path" and "Row" fields enter "194" resp. "27" (3.) These are the path and row numbers for the image we want to acquire. Click "Update Map" (4.)
When the map has been updated, click "Preview & Download". The image we want has ID 036-166. Click on the ID in the ID column to activate and preview it. Then click on the "Download" button, which will lead you to an FTP directory containing the parts of the image. We only need the .tif.gz files and the .met file. Download them and extract the .tif.gz's.
Setting up the GRASS workspace
We assume that you already installed GRASS and have already some experience in working with vector and raster maps in GRASS.
There are general compilation instructions and precompiled binaries available for GRASS. You also might want to have a look at the tutorial section of the GRASS web page. We specifically recommend Markus Neterler's "GRASS 6 in a nutshell" tutorial.
Create a UTM location for your work on the classification. The UTM zone can be found in the .met file that was previously downloaded with the Landsat image files. In case of the South Germany image the projection is UTM zone 32 north.
After starting GRASS in the newly created location, change the directory to where the landsat image files were downloaded to, or in case you downloaded the starter package, to the directory where you extracted the package. Extract the .tif.gz's if they are not already extracted.
Then for each .tif file run a command like the following:
r.in.gdal -e input=p194r027_7k20010824_z32_nn61.tif output=p194r027_7k20010824_z32_nn61
This command imports the file p194r027_7k20010824_z32_nn61.tif as raster layer p194r027_7k20010824_z32_nn61 and due to the option -e extends the default region of your location to also include the bounds of the newly imported file. Repeat this for all the other files, replacing the names on the command line appropriately. Note that the import may take some time if you import whole Landsat tiles.
This is it. The files are imported!
Creating the vector training layer
We will now create the vector training layer in which examples of areas will be marked by polygons and be annotated with the type number we want to assign. We will use the same classification used in the CORINE project.
But first we need to create a false color image by performing a so-called Brovey fusion on some of the channels from the Landsat imagery. The resulting image uses the high-resolution (14.25m/Pixel) panchromatic channel and therefore has a higher resolution than the simple RGB composite. The false-color images also allow better identification and distinction of several types of urban fabric and vegetation areas. The brovey images can often be used instead of the RGB composite when digitizing the training map. See the image on the right for a comparison (left: brovey, right: RGB).
The brovey transformation uses channels 2 (green), 4 (Near Infrared), 5 (Mid Infrared) and 8 (panchromatic):
i.fusion.brovey ms1=p194r027_7t20010824_z32_nn20 ms2=p194r027_7t20010824_z32_nn40 ms3=p194r027_7t20010824_z32_nn50 pan=p194r027_7p20010824_z32_nn80 outputprefix=p194r027_brovey
This creates three raster images p194r027_brovey.red, p194r027_brovey.green and p194r027_brovey.blue. You can either composite them using r.composite or you can display them using d.rgb resp. the RGB composite function in the new GIS manager.
Now let's create the new vector layer:
v.digit -n map=corine_train bgcmd="d.rgb red=p194r027_brovey.red green=p194r027_brovey.green blue=p194r027_brovey.blue"
The new layer will be called corine_train. The -n option instructs v.digit to create a new layer. The bgcmd option tells v.digit to display the brovey image in the background using the given d.rgb call.
In v.digit we first need to create a new table. First click on the settings button, then select the "Table" tab from the dialog (1.+2.). Then click on "Add new column" (3.) In the new empty field, enter the name for the new column (4.) We'll use "corine_class" because we will be putting corine class numbers in that field. Make sure that "integer" is selected as type. Then click "Create table" (5.) and then click "OK" in the confirmation dialog.
Now that we have the new dialog, we can start digitising. GRASS describes areas by a combination of boundaries and so-called centroids. This is something one has to get used to a bit when coming from other GIS software.
The typically known concept uses polygons for specification of areas, where the outline of the polygon simply separates the inside of the area from its outside. In GRASS the boundaries are polylines which separate two areas from each other. One of these areas may be the "outside". So each ring of adjacent boundary lines forms the border of an area (see the screenshots on the left, click to enlarge).
So to define areas one simply digitises the boundaries between the areas and connects them at their ends so that around each area there is a ring of boundaries. This way a common boundary between two areas has to be digitised only once and no tricks like snapping of polygon outlines are necessary to make the two areas fit exactly at their boundary. Holes can be created by simply adding a boundary ring within an area.
To actually make the difference between a boundary ring and an attributed area, one places a centroid in the area. The centroid is a point and it is associated with the inner area by whose boundary ring it is surrounded. It is also connected to a row in the database via a reference number which in GRASS is called its category. Actually, every geographic primitive, i.e. points, lines, boundaries and centroids, can have zero or more categories assigned, and the same category can be assigned to multiple geographic primitives.
GRASS has three modes for automatic assignment of categories to newly created primitives:
- No category, where no category number is assigned,
- Manual entry, where a number can be entered which is then assigned to any new primitive, and
- Next not used, where the next unused category number is assigned, counting up on each new primitive.
We'll use "Next not used" for the centroids, as we want to assign classes independently to each of the areas we mark, and "No category" for the boundaries, as we do not want to attach any information to the boundaries.
The image to the right shows an example for how to partition the training area with boundaries. Note that no centroids have yet been placed. Boundaries that are not part of any complete ring will be shown grey (there are none of these in the picture). Boundaries that are part of exactly one ring will be shown in green and boundaries that are part of more than one ring will be shown in orange.
Also note that the individual boundary lines end at the points where multiple boundaries meet. The end points are called "nodes" and whenever you try to place a point of a boundary segment near such a node, the point will snap to the node. Only nodes placed exactly above each other will be seen by the algorithm as connected.
It is suggested that you first create a partitioning using the "No category" mode all the time, and then go on creating all the centroids with "Next not used" category assignment mode. You can also switch back and forth a few times, but try to stay in one mode for some time so you don't have to switch around too often.
You don't need to specify borders for all areas you see in the picture. The point with training is to give the automatic classification module a few examples of the different classes. So rather than trying to assign a class to every pixel you see on the image, try to find a good mixture of different areas. You also don't need to actually specify the exact borders.
It is not a problem at all if you only mark a part of a larger area and keep the rest unmarked, as long as the marked area seems representative enough. In the example there is only a small part of the lake marked. You will go back and forth between training and automatic classification a few times anyway and each time the training algorithm will have to process all the areas you marked. The larger they are, the longer the loop between training and classification will take. You need to find a good way in between too many/too large and too few/too small areas. If the classification doesn't classify some area the right way, you will simply add it to the training layer later.
Now begin placing the centroids. Select the "Digitize new centroid" tool, change to the "Next not used" category assignment mode and place the centroid (1.) In the form window that opens then enter the class number for the given area (2.) and commit the entry by clicking "Submit" (3.)
In the example the marked area is an industrial area, which according to the CORINE classes is number 121. The other types of areas in the images are classified as follows:
- Continuous Urban fabric: 111
- Airports: 124
- Green Urban areas: 141
- Broad-leaved forest: 311
- Coniferous forest: 312
- Water bodies: 512
There are 37 further classes in the CORINE catalogue, but for starters we'll leave it at that. On the second image to the right you can see the distribution of the classes in the example (click to enlarge).
Create the training raster map
The training module wants to have the training data in form of a raster map. Therefore it is necessary to convert the vector map we just created into a raster map using the v.to.rast module.
First we make sure that the resolution is properly set so that the training module can actually use the 14.25m resolution of the panchromatic band. We also want to restrict the size of the training raster to the area our training vector map takes:
g.region vect=corine_train res=14.25
Then we simply call the v.to.rast module as follows:
v.to.rast --o input=corine_train output=corine_train ENGINE=area use=attr column=corine_class
This command creates a new raster map corine_train from the input vector map corine_train, converting the areas and assigning the values from the column corine_class to the new pixels. The --o option instructs the v.to.rast module to overwrite a possibly already existing raster map corine_train.
Note that in GRASS raster and vector maps may have the same name and exist simultaneously. GRASS can distinguish them by the type of map the user is referring to.
The resulting training raster is shown in the picture on the left, draped over the brovey-transformed Landsat image.
Generate a signature set
Now that we have the training raster available we can use it to generate a so-called signature set. A signature is a piece of information that tells the classification algorithm what values in the different channels indicate which class of land according to the training raster map.
First we need to arrange the raster maps for the individual channels into an image group so that the classification algorithms recognise them as a bundle. We also save the time of typing the whole list of map names all the time, what we would otherwise have to do, were the algorithms not forcing us to use image groups anyway.
To create the image group we use the i.group module. We tell it the name of the new group and the name of a subgroup we want to create, as well as the list of maps to group:
i.group group=p194r027 subgroup=all input=`g.mlist ENGINE=rast pattern="p194r027_7*" | tr '\n' ','`
Trust me, it's better with the g.mlist part in there. It essentiall lists all the raster maps whose names start with "p194r027_7" and replaces the end-of-line characters in the list by commas. Note that the raster maps starting with "p194r027_" only would include our brovey-transformed rasters. You might have to change the "_7" into what applies for your tile.
Now we can finally generate the signature set using the i.gensigset module. First we make sure again that the GRASS region is set up properly so that i.gensigset will only process the area of our training map, but in the resolution of the best Landsat channel, i.e. 14.25m/pixel:
g.region rast=corine_train res=14.25
Now we generate the signature set:
i.gensigset trainingmap=corine_train group=p194r027 subgroup=all signaturefile=corine
This command line tells i.gensigset to create a signaturefile named corine from the trainingdata specified by the images in group p194r027, subgroup all and the training map corine_train.
This operation may take some time and you will see messages about i.gensigset processing areas of different size and identifying the unique characteristics of these areas.
Starting the classification
With the signature set available we will now start a first classification, but for the first try only on a small area. We'll just keep the region settings so that it covers just the training region:
g.region rast=corine_train res=14.25
You don't need to run this command again if you haven't changed the region since creating the signature set. Now to start the classification, we invoke the module i.smap, specifying the image group and the signature set it is to work with, as well as the name for the output raster map:
i.smap --o group=p194r027 subgroup=all signaturefile=corine output=corine_class
The --o flag again instructs the module to overwrite any previously existing output map. You will be repeating this a few times until you have the right training established, so you'll be glad not having to delete the old map each time.
The result of the first run can be seen in the image on the right. As expected, the algorithm also was able to classify areas not specified in the training map. However, we also see that the image has quite a lot of speckles which we need to get rid of. Some of these are singular mis-classifications, others are correct, but all in all they are just too small so that we would not want to have them in our scenery, where every triangle counts.
Cleaning up a little bit
To get rid of the speckles we will use a multilevel approach. In a first step we will reclassify some of the areas, because they are currently of no use in the FlightGear scenery. For example there is no specific texture for industrial areas, so we might as well reclassify these to be urban area. We thereby spare the additional polygons required later in the vectorisation step to express the "industrial" holes in the map.
In a second step, we will remove some of the areas smaller than a given limit. You will probably have to play with that limit a bit to get good results.
In a third step we will use the r.neighbors tool to smoothen the boundaries of the areas a little bit, to get rid of further speckles and to close some of the holes the previous step left over. We will do this a few times and again, you will have to play around with how many times you do this and with what parameters.
First the reclassification. Create a text-file named reclass_corine with the following contents:
Then invoke the r.recode module:
r.recode --o input=corine_class output=corine_reclass < reclass_corine
This recodes the classes 121 to 124 and 131 to 133, i.e. the industrial stuff, to 111, which is continuous urban fabric. The other classes, 111 to 112 and 141 to 523, are untouched.
Now we will remove all connected areas which are smaller than 3 hectars:
r.reclass.area --o input=corine_reclass output=corine_reclass_area greater=3
The result can be seen in the image on the left. As can be seen, there are a lot of holes in the image now. These are the holes where small areas have been removed.
Now we'll close up the holes a bit, smooth the borders between the areas and remove some more speckles. For that we use the r.neighbors module. This module can count the number of pixels of each class around a given pixel (i.e. in its neighborhood) and then assign the class that was seen the most to the pixel in the center.
We will use a neighborhood of size 3x3, i.e. taking the processed pixel in the middle we consider the pixels one row left and right and one row above and below the pixel:
r.neighbors --o input=corine_reclass_area output=corine_reclass_area_n1 method=mode size=3
The result shows a lot less speckles, smaller holes and more smooth borders. Repeat that a few times:
r.neighbors --o input=corine_reclass_area_n1 output=corine_reclass_area_n2 method=mode size=3
r.neighbors --o input=corine_reclass_area_n2 output=corine_reclass_area_n3 method=mode size=3
r.neighbors --o input=corine_reclass_area_n3 output=corine_reclass_area_n4 method=mode size=3
The result can be seen in the image below. It is obvious that the borders have been smoothed heavily. Comparing it to the original landsat image however it can be seen that the accuracy has not suffered much. Of course we are also able to see that the whole agricultural area was wrongly classified as either urban green or urban area. As we did not specify any agricultural areas, this was to be expected. It is left as an exercise to the reader to correct this by refining the training map.
Vectorisation and final cleanup
For scenery generation we will need the data in vector form, so we will now convert the raster map into a vector map. This can be done using the r.to.vect module:
r.to.vect -s --o input=corine_reclass_area_n4 output=corine_class feature=area
This command line instructs the module to convert the areas from corine_reclass_area_n4 to vector areas in the new vector map corine_class. The -s option instructs the module to smooth the artifacts which would otherwise be introduced by vectorising directly around the pixel borders.
The resulting map is still pretty edgy and detailed, so we'll do some more simplification:
v.clean --o input=corine_class output=corine_class_cleaned tool=rmarea,prune thresh=6000,28.5
This command removes all areas smaller than 6000 m2. It also removes points from the boundary lines as long as the maximum deviation resulting from such a removal is less than 28.5 m. This is double the resolution of our basic imagery, so this sound reasonable. Again, you might have to play around with these values if you are not pleased with the result.
The result of the vectorisation and of the cleaning can be seen in the pictures on the right.
Exporting and reprojecting the vector data
Now that you have your vector data you need to get it transported to TerraGear. As the classification was run in a workspace based on UTM projection, the vector data is also in this projection, but TerraGear needs it in form of Latitude-Longitude-coordinates based on the World Geodetic System of 1984, short: WGS84.
If you wanted to reproject your data within GRASS, you would have to create another location in WGS84 projection and reproject using GRASS' v.proj module. Instead you can export all your data using the v.out.ogr module and reproject using ogr2ogr.
Generating scenery for TerraGear will also require you to split up your vector data into several parts so that each resulting shapefile contains only areas which will be attributed the same texture. This splitup can be done using ogr2ogr as well, doing the reprojection on the fly.
First, export your data using v.out.ogr:
v.out.ogr -c input="corine_class_cleaned" ENGINE="area" dsn="exportdir" olayer=rawsource
This will create a directory named exportdir and will export the data to a shape layer named rawsource. Make sure that exportdir does not already contain such a layer, otherwise v.out.ogr might complain that it can't delete a shape layer.
Now you can split up and reproject into individual layers, e.g.
ogr2ogr -t_srs wgs84 -nln v0_town -where "value=111" exportdir exportdir rawsource
ogr2ogr -t_srs wgs84 -nln v0_urban -where "value=112 or (value>=141 and value<=142)" exportdir exportdir rawsource
I always use this script to export the data from GRASS in a form suitable to simply replace standard VMAP0 files. Call it from within your GRASS workspace using
export_as_vmap0 corine_class_cleaned exportdir value
This will export the data from corine_class_cleaned to the directory exportdir, separating according to the value of attribute value, which is expected to be set according to the CORINE classification table. Calling the script without parameters will give you a short help message.