Practical questions
|
How much costs Biomapper?
Biomapper is free. You just have to download
it.
Does Biomapper work with Idrisi32?
Yes, Biomapper can read both Idrisi16 and Idrisi32 file format.
I'm working with Arcview. How can I use Biomapper?
Several of Biomapper's users are using Arcview. You will have
to prepare your maps in Arcview before converting
them to Idrisi/Biomapper format. Once the whole process has been completed,
you can reimport the resulting maps into Arcview for further analysis or
display.
If you are using Arcview 3.0, you can use the Biomapper module
Grid
Convertor.
How to convert an ESRI grid into an Idrisi/Biomapper
raster map and conversely?
There are several possibilities:
1° ArcView 3.x extension
There is an extension doing the job on the ESRI site, made by
Holger Schäuble. Look for Grid
Converter (av2idrisi.zip) on http://gis.esri.com/arcscripts/index.cfm.
It works for ArcView 3.x.
2° Biomapper's GridConvertor
If you have Arcview 3.0 or 3.1 and Spatial Analyst, you can
use the Biomapper module Grid Convertor. It allows to convert several
files in only one operation.
3° Manual conversion
If you have another version of ArcView, GridConvertor may not
work (always problems due to ESRI proprietary policy).You will therefore
have to convert your grids manually. In the Biomapper help file you will
find a description of the Idrisi/Biomapper file format that should help
you to transfer your data.You can try this: in Arcview, go into the file
menu and select Export Data Source and export your image as a binary raster
(has an flt extension). Exit Arcview and then change the file extension
from .flt to .rst. You should then be able to create a document file (*.rdc)
for this image with the information supplied in the help file. The flt
and rst format are identical. You will probably have to choose "real" data
type.
4° Otherwise...
You can also find useful programs at http://www.pierssen.com/idrisi/grid.htm
In which programming language is Biomapper developped?
I'm using Borland
Delphi for all my programming work. It allow me to quickly conceive
fast running procedures cloaked in a user-friendly interface.
Could you kindly send me the user manual.
There is no user manual for Biomapper. There is the help file
though, and on the web, you will find the FAQ ( www.unil.ch/biomapper/faq.html
) and more general information.
How can I use satellite imagery pictures (or
other file format) with Biomapper?
Biomapper has no importing capabilities. It can build
a map from raw data but cannot convert alien file formats. It works
only with the Idrisi (16 and 32) file format but Idrisi itself has an extensive
set of conversion tools.
How to quote Biomapper / ENFA?
You can either quote the main paper:
Hirzel, A., Hausser, J., Chessel, D., Perrin, N., in press. Ecological-Niche
Factor Analysis: How to compute habitat-suitability maps without absence
data? Ecology.
or the Biomapper software itself:
Hirzel, A., Hausser, J., Perrin, N., 2001. Biomapper 1.0. Lausanne,
Lab. for Conservation Biology. URL: http://www.unil.ch/biomapper.
Why is it not possible to use most of the modules
of biomapper with Idrisi16 (.img) files?
Only the main Biomapper program has a button to switch
from one format to the other. The other module use one of them according
to the folder in which they are placed (it is not very elegant, I agree,
and I will have to modify that some time in the future).
When the module is launched, it checks if there is an "idrisi.env"
file in the same folder as the module executable. If there is one, it switches
to the Idrisi16 mode, else it switch to Idrisi32 mode.
You can therefore enforce one mode or the other by placing a
fake "idrisi.env" in your Biomapper folder.
|
Statistics
|
I don't understand what mean
all the validation statistics
I agree that this validation part is not yet very well
documented... It is the fruit of hard work to find a way to evaluate a
model without absence data. I tried thus many methods that are still present
in the output, without any further explanation... I shall try here to unveil
a few details on this subject. This will finally be incorporated into the
help file.
You must have parted your sample into two sets. For validation
purpose, you must use as "validation map" the set you did NOT used for
model calibration. This observed map is therefore a boolean map indicating
species presence. You can then evaluate the habitat suitability map produced
with the ENFA model.
When no reliable absence data are available, evaluation consist
to compare various statistics computed on the "predicted map" (the habitat
suitability map): 1° On the whole study area 2° Only on the validation
points.
The best way to understand this is to look at the box-plot displayed
after the validation process. A good model should produce high species
HS-value (80-100). The global box-plot gives you insight on how marginal
the species is in the studied area and thus how much these results could
have been obtained by chance only.
The results window gives a few statistics to resume this box-plot.
It is composed of three parts: 1° species, 2° global statistics
and 3° Comparisons.
The two first part give identical information: first, common
distribution statistics are computed (mean, median, quartiles, etc.). Then
a few more specifics statistics: the one I find the most useful is the
"proportion of presence cells >50": this is the proportion of validation
points that have a predicted HS-value over 50. The higher this value, the
better your model.The "Prob. to be above this value by chance" statistic
use a bootstrap procedure to assess how much this value could have been
obtained by chance. Practically, it is not very interesting as I have allways
got here a 0.000 probability (good news)... Then you can see the 90th percentile
and 95th percentile. You can compare all the previous values between global
and validation sets, but only the validation set gives a really objective
idea of the quality of your model.
The last part give three comparison values between the two sets:
The "Kappa" coefficient is a modified kappa statistics that integrate both
how good is the model and how far from random it is. "Prop.of pix being
significantly above 50" is the difference between the two "Prop.>50" values.
And the "Probability to be over 50%" is the probability to be over 50%
by chance, computed on the global set distribution histogram.
These three last values are interesting to assess how the model
is different from what could be achieved by a random model but it says
nothing about the absolute quality of your model. They are highly related
to the global HS of the study area and thus, if your species is not very
marginal nor very specialised, the model could be very good but get a very
bad "far from random" score.
What is the "modified Kappa coefficient"?
I gave this name to a home-made statistic whose behaviour was
to be similar to the Cohen's Kappa coefficient. It is computed as follows:
Kmod = ( MS-MG )
/ (1- MG )
Do you have an example data set to test Biomapper?
Alas, none of our data are in public domain and we cannot give
them away.
I have very good absence data. Can I use them
with Biomapper?
If you are really sure that your absence data are good and
that no historical or spatial factors could have biased them, you will
probably get a better model by using a GLM or a GAM. But you can ever put
your absence data aside and apply ENFA on presence data only.
How to use abundance data with Biomapper?
So far, the only way to use those data is to "booleanise" them,
i.e. transform them into presence data. But as many users seem to have
this kind of data, I will probably improve the ENFA to take abundance into
account. Stay tuned... (If you want to be alerted when new versions of
Biomapper are published, you need to be registered
on my "Biomapperians" list).
What is a Box-Cox transformation and why is it
needed?
Box and Cox (1964) developed a procedure for estimating the
best transformation to normality within the family of power transformations:
Y' = (YL -1)/L (for L<>0)
Y' = ln Y
(for L = 0)
See "Biometry", Sokal & Rohlf, 1995, pp.417-419 for further explanations.
Biomapper uses the Box-Cox algorithm to normalise as well as possible
the ecogeographic variables. Empirically, we have found that normality
was not a crucial factor and this step could as well be ignored.
What is the "broken-stick advice"?
The distribution of the eigenvalue is compared to the
distribution of Mac Arthur's broken-stick. It is the expected distribution
when breaking a stick randomly. Therefore, the eigenvalues that are larger
than what would have been obtained randomly may be considered "significant".
You can also keep only the factors with an eigenvalue larger than 1. These
are objective means to choose how many factors to keep for HS map computing.
These are just indications designed as a support when selecting the factors.
The number of categories per factor for
the making of the HS map changes by steps of two. I mean, you can only
chose 2, 4, 6 and so on classes per factor. Is this normal?
Yes. It is because the HS computation is based on the
median of the factor distributions and the median must fall between two
classes and so the number of classes must be even.
I have computed an ENFA model and I would
like to extrapolate it on a wider / other area. Is there some equation
I can use to do it?
There is no such equation but the scores matrix computed by
the ENFA may be used for extrapolation purposes. Here is the extrapolation
procedure:
-
Create a project for the calibration area and compute an ENFA model.
-
Save the score matrix.
-
Create a new project for the extrapolation area. The EGV maps must be equivalent
to that used in the calibration model (same units). They also must be sorted
in exactly the same order; the best way to achieve this is to give them
the same name as in the calibration project as EGV maps are sorted alphabetically.
-
Compute the covariance matrix of these EGVs.
-
Load the calibration score matrix
Compute the HS map. It will be based on the calibration model applied
to the extrapolation data set.
How is computed the scores matrix?
The full mathematical details of this operation are
described in a paper currently in press for Ecology. Biomapperians
will be alerted when it will be published. You can get an intuitive understanding
by looking at the help file or on this site at ttp://www.unil.ch/biomapper/enfa.html
. For now, I shall give you here a short view of this process:
The eigenvalues and eigenvectors are extracted as follows: Compute
the matrix W=Rs-1 * Rg where Rg
is the global correlation matrix and Rs the species covariance matrix.
From W, extract the marginality factor (I don't give here the mathematical
procedure), which gives us the matrix W*. The specialisation
factors are computed by extracting the eigensystem from W*.
What are global marginality, specialisation
and tolerance?
Global Marginality = M = Sqrt(Si=1,v[Mi2]
)/1.96
Global Specialisation = S = Sqrt(Si=1,v(li)/V)
Global tolerance = T = 1/S
where Mi are the coefficients of the marginality factor,
Sqrt() is the square root function, V is the number of variables
and li are the eigenvalues.
How do you compute the HS value for each cell
from the scores matrix?
This is a rather complex procedure. The full mathematical
details of this operation are described in a paper currently in press for
Ecology.
Biomapperians
will be alerted when it will be published...
Shortly said, for each retained factor, a frequency histogram
is computed over all the cells of the map. The median of this distribution
is computed. The further a cell is from this median, the lower is its suitability
for this factor. The global suitability is then obtained by computing a
weighted mean on these "partial suitabilities". Marginality has a weight
of 1, the sum of specialisation factors has a weight of 1, proportionally
to their eigenvalue.
How to interpret factor biological meaning?
Look at the scores matrix. The first column of this
matrix is the marginality factor. The other columns are the V-1
specialisation factors. (V is the number of variables). The rows
are the EGV contributions to each factor.
Box-Cox normalisation fails with some EGV
maps. Should I discard them?
For myself, when the Box-Cox fails, I keep the original
map. A "Box-Coxised" map gives better results than a "brute" map, but a
"brute" map is still better than no map at all.
Anyway, you may include it at the beginning (to compute the big
time-consuming covariance map. Once it is computed, you can remove easily
variables from it, but if you add new variables, the whole matrix will
have to be recomputed) and then try to remove it to see how it affects
the result.
How are the ROC curves and kappa calculated
if Biomapper does not use absence data?
The kappa and the ROC curves that you find in the validation
dialog box assume that blank values are true absences. They are therefore
not suitable for most data sets where absences are unreliable. I put them
here for the cases where you can rely upon absences.
How can I compare results from GLM and ENFA?
Comparing ENFA and GLM is a tricky stuff. In my recent
paper (Hirzel et al, 2001, Ecological Modelling 145), I was able to compare
them because I was using virtual data and so I had access to the "reality",
the "truth". But in the general case, we do not know it and so we are constrained
to use the standard statistics (Kappa, ROC, etc.). There are three main
problems when comparing presence/absence to presence-only methods:
1° If absences are thought to be unreliable to build a model, there
is no sense in using them to validate it afterwards. So, the standard statistics
are not useful. I tried to develop a few statistics to replace them (see
the FAQ on the Biomapper site) but the perfect statistic has still to be
invented.
2° As it is based on presence data only, the ENFA is more efficient
to model the areas with average to high suitability; its predictions for
low-suitabitility regions should be taken with prudence. By contrast, presence/absence
methods and in particular GLMs, will tend to model good versus bad areas
causing such a kind of "stepped" response which is different from the linear
one of the ENFA.
3° Without absences - i.e. bad habitat points - to "fix the floor",
ENFA must scale its suitability index to the ceiling. That means that,
on an ENFA HS map, you will always, by construction, have at least one
cell with a HS of 1 (or 100) ; with GLM, it is generally not the case as
it is computing "probability of presence" and so the maximum values are
generally quite lower than 1.
Thus, when comparing visually GLM and ENFA maps (computed on
well-known species at equilibrium, i.e., when absence data are largely
reliable), the results are obviously quite similar. But if you try to compare
them statistically, you get strange results biased either for one or for
the other depending on which base hypothesis you use. To compare them you
must correct both results to make them comparable:
1° Remove the "ceiling effect" by stretching the GLM results between
0 and 1.
2° Synchronise the "step effects" by transforming both GLM and ENFA
results into boolean maps (by choosing a threshold).
3° Then you can compare these results with standard presence/absence
statistics.
What is the difference between unidimensional
multidimensional histogram algorithm for the computation of HS maps and
how strong is the impact on the resulting HS maps?
I fear you will have to wait the publication of our
main paper on ENFA (Ecology, in press, hopefully early in 2002) to fully
understand these algorithms, but I can give you here a "feeling" of what
they do and how they differ:
Unidimensional algorithm:
Once the ENFA factors have been computed, it is possible
to compute for every cell of the map a value along each of them (in fact,
usually, one computes it only for a few of the first factors). The distribution
histogram of each factor is then computed taking into account only the
presence points. Each histogram will be used to attribute a "partial suitability"
value to every cell (the more the cell factor value depart from the median
of the histogram, the lower its "partial suitability". Then, a weighted
sum of these partial suitability values is made for each map cell, and
finally these sum are stretched in order to have their maximum at 100.
Multidimensional algorithm:
Here we do not address the selected factors independently.
By crossing all factors together, the factor space is divided in small
units (hypercubes). Then we count how many cells belong to each hypercube
: this is the multidimensional histogram. This multidimensional distribution
is computed both for all cells (global distribution) and the presence cells
(species distribution). Finally, the HS value is computed for each hypercube
by dividing the species hypercube by the global hypercube (and multiplied
by 100).
The problem with the multi algorithm is that it need very huge amount
of presence data to be accurate, in particular if you want to include more
than two factors, which is generally the case. It is also very memory-consuming.
So far, I never got good results with this algorithm and it is why it will
not be described in your paper. We strongly advise Biomapper users to use
only the unidimensional algorithm. In fact, I could well have removed it
from the interface...
I am currently working on new kinds of algorithms, but it is
another story... Stay tuned!
Can I compare the global marginality
and specialisation coefficients of different species if I use the same
area but different set of ecogeographical maps, in particular if have had
to discard different correlated maps in the process?
Strictly speaking, you cannot. Practically, the main
biasing effect is the study area. When you have to remove a variable, it
is because it does not contain more information than is already included
into the model, so removing it should not alter significantly the global
marginality and specialisation coefficients. Thus, provided the map sets
do not differ drastically, you can still comparer them by these statistics.
Anyway, do not assume too much significance to small differences in marginality
or specialisation between species
To be true, I never tested this. You could test it by building
a common minimal EGV set and apply it to all your species. You will then
see how the marginality et specialisation differ between the common data-set
and the species-optimal one. Tell what you get, should you decide to try
this.
You say that sometimes the marginality
factor takes also a part of the specialisation into account. Where can
I find how much?
The amount of variance explained by the first factor
is in fact the amount of specialisation. It generally ranges from 10% to
70%. This value is given in the eigenvalue table.
To summarise, the marginality factor explains always 100% of
the marginality and some part of the specialisation. It is why it has always
a great weight (minimum 0.5) for HS computation.
I know that the habitat suitability
of my species is linearly related to some variable. Nevertheless, in the
HS map, the optimum for this variable seems not to be at an extremum. Why?
The HS computing algorithm
is not linear but bases itself on the observed distribution. Your problem
arises generally on the marginality factor. Let's imagine a species linearly
related to the frequency of forest and that this variable is strongly correlated
to the marginality factor (but the reasoning hold also for specialisation
factors) ; it means that, the more forest there is around a given cell,
the more suitable it is for the species, the maximum being a frequency
of 100%. This optimum is what we know from our knowledge of the species,
field studies, etc. Now, let's see the Biomapper's point of view: To it,
the optimal frequency is the one where the species is the most frequent
(More precisely, the median of the species distribution along the marginality
factor. As the distribution of this factor is generally unimodal and more
or less symmetrical, the median corresponds also to the most frequent.)
Therefore, if large forests are rare in the study area, points with 100%
forest freq. will be rare too, and the optimum for the species will not
be 100% but lower (say 80%). Then, when computing the HS index, 80% freq.
will provide the highest partial suitability value and this will decrease
when freq. is either increasing or decreasing. The rarer the large forests,
the steeper the rate of decrease.
Sometimes this effect is welcome (when dealing with median optima)
and sometimes counterproductive (with extreme optima). I am currently working
at new algorithms which will hopefully address this problem.
|
Procedures
|
How can I create
the species-presence map?
There are several possibilities depending on what kind of data you
have at hand:
-
List of observations coordinates: Put them in an ASCII file, using a structure
x-coordinate tabulation y-coordinate (You can do this with Microsoft Excel)
and the use the "Convertor" module to create a boolean presence map from
this file.
-
Observation map in a point-vector-file: Simply rasterise it (with Idrisi
function "PointRas"), using the same resolution and window as your ecogeographical
maps.
-
Population map in a polygon-vector-file: First, rasterise it (with Idrisi
function "PolyRas"), using the same resolution and window as your ecogeographical
maps. Then make this map boolean (1=inside populations). Finally, use the
module "Sampler" to divide this map into a calibration and a validation
data sets.
How to convert a point vector map
into a species map?
In Idrisi32:
-
Menu Reformat/Raster-vector conversion/POINTRAS
-
Select your vector map, e.g. "Species.vct"
-
Choose a name for the "image file to be updated". It must be a new name
(not an existing raster) (e.g. "Species_bl.rst")
-
Choose "Change cells to record the presence of 1 or more points"
Idrisi asks you if you want to bring up INITIAL: Answer YES.
-
In the INITIAL dialog box:
-
Select "Copy spatial parmeters..."
-
Select one of your EGV maps in "Image to copy parameters from"
-
In "Output data type", select "Byte"
-
Click OK
And this should do the job. Species_bl can now be used as species
map. You may want to partition it into calibration and validation data
sets. You can do this with the Biomapper module Sampler.
Rememember that all the EGV maps and the species map must be
in the same directory.
How to convert a polygon vector
map into a species map?
In Idrisi32:
Menu Reformat/Raster-vector conversion/POLYRAS
Select your vector map, e.g. "Species.vct"
Choose a name for the "image file to be updated". It must be a new name
(not an existing raster) (e.g. "Species.rst")
Idrisi asks you if you want to bring up INITIAL: Answer YES.
-
In the INITIAL dialog box:
-
Select "Copy spatial parmeters..."
-
Select one of your EGV maps in "Image to copy parameters from"
-
In "Output data type", select "Byte"
-
Click OK
Then, you must booleanise the rasterised polygons:
-
Menu Analysis/Database query/image calculator
-
Select "Logical expression"
-
Type Species_BL = [Species]>0
-
Click OK
Species_BL can now be used as species map. You may want to partition
it into calibration and validation data sets. You can do this with the
Biomapper module Sampler.
Rememember that all the EGV maps and the species map must be
in the same directory.
How to insert the species-presence
map?
Once the species-presence map has been
created, you must insert it in Biomapper in order to use it in the
analyses. The species-map must be inserted in the Work
maps list (NOT the EGV maps). This can be done in the Files/Work
maps/Add maps menu. Once inserted in this list, you must declare it
as the current species map by selecting it, right-clicking on it and selecting
"Mark as species map". The current species map is then marked by a red
circle in front of its name.
Is it possible to obtain a better model
by reducing the amount of explained variance (i.e. the number of factors
used) and, consquently, increasing the number of classes per factor?
Yes. You must find the best trade-off between explained variance
and smoothness of the HS model. Note that it is generally not useful to
select more than 10 classes.
How to compare models obtained by various
factor number/class number trade-off ?
You can use the validation module of Biomapper. Look at the
box-plot it generates. Focus on the species box; it must be as high as
possible. The higher and the narrower the better. (The global-box is not
useful to assess model quality; it is here to see how different from randomness
is your model. If you built your model on an area globally good for your
species, you will get a good model simply because the species can live
everywhere.)
How can I print a map from biomapper? I guess
I can do it from Idrisi but I´d like to use the rainbow palette.
Display the map and save it as a BMP file. This file you can
insert in a word document or print with any picture software. The rainbow
palette cannot be used in Idrisi because it uses more than 256 colors.
What are these "Biomapper extensions" used for?
Can I ignore them?
These extensions are made to simplify the browsing and
help the user to select among all the maps, those having the right data
type. But these extensions are not used by Biomapper to verify the maps;
it uses the raster documentation file (*.rdc). Thus you can as well ignore
the biomapper extensions.
In the Options dialog box, it is possible
to switch between correlation and covariance matrix, and to change the
norming of the eigenvectors. However, this doesn't seem to affecte ENFA
outputs.
These options are intended for the Principal Components Analysis.
They do not affect ENFA indeed.
|