About database


All the data contained in the IO database belongs to a wide array of published papers and books and from public repositories (notably GBIF, iNaturalist, GenBank). The data has been validated by people working on the project and is continuously validated, improved and updated.


The species included in the database represent all the (known?) species living in Europe and the taxonomy largely reflects the widely accepted assessment made by the committee of Fauna Europaea composed by a large number of Lepidoptera experts from many countries. Moreover, some minor changes due to most recent studies have been made by the people working on the IO database. The accepted taxonomy for each family can be inspected by clicking on “explore taxonomy”

Kind of Data

For each European butterfly species the IO database provides the following data:

Occurrence data
Contains the information for the occurrence of the species together with the internal ID, the country, the latitude and longitude (in decimal degrees), the collection date, the GgenBank accession for the specimens for which a COI sequence is available and a flag column indicating whether a COI sequence is available and then included in the FASTA file (see below). The last two fields indicate the source of the DNA sequence and occurrence data. References for literature can be downloaded in the file "publications". Occurrence data can be downloaded as CSV, JSON and GEOJSON file.
COI sequences
Provided as a FASTA file and aligned in a 658 bp format. The order of the samples in the FASTA file is the same as in the occurrence data. So, by selecting the specimens flagged for a COI sequence in the occurrence table a corresponding matrix and FASTA file can be obtained.
Host plants
The host plant genera reported to be used by each species in two main literature works: Tolman and Lewington (2008) and Lafranchis (2004). The plant family is also provided.
Species traits
Species traits include the following information:
  1. Number of host plants. This is given as the number of known genera used by the larva.
  2. Mobility index. Measurements of the dispersal tendency in butterflies are complex and, in previous studies, have been mostly based on the agreement among subjective evaluations made by experts, and less commonly by using objective species traits. Here we combined the indexes provided by four studies (Balletto & Kudrna, 1985; Komonen et al. 2004; Heikkinen et al. 2010; Dennis, 2012) by standardizing their range between 0 (low dispersal) and 1 (high dispersal) and computed, for each species, an average dispersal tendency based on the available measurements.
  3. Wingspan. Is provided like the minimum and maximum value reported by Higgins and Riley (1970). The average value is also provided together with a variance value calculated as (max-min)/mean.
  4. Flying period and voltinism. Is provided as the first and the last month of appearance of adults in Europe (Month min, Month max), the number of months in which the adults fly (length(months)) and the central month among them (Month central). Voltinism is also provided with the minimum and maximum number of genarations reported in Europe (Genarations min and Generation max) and their variation (Generation max-Genaration min). The information about phenology belong to Tolman and Lewington (2008) and have been checked with our available data.
  5. Altitude. Is provided as the minimum and maximum altitudes reported by Tolman and Lewington (2008). A range (maximum-minimum) is also provided.
  6. Range size. Is calculated as the number of 50x50km squares inhabited by that species in Europe. It has been obtained by the CLIMBER resource (Schweiger et al 2014).
  7. Climatic data. Obtained by the CLIMBER resource they report several values of the temperature and precipitation experienced by the species in the 50x50 km squares of Europe where it is predicted to occur on the basis of climatic envelop models (Schweiger et al 2014). The 95% confidence interval and the standard deviation of temperature and precipitation among the cells predicted to host the species in Europe are also provided.


The taxonomy and the different data have been validated by the IO database team. This is particularly important for COI sequences, traits and occurrence data belonging to cryptic taxa that were not separated at the time of the original sequencing, to identifications belonging to citizen science projects (e.g. inaturalist). Validation has been made in a conservative manner and anytime a doubt emerged the datum was removed.