trias-project / unified-checklist Goto Github PK
View Code? Open in Web Editor NEW🇧🇪 Global Register of Introduced and Invasive Species - Belgium
Home Page: https://trias-project.github.io/unified-checklist/
License: MIT License
🇧🇪 Global Register of Introduced and Invasive Species - Belgium
Home Page: https://trias-project.github.io/unified-checklist/
License: MIT License
File named taxa_after_verification.csv
taxa.csv
taxa.csv
taxa.csv
taxa.csv
taxa.csv
taxa.csv
taxa.csv
taxa.csv
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
trias_verifiedKey
Note that trias_verifiedKey
should only contain a single value. Taxa that are verified to multiple taxa should thus appear as multiple lines
Lists the datasets considered for unifying information:
Sources considered for unifying information: https://DOI, https://DOI, https://DOI
Note: this is not the source for the taxon information (that would be GBIF Backbone taxonomy), but just extra information (hence why we put it in taxonRemarks
)
Todo:
dwc_mapping.Rmd
: add correct taxonRemarks format to taxaIncludes the link to the specific checklist taxon that was used for this piece of information:
https://www.gbif.org/species/141264581: Nymphaea marliacea Marliac in Verloove F, Groom Q, Brosens D, Desmet P, Reyserhove L (2018). Manual of the Alien Plants of Belgium. Version 1.7. Botanic Garden Meise. Checklist dataset https://doi.org/10.15468/wtda1m
Todo:
get_taxa.Rmd
: Remove "accessed via GBIF.org on xxxx-xx-xx." from citation and add as citation
to checklists.csv
unify_information.Rmd
: add sourceScientificName
to distributionunify_information.Rmd
: add sourceScientificName
to species profileunify_information.Rmd
: add sourceScientificName
to descriptiondwc_mapping.Rmd
: create small build_source_citation()
function, taking the arguments taxonKey
, scientificName
, datasetCitation
dwc_mapping.Rmd
: add correct source format to distributiondwc_mapping.Rmd
: add correct source format to species profiledwc_mapping.Rmd
: add correct source format to description@timadriaens found strange that there was no analysis of Trachemys scripta
, although this species is in the Union list of concern species. The answer was simple: in the unified there are three subspecies of it, but not the species itself (see www.gbif.org/species/163636890), although it is an alien species.
Why are there infraspecific taxa in unified and source checklists? There are two possible reasons, so we can divide such taxa in two groups:
Subspecies of Trachemys scripta
belong to second group as the 3 subspecies have different date of first observed in ad-hoc checklist (see raw dump). But the species itself is alien...
So, how to distinguish infraspecific taxa of group 1 from those of group 2? An idea could be to add the species to taxon core without extensions, so no informations about pathways, native range, distribution etc. are present as they are specified at infraspecific level.
This solution will not affect checklist indicators at all as they are based on the information in the extensions. For occurrence indicators we should add a step in the pipeline for making the cube to avoid doubling the number of occurrences, but it is feasible.
In this file, infraspecific_alien_taxa_source_info.txt, you can find a list with all infraspecific taxa in the unified with their source checklist where key
is the key of the taxon in the unified (e.g. 152543132) and nubKey
is the key of the corresponding taxon in the GBIF backbone (e.g. 6157050).
@timadriaens and I think it is worth to find them. They should not be so many.
@peterdesmet, @qgroom & co.: what do you think about it? Is it something to add at unified level? Or at source checklist level?
The unified checklist currently contains:
type | number of records |
---|---|
degree of establishment | 68 |
Introduced species abundance | 102 |
Introduced species impact | 195 |
Introduced species management | 4 |
Introduced species population trend | 25 |
Introduced species remark | 90 |
Introduced species vector dispersal | 359 |
invasion stage | 2,469 |
native range | 4,109 |
pathway | 3,153 |
@qgroom @timadriaens @SoVDH Should we restrict those to the ones we know about (i.e. those with lower case)? The others are from WRIMS.
I would like to start this issue as a forms of quality control of the Belgian GRIIS checklist after some in vain attempts to discuss this at overcrowded TrIAS core group meetings and some discussions with @SanderDevisscher but we lacked to time to address this. After all this is also one of the main reasons for having published DAISIE within the AlienCSI STSM. The procedure is relatively simple I think, the big work will be in finding the appropriate source to complement GRIIS Belgium:
locality = Belgium
occurrenceStatus
, establishmentMeans
, eventdata
be used to explore differences with GRIIS Belgium and perhaps to improve it provided we have sources - we will also have an idea on mistakes in both registersregion_of_first_record
is also interestingget_taxa <- function(
taxon_keys = NULL
checklist_keys = NULL
limit = NULL
)
If parameter taxon_keys
:
name_usage()
to query e.g. http://api.gbif.org/v1/species/5231190checklist_keys
: assertion errorIf parameter checklist_keys
:
name_usage(datasetKey = ...)
taxon_keys
: assertion errorIf parameter limit
(e.g. 10)
taxon_keys
: limit to 10 taxachecklist_keys
: limit to 10 taxa PER DATASETkey | nubKey | nameKey | taxonID | sourceTaxonKey | kingdom | ... |
---|---|---|---|---|---|---|
134087746 | 5567657 | 15913604 | 2346 | NA | Plantae | ... |
5567657 | 5567657 | 1439808 | gbif:5567657 | 123978664 | Plantae | ... |
key
(= input taxonKey) should be first.limit
, offset
nameUsage()
: store in vector. At end of retrieving all keys, add those to the returned dataframe, with column key
populated and all other columns NA. Also provide warning message, listing the keys that were not found at GBIF. Note: this should never happen with parameter checklist_keys
. 😄This function should replace:
I accidentally noticed Crassula helmsii does not occur on the unified. It is one of our most notorious invasive plant species. I see the accepted (that also occurs in the other GRIIS checklists) is this one, however, when mapping rinse and manual of alien plants is was mapped on the synonym .
@peterdesmet @damianooldoni can you check what went wrong here?
Some species are currently lacking the flemish region locality on the "include-distribution-regions" - branch. These include:
Species | Key |
---|---|
Alopochen aegyptiaca | 2498252 |
Oxyura jamaicensis | 2498305 |
Ludwigia grandiflora | 5421039 |
Ludwigia peploides | 5420991 |
According to Tim the missing birds can be explained by the fact the birds checklist is not yet published. This also explains the missing of Sacred ibis from the unified checklist.
I just wanted to flag that Persicaria wallichii seems to be lacking from the GRIIS checklist in Belgium, whereas it is quite well etablished and has many records on wnm.be. This might be due to the many synonyms (Koenigia polystachya, ) and the mapping of Manual of Alien Plants onto the GRIIS unified.
In manual it is under Rubrivena polystachya which has gbif code 4037343
In waarnemingen.be it is under Persicaria wallichii, gbif code 6391908
@peterdesmet @LienReyserhove can you take a look please and make sure it is added to GRIIS Belgium (preferably under 8848208
)?
I noticed quite a strange behavior while using RGBIF's function name_usage()
with datasetKey
equal to a character vector. Try to run the code here below:
test1 <- name_usage(datasetKey = "73605f3a-af85-4ade-bbc5-522bfb90d847")
test2 <- name_usage(datasetKey = "d7c60346-44b6-400d-ba27-8d3fbeffc8a5")
test3 <- name_usage(datasetKey = c("73605f3a-af85-4ade-bbc5-522bfb90d847",
"d7c60346-44b6-400d-ba27-8d3fbeffc8a5"))
test4 <- name_usage(datasetKey = c("d7c60346-44b6-400d-ba27-8d3fbeffc8a5",
"73605f3a-af85-4ade-bbc5-522bfb90d847")) #invert order datasetKeys
c(nrow(test1$data), nrow(test2$data), nrow(test3$data), nrow(test4$data))
c(unique(test1$data$datasetKey), unique(test2$data$datasetKey),
unique(test3$data$datasetKey), unique(test4$data$datasetKey))
I got this:
c(nrow(test1$data), nrow(test2$data), nrow(test3$data), nrow(test4$data))
10 7 10 10
unique(test1$data$datasetKey)
[1] "73605f3a-af85-4ade-bbc5-522bfb90d847"
unique(test2$data$datasetKey)
[1] "d7c60346-44b6-400d-ba27-8d3fbeffc8a5"
unique(test3$data$datasetKey)
[1] "73605f3a-af85-4ade-bbc5-522bfb90d847"
unique(test4$data$datasetKey)
[1] "73605f3a-af85-4ade-bbc5-522bfb90d847"
I was expecting:
c(nrow(test1$data), nrow(test2$data), nrow(test3$data), nrow(test4$data))
10 7 17 17
unique(test1$data$datasetKey)
[1] "73605f3a-af85-4ade-bbc5-522bfb90d847"
unique(test2$data$datasetKey)
[1] "d7c60346-44b6-400d-ba27-8d3fbeffc8a5"
unique(test3$data$datasetKey)
[1] "73605f3a-af85-4ade-bbc5-522bfb90d847", "73605f3a-af85-4ade-bbc5-522bfb90d847"
unique(test4$data$datasetKey)
[1] "73605f3a-af85-4ade-bbc5-522bfb90d847", "73605f3a-af85-4ade-bbc5-522bfb90d847"
Strange. What do you think? Should I start an issue on rgbif?
I found three taxa in speciesProfiles which are missing in the taxon core.
> speciesProfiles %>% anti_join(taxon, by = "id")
anti_join: added no columns
> rows only in x 3
> rows only in y ( 37)
> matched rows (2,999)
> =======
> rows total 3
# A tibble: 3 x 7
id is_marine is_freshwater is_terrestrial is_invasive habitat source
<chr> <lgl> <lgl> <lgl> <lgl> <chr> <chr>
1 https://~ FALSE TRUE TRUE NA freshwa~ https://www.~
2 https://~ FALSE TRUE TRUE NA freshwa~ https://www.~
3 https://~ FALSE TRUE TRUE NA freshwa~ https://www.~
id
of these three taxa:
I think the origin of this bug is in this section of the workflow:
https://trias-project.github.io/unified-checklist/4_unify_taxa.html#explicitely-remove-incorrect-taxa
We remove these taxa from the taxon core but not from the extension.
We reference ISO_3166-2
for our locationIDs:
But ISO_3166-2
is only for the subdivisions of countries. So correct for BE-WAL
, but not for BE
.
I propose to just reference the general ISO_3166
(e.g. ISO_3166:BE-WAL
, ISO_3166:BE
), rather than one of its parts. That more broader namespace is not going to create name clashes either (i.e. any code on ISO_3166-2 will not appear on ISO_3166-1 or 3).
@qgroom @LienReyserhove Suggestions?
Should be considered for change, now it's confusing.
To do after first publication:
The use of index.Rmd as first file to run in Rstudio (containing the packages etc.) should be documented in the README.
I found that the pathway info linked to types "introduction pathway"
(Checklist of alien herpetofauna of Belgium) and "pathway of introduction"
(Checklist of alien species in the Scheldt estuary in Flanders, Belgium) discussed in #68 is not standardized as it doesn't start with the typical cbd_2014_pathway:
prefix. @peterdesmet: I suppose this should be improved at checklist level as for #68, right?
Here a table of the values I found in description
:
suspect_type_pathways <- c(
"introduction pathway",
"pathway of introduction"
)
description %>%
filter(type %in% suspect_type_pathways) %>%
distinct(type, description) %>%
arrange(type, description)
# A tibble: 19 x 2
type description
<chr> <chr>
1 introduction pathway contaminant
2 introduction pathway contaminant_timber
3 introduction pathway escape_pet
4 introduction pathway escape_research
5 introduction pathway nursery
6 introduction pathway release_conservation
7 introduction pathway release_landscape_improvement
8 introduction pathway release_other
9 introduction pathway stowaway_container
10 introduction pathway stowaway_other
11 pathway of introduction contaminant_plant
12 pathway of introduction corridor_water
13 pathway of introduction escape_aquaculture
14 pathway of introduction escape_pet
15 pathway of introduction release_fishery
16 pathway of introduction stowaway
17 pathway of introduction stowaway_ballast_water
18 pathway of introduction stowaway_hull_fouling
19 pathway of introduction stowaway_ship
While trying to update indicators and helping @timadriaens for making some graphs, I found the following:
description %>% distinct(type)
distinct: removed 12,232 rows (>99%), 12 rows remaining
# A tibble: 12 x 1
type
<chr>
1 pathway
2 degree of establishment
3 native range
4 Introduced species vector dispersal
5 Introduced species impact
6 Introduced species abundance
7 introduction pathway
8 Introduced species remark
9 Introduced species management
10 pathway of introduction
11 Introduced species population
12 Introduced species population trend
There are 3(!) types for encoding pathway information, where type pathway
is the most used (and the correct one):
type_pathways<- c(
"pathway",
"introduction pathway",
"pathway of introduction"
)
description %>%
filter(type %in% type_pathways) %>%
group_by(type) %>%
count() %>%
arrange(desc(n))
# A tibble: 3 x 2
# Groups: type [3]
type n
<chr> <int>
1 pathway 3283
2 introduction pathway 163
3 pathway of introduction 61
introduction pathway
All these data come from the Checklist of alien herpetofauna of Belgium (https://doi.org/10.15468/pnxu4c):
description %>%
filter(type %in% "introduction pathway") %>%
group_by(type) %>%
distinct(source) %>%
mutate(from_herpetofauna = str_detect(.data$source,
pattern = "herpetofauna",
negate = FALSE)) %>%
group_by(from_herpetofauna) %>%
count()
from_herpetofauna n
<lgl> <int>
1 TRUE 89
pathway of introduction
All these data come from Checklist of alien species in the Scheldt estuary in Flanders, Belgium (https://doi.org/10.15468/8zq9s4):
description %>%
filter(type %in% "pathway of introduction") %>%
group_by(type) %>%
distinct(source) %>%
mutate(from_scheldt_estuary = str_detect(.data$source,
pattern = "Scheldt estuary",
negate = FALSE)) %>%
group_by(from_scheldt_estuary) %>%
count()
from_scheldt_estuary n
<lgl> <int>
1 TRUE 54
I think this are issues to solve at checklist level. @peterdesmet: what do you think?
For the indicators (see inbo/reporting-rshiny-grofwildjacht#148 (comment)), but also for any user, it would good if the values for degree of establishment are standardized. There are two questions to answer:
Currently we use:
blackburn_et_al_2011:B3
in the Mollusca checklistreleased (blackburn_2011:B3)
in the birds checklistSee table 2 in https://doi.org/10.3897/biss.3.38084: the correct value to use would be the label, so just released
The quickest (and temporary way) to standardize is in the the indicators, but it would be better to standardize here in the unified checklist, or in the source checklists. As far as I can tell, there are only two: mollusca and birds, so it seems best to update them there:
As discussed in inbo/natuurindicatoren#14 (comment) and for feeding the Harmonia website, it would be good to also unify distributions for the regions (when that information is available in the source checklist).
Mainly because:
/docs
without the (function(input_file, encoding) { rmarkdown::render(input_file, encoding = encoding, output_file = paste0("../docs/",sub(".Rmd", ".html", basename(input_file))))})
hackwhile checking the birds extract from the unified there appear to be a number of species native to Belgium on the unified checklist
Species | Argumentation |
---|---|
Perdix perdix (Linnaeus, 1758) | native red list species but indeed restocking occurs regularly for hunting |
Dryocopus martius (Linnaeus, 1758) | native black woodpecker |
Bubulcus ibis (Linnaeus, 1758) | vagrant, more and more seen, also kept in aviaries |
Nycticorax nycticorax (Linnaeus, 1758) | native breeding heron species, but also kept in aviaries |
Ciconia nigra (Linnaeus, 1758) | native black stork, rare breeder |
Emberiza hortulana Linnaeus, 1758 | probably also kept in aviaries but would take it out, rare vagrant |
Tarsiger cyanurus (Pallas, 1773) | rare vagrant |
Athene noctua (Scopoli, 1769) | native little owl |
Anser fabalis (Latham, 1787) | winter migrant, possibly also bred in waterfowl collections |
Anser anser (Linnaeus, 1758) | mixed population of wild and escaped birds but native to Belgium |
Milvus migrans (Boddaert, 1783) | native |
Bubo bubo (Linnaeus, 1758) | native, breeding, but kept widely in collections |
Branta leucopsis (Bechstein, 1803) | native, breeding, wintering, mixed population of wild and birds of escaped origin |
these would imo best be taken out of the unified for now @LienReyserhove .
The checklist currently contains:
taxonRank | taxa |
---|---|
GENUS | 11 |
SPECIES | 2399 |
SUBSPECIES | 101 |
VARIETY | 18 |
For a unified checklist, I think it makes sense to aggregate this information on SPECIES only, because:
This would only affect 5% of the taxa (i.e. the non-SPECIES):
I've created a spreadsheet of the taxa that are affected.
@timadriaens @SoVDH @qgroom would you be OK with this choice?
WRIMS distributions need a combination of:
"country": "BE",
"status": "PRESENT"
"establishmentMeans": "INTRODUCED",
... to be selected. Ideally they have temporal information:
"temporal": "2000",
I notice many distributions only have part of the information: e.g. year, but not status. Search for example for "BE"
in https://api.gbif.org/v1/species/157131005/distributions: there are 8 distributions for Belgium, but none have all 3 properties and year.
I wonder if we should drop the status
field in our selection.
To select unique descriptions in the unified checklist, we apply the following code to select the descriptions across checklists (section 6.5 point 3):
# Group by type and verificationKey across checklists
group_by(
type,
description,
verificationKey
) %>%
# Select first datasetKey, taxonKey and scientificName
summarize(
datasetKey = first(datasetKey),
taxonKey = first(taxonKey),
scientificName = first(scientificName)
) %>%
By grouping by both type, description and verificationKey, we risk to select duplicated descriptions due to the use of different vocabularies. An example:
verificationKey | type | description | taxonKey |
---|---|---|---|
a | native range | Northern America | 1 |
a | native range | Southern America | 1 |
b | native range | North America | 1 |
Here, all descriptions for this species will be selected, due to the use of a different vocabulary.
To be considered....
A problem (already raised in #32) arising from including regional distributions in the unified checklist (see #45 and #43) is that eventually, species that have been introduced in one region but are native in another, are included in the checklist of alien species of Belgium. I think it would be best to not include those in the unified? Some examples are
@damianooldoni @peterdesmet We should decide what to do with such cases, because it is strange they appear on a Belgian alien species checklist.
Warning:
progress_estimated()
was deprecated in dplyr 1.0.0.
Progress bars are still shown though, but I assume a separate package should be loaded?
Tidylog reports no changes for this step:
unified-checklist/src/1_get_taxa.Rmd
Lines 118 to 125 in 9d309ec
Maybe this step can be removed?
@damianooldoni can you check if we can delete the branch get-taxa-verify-synonyms
? Here is how it compares to master: master...get-taxa-verify-synonyms
But most changes are maybe included in the newer https://github.com/trias-project/unified-checklist/tree/get_taxa_populate_spreadsheet
It would be nice to have as output of unified checklist a list of accepted taxa with their synonyms.
Malva sylvestris (cultivar), often called mauritiana, is in the non-native plants dataset of waarnemingen.be (however with the same Specieskey as the native Malva sylvestris). This is a bit strange of course to find a native species key in the non-native plants dataset but understandable and relates to taxonomic issues/interpretation (it is not in MAP and MAP says that it hardly deserves a taxonomic status). However, the dutch do consider it see https://www.verspreidingsatlas.nl/6891 (if you click taxonomy you can directly look for it on gbif, handy!).
The thing is that now, we can't evaluate emerging status of this "seed mixture escape" since that is produced only for species on the unified. Just want to flag this, and discuss about potential solutions.
While strolling through the unified-checklist website, I'm writing down some suggestions. Just some things that pop up in my mind, feel free to integrate or not.
taxonKey can be used to verify manually on GBIF
(section 2.3): I would add the URL (i.e. www.gbif.org/species/taxonKey). Maybe also refer to the verbatim page, as the overview page for e.g. Pilosella x brachiata indicates that the species is present in Belgium, but not that the presence of this species is uncertainSection 3: I would add a preview of each of the resulting datasets
What is the reason to use startYear as endYear when no endYear is provided in the unified checklist? (in the first step) I think we need to think about a general strategy for dates because:
sometimes, we have no date information at all. Should also be mentioned in the text somewhere (i.e. what about NA's)
In the same section, I wonder if it would be possible to give more previews as well, sometimes it's hard to visualise the steps we undertake. A preview could be helpfull here.
415 taxa are currently not verified. 421 are from the new run on 2023-09-15 (which includes the waarnemingen.be checklist). We should make an effort and verifying the suggested synonymy where we can. See spreadsheet.
upon updating the biodiversity indicators for Flanders with @SanderDevisscher we noticed "natural" has become a category at level 1. Therefore, the indicator table on pathways also gives a number of species for unaided and for natural dispersal separately which is no good (they are synonymous). See the vocab table for pathways, here line 52 is in fact redundant because if level 1 is unaided then level 2 is always natural dispersal.
In fact, "unaided" (level 1) is the same as "natural dispersal" (level 2).
So probably something happens in the function which results in this.
Can be step 7 of pipeline. Template: Vietnam_GBIF_GRIIS_2018.xlsx
File verification_file.csv
will be used by taxonomists for verifying taxa. At the moment, it contains the following columns in the following order:
scientificName
bb_scientificName
bb_taxonomicStatus
bb_acceptedName
bb_key
bb_acceptedKey
bb_kingdom
issues
verification_key
date_added
checklists
remarks
This structure is optimal for experts but too difficult to manage. Main source of bugs is the combination of the following properties:
scientififcName
allowedHere below the columns of taxa.csv
. I checked the box aside the columns I think we need to include in verification_file.csv
:
taxonKey
scientificName
taxonID
datasetKey
nameType
issues
validDistribution
bb_key
bb_scientificName
bb_species
bb_genus
bb_family
bb_order
bb_class
bb_phylum
bb_kingdom
bb_rank
bb_speciesKey
bb_taxonomicStatus
bb_acceptedKey
bb_acceptedName
In addition to these columns, the next ones are peculiar of verification_file.csv
and should be present:
verificationKey
dateAdded
remarks
Based on what we decide in this issue I will modify (= simplify) trias::verify_taxa()
.
@peterdesmet What do you think?
Upon checking the country level status for the EASIN baseline distribution of Union Concern species for the 3rd batch species, I noticed Humulus scandens (aka Humulus japonicus) does not occur on the unified (whereas it should, as it is in the Manual of Alien Plants).
There is something strange going on. I can't find the species in the taxon.txt from the gbif download of the unified, also gbif does not state it for GRIIS Belgium (not under Humulus scandens (Lour.) Merr. nor under the synonym Humulus japonicus Sieb. & Zucc. , nor under Humulopsis scandens (Lour.) Grudz.).
However, GBIF does present description data on Humulus japonicus Sieb. & Zucc. coming from the Manual.
NATIVE RANGE
Asia
source: Manual of the Alien Plants of Belgium
INVASION STAGE
casual
source: Manual of the Alien Plants of Belgium
PATHWAY
cbd_2014_pathway:escape_horticulture
source: Manual of the Alien Plants of Belgium
How is this at all possible @peterdesmet @qgroom ? And how to add the species to the Belgian checklist? Does this have to do with verification again? This is problematic for Union List IAS for which we have to officially report. As good Belgians we should at least try to get our hops right :-)
Symphoricarpos albus (L.) S.F.Blake is in the Alien Plants of Belgium checklist, but is missing from the unified checklist.
Alien Plants of Belgium checklist
https://www.gbif.org/species/141267078
Perhaps this is something to do with the synonymy, but I can't see an obvious problem.
verify_synonyms <- function(
taxa = NULL # Dataframe with taxa to verify
verified_synonyms = NULL # Dataframe with verified synonym info
)
taxa
: a dataframe with at least the following columns
verified_synonyms
: a dataframe with at least the following columns
taxa
)if in verified_synonyms:
if taxa.backbone_scientificName != verified_synonyms.backbone_scientificName:
update in verified_synonyms
add to updated_scientificName
if taxa.backbone_accepted != verified_synonyms.backbone_accepted:
update in verified_synonyms
add to updated_accepted
else
do nothing
else (not in verified_synonyms):
add to verified_synonyms
add to new_synonyms
if in verified_synonyms, but not in taxa:
add to unused_synonyms
verified_synonyms
: same as input df, but now with updated info. Could be written to file outside the function.new_synonyms
: a subset of verified_synonyms
(same columns) with synonyms relations that were added (found in taxa, but not in verified_synonyms)unused_synonyms
: a subset of verified_synonyms
(same columns) with unused synonym relations (found in verified_synonyms, but not in taxa)updated_scientificName
: a df with backbone_scientificName
+ updated_backbone_scientificName
updated_accepted
: a df with backbone_accepted
+ updated_backbone_accepted
Explore if and how we can add isInvasive information, as requested by GRIIS.
Originally reported in trias-project/alien-birds-checklist#13 by @timadriaens:
Hi, scrolling through some emerging species products further down the pipeline and through the unified checklist itself, there are still a number of species that should not occur on the unified checklist. I use this issue to report them:
checklist_taxa.tsv
alien_backbone_taxa.tsv
Hi, there is a problem when downloading Orconectes limosus (our commonest North American crayfish) from gbif. The taxonomy changed and the accepted is now Faxonius limosus
Both are accepted names on gbif currently, but they are not linked. However, it is the same species (and it is on the union list...). How to solve this? @peterdesmet suggested adding both accepted names to the checklist? Should we also update the tsv with the Union List species?
Running code block unify_information-7
in unify_information.Rmd
returns the following message:
Joining, by = "taxonKey" no non-missing arguments, returning NAno non-missing arguments
and this message is repeated over and over and over and over and over and over and over again. Perhaps a good idea to supress the message?
Examine what habitat information is requested by GRIIS:
For e.g. all Fungi, Viruses, Hemiptera are listed as 'Host'. For Marine and Fish we are using WoRMS and Fishbase to assign the habitat type.
Would you have any objection to post the list with the reassigned habitat. We can highlight the changes for you.
So that unified checklist can be used for Flemish indicators.
For some of the record-level terms, I'm not 100% sure what information to use:
license
: Taxa info comes from backbone (CC-BY), rest from datasets, might not always be CC0. I would use the most limiting license, i.e. CC-BY
RightsHolder
:
Organization who has the rights to the data and in the case of multiple rights holder, the organization who managed/made the decision to release those rights under CC0. Is often the same as publishing organization.
This is a difficult one, I'm tempted to say that the owners of the checklists are the rightsHolders, but then we're ignoring the taxonomic information from the backbone...
institutionCode
. Is now populated with "INBO", but according to our guidlines, this should be the same as rightsHolderThe directories data/interim
and data/output
both contain a verification file + taxa after verification. This should be cleaned up.
taxa_after_verification.csv
makes sense in data/interim
verification_file.csv
should probably be moved to data/raw
, as it is a start file.
Fields I found in GRIIS checklists are in bold.
id
: GBIF species keymodified
language
license
rightsHolder
accessRights
bibliographicCitation
: citation of contributing checklist = GBIF backbonedatasetID
: DOI of unified checklistinstitutionCode
datasetName
: name of unified checklisttaxonID
: according to GBIFscientificName
: according to GBIFacceptedNameUsageID
acceptedNameUsage
kingdom
: according to GBIFphylum
: according to GBIFclass
: according to GBIForder
: according to GBIFfamily
: according to GBIFgenus
: according to GBIFtaxonRank
: according to GBIFnomenclaturalCode
taxonomicStatus
taxonRemarks
: remark for SYNONYMs and why they are kept no: sources that were consideredid
locationID
: "ISO_3166-2:BE"locality
: "Belgium"countryCode
: "BE"occurrenceStatus
: "present"establishmentMeans
: "introduced"eventDate
: widest range within used checklistsource
: citation of contributing checklist OR actual source within that checklistid
: one species profile per taxon, from the most trust worthy checklistisMarine
isFreshwater
isTerrestrial
isInvasive
habitat
: e.g. "terrestrial|freshwater", is same information as is...
fields, so easy to repeat hereAfter review pipeline, next (final) step will be standardization of vocabularies. While working on producing an unified checklist of alien species of Belgium I noticed that native range assumes values at a variety of levels (country level, continental level, climate level, origin level). While reading data from the following six checklists:
I get the following values:
Africa
, Africa (WGSRPD:2)
, Arctic
, Asia
, Australasia (WGSRPD:5)
, Australia
, Australia (WGSRPD:50)
, China
, cultivated origin
, East Asia
, Eastern Europe
, Europe (WGSRPD:1)
, hybrid origin
, Indo-Pacific
, New Zealand
, North Africa
, Northeast Asia
, Northern America
, Northern America (WGSRPD:7)
, pan-American
, Pantropical
, Ponto-Caspian
, South America
, Southeast Asia
, Southern America (WGSRPD:8)
, Southern Europe
, Southern Hemisphere
, temperate Asia (WGSRPD:3)
, Tropical and warm seas
, tropical Asia (WGSRPD:4)
, United States
, West Africa
, Western Atlantic
.
@peterdesmet: WGSRPD has been conceived specifically for plant distribution. Are there good practice guidelines for distribution of species belonging to other kingdoms? Does an unique controlled vocabulary for all kingdoms? Any other idea about standardization? I don't see immediately a solution. Thanks in advance.
I noticed that the issues
we collect in get_taxa.Rmd are issues with the backbone taxon, not issues of the checklist taxa:
We get the ORIGINAL_NAME_DERIVED in our issues, which isn't useful for us. Unfortunately, the lookup function doesn't seem to return issues
as a column:
alien_plants <- rgbif::name_lookup(
datasetKey = "9ff7d317-609b-4c08-bd86-3bc404b77c42",
origin = "source",
limit = 99999,
return = "data"
)
colnames(alien_plants)
[1] "key" "scientificName"
[3] "datasetKey" "nubKey"
[5] "parentKey" "parent"
[7] "kingdom" "family"
[9] "kingdomKey" "familyKey"
[11] "canonicalName" "nameType"
[13] "taxonomicStatus" "origin"
[15] "numDescendants" "numOccurrences"
[17] "taxonID" "habitats"
[19] "nomenclaturalStatus" "threatStatuses"
[21] "synonym" "species"
[23] "speciesKey" "rank"
[25] "genus" "genusKey"
Whilst scanning through the list of species selected for the MijnVismaat occurrences dataset, noticed a few native species in there. I suspect this has to do with using GRIIS Belgium as a selection filter and indeed, those species appear to be on GRIIS Belgium:
These species do not belong on the alien species checklist for Belgium. They originate from the Zieritz et al. checklist.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.