trias-project / alien-species-checklist Goto Github PK
View Code? Open in Web Editor NEWπ Proof of concept for a checklist of alien species in Belgium
License: MIT License
π Proof of concept for a checklist of alien species in Belgium
License: MIT License
This can only be done once all the other issues have been resolved.
If yes, we can ignore the Annex B file in our combined checklist. @stijnvanhoey will do a join on scientificNames to see how many match.
We will add the following columns for species that occur in GRIIS and the concatenated file (based on identical acceptedKey
):
Order of sources to use info from:
Create a scientificName
for all records in concatenated
where there currently isn't one. Since none of those have a subspecies
, this can be done quite easily:
cells["genus"].value + " " + cells["specificEpithet"].value
Make sure to escape quotes before importing the file, so we don't get this in taxonID
:
plants "Populus x jackii Sargent ""Gileadensis"" (P. balsamifera x deltoides)" Salicaceae Nat.? Hort. D NAM X 2010 N?
In WRIMS, one can search on distributions: http://www.marinespecies.org/introduced/
There are 4 (sub)regions that might be applicable for a Belgian checklist:
I verified and Belgium (Nation)
includes the 3 other regions completely, so a search on Belgium (nation)
is the one we should use.
Once we have #26:
presenceBE
Field
Using this file:
datasetName
with macroinvertebrates
data-with-common-terms.tsv
data-with-common-terms-refine.json
First scan & basic cleaning of the initial files
Dear Tim
I would like you to help to define the main columns tittles (Darwin Core Terms) we would need for the checklist.
I try to do till the Assignment # 30. Peter, you can check the spread sheet for comments
For the following, I think, I'm not technically qualified for them; Right?
If I have to do them, I need to be at INBO, with one of you, Stijn, Peter or Dimitri
This is possible by next Monday. Or Stijn has already completed them?
Anyway, I would like to know how these have been done
Thanks
Using this file:
datasetName
with plants
data-with-common-terms.tsv
data-with-common-terms-refine.json
Once we have #26:
introductionPathways
This table gives an overview of the current GBIF name matching status:
match | fishes | harmonia | macroinvertebrates | plants | rinse | rinse-annex-b | wrims | sum |
---|---|---|---|---|---|---|---|---|
exact match with gbifapi_scientificName | 1 | 1294 | 36 | 1 | 1332 | |||
exact match with gbifapi_canonicalName | 22 | 130 | 66 | 3 | 6093 | 21 | 175 | 6510 |
EXACT 100% | 4 | 849 | 211 | 6 | 1070 | |||
EXACT < 100% | 72 | 20 | 4 | 96 | ||||
FUZZY | 1 | 2 | 15 | 73 | 4 | 95 | ||
HIGHERRANK | 3 | 4 | 177 | 224 | 7 | 25 | 440 | |
NO OR DOUBLE MATCH | 3 | 4 | 2 | 9 | ||||
sum | 23 | 140 | 73 | 2410 | 6661 | 45 | 200 | 9552 |
Some observations:
EXACT < 100%
matches will have to be examined case by case.FUZZY
matches are probably typos, and are addressed in #41HIGHERRANK
matches are mostly plants and a lot of hybrids. Chances are we can only correct half of those to match in GBIF.@timadriaens, how do you want to prioritize going forward?
In order to do the matching of the GRISS dataset with GBIF, some transformation is needed. However, @timadriaens, is there any taxonID in the current set to use as an identifier?
We want to add presence data to the WRIMS dataset:
PlaceName
Belgium
and add a column based on this column, named presenceBE
Belgian part of the North Sea
and add a column based on this column, named presenceBPNS
Belgian Exclusive Economic Zone
and add a column based on this column, named presenceBEEZ
Belgian Coast
and add a column based on this column, named presenceBECoast
Once this is done:
presenceBE
, presenceBEEZ
, presenceBPNSand
presenceBECoast` to the correct position (see https://github.com/LifeWatchINBO/alien-species-checklist#process)data-with-common-terms.tsv
data-with-common-terms-refine.json
When I open the concatenated file in Refine, I still have the same problem: 1256 records rather than about 9000 and more.
Because of a Refine issue in automatically interpreting data from Excel, years are written as 2015.0
. We need to remove those decimals.
matchType | confidence | status | records |
---|---|---|---|
EXACT | 100 | ACCEPTED | 20021 |
EXACT | 100 | SYNONYM | 1390 |
EXACT | <100 | ACCEPTED | 1775 |
EXACT | <100 | DOUBTFUL | 34 |
EXACT | <100 | SYNONYM | 73 |
FUZZY | <100 | ACCEPTED | 129 |
FUZZY | <100 | SYNONYM | 21 |
HIGHERRANK | 100 | ACCEPTED | 398 |
HIGHERRANK | 100 | DOUBTFUL | 2 |
HIGHERRANK | 100 | SYNONYM | 44 |
HIGHERRANK | <100 | ACCEPTED | 281 |
HIGHERRANK | <100 | SYNONYM | 22 |
blank | 990 |
Question is asked by @DimEvil to VLIZ. Ideally there is a public bulk dataset.
In what format should the GRISS information be provided when returning? is CSV appropriate, or should it be excel?
We will add the following columns for species that occur in wn.be and the concatenated file (based on identical acceptedKey
:
Order of sources to use info from:
Using this file:
datasetName
with fishes
data-with-common-terms.tsv
data-with-common-terms-refine.json
Once we have #26:
origin
Using this file:
Belgian exclusive economic zone
; Belgian Part of the North sea
; Belgian coast
datasetName
with wrims
data-with-common-terms.tsv
data-with-common-terms-refine.json
What are the meaning of the columns "D/N" and "V/I"?
Hello Dear Tim.
Do we need to keep other countries than Belgium? (ie: GB, France, Netherlands)
@timadriaens, I also asked you this by email, but I'm recording it here so we won't forget. What file do we use for the RINSE dataset?
My preference would be 2, as that is publicly available and easier to reference. I just need to know that all the core info is there.
Once we have #26:
habitat
Is the Harmonia dataset publicly available in bulk? If not, how can we obtain this?
Once we have #26:
status
23 records:
exact match with gbifapi_canonicalName
which is ACCEPTED.Similar to the concatenated and waarnemingen lists, a matching with GBIF is needed to make comparison possible.
For the macroinvertebrate data, we are currently relying on a final proof paper emailed by Tim: https://github.com/LifeWatchINBO/alien-species-checklist/blob/master/source-datasets/macroinvertebrates/AI15-039_Boets_etal_almost%20ready%207%20Dec%202015_Tim.doc
When Boets et al. is published, we should use the actual pdf paper as the source, as that is easier to reference.
140 records:
exact match with gbifapi_canonicalName
+ ACCEPTED: 114 SPECIES and 5 GENUS. The genera got a lower confidence level, but are all correct.EXACT 100%
+ ACCEPTED. Those are nothotaxa and are also correctly matched.Using this file:
datasetName
with harmonia
data-with-common-terms.tsv
data-with-common-terms-refine.json
Using this file:
datasetName
with rinse-annex-b
data-with-common-terms.tsv
data-with-common-terms-refine.json
Using this file:
datasetName
with rinse
data-with-common-terms.tsv
data-with-common-terms-refine.json
(17 fuzzy matches)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.