Giter Club home page Giter Club logo

alien-species-checklist's People

Contributors

dimevil avatar oscardore avatar peterdesmet avatar stijnvanhoey avatar timadriaens avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

qgroom mohan7098

alien-species-checklist's Issues

GRIIS mapping

We will add the following columns for species that occur in GRIIS and the concatenated file (based on identical acceptedKey):

  • PresenceBE: one of our 7 columns
  • Status: standardized
  • Source: dataset name from which this info was derived
  • GBIF acceptedKey
  • Our taxonID ?

Order of sources to use info from:

  • Plants
  • Wrims
  • Fishes
  • Macroinvertebrates (all considered established)
  • Harmonia (beware, not all present in BE)
  • Rinse + annex: no status info

Matches

  • Records in both lists considered validated
  • Records in GRIIS only: are probably errors in one of both datasets
  • Records in concatenated only: we will submit this as an additional file: potentially missing from GRIIS.

Populate scientificName

Create a scientificName for all records in concatenated where there currently isn't one. Since none of those have a subspecies, this can be done quite easily:

  1. Filter on blank scientificName
  2. Edit cells > Transform
  3. value: cells["genus"].value + " " + cells["specificEpithet"].value

Make sure to escape quotes before importing the file, so we don't get this in taxonID:

plants "Populus x jackii Sargent ""Gileadensis"" (P. balsamifera x deltoides)" Salicaceae Nat.? Hort. D NAM X 2010 N?

What filter to use for WRIMS?

In WRIMS, one can search on distributions: http://www.marinespecies.org/introduced/

There are 4 (sub)regions that might be applicable for a Belgian checklist:

  • Belgian coast (Coast): 13 records
  • Belgian Exclusive Economic Zone (EEZ): 55 records
  • Belgian part of the North Sea (Marine Region): 55 records
  • Belgium (Nation): 114 records

I verified and Belgium (Nation) includes the 3 other regions completely, so a search on Belgium (nation) is the one we should use.

Create list of all "presence"

Once we have #26:

  1. Do a facet on presenceBE
  2. Copy all the values and the number of times they occur
  3. Paste this information in the relevant sheet in this spreadsheet
  4. In the spreadsheet, indicate the field name in the column Field
  5. Repeat this for the 6 other presence columns.

Standardize to common terms: macroinvertebrates

Using this file:

  • Rename the columns we want to keep to the common term names
  • Remove the columns we are not planning to use
  • Add the other common terms as columns
  • Populate datasetName with macroinvertebrates
  • Order the columns alphabetically
  • Export the final file to the appropriate directory as data-with-common-terms.tsv
  • Store the refine steps as data-with-common-terms-refine.json

quick clean google refine

First scan & basic cleaning of the initial files

  • Harmonia
  • Rinse
  • WRIMS
  • Plants
  • Macroinvertebrates
  • Neobiota

Belgian Alien Species Checklist

Dear Tim
I would like you to help to define the main columns tittles (Darwin Core Terms) we would need for the checklist.

Assignments #31 to #40...

I try to do till the Assignment # 30. Peter, you can check the spread sheet for comments
For the following, I think, I'm not technically qualified for them; Right?
If I have to do them, I need to be at INBO, with one of you, Stijn, Peter or Dimitri
This is possible by next Monday. Or Stijn has already completed them?
Anyway, I would like to know how these have been done

Thanks

Standardize to common terms: plants

Using this file:

  • Rename the columns we want to keep to the common term names
  • Remove the columns we are not planning to use
  • Add the other common terms as columns
  • Populate datasetName with plants
  • Order the columns alphabetically
  • Export the final file to the appropriate directory as data-with-common-terms.tsv
  • Store the refine steps as data-with-common-terms-refine.json

GBIF match results for concatenated file

This table gives an overview of the current GBIF name matching status:

match fishes harmonia macroinvertebrates plants rinse rinse-annex-b wrims sum
exact match with gbifapi_scientificName 1 1294 36 1 1332
exact match with gbifapi_canonicalName 22 130 66 3 6093 21 175 6510
EXACT 100% 4 849 211 6 1070
EXACT < 100% 72 20 4 96
FUZZY 1 2 15 73 4 95
HIGHERRANK 3 4 177 224 7 25 440
NO OR DOUBLE MATCH 3 4 2 9
sum 23 140 73 2410 6661 45 200 9552

Some observations:

  • The first 3 categories can be considered OK, which is 93,3% of the dataset! The only caveat is that we have to trust the accepted names GBIF gives for synonyms (745 records + 15 doubtful), which we don't always do: e.g. Tripolium pannonicum is not a synonym of A. salignus
  • The 96 EXACT < 100% matches will have to be examined case by case.
  • The 95 FUZZY matches are probably typos, and are addressed in #41
  • The 440 HIGHERRANK matches are mostly plants and a lot of hybrids. Chances are we can only correct half of those to match in GBIF.
  • And then there are 9 records with no match: those are some viruses and names in Harmonia that appear twice in GBIF (a bug that will be fixed in April).

@timadriaens, how do you want to prioritize going forward?

Add presence to WRIMS dataset

We want to add presence data to the WRIMS dataset:

  1. Open this file in Open Refine: https://github.com/LifeWatchINBO/alien-species-checklist/blob/master/source-datasets/wrims/data.tsv
  2. Do a facet on PlaceName
  3. Filter on Belgium and add a column based on this column, named presenceBE
  4. Filter on Belgian part of the North Sea and add a column based on this column, named presenceBPNS
  5. Filter on Belgian Exclusive Economic Zone and add a column based on this column, named presenceBEEZ
  6. Filter on Belgian Coast and add a column based on this column, named presenceBECoast

Once this is done:

  1. Copy and apply https://github.com/LifeWatchINBO/alien-species-checklist/blob/master/source-datasets/wrims/data-with-common-terms-refine.json
  2. Move the columns presenceBE, presenceBEEZ, presenceBPNSandpresenceBECoast` to the correct position (see https://github.com/LifeWatchINBO/alien-species-checklist#process)
  3. Export the data as data-with-common-terms.tsv
  4. Export the Refine as data-with-common-terms-refine.json

concatenated tsv

When I open the concatenated file in Refine, I still have the same problem: 1256 records rather than about 9000 and more.

Remove decimals from years

Because of a Refine issue in automatically interpreting data from Excel, years are written as 2015.0. We need to remove those decimals.

  • Plants
  • WRIMS

First summary of GBIF match of waarnemingen.be species

matchType confidence status records
EXACT 100 ACCEPTED 20021
EXACT 100 SYNONYM 1390
EXACT <100 ACCEPTED 1775
EXACT <100 DOUBTFUL 34
EXACT <100 SYNONYM 73
FUZZY <100 ACCEPTED 129
FUZZY <100 SYNONYM 21
HIGHERRANK 100 ACCEPTED 398
HIGHERRANK 100 DOUBTFUL 2
HIGHERRANK 100 SYNONYM 44
HIGHERRANK <100 ACCEPTED 281
HIGHERRANK <100 SYNONYM 22
blank 990

Return file format for GRIIS?

In what format should the GRISS information be provided when returning? is CSV appropriate, or should it be excel?

Waarnemingen.be mapping

We will add the following columns for species that occur in wn.be and the concatenated file (based on identical acceptedKey:

  • Status: standardized
  • GBIF acceptedKey
  • Our taxonID ?

Order of sources to use info from:

  • Plants
  • Wrims
  • Fishes
  • Macroinvertebrates (all considered established)
  • Harmonia (beware, not all present in BE)
  • Rinse + annex: no status info

Matches

  • Records in both lists ok: maybe check discrepancy in status wn.be and ours
  • Records in wn.be only: mostly natives, check those flagged by wn.be as exotic
  • Records in concatenated only: ok

Standardize to common terms: fishes

Using this file:

  • Rename the columns we want to keep to the common term names
  • Remove the columns we are not planning to use
  • Add the other common terms as columns
  • Populate datasetName with fishes
  • Order the columns alphabetically
  • Export the final file to the appropriate directory as data-with-common-terms.tsv
  • Store the refine steps as data-with-common-terms-refine.json

Standardize to common terms: wrims

Using this file:

  • Rename the columns we want to keep to the common term names
  • Remove the columns we are not planning to use
  • Add the columns Belgian exclusive economic zone; Belgian Part of the North sea; Belgian coast
  • Add the other common terms as columns
  • Populate datasetName with wrims
  • Order the columns alphabetically
  • Export the final file to the appropriate directory as data-with-common-terms.tsv
  • Store the refine steps as data-with-common-terms-refine.json

Which file to use for RINSE

@timadriaens, I also asked you this by email, but I'm recording it here so we won't forget. What file do we use for the RINSE dataset?

  1. File emailed by you: https://github.com/LifeWatchINBO/alien-species-checklist/blob/master/source-datasets/rinse/AnnexB%20RINSE%20Registry%20of%20NNS.xlsx
  2. File in supplementary material of Zieritz et al.: https://github.com/LifeWatchINBO/alien-species-checklist/blob/master/source-datasets/rinse/neobiota-023-065-s001.xlsx

My preference would be 2, as that is publicly available and easier to reference. I just need to know that all the core info is there.

Get Harmonia data

Is the Harmonia dataset publicly available in bulk? If not, how can we obtain this?

GBIF match with Harmonia

140 records:

Considered OK

  • 119 records are exact match with gbifapi_canonicalName + ACCEPTED: 114 SPECIES and 5 GENUS. The genera got a lower confidence level, but are all correct.
  • 2 records have EXACT 100% + ACCEPTED. Those are nothotaxa and are also correctly matched.

Synonyms

Other issues

Standardize to common terms: harmonia

Using this file:

  • Rename the columns we want to keep to the common term names
  • Remove the columns we are not planning to use
  • Add the other common terms as columns
  • Populate datasetName with harmonia
  • Order the columns alphabetically
  • Export the final file to the appropriate directory as data-with-common-terms.tsv
  • Store the refine steps as data-with-common-terms-refine.json

Standardize to common terms: rinse-annex-b

Using this file:

  • Rename the columns we want to keep to the common term names
  • Remove the columns we are not planning to use
  • Add the other common terms as columns
  • Populate datasetName with rinse-annex-b
  • Order the columns alphabetically
  • Export the final file to the appropriate directory as data-with-common-terms.tsv
  • Store the refine steps as data-with-common-terms-refine.json

Standardize to common terms: rinse

Using this file:

  • Rename the columns we want to keep to the common term names
  • Remove the columns we are not planning to use
  • Add the other common terms as columns
  • Populate datasetName with rinse
  • Order the columns alphabetically
  • Export the final file to the appropriate directory as data-with-common-terms.tsv
  • Store the refine steps as data-with-common-terms-refine.json

Correct typos in data (i.e. fuzzy matches)

  • Select in concatenated.tsv file for the fuzzy matches
  • Compare the original scientificName with the scientificName_gbif
  • double check the names (eol.org , google...)
  • Correct typos!

(17 fuzzy matches)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.