Giter Club home page Giter Club logo

domradar's People

Contributors

0x48piraj avatar arihantawasthi avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

dashbaord202401

domradar's Issues

Expand the species dataset

Uniprot

Uniprot has a list of the controlled vocabulary for common and scientific names of species listed over speclist.txt.

An example entry:

ACAER E  111511: N=Acanthodactylus erythrurus
                 C=Spanish fringe-toed lizard
                 S=Lacerta erythrura

In the example the N is the scientific binomial name (Canthodactylus erythrurus), C is the common name (Spanish fringe-toed lizard).

ACAER is the id code, 111511 is the code for the taxonomic node, E means it is a eukaryote, and S is a synonym of either name.

The list contains 25336 scientific names currently, which falls short of the ~2.5m species in GBIF, or the 10s, or 100s of millions that are estimated to exist. The Uniprot list does, however, represent every organism included in Uniprot, which is widely regarded as being among the most comprehensive protein databases that exist today.

GBIF

The Global Biodiversity Information Facility (GBIF) has an APIhttp://www.gbif.org/developer/species where you extract data for species names. Their database includes common names (aka vernacular names) when they have that, and often common names from different languages. Using this API, you can extract data and construct a name file for a particular taxa that you are interested in.

As an example, this is the list of the first 20 vernacular names found for Passer domesticus (House sparrow):

{
   "endOfRecords" : false,
   "results" : [
      {
         "language" : "",
         "sourceTaxonKey" : 100220560,
         "source" : "Global Invasive Species Database",
         "vernacularName" : "English sparrow"
      },
      {
         "language" : "",
         "sourceTaxonKey" : 100220560,
         "vernacularName" : "Europese huismuis",
         "source" : "Global Invasive Species Database"
      },
      {
         "vernacularName" : "Gorrion domestico",
         "source" : "Global Invasive Species Database",
         "language" : "",
         "sourceTaxonKey" : 100220560
      },
      {
         "source" : "Integrated Taxonomic Information System (ITIS)",
         "vernacularName" : "Gorrión casero",
         "language" : "spa",
         "sourceTaxonKey" : 102101640
      },
      {
         "vernacularName" : "Gorrión Común",
         "sourceTaxonKey" : 123213203,
         "language" : "spa"
      },
      {
         "language" : "spa",
         "sourceTaxonKey" : 101186844,
         "source" : "The European Nature Information System (EUNIS)",
         "vernacularName" : "Gorrión Común"
      },
      {
         "language" : "spa",
         "sourceTaxonKey" : 114130266,
         "source" : "Colaboraciones Americanas Sobre Aves",
         "vernacularName" : "Gorrión casero"
      },
      {
         "vernacularName" : "Gorrión casero",
         "source" : "Yanayacu Natural History Research Group",
         "sourceTaxonKey" : 119245200,
         "language" : "spa"
      },
      {
         "vernacularName" : "Gorrión casero",
         "source" : "Catalogue of Life",
         "sourceTaxonKey" : 119950016,
         "language" : "spa"
      },
      {
         "language" : "swe",
         "sourceTaxonKey" : 101186844,
         "vernacularName" : "Gråsparv",
         "source" : "The European Nature Information System (EUNIS)"
      },
      {
         "vernacularName" : "Gråspurv",
         "language" : "dan",
         "sourceTaxonKey" : 123213203
      },
      {
         "vernacularName" : "Gråspurv",
         "language" : "nob",
         "sourceTaxonKey" : 123213203
      },
      {
         "language" : "deu",
         "sourceTaxonKey" : 116795880,
         "vernacularName" : "Haussperling",
         "source" : "Taxon list of animals with German names (worldwide) compiled at the SMNS",
         "country" : "DE"
      },
      {
         "language" : "deu",
         "sourceTaxonKey" : 100483595,
         "source" : "Belgian Species List",
         "country" : "BE",
         "vernacularName" : "Haussperling"
      },
      {
         "language" : "deu",
         "sourceTaxonKey" : 123213203,
         "vernacularName" : "Haussperling"
      },
      {
         "sourceTaxonKey" : 101186844,
         "language" : "deu",
         "source" : "The European Nature Information System (EUNIS)",
         "vernacularName" : "Haussperling"
      },
      {
         "source" : "The Clements Checklist",
         "vernacularName" : "House Sparrow",
         "language" : "eng",
         "sourceTaxonKey" : 113987294
      },
      {
         "vernacularName" : "House Sparrow",
         "source" : "Taxonomy in Flux Checklist",
         "language" : "eng",
         "sourceTaxonKey" : 100159046
      },
      {
         "source" : "Colaboraciones Americanas Sobre Aves",
         "vernacularName" : "House Sparrow",
         "language" : "eng",
         "sourceTaxonKey" : 114130266
      },
      {
         "sourceTaxonKey" : 102101640,
         "language" : "eng",
         "vernacularName" : "House Sparrow",
         "source" : "Integrated Taxonomic Information System (ITIS)"
      }
   ],
   "limit" : 20,
   "offset" : 0
}

Using this type of search: api.gbif.org/v1/species?name=Passer%20domesticus, you can look for all info for a particular species, starting from either a scientific name or a common name (in example, Passer domesticus).

GBIF includes information in 1,643,948 species (and counting), but I don't know for what proportion they have common names (or where there are common names).

Marine

For marine species, the World Register of Marine Species is probably the best place to find this information.

The Ocean Biogeographic Information System also contains a tremendous amount of marine species.

Generic

The website of Observado, an initiative to collect species observations worldwide, has global species lists in csv format that are as complete as possible. The plant list currently has 381.473 records. You can download local species names in more languages you might have heard from, from English to Russian and from Frysk to Dzongkha.

Note that these lists are meant for observations in the field, and hence also contain multispecies, hybrids and synonyms. But these can be filtered out easily.

Execution/Debug/Crash [info]

Description
Messages for smooth command-line interface.

  • How many domains have been so far processed?
  • Which dataset is currently running?
  • ;&(){}!¡?¿=.,_$@^*¨%\"<> characters not permitted by the Domain Naming Convention for Internet User Applications. You can read the formal document from the Internet Engineering Task Force - rfc819 - IETF.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.