Giter Club home page Giter Club logo

Comments (12)

rdmpage avatar rdmpage commented on May 29, 2024

I have several projects that map names to literature, using nomenclators as a starting point, such as ION (used by BioNames), IPNI (making well underway), and Index Fungorum (just started). Happy to contribute literature links. I'm also doing work on clustering names within nomenclators to get around massive duplications (e.g., ION and IPNI).

from general.

rdmpage avatar rdmpage commented on May 29, 2024

Some additional name sources include:

World Spider Catalog LSIDs with literature as strings, some overlap with ION but better quality literature citations for older names
Species File projects LSIDs (I guess @mjy can talk about thse projects)
Nomenclator Zoologicus I have a version of the uBio file with links to ION, BHL, etc.

Some databases are explicitly about nomenclature, many are also about taxonomy (or confound the two).

from general.

deepreef avatar deepreef commented on May 29, 2024

This is EXCELLENT! And EXACTLY what is needed, in my opinion! @rdmpage -- have you and @dimus compared systems for clustering names? Massive duplications also exist among literature citations (including the dataset you gave me a few years ago). Is there any work within this group to do similar clustering of literature citations? I've been chipping away at this, starting with Journal names. @gsautter has worked on this through RefBank, and there is a parsing service available that seems to work pretty well.

As I've said many times before, reconciling names is relatively easy compared to reconciling literature citations (I would estimate that 80% of the effort to reconcile and import a batch of names into GNUB is spent on reconciling and importing the associated literature) -- probably why there are so many efforts to build lists of names, and so few that focus on linking those names to literature.

In any case, I hope CoLPlus remains committed to incorporating a "names-linked-to-literature" approach, rather than just another "names and associated concepts" approach. It requires a bit of extra work up-front, but the rewards are VASTLY greater.

from general.

dremsen avatar dremsen commented on May 29, 2024

Don't forget Index Sherborne's Animalium, Rich. I think you would have the most up to date and parsed copy. If there is more parsing to do we might consider seeing if dima is up for it but most should be in good shape. For subsequent combinations, there is a reference to the original combination (I think it's just a reference to the original genus) so there are homotypic synonyms accessible there as well. Some taxonomic database have parsed and separate nomenclature databases inherent to them. I can recall Thompson's diptera, there is an algal nomenclator. Index Fungorum, of course, etc.

from general.

rdmpage avatar rdmpage commented on May 29, 2024

@dremsen The whole Sherborn - ION - BHL mapping doi:10.3897/zookeys.550.9673 should be opened up as well. AFAIK ION have it but haven't made it available to anyone not visiting their web site (e.g., I gather that BHL don't have it ). I've made a start on trying to resurrect it via screen scraping, see https://github.com/rdmpage/ion-sherborn

I've also grabbed a copy of Index Animalium and put it in a repository https://github.com/rdmpage/index-animalium

from general.

deepreef avatar deepreef commented on May 29, 2024

@dremsen and @rdmpage : Index Animalium represents the PERFECT example of what I'm talking about. There are 7,723 literature citations in the combined bibliography, and 429,829 TNUs (approximately 350K Protonyms). It's an absolute GOLD MINE of information (massive numbers of Protonyms, homotypic synonyms/combinations/spelling variants, etc.), ALL of which are anchored to literature. The records (both bibliography and TNUs) are almost completely parsed (just another week or so needed to finish parsing the microcitations connected to each TNU record).

So... what's the hold-up? The literature! The bibliography is highly abbreviated (e.g., no titles and highly abbreviated -- and inconsistent -- Journal names). Even though it's almost fully parsed, most of the records have scant field values. Suzanne Pilsk (lead author of the paper cited by Rod) had made it her mission to tie Sherborn bibliography records to proper citations, and as of the last cut I got from her, 4,477 of them had been fleshed out. The remaining 3,246 represent (almost by definition) the most difficult to pin down. I had been working on cleaning up just the Journals, with the hope of identifying full citations (e.g., from RefBank) via Journal+Volume+Startpage, but there are no page numbers (bummer), and there are still over 2,800 unique and highly abbreviated text strings from which Journals need to be derived.

Once we do clean up & flesh out the literature (or decide that we're OK with dirty microcitations as our anchorpoints to the literature), the next hurdle will be to cross-link the microcitations in the TNU records (again, incomplete & inconsistent) to the corresponding bibliography record. That should be relatively striaghtforward -- maybe a week or two to complete. After that, the names are an absolute breeze (probably less than a day's work).

If we're OK with incomplete bibliographic citations (which doesn't connect us to BHL pages, but eventually we can flesh them out later), I'm willing to dust that project off and bump it to the top of my "CFT" (Copious Free Time) priority list, if this group thinks it's a worthy investment in time (actually, after looking at the DB, I'm getting more excited about it myself).

Bottom line: Sherborn is not a "names" problem (we already have the names as Name-Stings, plus authors, combination authors, etc.) It's a literature problem -- which brings me back to my previous post on this.

from general.

mdoering avatar mdoering commented on May 29, 2024

@deepreef I have created a new issue #23 to discuss how to deal with literature. Lets keep this issue for listing nomenclatural sources

Making Index Animalium open and accessible would be a very good thing.
is the version you have, Rich, the best there is to continue with? Or does the versions from Rod add anything not present in yours?

from general.

deepreef avatar deepreef commented on May 29, 2024

Understood, and agreed! I just wanted to use @dremsen suggestion of Sherborn to illustrate the point made earlier. Also, we already have zillions of sources of names. There is no shortage of those. What we need for CoL+ to actually get beynd what we already have is sources of names linke dto literature, and Sherborn Index Animalium is a big one! :-)

I'm happy to share what I have, but perhaps give me a week to clean up a few loose ends. I don't know the state of others (mine is a more highly parsed version of what @dremsen provided to my years ago, which I believe was originally parsed by Pat Leary).

from general.

rdmpage avatar rdmpage commented on May 29, 2024

Oh and let's not forget Wikispecies which is mixed, but has lots of literature. Unfortunately it's in a somewhat idiosyncratic format. I'll be at Wikicite 2017 next week working on a tool to parse Wikispecies citations. Apart from the literature Wikispecies is a potential source of links to Wikidata and author identifiers, so has an important role to lay for those of us obsessed with linking stuff together.

from general.

mdoering avatar mdoering commented on May 29, 2024

Also see official code lists #26

from general.

mdoering avatar mdoering commented on May 29, 2024

Came across yet another Fungi nomenclator: http://www.cybertruffle.org.uk/cybernome/eng/index.htm

from general.

yroskov avatar yroskov commented on May 29, 2024

from general.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.