Giter Club home page Giter Club logo

Comments (10)

kermitt2 avatar kermitt2 commented on May 30, 2024

I am doing a new iteration on this.

The "super" matching service would work as this:

GET host:port/service/lookup?doi=DOI&atitle=ARTICLE_TITLE&firstAuthor=FIRST_AUTHOR_SURNAME&jtitle=JOURNAL_TITLE&volume=VOLUME&firstPage=FIRST_PAGE&biblio=BIBLIO_STRING[?postValidate=true][?parseReference=true]

The logic would be the following one:

  1. try to match with DOI if present, if successful try the post validation (if author at least is present), if validated DOI lookup fails go to 2 otherwise 5

  2. try to match with author and article title metadata if present, if successful try the post validation, if validated lookup fails go to 3, otherwise 5

  3. try to match with journal title, volume and firs page if present, if successful try the post validation (at least with author is present), if validated lookup fails go to 4, otherwise 5

  4. try full string matching, if present, if successful try the post-validation if at least if author is present - if not present call the grobid citation parser to get author at least (article title if possible) and post-validate with that, if this validated match fails too go to 6, otherwise 5

  5. return 200 with the matched DOI and extended metadata

  6. return 404

Validation takes place if postValidate is true (default). GROBID parsing takes place if parseReference is true` (default).

Ideally it should work with PMID, PMCID or ISTEX_ID instead of DOI (any "strong" identifier).

from biblio-glutton.

lfoppiano avatar lfoppiano commented on May 30, 2024

The point number 4 is not clear, what to do if the postvalidation doesn't work. Why should be re-validated with the author/title extracted from the full string? shouldn't we try to extract them with grobid and try to match (2)?

Second, so are you sure you want to changepostValidate and parseReference to true by default?

from biblio-glutton.

kermitt2 avatar kermitt2 commented on May 30, 2024

about point 4) I think it is clear but I can rephrase with more details:

4. try full string matching, if a full string is present. 
4.1 if full string matching is successful 
             if at least the author is present -> try to post validate the matching result with this author and possible title
             if author/title not present -> call the grobid citation parser to get author at least (article title if possible) -> try to post-validate with that 
4.2 if the post-validation fails, or if no author/title is available after Grobid parsing, or if full string matching initially failed, too go to 6, otherwise 5

In 2) we use the provided metadata only, Grobid is not called. So it's a different setting, with higher priority as it does not involve costly Grobid parsing.

from biblio-glutton.

kermitt2 avatar kermitt2 commented on May 30, 2024

Second, so are you sure you want to changepostValidate and parseReference to true by default?

Yes because, without post-validation, the false positive due to full reference string matching will kill the accuracy. Parsing the reference is a way to allow this post-validation even when no metadata are provided, so it is something to exploit as much as possible.

from biblio-glutton.

lfoppiano avatar lfoppiano commented on May 30, 2024

at least with author is present

what does that means?

if the title is not present then it's ignored and if it's present is considered for the postValidation while if the author is not present the post validation will fail? is it correct?

from biblio-glutton.

kermitt2 avatar kermitt2 commented on May 30, 2024

at least with author present

So this is only for step 4)

if the title is not provided as metadata, but author is, we can post-validate just with author (my observation is that most of the cases it is enough, but it might require more tests) this is an acceptable trade-off (there is also good chance that the title is anyway not present in the raw reference bibliography). Of course if both title and author are provided, we use both.

If the post-validation with author only passes, we are done, success.

If the post-validation with author only fails, we are also done, 404. There is no additional grobid reference parsing in this case (this is the trade-off).

from biblio-glutton.

lfoppiano avatar lfoppiano commented on May 30, 2024

another question.... back to point 1. (but valid everytime)

if the author is not present and postvalidate = true its a 404, right?

If this is the case, this might be a problem for DOI lookup cause if postvalidation is not disable will return 404

from biblio-glutton.

kermitt2 avatar kermitt2 commented on May 30, 2024

mmm if author is not present, we don't post-validate in (1), we go basically then to (3) - and if we reach (4) we parse the raw reference string to try to get an author. There is no "author is not present and postvalidate = true" possible in the whole process. The availability of at least an author name (provided or extracted) if a condition for postvalidating.

from biblio-glutton.

kermitt2 avatar kermitt2 commented on May 30, 2024

So done with PR #18

from biblio-glutton.

kermitt2 avatar kermitt2 commented on May 30, 2024

Finally the mixed matching approach has been removed in version 0.2 because the full matching (which is more accurate) has been made much faster (almostas fast as the previous mixed matching), removing the interest of the mixed matching.

from biblio-glutton.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.