Giter Club home page Giter Club logo

paperscited's People

Contributors

mkranj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

paperscited's Issues

If the same source is cited using "and" and "i" (English/Croatian), it gets recorded as two sources

Description

When citing works by multiple authors the "and" separator in different languages leads the program to conclude those are separate citations. This could occur e.g. in the English abstract of a croatian article.

Expected behavior

"A and B, 2000" and "A i B, 2000" should be logged as one citation. Preferably "A i B, 2000" since Croatian matches mean the article is probably in Croatian.

Observed behavior

Both "A and B, 2000" and "A i B, 2000" get recorded as separate citations.

Authors with "ç" in the name don't get recognised properly

Description

Names with the "ç" don't get recorded as parts of citations.

Expected behavior:

The text includes the following sentence: "Mendonça i suradnici (2009) utvrdili su..."
The Excel file should contain "Mendonça i sur. 2009"

Actual behavior:

Excel contains "sur. 2009"

Replace possesive form with proper noun in citation

Description

In the sentence Cohen's (1999) seminal paper..., an article by Cohen (1999) is cited and should be recorded as such. Currently, the citation appears as Cohen's 1999, instead of Cohen 1999.
This could also lead to duplicate citations being recorded if both the possessive and regular form occur.

Serial numbers get recognised as citations

Description

Certain phrases in the format of word, long number get recognised as citations, e.g. Washington, DC, 10694(000).

Expected behavior

The mentioned text doesn't get written in Excel.

Current behavior

The mentioned text returns a citation "DC 1069"

Suggested solutions

  • Filter citations starting with "DC" as excluded
  • Alter the part that catches years so that it must include 4 digits, no more, no less.

Lowercase words should not be recognised as first authors

Description

Some texts contain lots of single words followed by a string of numbers, which falsely get detected as authors.
Limiting first authors to only uppercase words would avoid this issue. This applies to solo authored works, pairs, and threes.

Expected behavior:

The text includes the following sentence: "Review of psychology (1234-5678-1234)..."
The Excel file should not contain anything from this sentence.

Actual behavior:

Excel contains "psychology 1234"

Potential issues

Misspeling author names will prevent the program from recognising them.

Recognise different grammatical cases of the same citation

Description

Croatian

Authors can be reffered to in different cases in Croatian (padeži). For example, Cohen, Cohenu, Cohenom. Possesive forms also alter the words - Cohenov. These all refer to the same source.

English

Cohen's (1999) paper is the same as the one referenced by Cohen (1999).

Suggested feature

Recognise different word endings in citations. If two citations differ only in the last letter, keep only one. Same for citations ending in 's.

Example

Superov (1999) članak potaknuo je daljnju diskusiju. (...) Na kraju, nitko nije imao bolju ideju od Supera (1999).
With recognising case, the citation in the Excel file would be a single Super 1999.

Potential issues

  • Which ones to keep? Have a hierarchy (nominativ, genitiv...) and keep the highest one.
  • Could this potentially confuse authors whose surnames actually differs in the last letter?
    • Make this an optional feature.

Grammatical cases implemented:

  • recognise Croatian cases
  • recognise Croatian possesives
  • recognise English possesives
    • probably the easiest

Automatically separate citations from same authors in multiple years

Description

Right now the program detects multiple sources when they are listed directly after one another. However, they are recorded in one cell, and possibly duplicate other citations when they reference just one of the years listed.
A function that separates such citations into individual ones would help promote readability and reduce duplicated sources. The sources should be listed as such in the reference list.

Expected behavior:

The text includes the following sentence: "AuthorA (2000; 2002; 2003) thoroughly explored..."
The Excel file should list "AuthorA 2000", "AuthorA 2002" and "AuthorA 2003" as separate citations.

Actual behavior:

Excel contains a citation "AuthorA 2000 2002 2003"

Support all Unicode characters

Current status

Individual characters that are "allowed" are listed in the program.
Omission of characters such as æ, ø, å...

Suggestion

Import a list of Unicode characters (that doesn't include numbers, special symbols...) and use that for maximum coverage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.