Giter Club home page Giter Club logo

altmetrics's People

Contributors

fradeve avatar ja573 avatar rowan08 avatar simoncrowe avatar stuartjennings-up avatar terrasea2 avatar yoannspace avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

altmetrics's Issues

Hypothes.is API not fetching annotations by DOI

The hypothesis plugin is failing to retrieve annotations when querying the Hypothes.is API, using the DOI of a book. A similar problem was encountered when trying to retrieve Hypothes.is annotations for books from the Crossref Event Data API.

Following the Hypothes.is guidelines (https://web.hypothes.is/help/how-hypothesis-interacts-with-document-metadata/), we have embedded the DOI of our books in the EPUB reader page, using the Dublin Core meta tag, but annotations from these pages do not show up when querying the Hypothes.is API by DOI, even though they can be fetched using the URL of the page, e.g. annotations for using the wildcard query https://hypothes.is/api/search?wildcard_uri=https://www.ubiquitypress.com/site/books/10.5334/baj/read/?loc=* are not found when querying based purely on DOI: https://hypothes.is/api/search?uri=doi:10.5334/baj

We have contacted the Hypothes.is support team, and they are looking into this issue.

Until it is resolved. we will only be able to fetch book annotations by URL, rather than DOI.

CrossRef Event Data API plugin

  • As the service administrator, I want to have a plugin to gather data from the CrossRef Event Data API. If enabled, the plugin will be used to fetch Twitter, Hypothes.is and Wikipedia mentions by DOI.

  • When gathering results for a specific DOI, the plugin will be automatically available in the list of enabled plugins in the service.

Facebook plugin

  • As the service administrator, I want to have a plugin to gather data from Facebook, searching for URLs and DOIs.

  • When gathering results for a specific DOI or URL, the plugin will be automatically available in the list of enabled plugins in the service.

Wordpress.com support

Add support for measuring shares of DOIs in Wordpress.com using Crossref Event Data.

Bug Fix: Wikipedia client crashes when fetching some pages

When the wikipedia Python package tries to fetch a page, there is logic that prioritizes suggestions over actual results, which causes the client to crash if the suggestion is wrong. So we just need to update the client calls to work around this.

On a related note, the wikipedia Python package we use (https://pypi.org/project/wikipedia/) is fairly outdated - it hasn't been updated since 2014. I have not yet been able to find a suitable replacement, but it may be worth looking into.

Twitter URL plugin

  • As the service administrator, I want to have a plugin to gather data from Twitter, searching for URLs related to a DOI.

  • When gathering results for a specific DOI, the plugin will be automatically available in the list of enabled plugins in the service.

Login with credentials - frontend

  • As a partner, I want to be able to log in with the provided username and password, which has previously been created by the administrator of the service.

  • Before the login, an introduction page is shown explaining the rationale of the project.

  • After the login, I am redirected to the list of DOIs I have uploaded.

Manage deleted events from Crossref Event Data

Crossref Event Data events do occasionally get marked as deleted (e.g. if a tweet is deleted from Twitter). When this happens, we will need to mark these events as deleted.

  • Periodically query the Crossref Event Data API for deleted events.
  • Mark any matching events on the Altmetrics system as deleted

URL and DOI upload

  • As a user, I want to be able to upload a CSV containing DOIs and URLs, after logging in. The CSV will have URLs in the first column, and the related DOIs in the second column.

  • Once uploaded, the CSV will be registered on the S3 account of the service, and the DOIs and URLs saved to the database (by a Celery task).

  • At the database level, the URLs will have a foreign key referencing the related DOI.

  • If the loaded CSV contains URLs or DOIs already registered in the database, they will be skipped.

  • If the CSV contains DOIs already registered in the database, but different URL, the new URLs will be added to the database, referencing the related (existing) DOI.

  • Right after the upload, as a logged in user, I will be redirected to the page listing the uploaded CSVs.

  • Before uploading the CSV to S3, the file must be uniquely named.

REST API

  • As a partner, I want to be able to access the data in the service using an API.

  • The endpoints must be compliant to the project's tech specifications (HIRMEOS WP6).

Add approval of user accounts

After registering for an account, users will need to be approved before they can use this service.

This should include:

  • A more detailed registration form to identify the user/user affiliation
  • Letting the user know that they will need to wait for their account to be approved
  • Emailing the project admin to let them know that a new user has registered
  • The ability to activate/deactivate user accounts on the Admin interface
  • Emailing the user when their account has been approved

Twitter client does not pick up URLs that are tweeted

Received a message querying a book tweet that was not saved by the Altmetrics service:
• "We found a tweet about one of our books but the API doesn’t show it: https://twitter.com/OpenEditionNews/status/1140910422431797250?s=20"

After some investigation, this tweet was not picked up because it references the book in question by its URL (books.openedition.org/oep/8999), yet we only search for books based on DOI. After adding the URL to the Twitter search, it was still unable to find the tweet. The following combinations were tested:

DOI and all URLs:
keywords = ['"https://books.openedition.org/oep/8999"', '"https://books.openedition.org/oep/pdf/8999"', '"https://books.openedition.org/oep/epub/8999"', '"10.4000/books.oep.8999"']
Only the relevant URL
Keywords = ['"https://books.openedition.org/oep/8999"']

Only the relevant URL without 'https://'
Keywords = ['"books.openedition.org/oep/8999"']

None of these searches with the Twitter client returned anything.
Ideally, we should be able to search for tweets about a book that is mentioned by its URL, not just its DOI.

Wikipedia URL plugin

  • As the service administrator, I want to have a plugin to gather data from Wikipedia, searching for URLs related to a DOI.

  • When gathering results for a specific DOI, the plugin will be automatically available in the list of enabled plugins in the service.

Docker container

  • As a service administrator, I want to be able to run the service as a Docker container.

  • A Dockerfile must be added to the repository.

Check RawEvents for potential integrity errors before create new entries

Background:
Because the Crossref Event Data API filters events by date; new events will show up twice if they happen before 12:00 (when metrics are pulled). E.g. if an event occurs at 09:00 on the 16th of Feb, and the last scrape was on the 15th of Feb, then this new event will be recorded. The next day, when the scrape happens again, looking for events that happened since the 16th of Feb, the same event will be returned again, which causes an integrity error when trying to save this in the database.

Solution:
The Crossref Event Data plugin needs to check for existing RawEvent entries before trying to create them - since this is done with a bulk-create, and using 'from-collected-date' is not sufficient for this. This will also allow for fresh re-scraping in future, without causing problems, if we ever want to do this.

Hypothes.is plugin: Query by DOI and URL

The Hypothes.is plugin currently tries to fetch annotations by DOI only. The linking of DOIs to Hypothes.is annotations does not yet seem to work, reliably.

Need to:

  • Query all URLs registered URLs for annotations
  • Include a wildcard search for annotations on epub readers

Wikipedia Plugin: Make sure DOI is still references in up-to-date version of the page

Some books that are referenced on a Wikipedia page are removed as references in later versions of the page. When this happens, the reference is no longer applicable for the book, so the following should be done:

  • Periodically go through Wikipedia events. For each, check the reference section of the Wikipedia page to make sure that the DOI of that event is still referenced on the page.
  • If the reference is no longer valid, mark it as a deleted event.
  • Periodically check deleted Wikipedia events to see if the book reference has been restored.

Generic plugin

  • As a service administrator / developer, I want a generic plugin to be available in the system

  • The plugin is a class, and has all the methods available to the real plugins, but each method raises a NotImplemented when called; it serves the purpose of showing how a plugin should be implemented, and details the usage of the methods and class with the relevant docstrings.

Login with credentials - API

  • As a partners, I have the ability to access the API using the API token available in my personal settings, once I am logged in.

  • Using the API without authentication will return a 401 error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.