hirmeos / altmetrics Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 0.0 399 KB

Implementation of HIRMEOS WP6

License: MIT License

Python 87.67% HTML 9.74% Dockerfile 0.37% Mako 0.24% CSS 0.25% JavaScript 0.83% Shell 0.90%

altmetrics's People

Contributors

Stargazers

Watchers

altmetrics's Issues

Hypothes.is API not fetching annotations by DOI

The hypothesis plugin is failing to retrieve annotations when querying the Hypothes.is API, using the DOI of a book. A similar problem was encountered when trying to retrieve Hypothes.is annotations for books from the Crossref Event Data API.

Following the Hypothes.is guidelines (https://web.hypothes.is/help/how-hypothesis-interacts-with-document-metadata/), we have embedded the DOI of our books in the EPUB reader page, using the Dublin Core meta tag, but annotations from these pages do not show up when querying the Hypothes.is API by DOI, even though they can be fetched using the URL of the page, e.g. annotations for using the wildcard query https://hypothes.is/api/search?wildcard_uri=https://www.ubiquitypress.com/site/books/10.5334/baj/read/?loc=* are not found when querying based purely on DOI: https://hypothes.is/api/search?uri=doi:10.5334/baj

We have contacted the Hypothes.is support team, and they are looking into this issue.

Until it is resolved. we will only be able to fetch book annotations by URL, rather than DOI.

CrossRef Event Data API plugin

As the service administrator, I want to have a plugin to gather data from the CrossRef Event Data API. If enabled, the plugin will be used to fetch Twitter, Hypothes.is and Wikipedia mentions by DOI.
When gathering results for a specific DOI, the plugin will be automatically available in the list of enabled plugins in the service.

Facebook plugin

As the service administrator, I want to have a plugin to gather data from Facebook, searching for URLs and DOIs.
When gathering results for a specific DOI or URL, the plugin will be automatically available in the list of enabled plugins in the service.

Wordpress.com support

Add support for measuring shares of DOIs in Wordpress.com using Crossref Event Data.

Bug Fix: Wikipedia client crashes when fetching some pages

When the wikipedia Python package tries to fetch a page, there is logic that prioritizes suggestions over actual results, which causes the client to crash if the suggestion is wrong. So we just need to update the client calls to work around this.

On a related note, the wikipedia Python package we use (https://pypi.org/project/wikipedia/) is fairly outdated - it hasn't been updated since 2014. I have not yet been able to find a suitable replacement, but it may be worth looking into.

Twitter URL plugin

As the service administrator, I want to have a plugin to gather data from Twitter, searching for URLs related to a DOI.
When gathering results for a specific DOI, the plugin will be automatically available in the list of enabled plugins in the service.

Login with credentials - frontend

As a partner, I want to be able to log in with the provided username and password, which has previously been created by the administrator of the service.
Before the login, an introduction page is shown explaining the rationale of the project.
After the login, I am redirected to the list of DOIs I have uploaded.

Manage deleted events from Crossref Event Data

Crossref Event Data events do occasionally get marked as deleted (e.g. if a tweet is deleted from Twitter). When this happens, we will need to mark these events as deleted.

Periodically query the Crossref Event Data API for deleted events.
Mark any matching events on the Altmetrics system as deleted

URL and DOI upload

As a user, I want to be able to upload a CSV containing DOIs and URLs, after logging in. The CSV will have URLs in the first column, and the related DOIs in the second column.
Once uploaded, the CSV will be registered on the S3 account of the service, and the DOIs and URLs saved to the database (by a Celery task).
At the database level, the URLs will have a foreign key referencing the related DOI.
If the loaded CSV contains URLs or DOIs already registered in the database, they will be skipped.
If the CSV contains DOIs already registered in the database, but different URL, the new URLs will be added to the database, referencing the related (existing) DOI.
Right after the upload, as a logged in user, I will be redirected to the page listing the uploaded CSVs.
Before uploading the CSV to S3, the file must be uniquely named.

REST API

As a partner, I want to be able to access the data in the service using an API.
The endpoints must be compliant to the project's tech specifications (HIRMEOS WP6).

Add approval of user accounts

After registering for an account, users will need to be approved before they can use this service.

This should include:

A more detailed registration form to identify the user/user affiliation
Letting the user know that they will need to wait for their account to be approved
Emailing the project admin to let them know that a new user has registered
The ability to activate/deactivate user accounts on the Admin interface
Emailing the user when their account has been approved

Twitter client does not pick up URLs that are tweeted

Received a message querying a book tweet that was not saved by the Altmetrics service:
• "We found a tweet about one of our books but the API doesn’t show it: https://twitter.com/OpenEditionNews/status/1140910422431797250?s=20"

After some investigation, this tweet was not picked up because it references the book in question by its URL (books.openedition.org/oep/8999), yet we only search for books based on DOI. After adding the URL to the Twitter search, it was still unable to find the tweet. The following combinations were tested:

DOI and all URLs:
keywords = ['"https://books.openedition.org/oep/8999"', '"https://books.openedition.org/oep/pdf/8999"', '"https://books.openedition.org/oep/epub/8999"', '"10.4000/books.oep.8999"']
Only the relevant URL
Keywords = ['"https://books.openedition.org/oep/8999"']

Only the relevant URL without 'https://'
Keywords = ['"books.openedition.org/oep/8999"']

None of these searches with the Twitter client returned anything.
Ideally, we should be able to search for tweets about a book that is mentioned by its URL, not just its DOI.

Wikipedia URL plugin

As the service administrator, I want to have a plugin to gather data from Wikipedia, searching for URLs related to a DOI.
When gathering results for a specific DOI, the plugin will be automatically available in the list of enabled plugins in the service.

Docker container

As a service administrator, I want to be able to run the service as a Docker container.
A Dockerfile must be added to the repository.

Check RawEvents for potential integrity errors before create new entries

Background:
Because the Crossref Event Data API filters events by date; new events will show up twice if they happen before 12:00 (when metrics are pulled). E.g. if an event occurs at 09:00 on the 16th of Feb, and the last scrape was on the 15th of Feb, then this new event will be recorded. The next day, when the scrape happens again, looking for events that happened since the 16th of Feb, the same event will be returned again, which causes an integrity error when trying to save this in the database.

Solution:
The Crossref Event Data plugin needs to check for existing RawEvent entries before trying to create them - since this is done with a bulk-create, and using 'from-collected-date' is not sufficient for this. This will also allow for fresh re-scraping in future, without causing problems, if we ever want to do this.

Altmetrics documentation

Documentation for the usage of the Altmetrics API endpoints:

must be available at https://docs.metrics.ubiquity.press
must include Postman config JSON
must include docs on how to use JWT authentication

Hypothes.is plugin: Query by DOI and URL

The Hypothes.is plugin currently tries to fetch annotations by DOI only. The linking of DOIs to Hypothes.is annotations does not yet seem to work, reliably.

Need to:

Query all URLs registered URLs for annotations
Include a wildcard search for annotations on epub readers

Restructure the database to track event deletions within RawEvents

For practicality, we have decided to remove the DeletedEvent table, and simply mark an event as deleted or not. Any additional information about the event deletion should be stored in the RawEvent table for a given event.

Wikipedia Plugin: Make sure DOI is still references in up-to-date version of the page

Some books that are referenced on a Wikipedia page are removed as references in later versions of the page. When this happens, the reference is no longer applicable for the book, so the following should be done:

Periodically go through Wikipedia events. For each, check the reference section of the Wikipedia page to make sure that the DOI of that event is still referenced on the page.
If the reference is no longer valid, mark it as a deleted event.
Periodically check deleted Wikipedia events to see if the book reference has been restored.

Generic plugin

As a service administrator / developer, I want a generic plugin to be available in the system
The plugin is a class, and has all the methods available to the real plugins, but each method raises a NotImplemented when called; it serves the purpose of showing how a plugin should be implemented, and details the usage of the methods and class with the relevant docstrings.

Login with credentials - API

As a partners, I have the ability to access the API using the API token available in my personal settings, once I am logged in.
Using the API without authentication will return a 401 error.

Kubernetes configuration

Publish Helm chart for the service

Sync data with Metrics API

Add Nameko class to send data to the Metrics API (https://github.com/hirmeos/metrics-api)

hirmeos / altmetrics Goto Github PK

altmetrics's People

Contributors

Stargazers

Watchers

altmetrics's Issues

Recommend Projects

Recommend Topics

Recommend Org