hirmeos / altmetrics Goto Github PK
View Code? Open in Web Editor NEWImplementation of HIRMEOS WP6
License: MIT License
Implementation of HIRMEOS WP6
License: MIT License
The hypothesis plugin is failing to retrieve annotations when querying the Hypothes.is API, using the DOI of a book. A similar problem was encountered when trying to retrieve Hypothes.is annotations for books from the Crossref Event Data API.
Following the Hypothes.is guidelines (https://web.hypothes.is/help/how-hypothesis-interacts-with-document-metadata/), we have embedded the DOI of our books in the EPUB reader page, using the Dublin Core meta tag, but annotations from these pages do not show up when querying the Hypothes.is API by DOI, even though they can be fetched using the URL of the page, e.g. annotations for using the wildcard query https://hypothes.is/api/search?wildcard_uri=https://www.ubiquitypress.com/site/books/10.5334/baj/read/?loc=* are not found when querying based purely on DOI: https://hypothes.is/api/search?uri=doi:10.5334/baj
We have contacted the Hypothes.is support team, and they are looking into this issue.
Until it is resolved. we will only be able to fetch book annotations by URL, rather than DOI.
As the service administrator, I want to have a plugin to gather data from the CrossRef Event Data API. If enabled, the plugin will be used to fetch Twitter, Hypothes.is and Wikipedia mentions by DOI.
When gathering results for a specific DOI, the plugin will be automatically available in the list of enabled plugins in the service.
As the service administrator, I want to have a plugin to gather data from Facebook, searching for URLs and DOIs.
When gathering results for a specific DOI or URL, the plugin will be automatically available in the list of enabled plugins in the service.
Add support for measuring shares of DOIs in Wordpress.com using Crossref Event Data.
When the wikipedia
Python package tries to fetch a page, there is logic that prioritizes suggestions over actual results, which causes the client to crash if the suggestion is wrong. So we just need to update the client calls to work around this.
On a related note, the wikipedia
Python package we use (https://pypi.org/project/wikipedia/) is fairly outdated - it hasn't been updated since 2014. I have not yet been able to find a suitable replacement, but it may be worth looking into.
As the service administrator, I want to have a plugin to gather data from Twitter, searching for URLs related to a DOI.
When gathering results for a specific DOI, the plugin will be automatically available in the list of enabled plugins in the service.
As a partner, I want to be able to log in with the provided username and password, which has previously been created by the administrator of the service.
Before the login, an introduction page is shown explaining the rationale of the project.
After the login, I am redirected to the list of DOIs I have uploaded.
Crossref Event Data events do occasionally get marked as deleted (e.g. if a tweet is deleted from Twitter). When this happens, we will need to mark these events as deleted.
deleted
As a user, I want to be able to upload a CSV containing DOIs and URLs, after logging in. The CSV will have URLs in the first column, and the related DOIs in the second column.
Once uploaded, the CSV will be registered on the S3 account of the service, and the DOIs and URLs saved to the database (by a Celery task).
At the database level, the URLs will have a foreign key referencing the related DOI.
If the loaded CSV contains URLs or DOIs already registered in the database, they will be skipped.
If the CSV contains DOIs already registered in the database, but different URL, the new URLs will be added to the database, referencing the related (existing) DOI.
Right after the upload, as a logged in user, I will be redirected to the page listing the uploaded CSVs.
Before uploading the CSV to S3, the file must be uniquely named.
As a partner, I want to be able to access the data in the service using an API.
The endpoints must be compliant to the project's tech specifications (HIRMEOS WP6).
After registering for an account, users will need to be approved before they can use this service.
This should include:
Received a message querying a book tweet that was not saved by the Altmetrics service:
• "We found a tweet about one of our books but the API doesn’t show it: https://twitter.com/OpenEditionNews/status/1140910422431797250?s=20"
After some investigation, this tweet was not picked up because it references the book in question by its URL (books.openedition.org/oep/8999
), yet we only search for books based on DOI. After adding the URL to the Twitter search, it was still unable to find the tweet. The following combinations were tested:
DOI and all URLs:
keywords = ['"https://books.openedition.org/oep/8999"', '"https://books.openedition.org/oep/pdf/8999"', '"https://books.openedition.org/oep/epub/8999"', '"10.4000/books.oep.8999"']
Only the relevant URL
Keywords = ['"https://books.openedition.org/oep/8999"']
Only the relevant URL without 'https://'
Keywords = ['"books.openedition.org/oep/8999"']
None of these searches with the Twitter client returned anything.
Ideally, we should be able to search for tweets about a book that is mentioned by its URL, not just its DOI.
As the service administrator, I want to have a plugin to gather data from Wikipedia, searching for URLs related to a DOI.
When gathering results for a specific DOI, the plugin will be automatically available in the list of enabled plugins in the service.
As a service administrator, I want to be able to run the service as a Docker container.
A Dockerfile
must be added to the repository.
Background:
Because the Crossref Event Data API filters events by date; new events will show up twice if they happen before 12:00 (when metrics are pulled). E.g. if an event occurs at 09:00 on the 16th of Feb, and the last scrape was on the 15th of Feb, then this new event will be recorded. The next day, when the scrape happens again, looking for events that happened since the 16th of Feb, the same event will be returned again, which causes an integrity error when trying to save this in the database.
Solution:
The Crossref Event Data plugin needs to check for existing RawEvent entries before trying to create them - since this is done with a bulk-create, and using 'from-collected-date'
is not sufficient for this. This will also allow for fresh re-scraping in future, without causing problems, if we ever want to do this.
Documentation for the usage of the Altmetrics API endpoints:
The Hypothes.is plugin currently tries to fetch annotations by DOI only. The linking of DOIs to Hypothes.is annotations does not yet seem to work, reliably.
Need to:
epub
readersFor practicality, we have decided to remove the DeletedEvent
table, and simply mark an event as deleted or not. Any additional information about the event deletion should be stored in the RawEvent
table for a given event.
Some books that are referenced on a Wikipedia page are removed as references in later versions of the page. When this happens, the reference is no longer applicable for the book, so the following should be done:
As a service administrator / developer, I want a generic plugin to be available in the system
The plugin is a class, and has all the methods available to the real plugins, but each method raises a NotImplemented
when called; it serves the purpose of showing how a plugin should be implemented, and details the usage of the methods and class with the relevant docstrings.
As a partners, I have the ability to access the API using the API token available in my personal settings, once I am logged in.
Using the API without authentication will return a 401 error.
Publish Helm chart for the service
Add Nameko class to send data to the Metrics API (https://github.com/hirmeos/metrics-api)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.