Comments (5)
@chreman is there somewhere that ctree is properly defined?
from getpapers.
I am working on cproject
and ctree
and raise these issues for discussion. Some architecture is described in http://www.slideshare.net/petermurrayrust/architecture-of and was ported to Discourse http://discuss.contentmine.org/t/overall-architecture/142 with the idea we could address it there. But probably I should tackle it in https://github.com/peter/cmine and then give @Anusha a pull request to https://github.com/ContentMine/cmine (which doesn't yet exist)
from getpapers.
OK. For the moment how about I follow what quickscrape does?
for each entry, create a directory with:
bib.json
with results in bibjson formatapi_results.json
with raw json results, e.g.epmc_results.json
fulltext.{xml, pdf, html}
- fulltext in whatever formatimages/*.*
images in whatever format
If bib.json
already exists, getpapers
will merge the new results into the existing ones, unless the use specifies --overwrite
, in which case it will replace the whole file. If any of the fulltext.*
files already exists, they will not be downloaded again unless --overwrite
is specified.
from getpapers.
looks good.
when I looked at eupmc_results.json
the JSON seemed to have been created from a database with a lot of 1-element arrays. Presumably you can convert this to BibJSON ok?
from getpapers.
Yeah that is pretty common in lots of databases! Easy to convert :)
from getpapers.
Related Issues (20)
- Corrupt PDFs from EuropePMC HOT 7
- arXiv.org PDFs denying access to User-Agent "getpapers/(TDM Crawler [email protected])" HOT 9
- EuropePMC now needs SSL
- Potentially unhandled rejection - Error: socket hang up HOT 1
- EUPMC version updated to 5.3.2 January 2018
- Error: EACCES: permission denied (cannot mkdir for results)
- IEEE API throws error HOT 5
- arXiv API feed contains less data than page size - but getpapers starts new query with the next start parameter
- getpapers should not urlencode '&', '=', or '?' in its query URLs
- "Malformed response from arXiv API - no data in feed" woes...
- Add submittedDate and lastUpdatedDate to the Wiki description of arXiv API
- Potentially unhandled rejection - invalid string length HOT 5
- User-side feedback HOT 3
- Journal specific queries in `getpapers` HOT 1
- Syntax error with getpapers minimal usage HOT 1
- Getpapers hangs running through vpn
- Problem with logical restriction (AND) in europepmc HOT 1
- getpapers 'JavaScript heap out of memory error' HOT 3
- Outdated packages (security issues)
- Wrong paper is downloaded HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from getpapers.