Comments (6)
But this is correct. First one .json
is created for each paper, and later they are aggregated into a single .jsonl
that contains one entry per line. The .json
files will then be deleted automatically, but if the code stops during scraping (like in your case), you will find only the .json
files, not the jsonl
yet
from paperscraper.
I manually added the fixed code yesterday and successfully ran chemrxiv()
. Thanks for your help!
from paperscraper.
Hi @LioHong,
Thanks for the interest in paperscraper
and reporting this issue. This is caused by your OS being Windows, I should not have used os.path.join
to join URLs.
Sorry for that, I will fix it soon in a PR
from paperscraper.
Thank you for the prompt response!
Just to confirm, I'm assuming the expected output is a single chemrxiv JSONL dump rather than a bunch of JSONs.
from paperscraper.
@LioHong Please have a look at #29
from paperscraper.
Thanks. Thew new version is available via pypi now (0.2.7).
pip install --upgrade paperscraper
and your issue will be resolved
from paperscraper.
Related Issues (15)
- import error HOT 4
- Randomness in arxiv API requests
- get_dumps.chemrxiv does nothing HOT 6
- Error when importing any of chemrxiv, biorxiv, medrxiv from paperscraper.get_dumps HOT 7
- ChemRxiv Engage API integration HOT 1
- ImportError: attempted relative import beyond top-level package HOT 1
- No DOI given in saved dumps of recent arxiv papers HOT 4
- UnexpectedEmptyPageError and associated errorscre HOT 3
- How to turn off the DEBUG log information? HOT 3
- Scrape X-rxiv via API HOT 3
- Remote diconnected and didnt download files HOT 5
- Searching impact factor of journal
- scrapper Killed HOT 2
- Error when downloading papers from Pubmed. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paperscraper.