Giter Club home page Giter Club logo

Comments (6)

jannisborn avatar jannisborn commented on June 10, 2024 1

But this is correct. First one .json is created for each paper, and later they are aggregated into a single .jsonl that contains one entry per line. The .json files will then be deleted automatically, but if the code stops during scraping (like in your case), you will find only the .json files, not the jsonl yet

from paperscraper.

LioHong avatar LioHong commented on June 10, 2024 1

I manually added the fixed code yesterday and successfully ran chemrxiv(). Thanks for your help!

from paperscraper.

jannisborn avatar jannisborn commented on June 10, 2024

Hi @LioHong,

Thanks for the interest in paperscraper and reporting this issue. This is caused by your OS being Windows, I should not have used os.path.join to join URLs.

Sorry for that, I will fix it soon in a PR

from paperscraper.

LioHong avatar LioHong commented on June 10, 2024

Thank you for the prompt response!

Just to confirm, I'm assuming the expected output is a single chemrxiv JSONL dump rather than a bunch of JSONs.

from paperscraper.

jannisborn avatar jannisborn commented on June 10, 2024

@LioHong Please have a look at #29

from paperscraper.

jannisborn avatar jannisborn commented on June 10, 2024

Thanks. Thew new version is available via pypi now (0.2.7).
pip install --upgrade paperscraper and your issue will be resolved

from paperscraper.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.