Giter Club home page Giter Club logo

targeted_literature_reviews_via_webscraping's Introduction

alt text

Targeted Literature Reviews using webscraping

Web scraping to get articles for a given query. It returns an spreadsheet with titles, abstracts and pmids.

It works on Pubmed and it is based on biopython: https://biopython.org

You can run it on Google Colab without downloading anything locally! :) https://research.google.com/colaboratory/faq.html

How it works?

For a given query, you can get:

  1. an xlsx file with the titles and abstracts of the papers in your query
  2. a graph with the papers in your query and their references. This lets us find highly cited papers in a given field
  3. an xlsx file with the titles and abstracts of the references as well together with their degree (i.e. the number of connections in the graph). The higher the degree, the more papers in your query citing it

For the example query "Radiomics"AND"CT"AND"Ovarian Cancer" we get:

alt text

Next steps:

  • At the moment it only works on PubMed. I'm working on making it work in arxiv and bioarxiv as well. Implementation in Google Scholar is complicated but I am also trying to get my head around it.
  • I'm working on an implementation that requires no code whatsoever - via website or widgets.
  • It would be great to import the articles to Mendeley, so I'm also working on that!

If you have any suggestion to improve the code, please feel free to raise an Issue!

Questions:

What happens to articles behind a paywall?

You'll be able to get the abstract but unfortunately not the references. So those won't be added to the graph. Open science is the way to go!!

targeted_literature_reviews_via_webscraping's People

Contributors

paulamartingonzalez avatar mfpfox avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.