Giter Club home page Giter Club logo

openneuro_reuse_2021's Introduction

How I generated openneuro reuse data (June 2021):

1) download_papers.sh: attempts to download all the papers resulting from an 'openneuro' google scholar search. Make sure to adjust the number in the for loop according to the number of openneuro results pages. Afterward, double check to make sure each directory contains 'bibtex.bib' and 'result.csv'.

2) manual_download.sh PAGE_DIR: this script prints each line for which the download failed. it asks for user input to get the URL to download the paper and will name it appropriately. Note that URL can also be a local path i.e. 'file:///Users/...'

3) validate_pdfs.sh: check that all pdfs are fully downloaded and not corrupted

4) Check that there are no duplicate papers. I imported in google sheets and highlighted rows with same paper name or same doi. 

5) find_openneuro.sh: searches for 'openneuro' in each paper and prints whether it appears. Papers without this should be inspected using check_no_match.sh to make sure file title matches the actual paper title.

6) create_all_result.sh: combines all the 'result.csv' files into a single 'papers/all_result.csv'

7) paper2txt.sh: converts every pdf in 'papers/' to a .txt file located in 'papers_txt/'

8) Create a file called 'openneuro_authors.tsv', which contains one column of dataset numbers and one of authors list. This can easily be copied from metadata.openneuro.org. And create a file called 'paper_list.tsv', which contains two columns: paper name and author list

9) mine_papers.sh paper_list.tsv: Searches the corresponding paper for dataset mentions, and uses user input to create a resulting file 'mappings.tsv'. 

10) Import the mappings into a google sheet or spreadsheet, filtering out any mappings other than 'reuse'. A seconds tab can be made containg a list of papers with metdata.

11) get_doi.sh: I discovered many incorrect DOIs so this script helps to manually verify and/or replace each doi. I wait until now to do this as I will only take the time to get correct metadata on papers coded as 'reuse'.

12) pybliometrics can be then be used with with DOIs to acquire 'senior author', 'senior author country', 'year' and 'journal' for many of the papers via scopus api. Crossref api can be tried for those that failed. Others will require manual acquisition.

13) Finally, use python package 'scholarly' to get number of citations for each paper since google scholar seems to be the most reliable source for this. I had to change my VPN location about 10 times in order to go through all the papers since google will cut you off after too many queries.

openneuro_reuse_2021's People

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.