openneuro_reuse_2021's Introduction
How I generated openneuro reuse data (June 2021): 1) download_papers.sh: attempts to download all the papers resulting from an 'openneuro' google scholar search. Make sure to adjust the number in the for loop according to the number of openneuro results pages. Afterward, double check to make sure each directory contains 'bibtex.bib' and 'result.csv'. 2) manual_download.sh PAGE_DIR: this script prints each line for which the download failed. it asks for user input to get the URL to download the paper and will name it appropriately. Note that URL can also be a local path i.e. 'file:///Users/...' 3) validate_pdfs.sh: check that all pdfs are fully downloaded and not corrupted 4) Check that there are no duplicate papers. I imported in google sheets and highlighted rows with same paper name or same doi. 5) find_openneuro.sh: searches for 'openneuro' in each paper and prints whether it appears. Papers without this should be inspected using check_no_match.sh to make sure file title matches the actual paper title. 6) create_all_result.sh: combines all the 'result.csv' files into a single 'papers/all_result.csv' 7) paper2txt.sh: converts every pdf in 'papers/' to a .txt file located in 'papers_txt/' 8) Create a file called 'openneuro_authors.tsv', which contains one column of dataset numbers and one of authors list. This can easily be copied from metadata.openneuro.org. And create a file called 'paper_list.tsv', which contains two columns: paper name and author list 9) mine_papers.sh paper_list.tsv: Searches the corresponding paper for dataset mentions, and uses user input to create a resulting file 'mappings.tsv'. 10) Import the mappings into a google sheet or spreadsheet, filtering out any mappings other than 'reuse'. A seconds tab can be made containg a list of papers with metdata. 11) get_doi.sh: I discovered many incorrect DOIs so this script helps to manually verify and/or replace each doi. I wait until now to do this as I will only take the time to get correct metadata on papers coded as 'reuse'. 12) pybliometrics can be then be used with with DOIs to acquire 'senior author', 'senior author country', 'year' and 'journal' for many of the papers via scopus api. Crossref api can be tried for those that failed. Others will require manual acquisition. 13) Finally, use python package 'scholarly' to get number of citations for each paper since google scholar seems to be the most reliable source for this. I had to change my VPN location about 10 times in order to go through all the papers since google will cut you off after too many queries.
openneuro_reuse_2021's People
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.