neurovault / pyneurovault Goto Github PK

View Code? Open in Web Editor NEW

6.0 8.0 8.0 510 KB

python wrapper for NeuroVault api (in dev)

Home Page: http://www.neurovault.org/api-docs

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

neurovault neuroimaging brainmaps brainmap api python

pyneurovault's Introduction

pyneurovault

python wrapper for NeuroVault api

Currently supports:

downloading all image and collections data into tables
counting unique cognitive atlas contrasts
downloading all resampled images
decoding with neurosynth terms
basic querying of results

Installation

pip install git+https://github.com/NeuroVault/pyneurovault

pyneurovault's People

Contributors

Stargazers

Watchers

Forkers

chrisgorgo schwarty ljchang poldrack nagyistge kwayeke tubbz-alt saurabhr

pyneurovault's Issues

needs to be made compatible for python 3

Simplify package dependencies

Right now, this package depends on nipype and nilearn.

The nipype is a larger dependency for a single utility function--seems better to copy it out.
The nilearn dependency is only for an example. Seems odd to install it when installing pyneurovault, as the example isn't accessible.

Function to get article info

We would want to:

filter down to subset of images interested in
for each image, get pmid/doi
[idea] could get complete info from pubmed
[idea] could get json data structure directly from brainspell

will be integrated into cogatlas API to annotate images!

Links of use:

collection based on DOI from Chris
brainspell with pmid

error in get_images_with_collections()

all_images = get_images_with_collections()

Extracting NeuroVault collections meta data...
http://neurovault.org/api/collections/?limit=100&format=json
Found 408 results.
Retrieving http://neurovault.org/api/collections/?format=json&limit=100&offset=100
Retrieving http://neurovault.org/api/collections/?format=json&limit=100&offset=200
Retrieving http://neurovault.org/api/collections/?format=json&limit=100&offset=300
Retrieving http://neurovault.org/api/collections/?format=json&limit=100&offset=400
Extracting NeuroVault images meta data...
http://neurovault.org/api/images/?limit=1000&format=json
Found 7578 results.
Retrieving http://neurovault.org/api/images/?format=json&limit=1000&offset=1000
Retrieving http://neurovault.org/api/images/?format=json&limit=1000&offset=2000
Retrieving http://neurovault.org/api/images/?format=json&limit=1000&offset=3000
Retrieving http://neurovault.org/api/images/?format=json&limit=1000&offset=4000
Retrieving http://neurovault.org/api/images/?format=json&limit=1000&offset=5000
Retrieving http://neurovault.org/api/images/?format=json&limit=1000&offset=6000
Retrieving http://neurovault.org/api/images/?format=json&limit=1000&offset=7000
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-80d2a85866da> in <module>()
----> 1 all_images = get_images_with_collections()

/Users/filo/anaconda/lib/python2.7/site-packages/pyneurovault/api.pyc in get_images_with_collections(collection_pks)
     94     collections_df = get_collections(pks=collection_pks)
     95     images_df = get_images(collection_pks=collection_pks)
---> 96     combined_df = pandas.merge(images_df, collections_df, how='left', on='collection_id',suffixes=('_image', '_collection'))
     97     return combined_df
     98 

/Users/filo/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
     36                          right_index=right_index, sort=sort, suffixes=suffixes,
     37                          copy=copy)
---> 38     return op.get_result()
     39 if __debug__:
     40     merge.__doc__ = _merge_doc % '\nleft : DataFrame'

/Users/filo/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in get_result(self)
    184 
    185     def get_result(self):
--> 186         join_index, left_indexer, right_indexer = self._get_join_info()
    187 
    188         ldata, rdata = self.left._data, self.right._data

/Users/filo/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_join_info(self)
    271              right_indexer) = _get_join_indexers(self.left_join_keys,
    272                                                  self.right_join_keys,
--> 273                                                  sort=self.sort, how=self.how)
    274 
    275             if self.right_index:

/Users/filo/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_join_indexers(left_keys, right_keys, sort, how)
    459 
    460     # get left & right join labels and num. of levels at each location
--> 461     llab, rlab, shape = map(list, zip( * map(fkeys, left_keys, right_keys)))
    462 
    463     # get flat i8 keys from label lists

/Users/filo/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _factorize_keys(lk, rk, sort)
    621     rizer = klass(max(len(lk), len(rk)))
    622 
--> 623     llab = rizer.factorize(lk)
    624     rlab = rizer.factorize(rk)
    625 

pandas/hashtable.pyx in pandas.hashtable.Int64Factorizer.factorize (pandas/hashtable.c:15733)()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Add atlas query functions

API to allow uploads

Would it be possible to expose an API to allow uploads, assuming a valid key pair and all that?

add sphinx documentation

Update download images / collections

Should be able to

check/exclude single subject
check/exclude missing MNI
check/exclude missing BOLD
check/exclude thresholding

Add filtering capabilities to collection fetching

According to API docs (http://neurovault.org/api-docs),

Returns a json file containing a list of dictionaries with information corresponding to each collection stored in NeuroVault. Results can be filtered by specifying the name, DOI or owner of the collection.

Parameters: name, DOI, owner

example: neurovault.org/api/collections/?DOI=10.1016/j.neurobiolaging.2012.11.002

This filtering should be available in the API.

Non-ascii characters prevent export

Bug! Both data frames have weird characters that don't want to save to any kind of text file. Must figure out how to address, and add an export function.

Retrieving collections by pk returns images, not collections

https://github.com/NeuroVault/pyneurovault/blob/master/pyneurovault/api.py#L99
def get_collections(self,pks=None):
calls
get_json_df("collections",pks,limit=1).

https://github.com/NeuroVault/pyneurovault/blob/master/pyneurovault/utils.py#L75
def get_json_df(data_type, pks=None, limit=1000):
calls
tmp = get_url("http://neurovault.org/api" "/images/%s/?format=json" % pk)
when pk is not None.

I believe images in that URL should be replaced by the variable data_type.

Incorporate functionality from neurovault_analysis

https://github.com/NeuroVault/neurovault_analysis/blob/master/neurovault_datagrabber.py

Tag NeuroVault images with Cognitive Atlas

Larger goal: we should be able to use the contrast_definition to put in a good hypothesis for what contrast_definition_cogatlas is, and then we can figure out a clever way to get it checked by the authors of the study (eg, have a function that does the tagging, and then sends an email to authors on paper and says "hey, this is what you meant right?"

** Don't worry, I will not do anything without discussion first! We must honor user contributions and nto spam. Just putting thoughts here.

data table export

add export of combined images and collections
"row numbers" are confusing, the first column should be the collection_id (this is likely because is not set as index in tables).

Use programmatic paths in examples

Right now, examples don't work because they use hard-coded paths. Use paths programmatically (in the worst case, use os.getcwd()) so that they work out of the box.

Mysterious characters not parsing from json

When using the API with urllib2 to read the json into a data structure, there seem to be some kind of invalid characters leading to error:

nv = api.NeuroVault()
Extracting NeuroVault images meta data...
Traceback (most recent call last):
File "", line 1, in
File "/home/vsochat/.local/lib/python2.6/site-packages/pyneurovault-0.1.0-py2.6.egg/pyneurovault/api.py", line 55, in init
self.images = self.get_images()
File "/home/vsochat/.local/lib/python2.6/site-packages/pyneurovault-0.1.0-py2.6.egg/pyneurovault/api.py", line 73, in get_images
images = DataJson("http://neurovault.org/api/images/?format=json")
File "/home/vsochat/.local/lib/python2.6/site-packages/pyneurovault-0.1.0-py2.6.egg/pyneurovault/utils.py", line 46, in init
self.json = self.get_json()
File "/home/vsochat/.local/lib/python2.6/site-packages/pyneurovault-0.1.0-py2.6.egg/pyneurovault/utils.py", line 55, in get_json
return urllib2.urlopen(self.url).read().encode("utf-8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 21126: ordinal not in range(128)

This needs to be addressed! We don't have much control over what users copy paste / write in their text boxes, so it probably is best to deal with here. We need a solution that is reasonably fast and deals with such characters.

pyneurovault needs tests

we need at least one test for each function, ideally testing parameters too

cannot import 'images_from_collections'

I am getting this import error...is it because I am using python 3.10?

ImportError: cannot import name 'images_from_collections' from 'pyneurovault.api' (/Users/saurabh.ranjan/opt/anaconda3/envs/brain/lib/python3.10/site-packages/pyneurovault/api.py)

pyneurovault fetches only first 100 images for each collection

In [50]:len(api.get_images(collection_pks=503))
    Extracting NeuroVault collections meta data...
    Retrieving collection 503...
Out[50]:
    100

but http://neurovault.org/collections/503/

Function to search data frame fields

Use should be like:

Show me fields that I can search
Let me do search
Return subset of entire data frame (for downloading images, etc).

Downloading images from a specific collection does not work

nv.download_and_resample("/tmp/", "/Applications/fsl/data/standard/MNI152_T1_2mm_brain.nii.gz", collection_ids=[457])


    ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-5dd535d694c7> in <module>()
----> 1 nv.download_and_resample("/tmp/", "/Applications/fsl/data/standard/MNI152_T1_2mm_brain.nii.gz", collection_ids=[457])

/Users/filo/anaconda/lib/python2.7/site-packages/pyneurovault/api.pyc in download_and_resample(self, dest_dir, target, collection_ids, image_ids)
    162     resampled_path = os.path.join(dest_dir, "resampled")
    163     mkdir_p(resampled_path)
--> 164     combined_df = self.get_images_with_collections_df()
    165     # If the user has specified specific images
    166     if image_ids:

/Users/filo/anaconda/lib/python2.7/site-packages/pyneurovault/api.pyc in get_images_with_collections_df(self)
    117   def get_images_with_collections_df(self):
    118     """Downloads metadata about images/statistical maps stored in NeuroVault and enriches it with metadata of the corresponding collections. The result is returned as a pandas DataFrame"""
--> 119     collections_df = self.get_collections_df()
    120     images_df = self.get_images_df()
    121     combined_df = pd.merge(images_df, collections_df, how='left', on='collection_id',suffixes=('_image', '_collection'))

/Users/filo/anaconda/lib/python2.7/site-packages/pyneurovault/api.pyc in get_collections_df(self)
    109   def get_collections_df(self):
    110     """Return just collections data frame"""
--> 111     return self.collections.data
    112 
    113   def get_images_df(self):

/Users/filo/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in __getattr__(self, name)
   1976                 return self[name]
   1977             raise AttributeError("'%s' object has no attribute '%s'" %
-> 1978                                  (type(self).__name__, name))
   1979 
   1980     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'data'

Lazy read of collections and images

Currently, api.NeuroVault fetches collections and images on object construction. This is very slow (~1 minute), and may not be relevant to the user.

    def __init__(self):
        self.collections = self.get_collections()      
        self.images = self.get_images()                
        print self

I suggest only downloading these data when needed. If you really want this behavior available, I suggest toggling it with a construction flag (perhaps preload=True or on_demand=False)

Add ability to download without resampling

Download image metadata has key error

  images = api.get_images(collection_pks=collections.collection_id.tolist())

  KeyError                                  Traceback (most recent call last)
  <ipython-input-19-8a09080e4037> in <module>()
  ---> 94     images['collection'] = images['collection'].apply(lambda x: int(x.split("/")[-2]))