scverse / scanpy Goto Github PK
View Code? Open in Web Editor NEWSingle-cell analysis in Python. Scales to >1M cells.
Home Page: https://scanpy.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
Single-cell analysis in Python. Scales to >1M cells.
Home Page: https://scanpy.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
e.g pp.filter_genes_dispersion
returns a dict with known keys. if you know the keys, this means you should return an object where you can use attribute access instead. usually the pythonic thing to do in this case is to return a namedtuple.
in this case though, we have
>>> {k: v.shape for k, v in filter_result.items()}
{'dispersions': (173351,),
'dispersions_norm': (173351,),
'gene_filter': (173351,),
'means': (173351,)}
which is a perfect case for a recarray.
hi, we want to converge our HDF5 representation with SingleCellExperiment’s.
SummarizedExperiment
already has a HDF5 repr
Hello,
it's me again, really thanks for your kindly reply before.
when I analyze my own data using sc.tl.dpt
with default n_branches
, it worked well, but when I set n_branches
more than 0, it occurred an error:
--> To enable computation of pseudotime, pass the index or expression vector
of a root cell. Either add
adata.add['iroot'] = root_cell_index
or (robust to subsampling)
adata.var['xroot'] = adata.X[root_cell_index, :]
where "root_cell_index" is the integer index of the root cell, or
adata.var['xroot'] = adata[root_cell_name, :].X
where "root_cell_name" is the name (a string) of the root cell.
perform Diffusion Pseudotime analysis
using "X_pca" for building graph
using stored data graph with n_neighbors = 30 and spectrum
[ 1. 0.9944264293 0.9934666753 0.9925051928 0.9899699688
0.9893597364 0.9855745435 0.9840251803 0.981688261 0.9806631804]
detect 1 branching
do not consider groups with less than 2742 points for splitting
branching 1: split group 0
WARNING: detected group with only [] cells
ValueError Traceback (most recent call last)
<ipython-input-3-b1749d943ac4> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', 'sc.tl.dpt(adata_corrected,n_jobs=48,n_pcs=30,allow_kendall_tau_shift=False,n_branchings=1)\nsc.logging.print_memory_usage()')
/public/bioapps/ana/anaconda3/envs/python35/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2113 magic_arg_s = self.var_expand(line, stack_depth)
2114 with self.builtin_trap:
-> 2115 result = fn(magic_arg_s, cell)
2116 return result
2117
<decorator-gen-59> in time(self, line, cell, local_ns)
/public/bioapps/ana/anaconda3/envs/python35/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):
/public/bioapps/ana/anaconda3/envs/python35/lib/python3.5/site-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
1178 else:
1179 st = clock2()
-> 1180 exec(code, glob, local_ns)
1181 end = clock2()
1182 out = None
<timed exec> in <module>()
/public/workspace/jiping/scanpy-master/scanpy/tools/dpt.py in dpt(adata, n_branchings, n_neighbors, knn, n_pcs, n_dcs, min_group_size, n_jobs, recompute_graph, recompute_pca, allow_kendall_tau_shift, flavor, copy)
127 adata.smp['dpt_pseudotime'] = dpt.pseudotime
128 # detect branchings and partition the data into segments
--> 129 dpt.branchings_segments()
130 # vector of length n_groups
131 adata.add['dpt_groups_order'] = [str(n) for n in dpt.segs_names_unique]
/public/workspace/jiping/scanpy-master/scanpy/tools/dpt.py in branchings_segments(self)
188 for each segment.
189 """
--> 190 self.detect_branchings()
191 self.postprocess_segments()
192 self.set_segs_names()
/public/workspace/jiping/scanpy-master/scanpy/tools/dpt.py in detect_branchings(self)
258 segs_connects,
259 segs_undecided,
--> 260 segs_adjacency, iseg, tips3)
261 # store as class members
262 self.segs = segs
/public/workspace/jiping/scanpy-master/scanpy/tools/dpt.py in detect_branching(self, segs, segs_tips, segs_connects, segs_undecided, segs_adjacency, iseg, tips3)
464 # branching on the segment, return the list ssegs of segments that
465 # are defined by splitting this segment
--> 466 result = self._detect_branching(Dseg, tips3, seg)
467 ssegs, ssegs_tips, ssegs_adjacency, ssegs_connects, trunk = result
468 # map back to global indices
/public/workspace/jiping/scanpy-master/scanpy/tools/dpt.py in _detect_branching(self, Dseg, tips, seg_reference)
632 if len(np.flatnonzero(newseg)) <= 1:
633 logg.warn('detected group with only {} cells'.format(np.flatnonzero(newseg)))
--> 634 secondtip = newseg[np.argmax(Dseg[tips[inewseg]][newseg])]
635 ssegs_tips.append([tips[inewseg], secondtip])
636 undecided_cells = np.arange(Dseg.shape[0], dtype=int)[nonunique]
/public/bioapps/ana/anaconda3/envs/python35/lib/python3.5/site-packages/numpy/core/fromnumeric.py in argmax(a, axis, out)
971 except AttributeError:
972 return _wrapit(a, 'argmax', axis, out)
--> 973 return argmax(axis, out)
974
975
ValueError: attempt to get argmax of an empty sequence```
Does it mean that this data didn't contain any branches?
Howerer, I do see some small tips when I plot using `diffmap` result or `sc.tl.dpt` with default branch. Could you help me to figure it out?
Thanks,
jiping
I have a short script which reads a tab file and writes h5 using scanpy. I've found that unless I provide a full path to the write() function or at least a relative one via "./foo.h5" it fails. Simplified version:
adata = sc.read(args.input_file, ext='txt', first_column_names=True).transpose()
adata.write('./test.h5') # this works
adata.write('test2.h5') # this fails
Here's the stack:
WARNING: This might be very slow. Consider passing `cache=True`, which enables much faster reading from a cache file.
Traceback (most recent call last):
File "./convert_gear_group_single_cell_to_hdf5.py", line 47, in <module>
main()
File "./convert_gear_group_single_cell_to_hdf5.py", line 43, in main
adata.write('test2.h5')
File "/usr/local/lib/python3.5/dist-packages/anndata/base.py", line 1471, in write
compression=compression, compression_opts=compression_opts)
File "/usr/local/lib/python3.5/dist-packages/anndata/base.py", line 1513, in _write_h5ad
os.makedirs(os.path.dirname(filename))
File "/usr/lib/python3.5/os.py", line 241, in makedirs
mkdir(name, mode)
FileNotFoundError: [Errno 2] No such file or directory: ''
_____________________________
i found pointers about the “Edit on GitHub” button here, and this is the relevant template code.
maybe we can set meta['github_url']
like this, but this approach does nothing:
def process_docstring(app, what, name, obj, options, lines):
nm = name.replace('.', '/').replace('api/pp', 'preprocessing')
lines[:0] = [f':github_url: https://https://github.com/theislab/scanpy/tree/master/{nm}.py', '']
def setup(app):
app.connect('autodoc-process-docstring', process_docstring)
maybe we can get inspiration from the viewcode extension
Hi,
In the example notebook, seurat.ipynb
, the function sc.pp.normalize_per_cell()
is run before sc.pp.regress_out()
. Is it not better to regress out the effect of n_counts before normalization? I do not completely understand this and it would be great if the authors could explain this order of pre-processing. Also, is there certain order(s) of steps which should always be avoided?
Thank you.
Best,
Parashar
Hi, I was just trying to use the package but it seems that somebody is working on the master branch right now. Would it be possible to set up a development branch and maybe add a few tags for the working versions so that people could download a particular release instead of an in-progress master branch? I also noticed that the notebooks disappeared right after I cloned the repository. It seems like there are some big changes going on, so sorry if the timing for this issue is not right and you are just cleaning up the repository.
hi Alex
here my list :) thanks a lot! and I might expand it....
'needed'
'nice to have'
I'm running scanpy in a docker container with few privileges, e.g. the guest user.
Unfortunately, scanpy tries to create a folder in the local working directory:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/scanpy/__init__.py", line 14, in <module>
from . import settings
File "/usr/local/lib/python3.6/site-packages/scanpy/settings.py", line 13, in <module>
os.makedirs('.scanpy/')
File "/usr/local/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '.scanpy/'
Could you use a tempdir alternativley?
AnnData
1cec418#commitcomment-20744162AnnData
. maybe fields named var_*
in the HDF5 will be var metadata and so on?*apart from things still crashing, especially the group plotting should be fixed (should probably be transformed to one scatter
call with a list of all groups)
dont think to csv is actually coded
to xlsx raises error: pandas.core.common.PandasError: DataFrame constructor not properly called!
Hello,
I'm trying out the Graph abstraction and I get this error:
SetKeyError Traceback (most recent call last)
<ipython-input-12-928a85d4478e> in <module>()
----> 1 sc.tl.tsne(adata)
2 sc.tl.draw_graph(adata, random_state=5) # random_state just makes a cosmetic change
3 sc.write('krumsiek11_blobs', adata)
~/Downloads/scanpy/scanpy/tools/tsne.py in tsne(adata, n_pcs, perplexity, early_exaggeration, learning_rate, random_state, use_fast_tsne, recompute_pca, n_jobs, copy)
108 X_tsne = tsne.fit_transform(X)
109 # update AnnData instance
--> 110 adata.smp['X_tsne'] = X_tsne # annotate samples with tSNE coordinates
111 logg.info(' finished', t=True, end=' ')
112 logg.info('and added\n'
~/Downloads/scanpy/scanpy/data_structs/ann_data.py in __setitem__(self, keys, values)
382 # TODO: need to reallocate memory
383 # or allow storing objects, or use pd.dataframes
--> 384 raise SetKeyError(k, v.dtype, self.dtype[k])
385 super(BoundStructArray, self).__setitem__(k, v)
386
SetKeyError: Currently you cannot implicitly reallocate memory:
Setting the array for key X_tsne001of002 with dtype float64 requires too much memory, you should init AnnData with a large enough data type from the beginning.
Probably you try to assign a string of length 8 although the array can only store strings of length 4.
I'm using the latest git version of scanpy.
Any ideas?
Best wishes
rank_genes_groups
“returns” two recarrays, each with the shape #cells×#groups. one of them stores gene IDs, one the genes’ scores.
the problem with this is that recarrays store their column index (names) in the dtype, in a place where only strings are accepted. however users (and indeed both our wilcoxon example and the tests) may choose to use numeric group IDs.
genes with score 0 are unimportant anyway, so maybe we should return sparse data, in the form of a long-form recarray with something like this shape (with <group_by>
being the rank_genes_groups
parameter of the same name):
obs | var | <group_by> | score |
---|---|---|---|
0 | ENSGXXXX | 5 | 9.728 |
… | … | … | … |
This way the three IDs can have user-defined types, and the data is easier to process via e.g. pd.DataFrame.fromrecords(adata.obs['gene_ranking'])
The data should probably be sorted by descending z-scores by group, i.e. if it was a DataFrame: return gene_ranking.groupby(group_by).sort_values('score')
The task is to write the following preprocessing sequence using an AnnData instance adata.
meanFilter = 0.01
cvFilter = 2
nr_pcs = 50
ddata = adata.to_dict()
X = ddata['X']
# row normalize
X = row_norm(X, max_fraction=0.05, mult_with_mean=True)
# filter out genes with mean expression < 0.1 and coefficient of variance <
# cvFilter
X, gene_filter = filter_genes_cv(X, meanFilter, cvFilter)
# compute zscore of filtered matrix
Xz = zscore(X)
# PCA
Xpca = pca(Xz, nr_comps=nr_pcs)
# update dictionary
ddata['X'] = X
ddata['Xpca'] = Xpca
ddata['var_names'] = ddata['var_names'][gene_filter]
sett.m(0, 'Xpca has shape',
ddata['Xpca'].shape[0], 'x', ddata['Xpca'].shape[1])
from ..ann_data import AnnData
adata = AnnData(ddata)
print(adata.X)
While the previous snippet works just as expected, when I want to do the same without a ddata object, some uncontrolled behavior comes up. Indexing doesn't work as expected anymore. @flying-sheep: could you have a look at why adata['Xpca'] = Xpca
in the following throws an
>>> adata['Xpca'] = Xpca
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
in the following snippet
X = adata.X
# row normalize
X = row_norm(X, max_fraction=0.05, mult_with_mean=True)
# filter out genes with mean expression < 0.1 and coefficient of variance <
# cvFilter
X, gene_filter = filter_genes_cv(X, meanFilter, cvFilter)
# compute zscore of filtered matrix
Xz = zscore(X)
# PCA
Xpca = pca(Xz, nr_comps=nr_pcs)
# update adata
adata.X = X
adata = adata.var_names[gene_filter] # filter genes
adata['Xpca'] = Xpca
sett.m(0, 'Xpca has shape',
adata['Xpca'].shape[0], 'x', adata['Xpca'].shape[1])
print(adata.X)
I played around quite some bit, but the only solution that I got running then had the numerically incorrect result. It's quite to hard to keep this sequence of steps nicely organized.
PS: the snippet appears in scanpy/preprocess/advanced.py
and an example would be ./scanpy.py nestorowa16 diffmap -r pp
.
so idk about you, but i think CFFI is pretty painless for interacting with C/C++ code:
.so
file, use ffi.dlopen('….so')
, ffi.cdef('void myfunc(…)')
and are able to call it from python.we just have to think if we need to pass complex data structures, but i assume a few dense and sparse matrices is all you need. we just have to figure that out beforehand, and how to create a sparse matrix from raw memory (AFAIK everything under the sun supports at least column compressed layout, we might want to switch csr_matrix
→ csc_matrix
)
Hello,
When I call 'dpt_scatter' with the groups parameter I get the following error:
NameError: name 'names' is not defined
It looks like this is from line 230 in scanpy/plotting/ann_data.py and the 'names' variable just doesn't exist.
I'm assuming it should just be 'groups'?
Thanks,
Sarah
I am trying to toy with the krumsiek11 model, but the sc.tl.sim
call seems to ignore parameters and always uses the parameters from the krumsiek11_params.txt
file. In particular, running:
adam_krumsiek11 = sc.tl.sim('krumsiek11')
adam_krumsiek11_2 = sc.tl.sim('krumsiek11', nrRealizations = 1, seed = 1665487)
sc.pl.sim(adam_krumsiek11 )
sc.pl.sim(adam_krumsiek11_2)
produces two exactly identical figures with 4 realizations.
I also tried to set read_params_from_file = False
(this is not documented at http://scanpy.readthedocs.io/en/latest/api/scanpy.api.tl.sim.html but seemed relevant). However, running
adam_krumsiek11 = sc.tl.sim('krumsiek11', nrRealizations = 1, read_params_from_file = False)
sc.pl.sim(adam_krumsiek11)
results in IndexError
and running
adam_krumsiek11 = sc.tl.sim('krumsiek11', nrRealizations = 1, tmax = 800, read_params_from_file = False)
sc.pl.sim(adam_krumsiek11)
avoids the error, but gives the exact same figure as the first code segment.
Maybe I am not understanding correctly, how the function should work? (in which case this would be a documentation issue) Or is there really something wrong?
Thanks for any hints.
installing the master branch via pip3 install --user --upgrade git+https://github.com/theislab/scanpy.git
fails:
UPDATING build/lib.linux-x86_64-3.5/None
error: [Errno 2] No such file or directory: 'build/lib.linux-x86_64-3.5/None'
maybe _version.py
returns None
instead of a version string for the master branch.
installing the tag works: pip3 install --user --upgrade git+https://github.com/theislab/[email protected]
but the installed version will be 0.1
, not 0.0
as the tag says (i’m not sure 0.0 is a legal version anyway)
I want to split AnnData after tl.diffmap according to each cell's library. But it appears that row-slicing AnnData after diffmap, dpt, or louvain gives the error message AttributeError: 'AnnData' object has no attribute '_n_obs'
. But AnnData.X and AnnData.obs can be sliced. Could you please give me advice?
>>> adata = sc.read_10x_h5('filtered_gene_bc_matrices_h5.h5', 'mm10')
>>> scanpy.api.tl.diffmap(adata)
>>> adata_diffmap[:, 0]
View of AnnData object with n_obs × n_vars = 5000 × 1
>>> adata_diffmap[0, :]
AttributeError: 'AnnData' object has no attribute '_n_obs'
everyone agrees it’s objectively bad 😉
therefore the characters “jet” should only appear in the commit message “jettisoned bad colormap defaults”
i’ll do it if you want, and i’ll make sure contrast and distinction is preserved in all figures
%matplotlib inline
import scanpy
when importing scanpy from a jupyter notebook I get this warning, because apparently scanpy calls matplotlib.use()
. if at all, it should only do that after checking that no backend is already selected
/home/icb/philipp.angerer/.local/lib/python3.5/site-packages/matplotlib/__init__.py:1401: UserWarning: This call to matplotlib.use() has no effect
because the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.
warnings.warn(_use_error_msg)
and this, which isn’t actually a Warning
, is printed to stdout (why?)
... WARNING: did not find DISPLAY variable needed for interactive plotting
--> try ssh with `-X` or `-Y`
setting `sett.savefigs = True`
in an interactive notebook or other shell, sett.savefigs
shouldn’t be automatically set to True
:class:~scanpy.api.AnnData
appears. :class:AnnData <http://anndata.readthedocs.io/en/latest/anndata.AnnData.html>
has the correct css style, but does not hyperlink. Probably a solution via http://www.sphinx-doc.org/en/1.5.1/ext/extlinks.html together with the definition of an :extclass:
role that inherits the :class:
properties would be the correct way to do it.__init__
method appears in AnnData
attributes
appear after methods
Looking forward to plot genes that are not used in pca/tsne analysis while exist in the raw data!
is there a way to systematically find the ideal k? There is a find_sigmas function in the R destiny package. Is the parameter k related to the parameter sigma? Thanks.
I would be nice to have a method that can merge several AnnData objects together.
Thanks for you work!
There is a feature that I would be nice to have. Once the DPT has been calculated, it would be nice to be able to "predict" (like in the R package destiny) new data into the calculate DifussionMap . It would be great to include a label that tells apart the new cells from the already existing ones.
Thanks
Hi,
May be I do not understand this clearly, but does the function sc.pp.normalize_per_cell
not supposed normalize the counts? However when I run this function, I see that the values do not change at all. Am I doing something wrong here?
raw_data.X.toarray().sum(axis=1)
Out[18]: array([ 23037., 18883., 20755., ..., 14785., 20996., 7604.], dtype=float32)
normed_data = sc.pp.normalize_per_cell(raw_data, copy=True)
normed_data.X.toarray().sum(axis=1)
Out[19]: array([ 23037., 18883., 20755., ..., 14785., 20996., 7604.], dtype=float32)
Thanks
This collects new features of version 0.3.1 while extending other issues such as #45.
I have some issues runnign tSNE with sc.tsne(adata)
. It seems to work on the moignard15
data set but running the same code with my data set results in the following error.
compute tSNE
preprocess using PCA with 50 PCs
--> avoid this by setting n_pcs = 0
0:00:02.013 - compute PCA with n_comps = 50
0:00:00.162 - finished
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
<ipython-input-5-ea03cbb426c5> in <module>()
----> 1 sc.tsne(adata)
/opt/conda/lib/python3.6/site-packages/scanpy/tools/tsne.py in tsne(adata, random_state, n_pcs, perplexity)
59 sett.m(0, 'preprocess using PCA with', n_pcs, 'PCs')
60 sett.m(0, '--> avoid this by setting n_pcs = 0')
---> 61 X = pca(adata.X, random_state=random_state, n_comps=n_pcs)
62 adata['X_pca'] = X
63 else:
/opt/conda/lib/python3.6/site-packages/scanpy/tools/pca.py in pca(adata_or_X, n_comps, zero_center, svd_solver, random_state)
60 zero_center, svd_solver,
61 random_state=random_state)
---> 62 adata['X_pca'] = X_pca
63 if isadata:
64 return adata
UnboundLocalError: local variable 'adata' referenced before assignment
Hi, I hit this error when trying to filter genes.
A minimal working example is included below.
Any help appreciated.
def paul15_raw():
filename = 'data/paul15/paul15.h5'
backup_url = 'http://falexwolf.de/data/paul15.h5'
adata = sc.read(filename, 'data.debatched', backup_url=backup_url)
# each row has to correspond to a sample, therefore transpose
adata = adata.transpose() # cluster assocations identified by Paul et al.
clusters = sc.read(filename, 'cluster.id', return_dict=True)['X'].flatten()
# names reflecting the cell type identifications from the paper
cell_types = {i: 'Ery' for i in range(1, 7)}
cell_types[7] = 'MEP'
cell_types[8] = 'Mk'
cell_types[9] = 'GMP'
cell_types[10] = 'GMP'
cell_types[11] = 'DC'
cell_types[12] = 'Baso'
cell_types[13] = 'Baso'
cell_types[14] = 'Mo'
cell_types[15] = 'Mo'
cell_types[16] = 'Neu'
cell_types[17] = 'Neu'
cell_types[18] = 'Eos'
cell_types[19] = 'Other'
adata.smp['paul15_clusters'] = [str(i) + cell_types[i] for i in clusters.astype(int)]
infogenes_names = sc.read(filename, 'info.genes_strings', return_dict=True)['X']
# just keep the first of the two equivalent names per gene
adata.var_names = np.array([gn.split(';')[0] for gn in adata.var_names])
# remove 10 corrupted gene names
infogenes_names = np.intersect1d(infogenes_names, adata.var_names)
# restrict the data to the 3461 informative genes
adata = adata[:, infogenes_names]
adata.add['iroot'] = np.flatnonzero(adata.smp['paul15_clusters'] == '7MEP')[0]
return adata
adata = paul15_raw()
afilter = sc.pp.recipe_zheng17(adata, n_top_genes=1000, zero_center=True, plot=True, copy=True)
or
afilter = sc.pp.filter_genes_dispersion(adata, n_top_genes=1000)
both fail with
AttributeError: 'Series' object has no attribute 'is_dtype_equal'
when computing the dispersion norm (line 207, simple.py)
207 df['dispersion_norm'] = (df['dispersion'].values # use values here as index differs
--> 208 - disp_mean_bin[df['mean_bin']].values) \
209 / disp_std_bin[df['mean_bin']].values
Running Scanpy version 0.2.6 on 2017-08-23 14:35.
Is there any way to estimate the number of branching points? It seems that this number has to be explicitly given before running the algorithm, which might be difficult if it's not clear from simply looking at the dimensionally reduced data.
Hi,
when I use dpt_timeseries function with adata.n_var > 50, there is no output or any hint and just empty.
I saw the source code and there is a comment # only if number of genes is not too high
Does it mean this function could not deal with n_var > 50?
I think there would be a hint when n_var > 50, if not it will confuse others. Thank you~
I've been able to successfully use scanpy for MEX-formatted datasets, and the documentation here was great in the Jupyter notebooks.
Many of our other datasets are in a simple tab format where the first column is gene symbol and the rest are in GROUP--CELLID format, like this:
Gene_symbol lymph_1--cell_avg lymph_1--Cell_1 lymph_1--Cell_10
A1BG.AS1 13.9085855833156 0 54.3778851449283
A2M.AS1 10.2185780428145 0 0
A2MP1 0 0 0
AADACL2 0 0 0
AAGAB 136.889472532613 0 0
AAR2 76.3090843598131 0 0
AATF 360.127068564485 0 0
AATK 2.93980819712579 0 0
AATK.AS1 0 0 0
These contain up to 30,000 rows and (so far) 400,000 columns.
I'd love to see more use cases for how to absorb different formats of data into scanpy. I'd be happy to help write it, but so far have only got MEX data to work.
I'm trying to install scanpy through pip install scanpy
but I'm getting this weird error
$ pip install scanpy
Collecting scanpy
Downloading scanpy-0.2.9.1.tar.gz (208kB)
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-lbk_t73k/scanpy/setup.py", line 39, in <module>
readme = readme_f.read()
File "/opt/conda/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 296: ordinal not in range(128)
That's with pip version 9.0.1
and python 3.6. I'm getting similar errors for older versions of scanpy, including 0.2.1. Perhaps this is a bug in pip but I'm not sure, I installed a bunch of other unrelated packages without any issues.
Hi Guys,
Thanks for developing such a wonderful tool in python.
Do you mind to point me on how to install scanpy under anaconda environment? currently "conda search scanpy" doesn't find it?!
Thanks
Hashem
I tried running https://github.com/theislab/scanpy_usage/blob/master/170501_moignard15/moignard15.ipynb and got this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/networkx/classes/graph.py in neighbors(self, n)
1058 try:
-> 1059 return list(self.adj[n])
1060 except KeyError:
KeyError: None
During handling of the above exception, another exception occurred:
NetworkXError Traceback (most recent call last)
<ipython-input-11-f3d9663e2b3b> in <module>()
1 adata.add['dpt_groups_names'] = ['undecided/endothelial', 'endothelial', 'erythrocytes', 'trunk'] # optional
----> 2 sc.pl.dpt(adata, color=['dpt_pseudotime', 'dpt_groups', 'exp_groups'], legendloc='upper left')
~/Documents/scanpy/scanpy/plotting/__init__.py in dpt(adata, basis, color, names, comps, cont, layout, legendloc, cmap, pal, right_margin, size, titles, show)
385 if not isinstance(color, list): colors = color.split(',')
386 else: colors = color
--> 387 if 'dpt_groups' in colors: dpt_tree(adata, show=False)
388 dpt_timeseries(adata, cmap=cmap, show=show)
389
~/Documents/scanpy/scanpy/plotting/__init__.py in dpt_tree(adata, root, colors, names, show, fontsize)
463 if name in sett._ignore_categories: colors[iname] = 'grey'
464 G = nx.Graph(adata.add['dpt_groups_adjacency'])
--> 465 pos = utils.hierarchy_pos(G, root)
466 fig = pl.figure(figsize=(5, 5))
467 ax = pl.axes([0, 0, 1, 1], frameon=False)
~/Documents/scanpy/scanpy/plotting/utils.py in hierarchy_pos(G, root, levels, width, height)
455
456 if levels is None:
--> 457 levels = make_levels({})
458 else:
459 levels = {l: {TOTAL: levels[l], CURRENT: 0} for l in levels}
~/Documents/scanpy/scanpy/plotting/utils.py in make_levels(levels, node, currentLevel, parent)
434 levels[currentLevel] = {TOTAL: 0, CURRENT: 0}
435 levels[currentLevel][TOTAL] += 1
--> 436 neighbors = G.neighbors(node)
437 if parent is not None:
438 neighbors.remove(parent)
/usr/local/lib/python3.6/site-packages/networkx/classes/graph.py in neighbors(self, n)
1059 return list(self.adj[n])
1060 except KeyError:
-> 1061 raise NetworkXError("The node %s is not in the graph." % (n,))
1062
1063 def neighbors_iter(self, n):
NetworkXError: The node None is not in the graph.
Hi,
I have tried the Moignard early blood development example and your results can be reproduced. However, I have some questions as follows.
What does the dpt_order in the final result mean?
I thought it should be the order of cells according to their pseudotime. However, after checking the "smp.csv" in the generated data, it seems there is no correspondence between dpt_order and dpt_pseudotime. For instance, the order of 3 cells with the smallest dpt_pseudotime are 3308, 1813, 3301 respectively (instead of 0, 1, 2).
How to adjust the figure size of sc.pl.dpt?
I have tried to put plt.figure(figsize=(15,15))
, but it seems that it doesn't work. Maybe we have to plot each figure manually.
Can we get information about branch assignment?
In this example, there are two branches. How can we know which branch each cell belongs to if possible? Do we have to find this indirectly with the dpt_group information?
Thanks.
Hi, first thanks for sharing this analysis tool. I prefer Python much more to R, though most Bioinformatics tools are written in R. Here I want to ask a question about data processing before we feed it as adata into dpt for pseudotime ordering.
As the DPT algorithm can accept multiple types of data, such as the most commonly single-cell qPCR (Ct values) and RNA-Seq (FPKM/TPM) data, is the data processing procedure identical with each other? Since I have also checked the Monocle 2 algorithm, it seems much more complicated in Monocle 2. For instance, in the 4th page of its document link, it asks you to specify different expressionFamily, i.e., the proper distribution of the data, for different kinds of data. Then, how about the dpt function in scanpy? Does it take all kinds of data the same way?
According to my understanding,
Any help is appreciated.
struct_dict['k']
should return a simple 1D ndarray, not a StructDict
struct_dict[['a', 'b']]
should return a proper StructDict
including all additional fields like _keys
both problems are visible in this test: https://travis-ci.org/theislab/scanpy/jobs/227216146#L224
but a test should be added for the second point once the first point is fixed
i would have fixed it, but i don’t have the slightest idea what all the “multicolumn” fields are for. also i don’t believe in too many caches and private fields, they get out of sync too easily. recomputing tiny things is fast, e.g. self._keys
could simply be replaced with self.dtype.names
E.g. matplotlib is only necessary when plotting, and for e.g. Docker images, it would be useful to have a slim scanpy core.
An idea would be to do it like Jupyter:
scanpy-core
PyPI package with just the essentials.scanpy
metapackage, which depends on scanpy-core
and most (or all) of the optional dependencies.Users doing pip install scanpy
will get the full package, with no annoying runtime errors, and packagers needing flexibility get scanpy-core
and can slim everything down as needed.
cc @hensing
I got an error doing pip3 install -e .
:
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/include -I/usr/local/opt/openssl/include -I/usr/local/opt/sqlite/include -I/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/include/python3.6m -c scanpy/cython/utils_cy.c -o build/temp.macosx-10.11-x86_64-3.6/scanpy/cython/utils_cy.o scanpy/cython/utils_cy.c:435:10: fatal error: 'numpy/arrayobject.h' file not found #include "numpy/arrayobject.h" ^ 1 error generated. error: command 'clang' failed with exit status 1 ----------------------------------------
Command "/usr/local/opt/python3/bin/python3.6 -c "import setuptools, tokenize;file='/Users/jyhung/Documents/scanpy/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" develop --no-deps" failed with error code 1
It worked after I installed cython: pip3 install cython
hi,
I installed the scanpy-master , but when I type scanpy --help
in bash the error occurred. I noticed that diffrank was replaced by rank_genes_groups, but I don't know how to fix it.
Traceback (most recent call last):
File "/public/bioapps/ana/anaconda3/envs/python35/bin/scanpy", line 11, in <module>
load_entry_point('scanpy==0+unknown', 'console_scripts', 'scanpy')()
File "/public/bioapps/ana/anaconda3/envs/python35/lib/python3.5/site-packages/scanpy-0+unknown-py3.5-linux-x86_64.egg/scanpy/__main__.py", line 278, in main
init_main_parser().print_help()
File "/public/bioapps/ana/anaconda3/envs/python35/lib/python3.5/site-packages/scanpy-0+unknown-py3.5-linux-x86_64.egg/scanpy/__main__.py", line 117, in init_main_parser
descr = 78*'-' + '\n' + getattr(tools, key).__doc__
AttributeError: module 'scanpy.api.tools' has no attribute 'diffrank'
In DropSeq experiments cell names are encoded by 12nt barcodes. It seems that no name check is performed when merging multiple datasets in ScanPy.
>>> from collections import Counter
>>> import scanpy.api as sc
>>> f = sc.read("data1.txt").transpose()
>>> g = sc.read("data2.txt").transpose()
>>> c = f.concatenate(g)
>>> len(c.obs_names)
7932
>>> len(set(c.obs_names))
7890
>>> cc = Counter(c.obs_names)
>>> cc.most_common(10)
[('AAAAAAAAAAAA', 2), ('TCCTGTCTCTTA', 2), ('CGCAAGGGAAAG', 2), ('ACCCGTCTATGT', 2), ('CTCCTGTCTCTT', 2), ('TTCCTGTCTCTT', 2), ('CCCTGTCTCTTA', 2), ('CCGCTGTCTCTT', 2), ('GACAAACCTACC', 2), ('ACACTGTCTCTT', 2)]
I was trying to reproduce the results in Example 1 on notebook
https://github.com/theislab/scanpy_usage/tree/master/170505_seurat
I'm getting two problems in the filtering steps in cell 9:
although genes seem to be filtered (there are 1838 genes left versus 13714 before), the plot does not show a different colour for 'highly variable' and 'other' genes. Both appear black (see attached figure). I've both tried it in a jupyter notebook and ipython. I'm running python in a conda environment with matplotlib 4.3.2.25.py35_0 and seaborn 0.8_py35
There's also the following warning message, that seems to complain of a divide by zero on the mean:
/anaconda/lib/python3.5/site-packages/scanpy/preprocessing/simple.py:193: RuntimeWarning: invalid value encountered in true_divide
dispersion = var / mean
Is
Thanks!
I just have scanpy 0.2.7 and am trying to produce bpmc3 results. BUT right at the beginning (sc.read()) the following error! I will appreciate your help.
thanks
`--------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in ()
2 filename_genes = '/ifs/projects/proj077/backup/public_data/scanpy_tutorials_data/PBMC3K/filtered_gene_bc_matrices/hg19/genes.tsv'
3 filename_barcodes = '/ifs/projects/proj077/backup/public_data/scanpy_tutorials_data/PBMC3K/filtered_gene_bc_matrices/hg19/barcodes.tsv'
----> 4 adata = sc.read(filename_data, cache=True).transpose()
5 adata.var_names = np.genfromtxt(filename_genes, dtype=str)[:, 1]
6 adata.smp_names = np.genfromtxt(filename_barcodes, dtype=str)
/ifs/devel/hashem/sw-v1/conda/lib/python3.6/site-packages/scanpy/readwrite.py in read(filename_or_filekey, sheet, ext, delimiter, first_column_names, backup_url, return_dict, cache)
73 if is_filename(filename_or_filekey):
74 data = read_file(filename_or_filekey, sheet, ext, delimiter,
---> 75 first_column_names, backup_url, cache)
76 if isinstance(data, dict):
77 return data if return_dict else AnnData(data)
/ifs/devel/hashem/sw-v1/conda/lib/python3.6/site-packages/scanpy/readwrite.py in read_file(filename, sheet, ext, delimiter, first_column_names, backup_url, cache)
364 os.makedirs(os.path.dirname(filename_cache))
365 # write for faster reading when calling the next time
--> 366 write_dict_to_file(filename_cache, ddata, sett.file_format_data)
367 return ddata
368
/ifs/devel/hashem/sw-v1/conda/lib/python3.6/site-packages/scanpy/readwrite.py in write_dict_to_file(filename, d, ext)
771 d_write[key] = value
772 # now open the file
--> 773 wait_until_file_unused(filename) # thread-safe writing
774 if ext == 'h5':
775 with h5py.File(filename, 'w') as f:
/ifs/devel/hashem/sw-v1/conda/lib/python3.6/site-packages/scanpy/readwrite.py in wait_until_file_unused(filename)
935
936 def wait_until_file_unused(filename):
--> 937 while (filename in get_used_files()):
938 time.sleep(1)
939
/ifs/devel/hashem/sw-v1/conda/lib/python3.6/site-packages/scanpy/readwrite.py in get_used_files()
919 def get_used_files():
920 """Get files used by processes with name scanpy."""
--> 921 loop_over_scanpy_processes = (proc for proc in psutil.process_iter()
922 if proc.name() == 'scanpy')
923 filenames = []
AttributeError: module 'psutil' has no attribute 'process_iter'
`
Hello,
I've tried setting up your package via pip, as instructed. This crashed out very quickly:
mib111492i:~ kp9$ pip install scanpy
Collecting scanpy
Downloading scanpy-0.2.9.1.tar.gz (208kB)
100% |################################| 215kB 2.8MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/tmp/pip-build-cx2i4lbu/scanpy/setup.py", line 39, in <module>
readme = readme_f.read()
File "/Users/kp9/anaconda3/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 296: ordinal not in range(128)
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/tmp/pip-build-cx2i4lbu/scanpy/
The offender seems to be the stylised README file, so I downloaded the source code, got rid of it, and proceeded with the installation. I'm unsure how representative the following encountered issues are of an ideal pip installation, but I figured I'd bring them to your attention anyway just in case they're relevant:
brew install hdf5
, and it seems like the most likely of these issues to affect other users.Thanks!
adata_comp
is only defined in the if branch:
Recently I installed scanpy 0.4. However, with this new version I could not correctly load result files generated by an old version (v0.2.8). In particular, I could not load the old add_keys as uni_keys.
Any suggestions?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.