bioinfo-biols / sevtras Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 5.0 9.59 MB

sEV-containing droplet identification in scRNA-seq data (SEVtras)

License: GNU Affero General Public License v3.0

Python 100.00%

sevtras's People

Contributors

Stargazers

Watchers

Forkers

bit-vs-it fly-fiancee filmchen dr-smectite

sevtras's Issues

error in SEVtras.ESAI_calculator when constructing adata_cell.raw

Dear developer:
Due to the requirement of SEVtras to save the adata as adata.raw before any filtering and normalization steps, which differs from the standard Scanpy workflow that saves adata as adata.raw after quality control filtering and log transformation, I reserved an identical and untreated copy of the object, adata2（n_obs × n_vars = 298308 × 32285）, at the time of object creation. After completing the standard analysis on adata1 with Scanpy（n_obs × n_vars = 232884 × 25113）and obtaining the cell types, I attempted to use adata.raw = adata2.copy() and obtained an adata.raw.shape of (232884, 32285), which appears to have filtered out cells without filtering any genes, seemingly achieving the desired outcome. However, when I ran the ESAI_calculator, I encountered the following error:

File ~/miniconda3/envs/scanpy/lib/python3.10/site-packages/SEVtras/main.py:188, in ESAI_calculator(adata_ev_path, adata_cell_path, out_path, species, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW, plot_cmp, save_plot_prefix, OBSMumap, size)
186 adata_cell = read_adata(adata_cell_path, get_only=False)
187 from .functional import deconvolver, ESAI_celltype, plot_SEVumap, plot_ESAIumap
--> 188 celltype_e_number, adata_evS, adata_com = deconvolver(adata_ev, adata_cell, species, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
189 ##ESAI for sample
190 sample_ESAI = (adata_com[adata_com.obs[OBScelltype]==OBSev,].obs[OBSsample].value_counts() / adata_com[adata_com.obs[OBScelltype]!=OBSev,].obs[OBSsample].value_counts()).fillna(0)

File ~/miniconda3/envs/scanpy/lib/python3.10/site-packages/SEVtras/functional.py:117, in deconvolver(adata_ev, adata_cell, species, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
115 def deconvolver(adata_ev, adata_cell, species, OBSsample='batch', OBScelltype='celltype', OBSev='sEV', OBSMpca='X_pca', cellN=10, Xraw = True, normalW=True):
--> 117 adata_combined = preprocess_source(adata_ev, adata_cell, OBScelltype=OBScelltype, OBSev=OBSev, Xraw = Xraw)
118 gsea_pval_dat = source_biogenesis(adata_cell, species, OBScelltype=OBScelltype, Xraw = Xraw, normalW=normalW)
119 near_neighbor_dat = near_neighbor(adata_combined, OBSsample=OBSsample, OBSev=OBSev, OBScelltype=OBScelltype, OBSMpca=OBSMpca, cellN=cellN)

File ~/miniconda3/envs/scanpy/lib/python3.10/site-packages/SEVtras/functional.py:74, in preprocess_source(adata_ev, adata_cell, OBScelltype, OBSev, Xraw)
71 def preprocess_source(adata_ev, adata_cell, OBScelltype='celltype', OBSev='sEV', Xraw = True):
72 ## cell type
73 if Xraw:
---> 74 adata_cell_raw = copy.copy(adata_cell.raw.to_adata())
75 else:
76 adata_cell_raw = copy.copy(adata_cell)

File ~/miniconda3/envs/scanpy/lib/python3.10/site-packages/anndata/_core/raw.py:159, in Raw.to_adata(self)
156 """Create full AnnData object."""
157 from anndata import AnnData
--> 159 return AnnData(
160 X=self.X.copy(),
161 var=self.var.copy(),
162 varm=None if self._varm is None else self._varm.copy(),
163 obs=self._adata.obs.copy(),
164 obsm=self._adata.obsm.copy(),
165 obsp=self._adata.obsp.copy(),
166 uns=self._adata.uns.copy(),
167 )

File ~/miniconda3/envs/scanpy/lib/python3.10/site-packages/anndata/_core/anndata.py:271, in AnnData.init(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
269 self._init_as_view(X, oidx, vidx)
270 else:
--> 271 self._init_as_actual(
272 X=X,
273 obs=obs,
274 var=var,
275 uns=uns,
276 obsm=obsm,
277 varm=varm,
278 raw=raw,
279 layers=layers,
280 dtype=dtype,
281 shape=shape,
282 obsp=obsp,
283 varp=varp,
284 filename=filename,
285 filemode=filemode,
286 )

File ~/miniconda3/envs/scanpy/lib/python3.10/site-packages/anndata/_core/anndata.py:453, in AnnData._init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
450 source = "shape"
452 # annotations
--> 453 self._obs = _gen_dataframe(
454 obs, ["obs_names", "row_names"], source=source, attr="obs", length=n_obs
455 )
456 self._var = _gen_dataframe(
457 var, ["var_names", "col_names"], source=source, attr="var", length=n_vars
458 )
460 # now we can verify if indices match!

File ~/miniconda3/envs/scanpy/lib/python3.10/functools.py:889, in singledispatch..wrapper(*args, **kw)
885 if not args:
886 raise TypeError(f'{funcname} requires at least '
887 '1 positional argument')
--> 889 return dispatch(args[0].class)(*args, **kw)

File ~/miniconda3/envs/scanpy/lib/python3.10/site-packages/anndata/_core/aligned_df.py:64, in _gen_dataframe_df(anno, index_names, source, attr, length)
54 @_gen_dataframe.register(pd.DataFrame)
55 def _gen_dataframe_df(
56 anno: pd.DataFrame,
(...)
61 length: int | None = None,
62 ):
63 if length is not None and length != len(anno):
---> 64 raise _mk_df_error(source, attr, length, len(anno))
65 anno = anno.copy(deep=False)
66 if not is_string_dtype(anno.index):

ValueError: Observations annot. obs must have as many rows as X has rows (298308), but has 232884 rows.

Perhaps my understanding of the Scanpy data structure is not sufficient, so I may not have correctly created adata.raw. How should I obtain adata_cell that is annotated with cell types? According to the error message, should I not filter any cells or genes at all?
I am looking forward to you providing some assistance in resolving the current issue.

ValueError: Bin edges must be unique:

run :
SEVtras.ESAI_calculator(adata_ev_path="./output/sEV_SEVtras_sample10.h5ad",
adata_cell_path='./scanpy_output/adata_gex_10sample.h5ad', out_path='./output/',
Xraw=False, OBSsample='sampleName', OBScelltype='celltype')

error:
/home/data/wangp_sc/.conda/envs/SEVtras_env/lib/python3.8/site-packages/anndata/_core/merge.py:942: UserWarning: Only some AnnData objects have `.raw` attribute, not concatenating `.raw` attributes.
warn(
/home/data/wangp_sc/.conda/envs/SEVtras_env/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
[AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/data/wangp_sc/.conda/envs/SEVtras_env/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
[AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/data/wangp_sc/.conda/envs/SEVtras_env/lib/python3.8/site-packages/scanpy/preprocessing/_normalization.py:197: UserWarning: Some cells have zero counts
warn(UserWarning('Some cells have zero counts'))
/home/data/wangp_sc/.conda/envs/SEVtras_env/lib/python3.8/site-packages/scanpy/preprocessing/_simple.py:352: RuntimeWarning: invalid value encountered in log1p
np.log1p(X, out=X)

ValueError Traceback (most recent call last)
Cell In[24], line 1
----> 1 SEVtras.ESAI_calculator(adata_ev_path="./output/sEV_SEVtras_sample10.h5ad",
2 adata_cell_path='./scanpy_output/adata_gex_10sample.h5ad', out_path='./output/',
3 Xraw=False, OBSsample='sampleName', OBScelltype='celltype')

File ~/.conda/envs/SEVtras_env/lib/python3.8/site-packages/SEVtras/main.py:188, in ESAI_calculator(adata_ev_path, adata_cell_path, out_path, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW, plot_cmp, save_plot_prefix, OBSMumap, size)
186 adata_cell = read_adata(adata_cell_path, get_only=False)
187 from .functional import deconvolver, ESAI_celltype, plot_SEVumap, plot_ESAIumap
--> 188 celltype_e_number, adata_evS, adata_com = deconvolver(adata_ev, adata_cell, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
189 ##ESAI for sample
190 sample_ESAI = (adata_com[adata_com.obs[OBScelltype]==OBSev,].obs[OBSsample].value_counts() / adata_com[adata_com.obs[OBScelltype]!=OBSev,].obs[OBSsample].value_counts()).fillna(0)

File ~/.conda/envs/SEVtras_env/lib/python3.8/site-packages/SEVtras/functional.py:114, in deconvolver(adata_ev, adata_cell, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
112 def deconvolver(adata_ev, adata_cell, OBSsample='batch', OBScelltype='celltype', OBSev='sEV', OBSMpca='X_pca', cellN=10, Xraw = True, normalW=True):
--> 114 adata_combined = preprocess_source(adata_ev, adata_cell, OBScelltype=OBScelltype, OBSev=OBSev, Xraw = Xraw)
115 gsea_pval_dat = source_biogenesis(adata_cell, OBScelltype=OBScelltype, Xraw = Xraw, normalW=normalW)
116 near_neighbor_dat = near_neighbor(adata_combined, OBSsample=OBSsample, OBSev=OBSev, OBScelltype=OBScelltype, OBSMpca=OBSMpca, cellN=cellN)

File ~/.conda/envs/SEVtras_env/lib/python3.8/site-packages/SEVtras/functional.py:88, in preprocess_source(adata_ev, adata_cell, OBScelltype, OBSev, Xraw)
86 sc.pp.normalize_total(adata_combined, target_sum=1e4)
87 sc.pp.log1p(adata_combined)
---> 88 sc.pp.highly_variable_genes(adata_combined, min_mean=0.0125, max_mean=3, min_disp=0.5)
89 # sc.pl.highly_variable_genes(Normal_combined)
90 adata_combined = adata_combined[:, adata_combined.var.highly_variable]#highly_variable

File ~/.conda/envs/SEVtras_env/lib/python3.8/site-packages/scanpy/preprocessing/_highly_variable_genes.py:440, in highly_variable_genes(adata, layer, n_top_genes, min_disp, max_disp, min_mean, max_mean, span, n_bins, flavor, subset, inplace, batch_key, check_values)
428 return _highly_variable_genes_seurat_v3(
429 adata,
430 layer=layer,
(...)
436 inplace=inplace,
437 )
439 if batch_key is None:
--> 440 df = _highly_variable_genes_single_batch(
441 adata,
442 layer=layer,
443 min_disp=min_disp,
444 max_disp=max_disp,
445 min_mean=min_mean,
446 max_mean=max_mean,
447 n_top_genes=n_top_genes,
448 n_bins=n_bins,
449 flavor=flavor,
450 )
451 else:
452 sanitize_anndata(adata)

File ~/.conda/envs/SEVtras_env/lib/python3.8/site-packages/scanpy/preprocessing/_highly_variable_genes.py:215, in _highly_variable_genes_single_batch(adata, layer, min_disp, max_disp, min_mean, max_mean, n_top_genes, n_bins, flavor)
213 df['dispersions'] = dispersion
214 if flavor == 'seurat':
--> 215 df['mean_bin'] = pd.cut(df['means'], bins=n_bins)
216 disp_grouped = df.groupby('mean_bin')['dispersions']
217 disp_mean_bin = disp_grouped.mean()

File ~/.conda/envs/SEVtras_env/lib/python3.8/site-packages/pandas/core/reshape/tile.py:293, in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates, ordered)
290 if (np.diff(bins.astype("float64")) < 0).any():
291 raise ValueError("bins must increase monotonically.")
--> 293 fac, bins = _bins_to_cuts(
294 x,
295 bins,
296 right=right,
297 labels=labels,
298 precision=precision,
299 include_lowest=include_lowest,
300 dtype=dtype,
301 duplicates=duplicates,
302 ordered=ordered,
303 )
305 return _postprocess_for_cut(fac, bins, retbins, dtype, original)

File ~/.conda/envs/SEVtras_env/lib/python3.8/site-packages/pandas/core/reshape/tile.py:420, in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
418 if len(unique_bins) < len(bins) and len(bins) != 2:
419 if duplicates == "raise":
--> 420 raise ValueError(
421 f"Bin edges must be unique: {repr(bins)}.\n"
422 f"You can drop duplicate edges by setting the 'duplicates' kwarg"
423 )
424 bins = unique_bins
426 side: Literal["left", "right"] = "left" if right else "right"

ValueError: Bin edges must be unique: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan]).
You can drop duplicate edges by setting the 'duplicates' kwarg

Does SEVtras apply to data generated by STRT-seq or SMART-seq2?

Hello, Dr. He. Sorry to bother you again.
I already knew that SEVtras would work with single cell RNA sequencing data produced by the 10x genomics chromium , which is based on droplet sequencing, but I didn't know if SEVtras would work with SMART-seq2 or STRT-seq.
From what I have learned so far, STRT-seq uses single tubes and 96 cells captured via C1 microfluidic chip for single-cell RNA sequencing. In this process, could sEV not be sequenced separately as it is in the droplets produced in the 10x genomics chromium process?
I really want to know that.

UnboundLocalError: local variable 'ev_list' referenced before assignment

Dear Dr. He,
I am trying to run a task following the instructions in the "SEVtras" tutorial. Because I only have a single sample, I used the method suggested in the tutorial and made two duplicates as my input.

Does it only apply to humans and mice, and not to other non-model species?

UnboundLocalError: local variable 'ev_list' referenced before assignment.

ValueError: Length mismatch in deconvolver

Thank you very much for the tool you developed. But an error occurred when I applied it to calculate ESAI. I ruled out many possibilities, but still could not solve the problem. Can you take a look at it for me when you have time?

Traceback (most recent call last):
File "./SEVtras/calculator.py", line 55, in
SEVtras.ESAI_calculator(adata_ev_path='SEVtras/sEV_SEVtras_231219.h5ad', adata_cell_path='SEVtras/hm_Patient_hvg3000_PC30_res1_dist0.1_k20_S0_rawcount_231219.h5ad', out_path='SEVtras', Xraw=False, OBSsample='Sample', OBScelltype='subtype')
File "/Mn/conda/envs/sevtras/lib/python3.7/site-packages/SEVtras/main.py", line 187, in ESAI_calculator
celltype_e_number, adata_evS, adata_com = deconvolver(adata_ev, adata_cell, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
File "/Mn/conda/envs/sevtras/lib/python3.7/site-packages/SEVtras/functional.py", line 128, in deconvolver
near_neighbor_dat.index = adata_ev.obs.index
File "/Mn/conda/envs/sevtras/lib/python3.7/site-packages/pandas/core/generic.py", line 5500, in setattr
return object.setattr(self, name, value)
File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.set
File "/Mn/conda/envs/sevtras/lib/python3.7/site-packages/pandas/core/generic.py", line 766, in _set_axis
self._mgr.set_axis(axis, labels)
File "/Mn/conda/envs/sevtras/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 216, in set_axis
self._validate_set_axis(axis, new_labels)
File "/Mn/conda/envs/sevtras/lib/python3.7/site-packages/pandas/core/internals/base.py", line 58, in _validate_set_axis
f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 135224 elements, new values have 144325 elements

ValueError: max() arg is an empty sequence

Hello ! When I use this code: SEVtras.sEV_recognizer(input_path='./', sample_file='./skin1/sample_file.txt', out_path='./skin1/
outputs', species='Homo')
to anylsis my 10x-scRNAseq raw file, after a few minutes run it return the error "ValueError: max() arg is an empty sequence".
I try to input another file but still not working.
I need some help, thank you!

issues about inport data in SEVtras

Dear Author,

I am using this script for analysis:
import SEVtras
SEVtras.sEV_recognizer(input_path='/home/yeziyang/Sc', sample_file='/home/yeziyang/Sc/Sc1_LN', out_path='/home/yeziyang/Sc/outputs', species='Homo')

My 10x_mtx formatted data is stored in this directory: /home/yeziyang/Sc/Sc1_LN/outs/raw_feature_bc_matrix/matrix.mtx.gz. I have ensured it is extracted from the raw_feature_bc_matrix. However, I encountered the following error:
File "run_SEVtras.py", line 2, in
SEVtras.sEV_recognizer(input_path='/home/yeziyang/Sc', sample_file='/home/yeziyang/Sc/Sc1_LN', out_path='/home/yeziyang/Sc/outputs', species='Homo')
File "/home/yeziyang/miniconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/main.py", line 79, in sEV_recognizer
sample_log = get_sample(sample_file)
File "/home/yeziyang/miniconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/utils.py", line 18, in get_sample
with open(sample_log, 'r') as f:
IsADirectoryError: [Errno 21] Is a directory: '/home/yeziyang/Sc/Sc1_LN'

I am not sure why this error occurs. Could you please give me some advice? Thank you very much!

Additionally, when I used echo "Sc1_LN" > sample_file and ran the above code again, setting alpha to 0.08, it showed max() arg is an empty sequence. I am not sure if this is due to an error in my import process.

Input data type

Hi, I encountered a issue when preparing the input data. I have a integrated anndata with counts but several samples lacked the features.tsv.gz, bacordes.tsv.gz, and matrix.mtx.gz files. Could the SEVtras accept the anndata.layers['counts'] or h5ad file? Or could I regenerate features.tsv.gz, bacordes.tsv.gz, and matrix.mtx.gz from the anndata per sample? Now I am trying the second approach but have not succeed yet. Thanks

SEVtras score were all NaNs

I appreciate your work on this amazing package!
I am trying to apply the package to my own research, but I encountered a problem when I ran sEV_recognizer on the Metastatic Colorectal Cancer (mCRC) scRNA-seq dataset. It does not detect any sEVs in any of the samples, and the scores in raw_SEVtras.h5ad are all NaN. I tried using both the "raw_feature_bc_matrix" and the h5ad files that I generated for each sample, but neither worked. I also tried lowering the score_t parameter, but it did not help. The logs are attached below and the h5ad files are available here. Is this the expected behavior? I would appreciate your feedback. Thank you!

nohup.out.txt

Error reported in SEVtras.ESAI_calculator

Hello, I currently want to reproduce the results in SEVtras using data from 15 normal tissues. When I was running the SEVtras.ESAI_calculator function, the following error occurred. I tried for a long time but could not solve it, so I am asking for your help.

/home/dell/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/anndata/_core/raw.py:146: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass AnnData(X, dtype=X.dtype, ...)to get the future behavour. uns=self._adata.uns.copy(), /home/dell/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. PassAnnData(X, dtype=X.dtype, ...)` to get the future behavour.
[AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/home/dell/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/anndata/_core/anndata.py:798: UserWarning:
AnnData expects .var.index to contain strings, but got values like:
[]

Inferred to be: empty

value_idx = self._prep_dim_index(value.index, attr)

AttributeError Traceback (most recent call last)
/tmp/ipykernel_19366/2460094773.py in
1 import SEVtras
----> 2 SEVtras.ESAI_calculator(adata_ev_path='./sEV_SEVtras.h5ad', adata_cell_path='../06.Seurat.15Tissue/pbmc.combined.v2.h5ad', out_path='./', Xraw=True, OBSsample='sample', OBScelltype='Stage2')

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/main.py in ESAI_calculator(adata_ev_path, adata_cell_path, out_path, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW, plot_cmp, save_plot_prefix, OBSMumap, size)
185 adata_cell = read_adata(adata_cell_path, get_only=False)
186 from .functional import deconvolver, ESAI_celltype, plot_SEVumap, plot_ESAIumap
--> 187 celltype_e_number, adata_evS, adata_com = deconvolver(adata_ev, adata_cell, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
188 ##ESAI for sample
189 sample_ESAI = (adata_com[adata_com.obs[OBScelltype]==OBSev,].obs[OBSsample].value_counts() / adata_com[adata_com.obs[OBScelltype]!=OBSev,].obs[OBSsample].value_counts()).fillna(0)

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/functional.py in deconvolver(adata_ev, adata_cell, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
112 def deconvolver(adata_ev, adata_cell, OBSsample='batch', OBScelltype='celltype', OBSev='sEV', OBSMpca='X_pca', cellN=10, Xraw = True, normalW=True):
113
--> 114 adata_combined = preprocess_source(adata_ev, adata_cell, OBScelltype=OBScelltype, OBSev=OBSev, Xraw = Xraw)
115 gsea_pval_dat = source_biogenesis(adata_cell, OBScelltype=OBScelltype, Xraw = Xraw, normalW=normalW)
116 near_neighbor_dat = near_neighbor(adata_combined, OBSsample=OBSsample, OBSev=OBSev, OBScelltype=OBScelltype, OBSMpca=OBSMpca, cellN=cellN)

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/functional.py in preprocess_source(adata_ev, adata_cell, OBScelltype, OBSev, Xraw)
79
80 adata_combined.obs[OBScelltype] = pd.Categorical(adata_combined.obs[OBScelltype],
---> 81 categories = np.append(adata_cell_raw.obs[OBScelltype].cat.categories.values, OBSev), ordered = False)
82
83 adata_combined.raw = adata_combined

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/pandas/core/generic.py in getattr(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.getattribute(self, name)
5488
5489 def setattr(self, name: str, value) -> None:

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/pandas/core/accessor.py in get(self, obj, cls)
179 # we're accessing the attribute of the class, i.e., Dataset.geo
180 return self._accessor
--> 181 accessor_obj = self._accessor(obj)
182 # Replace the property with the accessor object. Inspired by:
183 # https://www.pydanny.com/cached-property.html

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in init(self, data)
2599
2600 def init(self, data):
-> 2601 self._validate(data)
2602 self._parent = data.values
2603 self._index = data.index

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in _validate(data)
2608 def _validate(data):
2609 if not is_categorical_dtype(data.dtype):
-> 2610 raise AttributeError("Can only use .cat accessor with a 'category' dtype")
2611
2612 def _delegate_property_get(self, name):

AttributeError: Can only use .cat accessor with a 'category' dtype`

Below is my UMAP diagram

Looking forward to your reply.

Best wish

How to extract the information that the vesicle belongs to which cell secretes it?

Hello，i now found that there is a difference in the ESAI of macrophages in the case and control groups. So i want to see what the differences are in the substances in the vesicles secreted by the macrophages in the two groups. How to extract the information that the vesicle belongs to which cell secretes it?

Error in sEV_recognizer - output path file creation issue w/ multiple samples?

Hello! I appear to be running into issues with initializing the correct file paths/sample file. I am a relatively new python coder, so my apologies if I am providing insufficient information. Please let me know and I will do my best to correct.

For context, my directory is formatted as such:

raw_data
--sample1
----raw_feature_bc_matrix
------barcodes.tsv.gz
------features.tsv.gz
------matrix.tsv.gz
--sample2
----raw_feature_bc_matrix
------barcodes.tsv.gz
------features.tsv.gz
------matrix.tsv.gz
... --sample6

Within the raw_data folder, I have the sample_file.txt file, containing the relative paths to my files (attached). sample_file.txt. I initially tried entering the absolute file paths (as recommended by documentation, 'Here, first parameter was the abosulte path of each sample row by row.'), but received the same issue as below.

When I run the following code:
SEVtras.sEV_recognizer(input_path='./',sample_file='./raw_data/sample_file.txt', out_path='./sev_results', species='Homo',dir_origin=False,predefine_threads=30)

I receive the following output:

0 1
1 1

FileNotFoundError Traceback (most recent call last)
/tmp/ipykernel_90051/889495166.py in
----> 1 SEVtras.sEV_recognizer(input_path='./',sample_file='./raw_data/sample_file.txt', out_path='./sev_results', species='Homo',dir_origin=False,predefine_threads=30)

~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/main.py in sEV_recognizer(sample_file, out_path, input_path, species, predefine_threads, get_only, score_t, search_UMI, alpha, dir_origin)
155 pass
156 else:
--> 157 os.mkdir(str(out_path) + '/tmp_out/' + sample)
158
159 adata.write(str(out_path) + '/tmp_out/' + sample + '/raw_' + sample + '.h5ad')

FileNotFoundError: [Errno 2] No such file or directory: './sev_results/tmp_out/raw_data/sample1/raw_feature_bc_matrix'

I am hoping for some guidance on how to tackle, or some increased clarity on the correct file naming/path procedures. Thank you!

whether SEVtras is suitable for 10x Genomics Spatial Transcriptomics?

Thank you for the great method!

Regarding this algorithm, I'd like to ask whether it is suitable for spatial transcriptomic dataset, such as 10x Genomics Spatial Transcriptomics. If applicable, do the analysis processes and input file formats need to be adjusted?

Thanks for your help!
Best,
xiaoj

Other: ValueError: max() arg is an empty sequence

Hi, I'm having the same problem: max() arg is an empty sequence.
SEVtras.sEV_recognizer(sample_file='./sample_file', out_path='./outputs', species='Homo',alpha=0.09, score_t='10')
I've tried to reduce the thresholds, like: alpha=0.09, score_t='10' (I've also tried score_t='1'), but I still get the same error. Additionally I downloaded a portion of the mouse single cell sequencing data from the GEO data and after cellranger processing (default parameters), I get the following: no genes enriched... and sEV is hard to detect.... and KeyError: 'score'. There is also some warning：
/home/fbio/anaconda3/envs/secret_ev/lib/python3.7/site-packages/SEVtras/sc_pp.py:811: ImplicitModificationWarning: Trying to modify attribute .obs of view, initializing view as actual. adata.obs["n_genes"] = number /home/fbio/anaconda3/envs/secret_ev/lib/python3.7/site-packages/scipy/stats/stats.py:4484: SpearmanRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.

application and visualization

Thank you for developing the algorithm!
I would like to ask, if it's not too much trouble, could you provide us with the visualization code? I am particularly interested in attempting to replicate the figures involving tumor samples that were presented in the article. Could you provide the visualization code for that purpose?

Some issues about output data analyze

Hello, I am a newbie in single cell analysis, currently familiar with R but not too familiar with python, I have a lot of problems in using your software to process my own data, I hope I can get some help, thank you very much!
I re-read your article and software instructions carefully and have some questions from the analysis:
1: When you did the first step of sEV recognition for the 15 tissue samples, did you enter all the samples from different tissues together for the statistics, or did you enter the samples from the same tissues together and the samples from different tissues separately? I am currently analyzing the differences between 11 samples with different trait origins, and I found that putting them together only outputs one raw_SEVtras.h5ad and one sEV_SEVtras.h5ad. the software's output doesn't seem to find the batch information when I read it.
2: What information does each of the output files contain? I input 11 samples, and in addition to getting raw_SEVtras.h5ad and sEV_SEVtras.h5ad, I seem to get 11 folders under the tmp_out path, each of which seems to correspond to one of my input samples, but each of which contains only two files, "itera_gene. txt" and "raw_file.h5ad". I'm not sure exactly which files contain which information and now I'm confused .
3: When I try to convert the h5ad file to h5seurat format using the "Convert" function and read the file in R, I find that I can't read the meta-data. my analysis parameter settings are all the same as in your tutorial.
This is the error
Adding cell-level metadata
Error: Missing required datasets 'levels' and 'values'.
Sorry for asking too many questions at the same time. But these are questions I didn't get answered from your published articles and tutorials, and maybe there are other people who are experiencing the same problem. Looking forward to your reply, thank you very much!

How can we understand "sEV-containing droplets"?

Thank you for the excellent work of your team.
How can we understand "sEV-containing droplets"? After tissue dissociation, sEVs and cells should be in a separated state. How can it be determined which cell the sEV originates from?

Error in SEVtras.sEV_recognizer

Thank you for developing the algorithm!
As I primarily work with R, I'm not as proficient in Python, which has led to some challenges while trying to utilize your software for processing my data. I'm reaching out in the hope of receiving some guidance on how to resolve an issue I encountered at the outset.
When I ran

import SEVtras
SEVtras.sEV_recognizer(sample_file='/opt/conda/Zhoubo/SRR13005718/SRR13005718/outs/raw_feature_bc_matrix/matrix.mtx.gz', out_path='/opt/conda/Zhoubo/SRR13005718', species='Mus')

Unfortunately, I met the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/SEVtras/main.py", line 79, in sEV_recognizer
    sample_log = get_sample(sample_file)
  File "/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/site-packages/SEVtras/utils.py", line 19, in get_sample
    for line in f.readlines():
  File "/opt/conda/miniconda3/envs/SEVtras/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I'm uncertain whether this issue stems from how I've set the file paths or if there's another underlying cause. I would greatly appreciate any insights or suggestions you might have on resolving this error. Thank you very much for your time and assistance.

Issues of batch in sEV_recognizer and sample in ESAI_calculator

Dear Developer,

Thank you for creating such a useful tool.

I encountered some issues while using the ESAI_calculator function. Initially, I performed sEV_recognizer with multiple samples using the following code:

SEVtras.sEV_recognizer(
    input_path='../input_sevtras',
    sample_file='../input_sevtras/sample_file',
    out_path='../sev_recognizing/27samples/500UMI',
    species='Mus',
    predefine_threads=-2,  
    get_only=False,  
    score_t=None,  
    search_UMI=500,  
    alpha=0.15,  
    dir_origin=True  
)

The resulting sEV files contain batch information.

I then used the ESAI_calculator function, and the output in SEVtras_sEVs.h5ad under obsm contains source information with both type and sample fields.

import SEVtras
SEVtras.ESAI_calculator(
    adata_ev_path='../sev_evaluating/seurat_to_h5/sev_evaluate/adata_ev.h5ad',
    adata_cell_path='../sev_evaluating/seurat_to_h5/sev_evaluate/cell_ready.h5ad',
    out_path='../sev_evaluating/seurat_to_h5/sev_evaluate_output',
    species='Mus',
    OBSsample='batch',
    OBScelltype='celltype',
    OBSev='sEV',
    OBSMpca='X_pca',
    cellN=10,
    Xraw=True,
    normalW=True,
    plot_cmp='SEV_builtin',
    save_plot_prefix='',
    OBSMumap='X_umap',
    size=10
)

However, the sample information does not correspond to the output from sEV_recognizer.

I am unclear about the reason for this discrepancy. Could you please clarify which result is correct?

When I run larger 10X raw_feature_bc_matrix the process always be interrupted without any warning

Dear developer:
SEVtras is a great algorithm! Thank you for developing it. I want to use it in my research but when I run larger files using SEVtras.sEV_regconizer the process always be interrupted without any warning and the out_path didn't exist any files. Now my device has 16 cores and 32 threads and 128G memory, I don't know is it enough? I run this algorithm using Jupyterlab in Linux and the input file is the output of cellranger. I have check the reason and I found when the algorithm would spent a lot of time running multi_enrich and the process would suddenly stop work. Are there any time limitation or device limitation?
Thank you very much!

All the scores of the sEVs are the same

First of all, thank you very much for your work!
I attempted to use sEV_recognizer to identify sEVs in scRNA-seq data, but the scores of the identified sEVs are all the same. Is this a normal phenomenon? Below is a image showing the identification results of my data:

Additionally, I used the test2.h5ad provided in ./tests, and the scores of the identified sEVs are also all the same. Here are the identification results for test2.h5ad:

In sEV_recognizer, my predefine_threads=20, score_t=15.
I look forward to your reply!

The process was interrupted after running 7/16 samples

Hello! First of all, thank you very much for developing the algorithm!
Due to stability issues with the hard drive, my process was interrupted after running 7/16 samples. I would like to ask if the algorithm has a way to read and continue with the "tmd_out" of the 7 samples it has already output.

In addition, I ran SEVrecognizer on a 128GB memory computer(There was an OOM error on a 64G computer), and it takes about 18 hours to calculate a single sample. Is this a normal phenomenon?

Thank you very much!

bioinfo-biols / sevtras Goto Github PK

sevtras's People

Contributors

Stargazers

Watchers

Forkers

sevtras's Issues

0 1 1 1

Recommend Projects

Recommend Topics

Recommend Org

0 1
1 1