Giter Club home page Giter Club logo

vitae's People

Contributors

jaydu1 avatar jingshuw avatar minggao97 avatar tianyucodings avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vitae's Issues

Issue in model.pre_train when setting processed=True in model.preprocess_data

Hello,

I have an issue at the step in which the autoencoder is pretrained only when I give a preprocess anndata object (it works if the adata object is not preprocessed beforehand):

  • Preprocess data step:
# fit in data
model.get_data(adata=data,                   # count or expression matrix, (dense or sparse) numpy array 
               labels = data.obs['cluster_label'],       # (optional) labels, which will be converted to string
               gene_names = data.var['features'], # (optional) gene names, which will be converted to string
               cell_names = data.obs['sample_name']    # (optional) cell names, which will be converted to string
              )


# preprocess data
model.preprocess_data(gene_num = 2000,        # (optional) maximum number of influential genes to keep (the default is 2000)
                     data_type = 'Gaussian', # (optional) data_type can be 'UMI', 'non-UMI' or 'Gaussian' (the default is 'UMI')
                      npc = 64,         # (optional) number of PCs to keep if data_type='Gaussian' (the default is 64)
                     processed=True)

  • Pretrain step:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-2da55840b803> in <module>
      3                 batch_size=256,              # (Optional) the batch size for pre-training (the default is 32).
      4                 alpha=0.10,                  # (Optional) the value of alpha in [0,1] to encourage covariate adjustment. Not used if there is no covariates.
----> 5                 num_epoch = 300,             # (Optional) the maximum number of epoches (the default is 300).
      6                 ) 

~/anaconda3/lib/python3.7/site-packages/VITAE/VITAE.py in pre_train(self, stratify, test_size, random_state, learning_rate, batch_size, L, alpha, num_epoch, num_step_per_epoch, early_stopping_patience, early_stopping_tolerance, path_to_weights)
    274                                                 batch_size,
    275                                                 self.X[id_train].astype(tf.keras.backend.floatx()),
--> 276                                                 self.scale_factor[id_train].astype(tf.keras.backend.floatx()))
    277         self.test_dataset = train.warp_dataset(self.X_normalized[id_test], 
    278                                                 None if self.c_score is None else self.c_score[id_test].astype(tf.keras.backend.floatx()),

TypeError: 'NoneType' object is not subscriptable

Thank you in advance.

Best regards.

error in model.preprocess_data if an annData object is given as input in model.get_data

Hello,

Thanks for developing VITAE.

I tried to use VITAE but I have an issue regarding the model.preprocess_data when I give an annData object as an input of model.get_data function.

The preprocession should be done by scanpy which it is installed but I get the error:

# fit in data
model.get_data(adata=data,                   # count or expression matrix, (dense or sparse) numpy array 
               labels = data.obs['cluster_label'],       # (optional) labels, which will be converted to string
               gene_names = data.var['features'], # (optional) gene names, which will be converted to string
               cell_names = data.obs['sample_name']    # (optional) cell names, which will be converted to string
              )

# preprocess data
model.preprocess_data(gene_num = 2000,        # (optional) maximum number of influential genes to keep (the default is 2000)
                      data_type = 'UMI', # (optional) data_type can be 'UMI', 'non-UMI' or 'Gaussian' (the default is 'UMI')
                      npc = 64              # (optional) number of PCs to keep if data_type='Gaussian' (the default is 64)#)
                      )
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-aa66b286b1ac> in <module>()
     35 model.preprocess_data(gene_num = 2000,        # (optional) maximum number of influential genes to keep (the default is 2000)
     36                       data_type = 'UMI', # (optional) data_type can be 'UMI', 'non-UMI' or 'Gaussian' (the default is 'UMI')
---> 37                       npc = 64              # (optional) number of PCs to keep if data_type='Gaussian' (the default is 64)#)
     38                       )

2 frames
/usr/local/lib/python3.7/dist-packages/VITAE/preprocess.py in _recipe_seurat(adata, gene_num)
    238     This uses a particular preprocessing
    239     """
--> 240     cell_mask = sc.pp.filter_cells(adata, min_genes=200, inplace=False)[0]
    241     adata = adata[cell_mask,:]
    242     gene_mask = sc.pp.filter_genes(adata, min_cells=3, inplace=False)[0]

NameError: name 'sc' is not defined

I do not understand what is the issue because you import scanpy as sc in your defined function?

Thank you in advance.

Best regards.

Implement in scvi-tools

Hello,

I found your manuscript to be interesting and I'm wondering whether you have any interest in implementing a version that takes a pre-trained scvi-tools model as input (e.g., scVI) . I think this would get a lot of usage in our package!

Running model.init_inference in GPU version failed

Hello,
model.init_inference is very slow to run using the CPU version (but it is running) but I cannot get it to run by using the GPU version.

I get the following error:

# initialize inference
model.init_inference(batch_size=128, 
                     L=150,            # L is the number of MC samples
                     dimred='umap',    # dimension reduction methods
                     #**kwargs         # extra key-value arguments for dimension reduction algorithms.    
                     random_state=seed
                    ) 
# after initialization, we can access some variables by model.pc_x, model.w, model.w_tilde, etc..

Computing posterior estimations over mini-batches.

---------------------------------------------------------------------------

ResourceExhaustedError                    Traceback (most recent call last)

<ipython-input-27-91c48b13b6e4> in <module>()
      4                      dimred='umap',    # dimension reduction methods
      5                      #**kwargs         # extra key-value arguments for dimension reduction algorithms.
----> 6                      random_state=seed
      7                     ) 
      8 # after initialization, we can access some variables by model.pc_x, model.w, model.w_tilde, etc..

10 frames

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

ResourceExhaustedError:  OOM when allocating tensor with shape[128,150,1653,57] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node Tile_1 (defined at /usr/local/lib/python3.7/dist-packages/VITAE/model.py:367) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference__get_inference_4681280]

Function call stack:
_get_inference

I tried to reduce the batch size 64,32,16,8 but all failed. I am not running out of memory.

The is due to the size of the input data. When I reduce the number of cells in my data, it is working.

Thank you in advance.

Best regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.