adaptivemotorcontrollab / cebra Goto Github PK
View Code? Open in Web Editor NEWLearnable latent embeddings for joint behavioral and neural analysis - Official implementation of CEBRA
Home Page: https://cebra.ai
License: Other
Learnable latent embeddings for joint behavioral and neural analysis - Official implementation of CEBRA
Home Page: https://cebra.ai
License: Other
Hello,
I am trying to compute the consistency score across different embeddings from hippocampal population activity that have been obtained using 2d tracking position as the auxiliary variable.
To compute the consistency score I have tried to use as labels either the linearised 2d position or another discrete labelling, but I get an error in cebra_sklearn_helpers.align_embeddings
when quantising the embeddings with new labels. I believe it might be due to the high number of bins (n_bins
) used within the _coarse_to_fine()
function.. What do you think the issue may be?
operating system: Ubuntu 20.04
cebra version 0.2.0
gpu
Here is a snippet of the code
# Between-datasets consistency, by aligning on the labels
import cebra
# embeddings as list of np.ndarrays
embds = [cebra_w[e][m] for e in exps for m in MICE[:2]]
# labels as list of 1d np.ndarrays() w linearised tracking position
labels = [lineariseTrack(track[e][m][:,0], track[e][m][:,1], binsize=30)\
for e in exps for m in MICE[:2]]
scores, pairs, datasets = cebra.sklearn.metrics.consistency_score(embeddings=embds,
labels=labels,
between="datasets")
ValueError Traceback (most recent call last)
Cell In[16], line 7
3 embds = [cebra_w[e][m] for e in exps for m in MICE[:2]]
4 labels = [lineariseTrack(track[e][m][:,0], track[e][m][:,1], binsize=30)\
5 for e in exps for m in MICE[:2]]
----> 7 scores, pairs, datasets = cebra.sklearn.metrics.consistency_score(embeddings=embds,
8 labels=labels,
9 between="datasets")
10 cebra.plot_consistency(scores, pairs=pairs, datasets=subjects, colorbar_label=None)
File /data/phar0731/anaconda3/envs/py38/lib/python3.8/site-packages/cebra/integrations/sklearn/metrics.py:362, in consistency_score(embeddings, between, labels, dataset_ids)
359 scores, pairs, datasets = _consistency_runs(embeddings=embeddings,
360 dataset_ids=dataset_ids)
361 elif between == "datasets":
--> 362 scores, pairs, datasets = _consistency_datasets(embeddings=embeddings,
363 dataset_ids=dataset_ids,
364 labels=labels)
365 else:
366 raise NotImplementedError(
367 f"Invalid comparison, got between={between}, expects either datasets or runs."
368 )
File /data/phar0731/anaconda3/envs/py38/lib/python3.8/site-packages/cebra/integrations/sklearn/metrics.py:205, in _consistency_datasets(embeddings, dataset_ids, labels)
200 raise ValueError(
201 "Invalid number of dataset_ids, expect more than one dataset to perform the comparison, "
202 f"got {len(datasets)}")
204 # NOTE(celia): with default values normalized=True and n_bins = 100
--> 205 aligned_embeddings = cebra_sklearn_helpers.align_embeddings(
206 embeddings, labels)
207 scores, pairs = _consistency_scores(aligned_embeddings,
208 datasets=dataset_ids)
209 between_dataset = [p[0] != p[1] for p in pairs]
File /data/phar0731/anaconda3/envs/py38/lib/python3.8/site-packages/cebra/integrations/sklearn/helpers.py:138, in align_embeddings(embeddings, labels, normalize, n_bins)
133 digitized_labels = np.digitize(
134 valid_labels, np.linspace(min_labels_value, max_labels_value,
135 n_bins))
137 # quantize embedding based on the new labels
--> 138 quantized_embedding = [
139 _coarse_to_fine(valid_embedding, digitized_labels, bin_idx)
140 for bin_idx in range(n_bins)[1:]
141 ]
143 if normalize: # normalize across dimensions
144 quantized_embedding_norm = [
145 quantized_sample / np.linalg.norm(quantized_sample, axis=0)
146 for quantized_sample in quantized_embedding
147 ]
File /data/phar0731/anaconda3/envs/py38/lib/python3.8/site-packages/cebra/integrations/sklearn/helpers.py:139, in <listcomp>(.0)
133 digitized_labels = np.digitize(
134 valid_labels, np.linspace(min_labels_value, max_labels_value,
135 n_bins))
137 # quantize embedding based on the new labels
138 quantized_embedding = [
--> 139 _coarse_to_fine(valid_embedding, digitized_labels, bin_idx)
140 for bin_idx in range(n_bins)[1:]
141 ]
143 if normalize: # normalize across dimensions
144 quantized_embedding_norm = [
145 quantized_sample / np.linalg.norm(quantized_sample, axis=0)
146 for quantized_sample in quantized_embedding
147 ]
File /data/phar0731/anaconda3/envs/py38/lib/python3.8/site-packages/cebra/integrations/sklearn/helpers.py:78, in _coarse_to_fine(data, digitized_labels, bin_idx)
76 if quantized_data is not None:
77 return quantized_data
---> 78 raise ValueError(
79 f"Digitalized labels does not have elements close enough to bin index {bin_idx}. "
80 f"The bin index should be in the range of the labels values.")
ValueError: Digitalized labels does not have elements close enough to bin index 95. The bin index should be in the range of the labels values.
The problematic bin_index
varies depending on the discretisation of the position / labels
Hello! I'm a student at MIT majoring in Computer Science and Neuroscience, and I'm currently working on a project at the Wilson Lab at the McGovern Institute using the CEBRA model to build embeddings for hippocampal data. We have some spike rate data that we normalized in different ways (applied different scaling factors) and trained embeddings for. Given that the information within each dataset is the same, we were expecting for the embeddings to look around the same as well. However, they turned out quite different (I've attached an image of the plotted embeddings for reference).
Somewhat relatedly, I was wondering if there are any proposed methods of evaluating the "goodness" of an embedding, or the degree of similarity between embeddings?
Mac
0.4.0
V100
https://drive.google.com/file/d/1Ys4Lp4m9lxM_XR_BaloYRk07vrmL_DfT/view?usp=sharing
No response
No response
Hey :) I have installed CEBRA using the supplied cebra_paper_m1.yml file to run it on Apple M1, macOS Ventura 13.0. Once the installation was completed, I tried to run a training setting 'device = mps' but didn't recognize it. Only CUDA or CPU is available. I have followed instructions from CEBRA documentation. When I install CEBRA using pip, it works perfectly. If you need me to provide more details or information let me know!
operating system macOS Ventura 13.0.
cebra version
gpu = mps
No response
No response
No response
Congratulations on this great project and publication!
I was browsing the code and noticed a potential issue with cebra.models.criterion.infonce
. I assume that c
in the function is just there for numerical stability of logsumexp
, and that the function is supposed to return $L = \mathbb{E}_x [-\phi(x_i, y^{+}_i) + \log \sum_{j=1}^{n} e^{\phi(x_i, y^{-}_{ij})}]$
? If so, then I think that there might be an error in how c
is being broadcasted with neg_dist
, which makes the function return incorrect values.
Ubuntu 18.04
0.2.0
Core i9 / RTX 3090
#! pip3 install cebra==0.2.0
from cebra.models.criterions import infonce
from torch import randn, allclose, logsumexp
# random positive and negative measures
n = 10
pos = randn(n)
neg = randn(n, n)
# InfoNCE Loss
# :math: `L = \mathbb{E}_x [\phi(x_i, y^{+}_i) + \log \sum_{j=1}^{n} e^{\phi(x_i, y^{-}_{ij})}]`
L = neg.logsumexp(dim=1).mean() - pos.mean()
# CEBRA InfoNCE implementation
cebra_infonce, _, _ = infonce(pos, neg)
cebra_close = allclose(L, cebra_infonce)
print("Cebra InfoNCE --", cebra_close)
# Corrected InfoNCE implementation
def corrected_infonce(pos, neg):
c = neg.detach().max(dim=1, keepdim=True).values
pos = pos - c.squeeze(dim=1)
neg = neg - c
align = -pos.mean()
uniform = logsumexp(neg, dim=1).mean()
return align + uniform, align, uniform
corrected_infonce, _, _ = corrected_infonce(pos, neg)
corrected_close = allclose(L, corrected_infonce)
print("Corrected InfoNCE --", corrected_close)
Cebra InfoNCE -- False
Corrected InfoNCE -- True
No response
Hi,
thanks a lot for sharing this nice work!
I happened to use numpy.uint8
as the discrete label type, and the resulting CEBRA model ended up performing time-contrastive learning (instead of supervised learning using discrete labels).
It seems that it requires the labels to have numpy.int32
or numpy.int64
data types in order for them to be considered to be discrete: https://github.com/AdaptiveMotorControlLab/CEBRA/blob/main/cebra/integrations/sklearn/dataset.py#L142
Is it possible that you guys further support the other integer types for discrete labels?
I found out this StackOverflow post: https://stackoverflow.com/questions/37726830/how-to-determine-if-a-number-is-any-type-of-int-core-or-numpy-signed-or-not
So substituting something like below should theoretically work:
- elif y.dtype in (np.int32, np.int64):
+ elif np.issubdtype(y.dtype, np.integer):
Thank you very much in advance!
cebra version 0.2.0
Core i9 / RTX 3090
Something like below should reproduce this issue (unfortunately I write this on another computer: please forgive me for any potential typos)
import numpy as np
import cebra
N = 1000
rng = np.random.default_rng()
X = np.concatenate([rng.multivariate_normal(mean=[0, 0], cov=[[0.2, 0], [0, 0.2]], size=N),
rng.multivariate_normal(mean=[1, 0], cov=[[0.2, 0.1], [0.1, 0.2]], size=N)],
axis=0)
y = np.concatenate([np.zeros(N), np.ones(N)])
# casting the label to uint8 (the use of int32 instead should work)
y = y.astype(np.uint8)
_, _, loader, _ = cebra.CEBRA(batch_size = N // 2)._prepare_fit(X, y)
assert loader.dataset.discrete_index is not None
No response
No response
Hey,
I noticed that the function cebra.sklearn.metrics.consistency_score
is sorting the dataset_ids
that are passed to it, creating a mismatch between the pairs
and dataset
labels (see snippet). This is then passed onto the plot_consistency
function -- does it make sense?
import cebra
mice_ = MICE[:2]
lbl = ['a','d','b','c']
embds = [cebra_w[e][m] for m in mice_ for e in exps]
labels = [lineariseTrack(track[e][m][:,0], track[e][m][:,1], binsize=30)\
for m in mice_ for e in exps]
scores, pairs, datasets = cebra.sklearn.metrics.consistency_score(embeddings=embds,
labels=labels,
between="datasets",
dataset_ids=lbl,
num_discretization_bins=20)
print(f'original labels: {lbl}')
print(f'consistency labels: {datasets}')
print(f'consistency pairs: {pairs}')
OUTPUT:
consistency labels: ['a' 'b' 'c' 'd']
consistency pairs: [['a' 'd']
['a' 'b']
['a' 'c']
['d' 'a']
['d' 'b']
['d' 'c']
['b' 'a']
['b' 'd']
['b' 'c']
['c' 'a']
['c' 'd']
['c' 'b']]
operating system ubuntu
cebra version0.2.0
gpu
No response
No response
No response
Hey everyone, thanks for releasing this awesome tool.
I get an error when calling cebra_model.fit(..., adapt=True)
to fine tune a model on new data.
This only happens if the model was trained on multiple sessions, but not when adapting a model trained on a single session.
The actual usage of CEBRA
is within a larger code-base so I don't have a mwe, but these are the steps I do:
1: define CEBRA
model with hybrid=False
.
self.model = CEBRA(
model_architecture=model_architecture,
batch_size=batch_size,
temperature=temperature,
temperature_mode=temperature_mode,
learning_rate=learning_rate,
max_iterations=max_iterations,
time_offsets=time_offsets,
output_dimension=embedding_size,
device="cuda",
conditional="time_delta", # use behavioral data
distance=distance,
verbose=verbose,
hybrid=hybrid,
max_adapt_iterations=500,
)
2: fit self.model
on multiple sessions with:
self.model.fit(
Xs,
Ys,
adapt=False,
)
where Xs
and Ys
are lists of np.ndarray
with neural and behavioral (continuous) data from multiple experimental sessions.
3: adapt the model to new data:
self.model.fit( # single session
X_new,
Y_new,
adapt=True,
)
with X_new, Y_new
being arrays with the data for a single new session.
I get KeyError: '0.net.0.weight'
- this is the CEBRA
part of the error stack:
I did some digging and the problem is that the adapt_model
created here has different keys in .state_dict()
compared to self.model_
.
adapt_model
keys:
odict_keys(['net.0.weight', 'net.0.bias', 'net.2.module.0.weight', 'net.2.module.0.bias',
'net.3.module.0.weight', 'net.3.module.0.bias', 'net.4.module.0.weight', 'net.4.module.0.bias',
'net.5.weight', 'net.5.bias'])
self.model_
keys:
['0.net.0.weight', '0.net.0.bias', '0.net.2.module.0.weight',
'0.net.2.module.0.bias', '0.net.3.module.0.weight', '0.net.3.module.0.bias', '0.net.4.module.0.weight',
'0.net.4.module.0.bias', '0.net.5.weight', '0.net.5.bias', '1.net.0.weight', '1.net.0.bias',
'1.net.2.module.0.weight', '1.net.2.module.0.bias', '1.net.3.module.0.weight', '1.net.3.module.0.bias',
'1.net.4.module.0.weight', '1.net.4.module.0.bias', '1.net.5.weight', '1.net.5.bias', '2.net.0.weight',
'2.net.0.bias', '2.net.2.module.0.weight', '2.net.2.module.0.bias', '2.net.3.module.0.weight',
'2.net.3.module.0.bias', '2.net.4.module.0.weight', '2.net.4.module.0.bias', '2.net.5.weight',
'2.net.5.bias', '3.net.0.weight', '3.net.0.bias', '3.net.2.module.0.weight', '3.net.2.module.0.bias',
'3.net.3.module.0.weight', '3.net.3.module.0.bias', '3.net.4.module.0.weight', '3.net.4.module.0.bias',
'3.net.5.weight', '3.net.5.bias']
When I first train CEBRA
on a single session self.model_
's keys match those of adapt_model
.
Is training on multiple sessions + fine-tuning on new a new one not allowed? Am I doing something wrong in using CEBRA?
Thanks,
Federico
Windows
cebra version: '0.2.0'
gpu: NVIDIA GeForce RTX 3080
No response
No response
No response
Hi, I am new to CEBRA. When I reproduced the training and test procedures of Figure 3h decoding results in your nature paper "Learnable Latent Embeddings for Joint Behavioral and Neural Analysis" based on the demo decoding, the Direction(active) and Active/passive subplots are strange.
Specifically, I used 'area2-bump-target-active' and 'area2-bump-posdir-active-passive' as the training datasets. Both labels are their discrete_index. And the final metrics for the KNN Decoder are sklearn.metrics.accuracy_score since I noticed the y-axis is Acc.(%) different from R2. Am I doing something wrong?
When running the following code, an error was reported in line 30:
,"continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])" ,mainly because an error occurred when cebra.load_data() was used as the.h5 file. I do not know how to solve it, and I hope to seek the author's help and solution.
---------------------------------------------------------------------------------------------------------------------------------------------
test.py
---------------------------------------------------------------------------------------------------------------------------------------------
# Create a .h5 file, containing a pd.DataFrame
import pandas as pd
import numpy as np
X_continuous = np.random.normal(0,1,(100,3))
X_discrete = np.random.randint(0,10,(100, ))
df = pd.DataFrame(np.array(X_continuous), columns=["continuous1", "continuous2", "continuous3"])
df["discrete"] = X_discrete
df.to_hdf("auxiliary_behavior_data.h5", key="auxiliary_variables")
import cebra
from numpy.random import uniform, randint
from sklearn.model_selection import train_test_split
# 1. Define a CEBRA model
cebra_model = cebra.CEBRA(
model_architecture = "offset10-model",
batch_size = 512,
learning_rate = 1e-4,
max_iterations = 10, # TODO(user): to change to at least 10'000
max_adapt_iterations = 10, # TODO(user): to change to ~100-500
time_offsets = 10,
output_dimension = 8,
verbose = False
)
# 2. Load example data
neural_data = cebra.load_data(file="neural_data.npz", key="neural")
new_neural_data = cebra.load_data(file="neural_data.npz", key="new_neural")
continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])
discrete_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["discrete"]).flatten()
assert neural_data.shape == (100, 3)
assert new_neural_data.shape == (100, 4)
assert discrete_label.shape == (100, )
assert continuous_label.shape == (100, 3)
# 3. Split data and labels
(
train_data,
valid_data,
train_discrete_label,
valid_discrete_label,
train_continuous_label,
valid_continuous_label,
) = train_test_split(neural_data,
discrete_label,
continuous_label,
test_size=0.3)
# 4. Fit the model
# time contrastive learning
cebra_model.fit(train_data)
# discrete behavior contrastive learning
cebra_model.fit(train_data, train_discrete_label,)
# continuous behavior contrastive learning
cebra_model.fit(train_data, train_continuous_label)
# mixed behavior contrastive learning
cebra_model.fit(train_data, train_discrete_label, train_continuous_label)
# 5. Save the model
cebra_model.save('/tmp/foo.pt')
# 6. Load the model and compute an embedding
cebra_model = cebra.CEBRA.load('/tmp/foo.pt')
train_embedding = cebra_model.transform(train_data)
valid_embedding = cebra_model.transform(valid_data)
assert train_embedding.shape == (70, 8)
assert valid_embedding.shape == (30, 8)
# 7. Evaluate the model performances
goodness_of_fit = cebra.sklearn.metrics.infonce_loss(cebra_model,
valid_data,
valid_discrete_label,
valid_continuous_label,
num_batches=5)
# 8. Adapt the model to a new session
cebra_model.fit(new_neural_data, adapt = True)
# 9. Decode discrete labels behavior from the embedding
decoder = cebra.KNNDecoder()
decoder.fit(train_embedding, train_discrete_label)
prediction = decoder.predict(valid_embedding)
assert prediction.shape == (30,)
windows 10
cebra version 0.2.0
gpu
No response
Traceback (most recent call last):
File "E:\crop\injuryrun4\test.py", line 30, in <module>
continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])
File "E:\anaconda\envs\injuryrun4test\lib\site-packages\cebra\data\load.py", line 661, in load
data = loader.load(file, key=key, columns=columns)
File "E:\anaconda\envs\injuryrun4test\lib\site-packages\cebra\data\load.py", line 211, in load
raise ModuleNotFoundError()
ModuleNotFoundError
No response
Bug description
I have discrete auxiliary variables that I am using to color my embeddings. The color works fine, but I do not know which color corresponds to which label. I tried adding a legend like I would in plotly.graph_objects.Scatter, this did not produce an error, but it also does not produce a legend. How can I get a label to be associated with the embedding_label colors? Below is my code for generating the plot. 'crop_bx' is a 1xN vector.
cebra_model = cebra.CEBRA(
model_architecture = "offset10-model-mse",
batch_size = 512,
temperature_mode="auto",
learning_rate = 0.0001,
max_iterations = 1000,
time_offsets = 10,
output_dimension = 3,
conditional='time_delta',
device='cuda_if_available',
distance='euclidean',
verbose=True,
)
cebra_model.fit(crop_neural, crop_bx)
embedding = cebra_model.transform(crop_neural)
fig = cebra.integrations.plotly.plot_embedding_interactive(embedding, embedding_labels= crop_bx, title = "CEBRA-Behavior", cmap='tab20', legend=np.unique(crop_bx).astype(str), showlegend = True)
fig.show()
Windows 10
cebra version: '0.3.1'
cpu
No response
No response
No response
Hi,
I'm trying to replicate the synthetic neural benchmarking notebook (https://cebra.ai/docs/demo_notebooks/Demo_synthetic_exp.html) for our group's lab meeting presentation/discussion using Google CoLab. What functions should I call to replicate the synthetic data generated in line 1 of the Let's Load the Data cell? Currently, I'm getting an attribute error suggesting I'm missing some critical step that generates the synthetic data:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
[<ipython-input-13-812dd7e79322>](https://localhost:8080/#) in <cell line: 1>()
----> 1 data = jl.load(get_datapath('synthetic/continuous_label_poisson.jl'))
2 plt.scatter(data['z'][:, 0], data['z'][:, 1], c=data['u'], s=1, cmap='cool')
3 plt.axis('off')
[/usr/local/lib/python3.10/dist-packages/joblib/numpy_pickle.py](https://localhost:8080/#) in load(filename, mmap_mode)
648 obj = _unpickle(fobj)
649 else:
--> 650 with open(filename, 'rb') as f:
651 with _read_fileobject(f, filename, mmap_mode) as fobj:
652 if isinstance(fobj, str):
FileNotFoundError: [Errno 2] No such file or directory: 'data/synthetic/continuous_label_poisson.jl'
I'm also wondering how pivae is imported in this notebook, as the structure online on the repository (https://github.com/lukasadam/piVAE) seems to be different than the import structure in the notebook. Is this notebook just importing the pivae.py file (https://github.com/lukasadam/piVAE/blob/master/pivae.py)?
Thanks in advance.
Is the CEBRA suitable for Windows System?
operating system
cebra version
gpu
No response
is the cebra suitable for windows system?
No response
Can CEBRA use 2d or 3d position as label? If I change the label to 2d delta, an error will be thrown
RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 1
I have a question regarding the MixedDataLoader
The current docstring specifies that:
Sampling can be configured in different modes: 1. Positive pairs always share their discrete variable. 2. Positive pairs are drawn only based on their conditional, not discrete variable.
I am not sure how it's currently possible to do that?
Maybe an option might is to add a boolean keyword positive_sampling
to be either conditional
or discrete_variable
?
In addition, as already mentioned in the docstring of the get_indices function, the prior
of the discrete distribution could be specified
CEBRA/cebra/data/single_session.py
Line 305 in 0378db0
I will reference a PR here also
Hello,
Thank you so much for your work! I am definitely having fun with CEBRA:-)
I just wanted to verify whether I understood the sampling scheme for shaping embeddings using an auxiliary variable. I read the supplement to the paper and looked at the code. Do I understand correctly that if I specify a continuous auxiliary variable, the model will create a distribution of delta in values between neighboring time points and then use this distribution to find the closest sample to the reference sample with this delta as a positive sample (the model still only learns from the main data ie the data that I would apply the time only algorithm to, except now I get the positive samples using the auxiliary variable)? Are the negative samples still sampled uniformly as in time only learning?
Thanks again!
Dear CEBRA team,
first of all amazing tool and brilliant paper! I've already used it in different human invasive recording applications, and it just works really great!
I tried now to use an individually defined model, based on the tutorial provided here: https://cebra.ai/docs/usage.html#model-architecture and copied basically the offset-10 architecture
Line 249 in 00601fb
Defining this model also works, but when changing the offset from cebra.data.Offset(5,5)
to any value bigger, e.g. cebra.data.Offset(10,10)
will give an dimension error.
So suddenly the ref
, pos
, and neg
torch tensors become three dimensinal (they have then shape (given the provided example) [100, 3, 11], which were otherwise 2D with Offset (5,5): [100, 3]
I tried to adapt the other parameters, maybe it's also related to that.. But due to that I am not able to initialize an own model.
Windows 11
0.2.0
GPU GeForce RTX2070 SUPER
I wrote here a minimal example reproducing the error:
from cebra import CEBRA
import cebra
import numpy as np
from torch import nn
import cebra.models
import cebra.data
import cebra.models.layers as cebra_layers
from cebra.models.model import _OffsetModel, ConvolutionalModelMixin
@cebra.models.register("my-model") # --> add that line to register the model!
class Offset10Model(_OffsetModel, ConvolutionalModelMixin):
"""CEBRA model with a 10 sample receptive field."""
def __init__(self, num_neurons, num_units, num_output, normalize=True):
if num_units < 1:
raise ValueError(
f"Hidden dimension needs to be at least 1, but got {num_units}."
)
super().__init__(
nn.Conv1d(num_neurons, num_units, 2),
nn.GELU(),
cebra_layers._Skip(nn.Conv1d(num_units, num_units, 3), nn.GELU()),
cebra_layers._Skip(nn.Conv1d(num_units, num_units, 3), nn.GELU()),
cebra_layers._Skip(nn.Conv1d(num_units, num_units, 3), nn.GELU()),
nn.Conv1d(num_units, num_output, 3),
num_input=num_neurons,
num_output=num_output,
normalize=normalize,
)
def get_offset(self) -> cebra.data.datatypes.Offset:
"""See :py:meth:`~.Model.get_offset`"""
return cebra.data.Offset(10, 10)
cebra_model = CEBRA(
model_architecture = "my-model",
batch_size = 100,
temperature_mode="auto",
learning_rate = 0.005,
max_iterations = 1000,
output_dimension = 3,
device = "cuda",
conditional="time_delta",
hybrid=True,
verbose = True
)
X = np.random.random([1000, 100])
y = np.random.random([1000])
cebra_model.fit(X, y)
> Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
> The following operation failed in the TorchScript interpreter.
> Traceback of TorchScript (most recent call last):
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\lib\site-packages\cebra\models\criterions.py", line 46, in dot_similarity
> the similarities between reference samples and negative samples of shape `(n, n)`.
> """
> pos_dist = torch.einsum("ni,ni->n", ref, pos)
> ~~~~~~~~~~~~ <--- HERE
> neg_dist = torch.einsum("ni,mi->nm", ref, neg)
> return pos_dist, neg_dist
> RuntimeError: einsum(): the number of subscripts in the equation (2) does not match the number of dimensions (3) for operand 0 and no ellipsis was given
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\Lib\site-packages\cebra\models\criterions.py", line 268, in _distance
> pos, neg = dot_similarity(ref, pos, neg)
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\Lib\site-packages\cebra\models\criterions.py", line 159, in forward
> pos_dist, neg_dist = self._distance(ref, pos, neg)
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
> return forward_call(*args, **kwargs)
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\Lib\site-packages\cebra\solver\base.py", line 430, in step
> behavior_loss, behavior_align, behavior_uniform = self.criterion(
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\Lib\site-packages\cebra\solver\base.py", line 184, in fit
> stats = self.step(batch)
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\Lib\site-packages\cebra\integrations\sklearn\cebra.py", line 933, in _partial_fit
> solver.fit(
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\Lib\site-packages\cebra\integrations\sklearn\cebra.py", line 996, in partial_fit
> self._partial_fit(*self.state_,
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\Lib\site-packages\cebra\integrations\sklearn\cebra.py", line 1086, in fit
> self.partial_fit(X,
> File "C:\Users\ICN_admin\Documents\Cebra_RatStatesWenger\report_cebra_bug.py", line 53, in <module>
> cebra_model.fit(X, y)
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\Lib\runpy.py", line 86, in _run_code
> exec(code, run_globals)
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\Lib\runpy.py", line 196, in _run_module_as_main (Current frame)
> return _run_code(code, main_globals, None,
> RuntimeError: The following operation failed in the TorchScript interpreter.
> Traceback of TorchScript (most recent call last):
> File "C:\Users\ICN_admin\Anaconda3\envs\pn_env\lib\site-packages\cebra\models\criterions.py", line 46, in dot_similarity
> the similarities between reference samples and negative samples of shape `(n, n)`.
> """
> pos_dist = torch.einsum("ni,ni->n", ref, pos)
> ~~~~~~~~~~~~ <--- HERE
> neg_dist = torch.einsum("ni,mi->nm", ref, neg)
> return pos_dist, neg_dist
> RuntimeError: einsum(): the number of subscripts in the equation (2) does not match the number of dimensions (3) for operand 0 and no ellipsis was given
No response
The CEBRA documentation is very comprehensive and presents in a lot of detail the parameterization.
In the current form however the focus seems to explain the scikit-learn API and there is no example script for using the PyTorch API: https://cebra.ai/docs/usage.html
But for many options I am unsure how to parametrize them in the scikit-learn API. For example when using discrete behavioral data, it's currently not possible to specify empirical
or discrete
sampling:
CEBRA/cebra/data/single_session.py
Line 89 in 0378db0
I think this is also intended to not overload the cebra.Cebra
intialization or the model.fit()
function with too many parameters?
Therefore I thought that maybe adding a minimal example in the usage.rst
of how a dataloader with "non-scikitlearn API" conform parameters could be used using PyTorch directly:
import numpy as np
import cebra.datasets
from cebra import plot_embedding
import torch
neural_data = cebra.load_data(file="neural_data.npz", key="neural")
# continuous_label = cebra.load_data(
# file="auxiliary_behavior_data.h5",
# key="auxiliary_variables",
# columns=["continuous1", "continuous2", "continuous3"],
# )
discrete_label = cebra.load_data(
file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["discrete"],
)
# 1. Define Cebra Dataset
InputData = cebra.data.TensorDataset(
torch.from_numpy(neural_data).type(torch.FloatTensor),
# continuous=torch.from_numpy(np.array(continuous_label)).type(torch.FloatTensor),
discrete=torch.from_numpy(np.array(discrete_label[:, 0])).type(torch.LongTensor),
).to("cpu")
# 2. Define Cebra Model
neural_model = cebra.models.init(
name="offset10-model",
num_neurons=InputData.input_dimension,
num_units=32,
num_output=2,
).to("cpu")
InputData.configure_for(neural_model)
# 3. Define Loss Function Criterion and Optimizer
Crit = cebra.models.criterions.LearnableCosineInfoNCE(
# temperature=0.001,
# min_temperature=0.0001
).to("cpu")
Opt = torch.optim.Adam(
list(neural_model.parameters()) + list(Crit.parameters()),
# lr=0.001,
weight_decay=0,
)
# 4. Initialize Cebra Model
cebra_model = cebra.solver.init(
name="single-session",
model=neural_model,
criterion=Crit,
optimizer=Opt,
tqdm_on=True,
).to("cpu")
# 5. Define Data Loader
# loader = cebra.data.single_session.ContinuousDataLoader(
# dataset=InputData, num_steps=1000, batch_size=200
# )
loader = cebra.data.single_session.DiscreteDataLoader(
dataset=InputData, num_steps=1000, batch_size=200, prior="uniform"
)
# 6. Fit model
cebra_model.fit(loader=loader)
# 7. Transform embedding
TrainBatches = np.lib.stride_tricks.sliding_window_view(
neural_data, neural_model.get_offset().__len__(), axis=0
)
X_train_emb = cebra_model.transform(
torch.from_numpy(TrainBatches[:]).type(torch.FloatTensor).to("cpu")
).to("cpu")
# 8. Potentially plot embedding
plot_embedding(
X_train_emb,
discrete_label[neural_model.get_offset().__len__() - 1 :, 0],
markersize=10,
)
Fitting a 'CEBRA-behaviour' model with a sampling strategy 'delta' (ie sample pos examples around the value of the auxiliary variables in the ref examples) works when the dimensionality of the context is 1, but not when it is larger.
Linux
0.2.0
gpu
Defining and fitting a model like so, with continuous_label.shape = [n_time_points, 2]
cebra_model = CEBRA(
model_architecture = "offset10-model",
batch_size = 1024,
temperature_mode="auto",
learning_rate = 0.001,
max_iterations = 2000,
time_offsets = 10,
output_dimension = 3,
device = "cuda_if_available",
conditional='delta',
delta=0.1,
verbose = True
)
cebra_model.fit(neural_data, continuous_label)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 cebra_model.fit(neural_data, continuous_label)
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\cebra\integrations\sklearn\cebra.py:1086, in CEBRA.fit(self, X, adapt, callback, callback_frequency, *y)
1081 self._adapt_fit(X,
1082 *y,
1083 callback=callback,
1084 callback_frequency=callback_frequency)
1085 else:
-> 1086 self.partial_fit(X,
1087 *y,
1088 callback=callback,
1089 callback_frequency=callback_frequency)
1090 del self.state_
1092 return self
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\cebra\integrations\sklearn\cebra.py:996, in CEBRA.partial_fit(self, X, callback, callback_frequency, *y)
994 if not hasattr(self, "state_") or self.state_ is None:
995 self.state_ = self._prepare_fit(X, *y)
--> 996 self._partial_fit(*self.state_,
997 callback=callback,
998 callback_frequency=callback_frequency)
999 return self
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\cebra\integrations\sklearn\cebra.py:933, in CEBRA._partial_fit(self, solver, model, loader, is_multisession, callback, callback_frequency)
928 raise ValueError(
929 "callback_frequency requires to specify a callback.")
931 model.train()
--> 933 solver.fit(
934 loader,
935 valid_loader=None,
936 save_frequency=callback_frequency,
937 valid_frequency=None,
938 decode=False,
939 logdir=None,
940 save_hook=callback,
941 )
943 # Save variables of interest as semi-private attributes
944 self.model_ = model
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\cebra\solver\base.py:183, in Solver.fit(self, loader, valid_loader, save_frequency, valid_frequency, decode, logdir, save_hook)
181 iterator = self._get_loader(loader)
182 self.model.train()
--> 183 for num_steps, batch in iterator:
184 stats = self.step(batch)
185 iterator.set_description(stats)
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\cebra\solver\util.py:85, in ProgressBar.__iter__(self)
83 if self.use_tqdm:
84 self.iterator = tqdm.tqdm(self.iterator)
---> 85 for num_batch, batch in enumerate(self.iterator):
86 yield num_batch, batch
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\tqdm\std.py:1180, in tqdm.__iter__(self)
1177 time = self._time
1179 try:
-> 1180 for obj in iterable:
1181 yield obj
1182 # Update and possibly print the progressbar.
1183 # Note: does not call self.update(1) for speed optimisation.
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\cebra\data\base.py:217, in Loader.__iter__(self)
215 def __iter__(self) -> Batch:
216 for _ in range(len(self)):
--> 217 index = self.get_indices(num_samples=self.batch_size)
218 yield self.dataset.load_batch(index)
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\cebra\data\single_session.py:238, in ContinuousDataLoader.get_indices(self, num_samples)
236 negative_idx = reference_idx[num_samples:]
237 reference_idx = reference_idx[:num_samples]
--> 238 positive_idx = self.distribution.sample_conditional(reference_idx)
239 return BatchIndex(reference=reference_idx,
240 positive=positive_idx,
241 negative=negative_idx)
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\cebra\distributions\continuous.py:281, in DeltaDistribution.sample_conditional(self, reference_idx)
275 raise ValueError(
276 f"Reference indices have wrong shape: {reference_idx.shape}. "
277 "Pass a 1D array of indices of reference samples.")
279 # TODO(stes): Set seed
--> 281 query = torch.distributions.Normal(
282 self.data[reference_idx].squeeze(),
283 torch.ones_like(reference_idx, device=self.device) * self.std,
284 ).sample()
286 return self.index.search(query.unsqueeze(-1))
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\torch\distributions\normal.py:51, in Normal.__init__(self, loc, scale, validate_args)
50 def __init__(self, loc, scale, validate_args=None):
---> 51 self.loc, self.scale = broadcast_all(loc, scale)
52 if isinstance(loc, Number) and isinstance(scale, Number):
53 batch_shape = torch.Size()
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\torch\distributions\utils.py:42, in broadcast_all(*values)
39 new_values = [v if is_tensor_like(v) else torch.tensor(v, **options)
40 for v in values]
41 return torch.broadcast_tensors(*new_values)
---> 42 return torch.broadcast_tensors(*values)
File c:\Users\azylb\miniconda3\envs\cebra\lib\site-packages\torch\functional.py:74, in broadcast_tensors(*tensors)
72 if has_torch_function(tensors):
73 return handle_torch_function(broadcast_tensors, tensors, *tensors)
---> 74 return _VF.broadcast_tensors(tensors)
RuntimeError: The size of tensor a (2) must match the size of tensor b (1024) at non-singleton dimension 1
No response
I was trying to run the colab examples, but I am getting this error:
FileNotFoundError: [Errno 2] No such file or directory: 'data\rat_hippocampus/buddy.jl'
I get the same error on my local installation
Colab
0.2.0
gpu
Run the colab code sequentially upto data loading.
FileNotFoundError: [Errno 2] No such file or directory: 'data\\rat_hippocampus/buddy.jl'
Hello, I have some task-irrelevant variables to control in the embedding process. This is a crucial aspect of your paper. However, I am unable to find a parameter to specify it. I have thoroughly searched all the tutorials of CEBRA. It appears that the PyTorch API, Distributions, may be able to handle it. Nevertheless, I have not yet come across a clear example or parameter to do so.
Perhaps I haven't found it. Would you mind helping me?
I sincerely seek your assistance.
Ubuntu 20.04.2 LTS
'0.4.0'
gpu
No response
No response
No response
ModuleNotFound
error messages, and make them more actionable (e.g. directly suggest to install a particular set of extra deps, like datasets
)This solved it for me when I hit this error when working through the code in the Usage page.
A couple of notes (feel free to ignore 😄) -- I found the
ModuleNotFound
message a bit hard to interpret as it didn't say what module, so I wasn't sure how to proceed. Also, the installation page says that thedatasets
optional dependency is for working with the datasets at Figshare. Hence, when I got the error on the Usage page when trying to do stuff with synthetic data, I didn't consider the correct solution.
Anyway, minor wrinkles -- congrats on the cool package I'm having fun with it so far!
Originally posted by @EricThomson in #57 (comment)
Thanks so much for providing such a thorough documentation of this amazing tool!
I am trying to apply the decoding of natural video features procedure (Fig 5 in the Nature paper) to a separate dataset with a different movie. I am hoping you can provide some additional guidance on how you extracted the visual features using DINO. I have already downloaded the pre-trained DINO network form PyTorch Hub. It is not immediately obvious to me from your documentation how to extract the Ntime x Nfeature matrix from this model. Thank you very much for your time and effort. -Ronan O'Shea
As discussed in #106 by @FrancescaGuo, when passing a discrete index for consistency calculation, the default n_bins = 100
raises an (expected) error message. The current way to circumvent this error message is to set n_bins
to the number of passed labels. However, this could be improved directly in the code: Whenever discrete labels are passed, the binning process required for continuous data could be replaced/adapted.
In early steps of Demo_Allen.ipynb
, trying to load ca_train
with ca.datasets.init()
, I get an error.
Windows 10
0.2.0
cpu 11th gen intel i7
gpu nvidia geforce rtx 3050ti
Code:
cortex = 'VISp'
seed=333
num_neurons = 800
ca_train = cebra.datasets.init(f'allen-movie-one-ca-{cortex}-{num_neurons}-train-10-{seed}')
Stack trace:
File ~\miniconda3\envs\cebra\lib\site-packages\cebra\datasets\allen\ca_movie_decoding.py:176, in <listcomp>(.0)
162 """Construct pseudomouse neural dataset.
163
164 Stack the excitatory neurons from the multiple mice and construct a psuedomouse neural dataset of the specified visual cortical area.
(...)
169
170 """
172 list_mice = glob.glob(
173 get_datapath(
174 f"allen/visual_drift/data/calcium_excitatory/{area}/*"))
175 exp_containers = [
--> 176 int(mice.split(f"{area}/")[1].replace(".mat", ""))
177 for mice in list_mice
178 ]
179 ## Load summary file
180 summary = pd.read_csv(get_datapath("allen/data_summary.csv"))
IndexError: list index out of range
I went into the debugger and checked out the mice
string:
data\\allen/visual_drift/data/calcium_excitatory/VISp\\511498742.mat
It seems split()
is getting hung up on Windows path weirdness. I have a linux machine I can switch to for running Cebra, but I just thought I'd let you know about this just in case.
No response
No response
This is very interesting,
What device can be used to extract these neuronal signal data?
Thank you very much.
when I'm trying to run Demo_Allen.ipynb
shown on cebra.ai/Demos page, i chose to open that on Colab for convenience. I've downloaded the datasets from figshare and have uploaded them to the related google drive. I saw that the demo requires datasets put under the same directory, but that cannot be done with Colab, for the uploaded dataset is always under "/content/drive/MyDrive/Colab Notebooks/data/" . I've tried to rewrite the path by
import os
os.environ['CEBRA_DATADIR'] =
but it doesn't work, I still get the error
FileNotFoundError: [Errno 2] No such file or directory: 'data/allen/features/allen_movies/vit_base/8/movie_one_image_stack.npz/testfeat.pth'
what should I do to correctly read the datasets by colab?
Google Colab
cebra version
gpu
No response
No response
No response
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.