ennyvanbeest / unitmatch Goto Github PK

Ephys toolbox to match single units (electrophysiology) either within the same day (over splits) or across two days

License: Other

MATLAB 69.87% Python 11.10% Makefile 0.02% C 0.19% Fortran 0.24% Jupyter Notebook 3.16% Tcl 15.42%

unitmatch's Introduction

UnitMatch

Ephys toolbox to match single units (electrophysiology) either within the same recording (oversplits) or across recordings, by using spatial position and waveform-based parameters - both standard and unstandard metrics to the field.

This toolbox was initially created between August 2022-January 2023 by Enny H. van Beest as a tool to match units across multiple recording(day)s, and/or merge oversplit units within the same day. The toolbox was further optimized between February-October 2023 in collaboration with Célian Bimbard. A python version was created between November 2023 and January 2024 by Sam Dodgson. The python version only contains core UnitMatch code. Other functionality such as computing functional scores etc. for now is only available in the Matlab version.

We thank Julie Fabre, who implemented some changes to Bombcell - a toolbox for quality metrics and automated curation. We also thank Anna Lebedeva, Flora Takacs, Pip Coen, Kenneth Harris, and Matteo Carandini, as well as the rest of the Carandini-Harris laboratory for their feedback and contributions.

For this work, van Beest was supported by a Marie Sklodowska-Curie fellowship (van Beest, 101022757), Bimbard by a European Molecular Biology Organization (ALTF 740-2019), and Carandini and Harris by a Wellcome Investigator Award.

Preprint: https://www.biorxiv.org/content/10.1101/2023.10.12.562040v1

Video: https://www.youtube.com/watch?v=4c_dgTZcBaQ&list=PLfhWmWntvjl7kljKozClpjS29DoY8V5pB&index=23&t=21s

Below instructions are for the Matlab version. Please see the python folder for more information on the Python version.

Dependencies on other toolboxes/repositories

Toolboxes used for matching part:

https://github.com/kwikteam/npy-matlab

Toolboxes that are very useful and integrated with UnitMatch:

https://github.com/Julie-Fabre/bombcell
https://github.com/EnnyvanBeest/spikes (forked from https://github.com/cortex-lab/spikes, but also tested with the original spikes)

Toolboxes that are useful for other parts of analysis pipeline (but not necessary for just UnitMatch):

Usage

We included a DEMO_UNITMATCH.m script, which hopefully clarifies how to use UnitMatch and smoothly integrate it in your existing analysis pipeline. UnitMatch requires at minimum a cell array with path names, pointing do the different recordings for which you want to try and track units. Each of these paths (typically a Kilosort output folder) should contain a subfolder called 'RawWaveforms' in which for every unit (/cluster) there is a NPY file containing two average waveforms per recording site (spikewidth X nRecordingSites X 2). The two average waveforms are preferably from the first versus second half of a recording. Additionally UnitMatch needs a 'clusinfo' struct with the fields cluster_id, Good_ID (whether to include it or not), RecSesID (which recording session), and Probe (can be a vector of ones if just one probe was used). It also needs some parameters, which can be filled in using DefaultParametersUnitMatch.m.

Also using spikeGLX and Neuropixels, and spikesorting your data with (some form of) Kilosort? You are lucky! We included two example pipelines (see the folder 'ExampleAnalysisPipelines'), to show how we go from raw data (SpikeGLX + Neuropixels) to Kilosorted data ((Py)Kilosort) to curated single units (Bombcell), from where we can smoothly run UnitMatch.

You can also find some example data here.

Output

UnitMatch has two main outputs:

A matching table. This contains for every included pair of units each of the similarity scores, the probability of being a match, and some extra information (e.g. functional scores, if using our functional score validation analysis).
A UniqueIDConversion, which is a struct containing the original cluster identities (as was defined in clusinfo.cluster_id), the 'UniqueID' which - if using the 'AssignUniqueID' function - gives matching clusters across recording days the same UniqueID. Useful for later analysis! It contains extra information that might be useful, such as whether it was considered a 'good unit', and what recording session it was found in.

Modules

After the initial UnitMatch algorithm, you can use different 'Modules' to check and validate UnitMatch' output. Each module can be used simply by entering the path to the UnitMatch output. N.B. for some of the modules you need functional data in the format of (Py)Kilosort output or need to have used Bombcell.

AssignUniqueID (see Output.2)
EvaluatingUnitMatch: Within session cross-validation, or comparison to using Kilosort in the stitched/concatenated way
QualityMetricsROCs: Check for every quality metric from Bombcell whether this affected a unit's chance of being a match with another unit
ComputeFunctionalScores: Computes functional scores (autocorrelograms, reference population correlations, firing rate differences, if available natural images responses) similarities for each pair of units. Will be added to the matching table output.
DrawPairsUnitMatch: Draws and saves a figure for every matching pair that was found. It will also include some 'doubt cases', such as pairs that had very high functional scores, yet were not found to be a match.
FigureFlick: Useful for manual curation. A column will be added to the Matching table with the user's judgment for pairs of units. Only works after running the DrawPairsUnitMatch module.

Phy plugin

custom_unitmatch_probability_phy.py swaps the built-in similarity measure for UnitMatch's matching probability computed within recordings. It can be useful to find clusters to merge! The plugin loads a file named probability_templates.npy which contains the within-recording probability matrix. This file can be generated using SaveWithinSessionProba_forPhy.m. Phy plugin installation instructions can be found here.

Examples

Two recording sessions of same IMRO table. In the first recording this unit was oversplit, and UnitMatch will Merge it. Additonally it found it's match in the second recording.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

For commercial use please contact [email protected]

unitmatch's People

Contributors

Stargazers

Watchers

Forkers

sachuriga ryanharvey1 batter-wang mdiamantaki dervinism bouromain jakeswann1

unitmatch's Issues

Kilosort4 Related Minor Error in ExtractKilosortData.m

Thank you so much for this wonderful pipeline. It is THE feature that I'd been looking for.
I used Kilosort4 with UnitMatch and found a small error in the ExtractKilosortData.m code.

Now KS4 automatically saves the output in the '\kilosort4' folder,
so line 323 needs a modification to detect both 'KS4' and 'kilosort4'.
It works well after this.

Should I use preprocessed data to extract waveform?

Hi, please forgive me if I'm asking a stupid question. The code worked in my case 64-channel probe/phy export file. However, I am now extracting the waveform from the original data. Should I extract the waveform from processed data (which is bandpass filtered, common referenced, and centered)

Any demo data or toy data?

Hi,

I was trying to use UnitMatch with probes other than the Neuropixel, and I am wondering if you could provide me with a sample dataset for reference? It will help me alot.

Thanks,

using UnitMatch with data curated with Phy

Hi,

I am interested in trying your code on my data. I have chronic recordings with 64-channel silicone probes, which I sorted with Kilosort and then manually curated with Phy. I was wondering if you could help me use your code to skip the curation steps that you propose and already run UnitMatch on phy-curated data.

Thanks,
Nadin

Removing empty clusters is not doing what it is supposed to

Hi,

This line in the ExtractKilosortData messes up the data:

[clusinfo, sp, emptyclus] = RemovingEmptyClusters(clusinfo, sp);

There are no empty clusters in the TSV file. All sorts of other non-empty clusters are being removed instead.

ModuleNotFoundError: No module named 'params'

Hi, I tried the python version of the code, but I got this error.

ModuleNotFoundError Traceback (most recent call last)
Cell In[13], line 7
4 param = {'SpikeWidth': 90, 'waveidx': np.arange(20,50), 'PeakLoc': 35}
5 param = util.get_default_param(param)
----> 7 WavePaths, UnitLabelPaths, ChannelPos = util.paths_fromKS(AllSessionPaths)

File Q:\sachuriga\Sachuriga_Python\UnitMatch-main\UnitMatchPy\UnitMatchPy\utils.py:369, in paths_fromKS(KSdirs)
367 pathtmp = os.path.join(KSdirs[0], 'params.py')#
368 os.chdir(KSdirs[0])
--> 369 from params import n_channels_dat
370 nChannels.append(n_channels_dat - 1) #subtract the sync channel
371 os.chdir(tmp)

ModuleNotFoundError: No module named 'params'

Merging Units - Any function?

Hi,
Thanks for the helpful toolbox!

My pipeline uses spikeinterface to extract waveforms etc. I've successfully generated RawWaveforms by using sparse waveforms from spikeinterface. The UnitMatch() runs smoothly on it.

By going through the methodology of UnitMatch, the pipeline is also very helpful for merging oversplit units in the same session and compare outputs of different sorters.

I'm trying to use UnitMatch to do so.

The example figure at the end of the readme.md mentioned merging units but I couldn't find any function that does it or am I missing something?

In addition, is there any guide on how to use the GUI for curation? It's a very nice way to summarize the comparison but I don't how exactly to curate with it.

Issue with guassian filter in Python Verison

Hey Enny / Sam,

Thanks for this package. Looking forward to using it. I've come across an issue within Extract_A_unit within Extract_raw_data.py:

#tmp = gaussian_filter(tmp, 1, radius = 2, axes = 0)

I had to change this to:

tmp = gaussian_filter1d(tmp, sigma = 1, radius = 2, axis = 0)

Because the first func didn't have axes as an argument. See help doc - Perhaps I've done something wrong so let me know - https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.gaussian_filter.html

Error in UnitMatch after extracting waveform from openephys

Hello,

I encounter an error when trying to use your pipeline. I have open ephys recording and I spikesorted my data using Kilosort4.
I have 1 recording. First question is UnitMatch capable of matching unit in only recording.
I extracted the waveform from my openephys data using your demo Matlab file for openEphys.
However I came across this error and I don't know how to solve it.

Unable to perform assignment because the size of the left side is 61-by-1 and the size of the right
side is 82-by-1.

Extracting raw waveforms. Progress: 100%
Extracting raw waveforms took 27 minutes for 501 units
Extracting waveform information. Progress:   0%Unable to perform assignment because the size of the left side is 61-by-1 and the size of the right
side is 82-by-1.
Error in ExtractParameters (line 266)
        ProjectedWaveform(:,uid,cv) =
        nansum(spikeMap(:,ChanIdx,cv).*repmat(weight,1,size(spikeMap,1))',2)./sum(weight);

Error in UnitMatch (line 104)
[AllWVBParameters,param] = ExtractParameters(Path4UnitNPY,clusinfo,param);

Error in DEMO_UNITMATCH_OpenEphys (line 57)
[UniqueIDConversion, MatchTable, WaveformInfo, UMparam] = UnitMatch(clusinfo, UMparam);

AssignUniqueID dosent work

1 As shown in the figure, it seems like two units matched, but they are not being assigned the same ID. This issue occurred not only with the provided data but also with my own data.

2 I am not a Neuropixel/SpikeGLX user, so I have been extracting waveforms manually. However, the preprocessing performed on the original data and the waveform extraction process are unclear. Could you please share details about the processes performed on the waveform?

3 like in para file GoodUnitsOnly was set to 1 but its still including all the units.

RunUnitMatch not enough input arguments

Error in ExtractParameters: referencing channel 384 when there are only 383 channels

I am getting the following error:

Index in position 2 exceeds array bounds. Index must not exceed 383.

Error in ExtractParameters (line 153)
        [~,MaxChannel(uid,cv)] = nanmax(nanmax(abs(spikeMap(waveidx,ChanIdx,cv)),[],1)); %Only over relevant channels, in case there's other spikes happening elsewhere simultaneously

Error in UnitMatch (line 107)
[AllWVBParameters,param] = ExtractParameters(Path4UnitNPY,clusinfo,param);

Error in DEMO_UNITMATCH (line 106)
[UniqueIDConversion, MatchTable, WaveformInfo, UMparam] = UnitMatch(clusinfo, UMparam);

I assume this might be caused by me setting the following parameters like this:

UMparam.nSavedChans = 384;
UMparam.nSyncChans = 0;
UMparam.nChannels = 384;

rather than

UMparam.nSavedChans = 385;
UMparam.nSyncChans = 1;
UMparam.nChannels = 385;

because I don't have an extra sync channel.

Could you please advise me on how I should fix this issue in my case when there is no sync channel?

'RawWaveforms' subfolder?

Hi, I've recently started using UnitMatch but encountered an issue with the 'RawWaveforms' subfolder mentioned in the instructions below.

"Each of these paths (typically a Kilosort output folder) should contain a subfolder called 'RawWaveforms' in which for every unit (/cluster) there is a NPY file containing two average waveforms per recording site (spikewidth X nRecordingSites X 2). The two average waveforms are preferably from the first versus second half of a recording."

I successfully ran bombcell initially, but what I got regarding waveforms was 'templates._bc_rawWaveforms.npy,' which seems to include waveform information for all units. It didn't create a subfolder 'RawWaveforms,' and there are no individual NPY files for each unit. I'm wondering if there is an option to easily obtain this information to smoothly run unitmatch.

Thanks in advance!

Error in filling channel position on python version

Hey!

I'm getting an error with channel position:

Which is leading to a (I assume) errors down the line

Any advice?

ReadMeta2 returns a nonsensical output

Hi Enny,
Thank you for this cool toolbox! Looks very promising :):)
I'm trying to run the DEMO_UNITMATCH.m and I get the following error:

Invalid field name: 'M . # ) B ) • ‰ 3 Êÿ‰ � � ðÿP i g þÿ� � ) � g ´ÿ(
u ! � ‹ � Äÿ¬ÿ# � 5 ÷ÿI � g q � &'.

Error in setfield (line 35)
s.(deblank(strField)) = varargin{end};

Error in ReadMeta2 (line 35)
meta = setfield(meta, tag, C{2}{i});

Error in ExtractKilosortData (line 374)
[Imecmeta] = ReadMeta2(fullfile(rawD(id).folder, strrep(rawD(id).name, 'cbin', 'meta')), 'ap');

Error in DEMO_UNITMATCH (line 51)
UMparam = ExtractKilosortData(UMparam.KSDir, UMparam); % Extract KS data and do some noise removal, optionally decompresses cbin to bin data and uses BOMBCELL quality metric to define good single units

The problem seems to be in ReadMeta2. In particular, line 21 (textscan functions), returns a nonsensical output.

Please help?
All the best,
Valerio