whitews / flowkit Goto Github PK
View Code? Open in Web Editor NEWA Python toolkit for flow cytometry analysis supporting GatingML and FlowJo workspaces
Home Page: https://flowkit.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
A Python toolkit for flow cytometry analysis supporting GatingML and FlowJo workspaces
Home Page: https://flowkit.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
Add option to Sample class for exporting events as a Pandas DataFrame with options to have column headers as either PnN labels or both PnN/PnS labels. Other options will be the 'source' and 'subsample' args like in other Sample methods.
Sample transform methods should match the gating ones, both in implementation and in which ones are available.
Currently, only 2-dimensional ellipsoid are supported, yet the GatingML specification allows for n-dimensional ellipsoids. This needs to be implemented to fully support GatingML.
Currently, the 'parent' attribute of a gate is just a string for the parent's gate ID. This can be looked up internally to get the parent Gate instance, but the user cannot easily get the full reference. Maybe add a property or method to get the full parent reference.
Need an example of the FlowJo workspace XML for an ellipse gate element. The gate coordinates for polygon gates are saved in the non-transformed space in FJ workspaces, so it is likely the same for ellipse gates, however it might not be as straight-forward to convert depending on how the covariance is stored by FJ.
Lots of redundant code in the various export methods. Create a single export method that encapsulates all the current export functionality.
Consider the following gating strategy:
root
╰── Time
╰── Singlets
╰── aAmine-
╰── CD3+
├── CD4+
│ ├── CD107a+
│ ├── IFNg+
│ ├── IL2+
│ ╰── TNFa+
╰── CD8+
├── CD107a+
├── IFNg+
├── IL2+
╰── TNFa+
Here we would have duplicate gate IDs (e.g. IL2+) under different branches (CD4+ and CD8+). Currently this isn't allowed, but should be supported as it is commonly found in real-world gating strategies. FlowKit should allow re-using gate IDs as long as the parent gate paths are distinct.
Expand and split tutorial notebook into multiple parts.
Pretty self-explanatory...currently, the method takes either a single gate ID string or None (for all the gates). If a user wants results for more than one gate but not all of them, they have to loop over the gates of interest. Supporting a list of gate IDs would avoid this.
Hi @whitews, of all the cytometry libs available, I appreciate how straightforward and friendly yours is.
Do you support handling multiple datasets in FlowKit? I know that other projects (fcs/CytoFlow) allow this, but their script truncates the last dataset when reading.
Thanks!
I load this gatingML file from cytobank: CytExp_231400_Gates_v1.zip
I import it by
g_strat = fk.GatingStrategy('./cytobank/Gating.xml')
but when I check the gate name, it's not correct, and also the hierarchical seems to be wrong.
This is the gating strategy diagram
Could you please help me on this problem? Thank you
Good morning, I have the FCS files from FlowCAP-II AML: https://flowrepository.org/id/FR-FCM-ZZYA
I wonder how can I perform automatic compensation by using FlowKit, I have tried the Flowultils compensate method but it didn't work
With the new functionality to automatically create a compensation matrix from bead files, there is still no mechanism to apply a compensation matrix to the samples loaded in a Session instance. This relates to #11, as both issues deal with programmatically analyzing FCS files. The GatingStrategy class handles compensation, transformation, and gating. So, the capability to programmatically add gates or apply a compensation matrix should probably be added to the GatingStrategy class with convenient API calls in the Session class.
Though unusual, the GatingML specification does allow for a gate with dimensions that reference 2 (or more) compensation matrices. Currently, FlowKit does not support this scenario and requires that a gate's dimension reference the same compensation matrix. However, for full GatingML support, this should be implemented.
Currently, the GatingStrategy method gate_sample
returns a dictionary with ID keys and boolean arrays indicating which indices are inside a gate. While this is helpful for extracting those events, it is more typical in flow analysis to get absolute & relative frequencies. A new class for gating results could provide convenient ways to provide these and other useful statistics, and perhaps plotting functionality as well.
Not sure if this should be done in Sample or GatingStrategy class. Maybe in some new GatingResults class?
Hi,
Is there an easy way to add a channel to an FCS file? I'm doing clustering by cell type on FCS data and I would like to be able to add a channel called 'cluster' to the FCS file and then export the file. A user might then load an FCS file, perform the clustering and export a copy of the FCS file. Then they can take their new FCS file to, for example, Cytobank, draw a tSNE map of their data and use the cluster channel to color their cells.
Currently I'm hacking it by taking _raw_events and adding a column to it like so:
file = 'somepath'
fksample = fk.Sample(file))
# arbitrary cluster assignment
clusters = np.zeros((fksample.get_raw_events().shape[0], 1))
new_events = np.hstack((fksample.get_raw_events(), clusters))
sorted_channel_keys = sorted([int(i) for i in fksample.channels.keys()])
sorted_channel_keys = [str(i) for i in sorted_channel_keys]
header = [fksample.channels[x]['PnN']for x in sorted_channel_keys] + ['cluster']
new_sample = fk.Sample(fcs_path_or_data=new_events, channel_labels=header)
new_sample.export_csv(filename='aml.csv', source='raw')
new_sample.export_fcs(filename='aml.fcs', source='raw')
This method seems to lose all metadata in the new fcs.
The GatingStrategy class (or some new version of it) should provide functionality to export a valid GatingML document. This will be especially useful in conjunction with the capability to programmatically create gates in Python.
Since a Session wraps up multiple Sample instances and a GatingStrategy, perhaps it would be useful to have a convenient way to save a Session and restore it. This would allow users to easily continue where they left off for complex operations, and to archive analysis pipelines for reproducibility and re-use. Would be easily achieved through pickling the Session instance.
It's all the rage nowadays as cytometers add more and more detectors. Could be part of a series of features related to dimension reduction.
Currently, the "raw" events have been pre-processed according the the specified channel gain and lin/log display found in the channel metadata. However, some users may want the original events as they are saved in the FCS file.
Related to issue #19
Move all the logic for parsing compensation matrix formats to the Matrix class. This will make the API clearer and more consistent. Any functionality using or relying on a comp matrix will take a Matrix instance, and the user will not have to worry about the spill format that any function or method takes.
Add an attribute/method for getting a human readable display of the gating hierarchy used in a GatingStrategy instance. Don't know if this should be text, an image, or something else...
Add separate argument for specifying bead FCS files to the Session class. Maybe these are in a separate directory or maybe they are in the main and need to be discovered using some sub-string matching of file names? Regardless of how they are identified, add a Session method to calculate the compensation using linear regression.
Currently, polygon, ellipse, rectange, and range gates are able to be plotted from the Session class. Quadrant gates are more complex, and will be added to a future 0.4.x release.
With the new codecov integration, we can monitor test coverage. Get coverage to 70% for the 0.4.0 release.
A significant amount of time can be spent in the points_in_polygon function. See what optimizations can be made there, especially vectorizing the routine. If nothing significant can be done, convert to C and wrap in Python.
Add support for new CytoML document format, an extension of the GatingML 2.0 specification. The CytoML format had its roots in Cytobank and is now defined and supported by the R package CytoML. The R package supports reading and writing FlowJo WSP files, thus providing a path to round-trip manual analysis from FlowJo -> CytoML -> FlowKit, and back again.
Currently the Sample class caches the transformed events but not the type of transform or the parameters used. It would be helpful if the last transform type & params were saved as well.
Just realized the unit_test directory is being copied to the root of site_packages when installing. I think this directory needs to be within the main flowkit source directory.
Hi Scott,
I've been following your tutorial and came across an issue with getting the gating results (input 48 onward).
The method gate_sample
of your GatingStrategy
class is returning a dictionary, rather than an object from the GatingResults
class.
> gs_results = g_strat.gate_sample(sample)
> type(gs_results)
dict
> gs_results
{'Range1': array([False, False, False, ..., False, False, False]),
'Rectangle1': array([False, False, False, ..., False, False, False]),
'Rectangle2': array([False, False, False, ..., False, False, False]),
'Range2': array([False, False, False, ..., False, False, False]),
'RatRange1': array([ True, False, True, ..., False, True, False]),
'RatRange2': array([False, False, True, ..., False, False, False]),
'RatRange1a': array([False, False, True, ..., False, False, False]),
[...]
I've tried calling GatingResults
using thegs_results
output (I copy-pasted your source as I was unable to import it), but I haven't succeeded:
GatingResults(gs_results, sample_id=sample.original_filename)
AttributeError Traceback (most recent call last)
<ipython-input-17-e88b290ac31f> in <module>
----> 1 GatingResults(gs_results_dict, sample.original_filename)
<ipython-input-3-fd1b25407c76> in __init__(self, results_dict, sample_id)
4 self.report = None
5 self.sample_id = sample_id
----> 6 self._process_results()
7
8 @staticmethod
<ipython-input-3-fd1b25407c76> in _process_results(self)
25 if 'events' not in res:
26 # it's a quad gate with sub-gates
---> 27 for sub_g_id, sub_res in res.items():
28 pd_dict = self._get_pd_result_dict(sub_res, sub_g_id)
29 pd_dict['quadrant_parent'] = g_id
AttributeError: 'numpy.ndarray' object has no attribute 'items'
I'm on Ubuntu bionic (18.04) and Python 3.6.8
Even after installing the full MS visual studio suite, i get the following error when i try to install fcs. I'm not sure how to proceed
ERROR: Failed building wheel for flowutils
Running setup.py clean for flowutils
Failed to build flowutils
Installing collected packages: flowutils
Running setup.py install for flowutils ... error
ERROR: Command errored out with exit status 1:
command: 'c:\programdata\anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\pauln\AppData\Local\Temp\pip-install-kg5n55n5\flowutils\setup.py'"'"'; file='"'"'C:\Users\pauln\AppData\Local\Temp\pip-install-kg
5n55n5\flowutils\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\pauln\AppData\Local\Temp\pip-record-jgnsrohf\in
stall-record.txt' --single-version-externally-managed --compile
cwd: C:\Users\pauln\AppData\Local\Temp\pip-install-kg5n55n5\flowutils
Complete output (29 lines):
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\flowutils
copying flowutils\compensate.py -> build\lib.win-amd64-3.7\flowutils
copying flowutils\transforms.py -> build\lib.win-amd64-3.7\flowutils
copying flowutils_init_.py -> build\lib.win-amd64-3.7\flowutils
running build_ext
building 'flowutils.logicle_c' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\flowutils
creating build\temp.win-amd64-3.7\Release\flowutils\logicle_c_ext
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.22.27905\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -Ic:\programdata\anaconda3\lib\site-packages\numpy\core\include -Iflowutils/logicle_c_ext -Ic:\programdata\anaconda3
include -Ic:\programdata\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.22.27905\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.22.27905\include" "-IC:\Program File
s (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program
Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /Tcflowutils/logicle_c_ext/_logicle.c /Fobuild\temp.win-amd64-3.7\Release\flowutils/logicle_c_ext/_logicle.obj -std=c99
cl : Command line warning D9002 : ignoring unknown option '-std=c99'
_logicle.c
c:\programdata\anaconda3\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.22.27905\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -Ic:\programdata\anaconda3\lib\site-packages\numpy\core\include -Iflowutils/logicle_c_ext -Ic:\programdata\anaconda3
include -Ic:\programdata\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.22.27905\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.22.27905\include" "-IC:\Program File
s (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program
Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /Tcflowutils/logicle_c_ext/logicle.c /Fobuild\temp.win-amd64-3.7\Release\flowutils/logicle_c_ext/logicle.obj -std=c99
cl : Command line warning D9002 : ignoring unknown option '-std=c99'
logicle.c
flowutils/logicle_c_ext/logicle.c(213): error C2057: expected constant expression
flowutils/logicle_c_ext/logicle.c(213): error C2466: cannot allocate an array of constant size 0
flowutils/logicle_c_ext/logicle.c(213): error C2133: 'tmp_taylor': unknown size
flowutils/logicle_c_ext/logicle.c(321): error C2057: expected constant expression
flowutils/logicle_c_ext/logicle.c(321): error C2466: cannot allocate an array of constant size 0
flowutils/logicle_c_ext/logicle.c(321): error C2133: 'tmp_taylor': unknown size
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.22.27905\bin\HostX86\x64\cl.exe' failed with exit status 2
----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\programdata\anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\pauln\AppData\Local\Temp\pip-install-kg5n55n5\flowutils\setup.py'"'"'; file='"'"'C:\Users\pauln\App
Data\Local\Temp\pip-install-kg5n55n5\flowutils\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\pauln\AppData
Local\Temp\pip-record-jgnsrohf\install-record.txt' --single-version-externally-managed --compile Check the logs for full command output.
Create example of a FlowJo workspace with a quadrant gate and add xml utils code to parse it. Maybe use the diamond sample?
Great work I'm looking to apply a FlowJo 10 .wsp gating strategy to a Sample() object, as I am unable to find a way to export a suitable FlowJo xml strategy (FlowJo 9 xml format is not readable and FlowJo 10 doesn't seem to offer me an option to export a gating strategy as xml).
Approaches tried so far
Desired solution
Preferably, to use an xml gating strategy from FlowJo 9 (appreciate that you will likely no support that though).
Otherwise, another tutorial notebook to show how to use a FlowJo gating strategy from FlowJo 10 (either a .wsp or assistance on what gating strategy exports from FlowJo 10 are suitable to use with FlowKit).
Thanks in advance, great work on the package. Your visualisations using bokeh look great.
The current QuadrantGate class is clunky to instantiate, requiring the construction of an undocumented dictionary of quadrants. Further the QuadrantGate constructor takes QuadrantDivider instances instead of Dimension instances. Would be nice to figure out a way to not require both the dividers and the quadrants, but we need the quadrant labels specified by the user, so may not be possible.
This would remove both the matplotlib and seaborn dependencies. Bokeh is also faster and allows for interactivity.
Currently, all the events (sub-sampled) in a sample are displayed when plotting gates in the Session class. The events need to be gated with any parent gates to properly display the proper subset of events for the gate.
Is your feature request related to a problem? Please describe.
I would like to cite flowkit
in a publication in order to fully credit the author.
Describe the solution you'd like
A citation with a digital object identifier (DOI).
Additional context
If no publication is/will be available shortly, uploading a new release to Zenodo could be a good alternative.
Quadrant gates are a bit of an odd case in that they are not gates themselves, rather a collection of gates that divide a pair of dimensions into rectangular spaces, each with its own gating ID. Because of this they didn't fit into the pattern of other Gate sub-classes. Still, this needs to be implemented to fully support GatingML.
And support for the scenario where a gate has a parent quadrant reference of a QuadrantGate. Note, the QuadrantGate itself cannot be a parent, only its component Quadrants can be a parent gate.
For 0.4.x, increase test coverage to 75%, with all non-plot files at at least 70%
Currently all gates are created via GatingML gating elements. If someone wanted to create a gating strategy via Python code, the only way to do this would be to create these text elements in code which would be rather tedious. Perhaps a set of gate classes could be made that mirrors the current set, maybe the current ones get renamed to GML-gates (i.e. GMLRectangleGate).
Any reference to a gate class gives the user little to no information other than the class name and the memory address. Try to create more informative string representations of the class instances.
A gating strategy usually references only a single comp matrix for a hierarchy of gates. However, the code currently applies compensation & transforms in each gate, meaning the compensation procedure is duplicated over and over. Caching this array will provide significant performance improvements, especially for large gating strategies.
To solve plotting of gated events and the issue of creating a gating strategy programmatically, create a new Session class. A FlowKit Session will serve as the main user interface for the package, allowing a single line of code to get started analyzing a collection of FCS files using either an existing GatingML document or a blank GatingStrategy canvas.
Event data can be nicely extracted using get_events_as_data_frame()
, but cannot easily be used for downstream analysis as I can't relate individual events back to the populations they belong to.
Desired solution
As events may fall into more than one population, it is not simply a case of adding an additional column to the pandas.DataFrame
object return from get_events_as_data_frame()
. Perhaps a nice solution might be to provide an additional method that returns the mapping of population names to the indexes of the pandas.DataFrame
, e.g.
{
'lymphocytes': [0,1,2,3],
'CD8+Tcells': [2,3,4]
}
FlowKit can now import FJ workspaces. Being able to export a FlowKit Session to a new FJ workspace would allow round-tripping manual analysis in FJ to automated post-processing in FlowKit and then back to FJ for analysts.
import flowkit
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda3\lib\site-packages\flowkit_init_.py", line 2, in
from .models.gate import GatingStrategy
File "D:\Anaconda3\lib\site-packages\flowkit\models\gate.py", line 9, in
from flowkit.resources import gml_schema
File "D:\Anaconda3\lib\site-packages\flowkit\resources_init_.py", line 18, in
gml_tree
File "src/lxml/xmlschema.pxi", line 86, in lxml.etree.XMLSchema.init
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}group', attribute 'ref': The QName value '{http://www.isac-net.org/std/Gating-ML/v2.0/transformations}Transformation_Group' does not resolve to a(n) model group definition., line 51
My Environment:
python 3.7
FlowIO 0.9.5
FlowKit 0.1.0
flowutils 0.7.0
lxml 4.4.1
Windows 10 1903
FlowKit fails to parse an FCS $SPILL value as a compensation matrix, raising the error:
ValueError: Matrix labels do not match fluorescent labels in FCS file
Since the scatter plot matrix can take some time to create and display, it makes since to specify just the subset of channels to display.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.