spacetx / starfish Goto Github PK
View Code? Open in Web Editor NEWstarfish: unified pipelines for image-based transcriptomics
Home Page: https://spacetx-starfish.readthedocs.io/en/latest/
License: MIT License
starfish: unified pipelines for image-based transcriptomics
Home Page: https://spacetx-starfish.readthedocs.io/en/latest/
License: MIT License
Upload worked example of processed in-situ sequencing data (RCA w/ Padlock probes) in human breast tissue
Right now, the API for displaying an image stack looks like:
from starfish.io import Stack
s = Stack()
s.read(in_json)
tile(s.squeeze());
It would be nice if the last line in the above snippet looked like:
s.show()
Upload worked example of processed DARTFISH data
Upload worked example of processed MERFISH data
Filter methods should have an option to dump pngs for easy feedback + viewing.
Watershed needs a stain
image. Previously this was produced in the filter
pipeline stage. The new API should be:
stain
as a separate imageImageSet
.Many collaborators are on Windows machines. We should make sure pip installation works for them w/ no usability issues.
Dependencies: #834
For now, we've been using the Python API to process 1 FOV at a time to ensure the API makes sense and works as expected. Work on this will continue through #54 #55 #56 #57 etc. However, what does it look like to use the CLI to process an entire dataset comprised of multiple FOVs? How fast is the process? Did it work? How good is our wdl approach?
We have an early demo (https://github.com/chanzuckerberg/starfish/blob/master/starfish.wdl) of how Starfish can be run on 'green box' architecture for the HCA DCP. We should ensure that this demo stays up to date and plays will with changes to the CLI
Each pipeline component should inherit from a single AlgorithmBase object, instead of each separate pipelinecomponent
implementing a new one.
For the examples below, it should be easy for users of the API and the CLI to quickly get help through the docstrings and argparser respectively
s_reg = Registration.fourier_shift(s, upsample)
s_reg_filt = Filter.gaussian_low_pass(s_reg, simga)
s_reg_filt2 = Filter.gaussian_high_pass(s_reg_filt, sigma_2)
A good example can be found in the Atom project:
https://github.com/atom/atom/blob/master/CONTRIBUTING.md
And the Jupyter project:
https://github.com/jupyter/notebook/blob/master/CONTRIBUTING.rst
This can make the .ipynb portable (or .py if we have a better tool for converting ipynb <-> py).
Currently, starfish defines both API and CLI parameters, both of which can declare default paramters and help strings.
We'd like to harmonize this, ideally by programming the information into the API and automatically generating the CLI.
Tony thought it might be possible to do this with a decorator, but wants to experiment with how that could affect usability.
There currently exists a prototype interactive visualization of processed Starfish data. The code lives here: https://github.com/chanzuckerberg/starfish/tree/master/viz
There are several interesting directions to take this in:
There are two options: 1. Decode each pixel into a corresponding gene, then use a connected components labeler to call spots as contiguous genes of a certain size 2. Find spots, pool the intensity, then decode each spot into a gene.
It should be possible, through simulations, to determine under which SNR noise regimes one method is better than another.
I have been working on a similar application Skopy. It can be used for feature extraction (i.e. similar to CellProfiler’s measurement modules), but I plan on adding image segmentation and object detection in the near future. It has nearly the same mission, providing a ready-made pipeline for image analysis, but it has a simple, portable, and lightweight architecture and people have already started using it. Would you be interested in contributing? It could be fun!
Hi. I am having difficulty passing this step in the notebook (Reproducing Marco's results with Starfish):
s.read('ISS/fov_001/org.json')
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python2.7/site-packages/starfish/io.py", line 48, in read
self._read_stack()
File "/usr/lib/python2.7/site-packages/starfish/io.py", line 77, in _read_stack
im = self.read_fn(os.path.join(self.path, fname))
File "/usr/lib64/python2.7/site-packages/numpy/lib/npyio.py", line 431, in load
"Failed to interpret file %s as a pickle" % repr(file))
IOError: Failed to interpret file u'/newdata/data2/homes/joshr/ISS/fov_001/1_1st_Cy3 5.TIF' as a pickle
We generate a bunch of "dots" images that represent bright spots in the image. These are typically max projections over some subset of the data. Brian long suggested we might want to store some information on how such projections were created (e.g. max projection over z? z, h, and c?
Determine how to store this data in starfish and the starfish spec.
This allows us to take advantage of refactoring tools and keep the notebooks up to date.
This may be useful to have as part of the registration module, as opposed to what's currently there which is simply a fourier based translation algorithm
The current way to update a data stack is:
# this is a vector that needs to be a tensor; s knows what the shape is
s.set_stack(s.un_squeeze(stack_filt))
It would be preferable to have an API for s.set_stack()
that takes an arbitrary list that implicitly knows the correct shape.
Alternatively, s.set_stack()
could be a classmethod that generates a new stack, thus removing side-effecting code from the code-base.
Currently, the segmentation module in Starfish implements a single seeded watershed algorithm. While this is a good start, several labs have also used simple Voronoi tesselation. We should implement this as an option for pipeline research.
We want Starfish to natively and seamlessly support 3D images. There are several TODOs in the code where this transition has not been made yet.
To the uninitiated (aka me), it can take a while to understand whether the x-axis is the hybridization rounds and the y-axis is the channels, or vice versa.
Manually adding small, unobtrusive text to the figure would greatly help:
import matplotlib.pyplot as plt
fig = tile(s.squeeze(), size=20);
fig = plt.gcf()
# fig.text(x, y, text)
# x (y) = number from 0 to 1, where 0 is the left (bottom) of the plot and 1 is the right (top) of the plot.
# The numbers were found by manually playing around
fig.text(.5, .8, "Channels")
fig.text(.11, .53, "Hybridization rounds", rotation=90)
In the contributing guidelines, it is suggested to use nbencdec
to save Jupyter notebooks as python files side-by-side. Can you help me understand the advantages of this package over the internal nbconvert
or the nbdime
package?
git
hook, e.g. *.ipynb filter=ipython nbconvert --to python
This project needs some documentation style guidelines, preliminary documentation, auto-generation of documentation, and a statement in the contribution.md about how to build new documentation.
An example of a project that does this well: https://dask.pydata.org/en/latest/
We want to know when things crash and why. We want to know what's slow and why. We need a sensible logging infrastructure to keep track of these issues. This is particularly important for the CLI but probably not the API.
The gather
step of munge.py
is giving a ValueError: setting an array element with a sequence
.
The error arises during the pd.melt
in the function below. Haven't tried to chase down further, wanted to flag it.
def gather(df, key, value, cols):
id_vars = [col for col in df.columns if col not in cols]
id_values = cols
var_name = key
value_name = value
return pd.melt(df, id_vars, id_values, var_name, value_name)
This is no longer needed since we entirely use skimage. Opencv should also be removed from setup.py and requirements.txt.
argparse.Namespace
, argparse.ArgumentParser
)Some parameters should be required, optional, or positional. We should make sure we get this right.
For example, org_json and output_dir should be required parameters, not positional parameters.
Currently, Starfish processes one field of view at a time. At some point, these processed (and potentially unprocessed) images/results need to be stitched together for visualization. This can either be done implicitly, e.g., for each FOV we record an offset and position in an overall grid with necessary overlap information, or explicitly, e.g., the former information is used to create one large image / table representing results.
What are the right algorithms for stitching, handling boundary artifacts, and de-duping tabular results data?
Positional parameters make it too inflexible.
ISS notebook is currently does not work because the registration module has moved.
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-8-7b1a2d078906> in <module>()
----> 1 from starfish.registration._fourier_shift import FourierShiftRegistration
2 from starfish.registration import Registration
3
4 upsample = 1000
5 s = Registration.run("FourierShiftRegistration", s, upsample)
ModuleNotFoundError: No module named 'starfish.registration'
When I fix that, the pipeline appears to have changed the run API, so either @ttung can take a stab at it or I'll review the commits that changed this when I've had more sleep.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-19c5ac120260> in <module>()
3
4 upsample = 1000
----> 5 s = Registration.run("FourierShiftRegistration", s, upsample)
/usr/local/lib/python3.6/site-packages/starfish/pipeline/pipelinecomponent.py in run(cls, algorithm_name, stack, *args, **kwargs)
20 def run(cls, algorithm_name, stack, *args, **kwargs):
21 """Runs the registration component using the algorithm name, stack, and arguments for the specific algorithm."""
---> 22 algorithm_cls = cls._class_for_algorithm(algorithm_name)
23 instance = algorithm_cls(*args, **kwargs)
24 return instance.register(stack)
AttributeError: type object 'Registration' has no attribute '_class_for_algorithm'
How do we know our results of processed data are accurate? Is there a set of QC metrics that one can compute, that generalize across assays, that can answer this question? What QC metrics do methods developers want? What QC metrics do computational biologists want? What QC metrics do software engineers want?
The current Stack abstraction in io.stack.py conflates 'hybridization' images and 'auxilary' images. It's probably a good idea to separate these two under a common base class. This will:
Eliminate having to load (and potentially write) all 'hybridization' and 'auxilary' images when you only want to operate on, say, a single auxilary images
Eliminate the propagation of an 'org.json' file at each step of computation
Make the CLI more flexible and less tied to specific intermediary file formats
deep says this issue sucks. badly documented.
Right now starfish loads float, we should do one of:
uint16
uint16
or complain if we can't deduce how to do this from the data.This link in the main readme is broken. https://github.com/spacetx/starfish/blob/master/notebooks/Starfish%20Simple%20ISS%20tutorial%20%7C%20Mouse%20vs.%20Human%20Fibroblasts.ipynb
How do I see a worked example? I'm interested to see examples of input and output.
Thanks
Matt
See #97 as an example.
The notebook repository will contain several examples of how to use Starfish to analyze data from several assays, e.g., MERFISH, DARTFISH, Padlock Probes, sequential smFISH, etc.
While these notebooks provide direct examples of how to use the Starfish API, they can also be re-factored into unit tests such that developers can make sure they're not making breaking changes.
Developers will also need to make sure that the Jupyter notebooks are sufficiently updated if the corresponding unit tests need to be updated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.