Giter Club home page Giter Club logo

starfish's People

Contributors

ambrosejcarr avatar atarkowska avatar berl avatar chrisroat avatar csoneson avatar dany-fu avatar dependabot[bot] avatar dganguli avatar freeman-lab avatar gokceneraslan avatar iimog avatar imagejan avatar joshmoore avatar kevinyamauchi avatar kne42 avatar mattcai avatar mckinsel avatar neuromusic avatar nickeener avatar njmei avatar olgabot avatar sgratiy avatar sofroniewn avatar ttung avatar xchang1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

starfish's Issues

Add show method to io.Stack object

Right now, the API for displaying an image stack looks like:

from starfish.io import Stack
s = Stack()
s.read(in_json)
tile(s.squeeze());

It would be nice if the last line in the above snippet looked like:

s.show()

Watershed Stain API

Watershed needs a stain image. Previously this was produced in the filter pipeline stage. The new API should be:

  • Watershed should take a stain
  • User should be able to provide this stain as a separate image
  • If not provided, Watershed should have a parameter that takes string names that indicate how the stain should be produced from a provided ImageSet.

Process entire dataset, not just 1 FOV

For now, we've been using the Python API to process 1 FOV at a time to ensure the API makes sense and works as expected. Work on this will continue through #54 #55 #56 #57 etc. However, what does it look like to use the CLI to process an entire dataset comprised of multiple FOVs? How fast is the process? Did it work? How good is our wdl approach?

Work here will also inform #51 #58 #59 #61

AlgorithmBase -> single object

Each pipeline component should inherit from a single AlgorithmBase object, instead of each separate pipelinecomponent implementing a new one.

Algorithms should be exposed as top-level functions inside package with help

For the examples below, it should be easy for users of the API and the CLI to quickly get help through the docstrings and argparser respectively

s_reg = Registration.fourier_shift(s, upsample)
s_reg_filt = Filter.gaussian_low_pass(s_reg, simga)
s_reg_filt2 = Filter.gaussian_high_pass(s_reg_filt, sigma_2)

Research: CLI and API parameters are defined once and used for both calls.

Currently, starfish defines both API and CLI parameters, both of which can declare default paramters and help strings.

We'd like to harmonize this, ideally by programming the information into the API and automatically generating the CLI.

Tony thought it might be possible to do this with a decorator, but wants to experiment with how that could affect usability.

Interactive visualization of results

There currently exists a prototype interactive visualization of processed Starfish data. The code lives here: https://github.com/chanzuckerberg/starfish/tree/master/viz

There are several interesting directions to take this in:

  1. Split this into it's own repo
  2. Scope out possible UI elements, e.g., right now the visualizer allows one to look at spots, segmentation results, turn on and off genes, and pan/zoom. What are we missing? What could be improved?
  3. Have this visualizer connected to the CLI, such that after a CLI user finishes running a pipeline, they can visually inspect results
  4. Have this visualizer connected to the API, potentially so that it can be run from a Jupyter notebook
  5. Work on ensuring that our standardized file formats work well with the visualizer.

What kind of spot detection is best?

There are two options: 1. Decode each pixel into a corresponding gene, then use a connected components labeler to call spots as contiguous genes of a certain size 2. Find spots, pool the intensity, then decode each spot into a gene.

It should be possible, through simulations, to determine under which SNR noise regimes one method is better than another.

Interested in contributing to Skopy?

I have been working on a similar application Skopy. It can be used for feature extraction (i.e. similar to CellProfiler’s measurement modules), but I plan on adding image segmentation and object detection in the near future. It has nearly the same mission, providing a ready-made pipeline for image analysis, but it has a simple, portable, and lightweight architecture and people have already started using it. Would you be interested in contributing? It could be fun!

Problem with Reproducing Marco's results with Starfish

Hi. I am having difficulty passing this step in the notebook (Reproducing Marco's results with Starfish):

s.read('ISS/fov_001/org.json')

Traceback (most recent call last):

File "", line 1, in

File "/usr/lib/python2.7/site-packages/starfish/io.py", line 48, in read

self._read_stack()

File "/usr/lib/python2.7/site-packages/starfish/io.py", line 77, in _read_stack

im = self.read_fn(os.path.join(self.path, fname))

File "/usr/lib64/python2.7/site-packages/numpy/lib/npyio.py", line 431, in load

"Failed to interpret file %s as a pickle" % repr(file))

IOError: Failed to interpret file u'/newdata/data2/homes/joshr/ISS/fov_001/1_1st_Cy3 5.TIF' as a pickle

Investigate "dots" image vocabulary

We generate a bunch of "dots" images that represent bright spots in the image. These are typically max projections over some subset of the data. Brian long suggested we might want to store some information on how such projections were created (e.g. max projection over z? z, h, and c?

Determine how to store this data in starfish and the starfish spec.

Non-rigid registration algorithm

This may be useful to have as part of the registration module, as opposed to what's currently there which is simply a fourier based translation algorithm

set stack API

The current way to update a data stack is:

# this is a vector that needs to be a tensor; s knows what the shape is
s.set_stack(s.un_squeeze(stack_filt))  

It would be preferable to have an API for s.set_stack() that takes an arbitrary list that implicitly knows the correct shape.

Alternatively, s.set_stack() could be a classmethod that generates a new stack, thus removing side-effecting code from the code-base.

Segmentation -- add a Voronoi tesselation algorithm

Currently, the segmentation module in Starfish implements a single seeded watershed algorithm. While this is a good start, several labs have also used simple Voronoi tesselation. We should implement this as an option for pipeline research.

Improve 3D support

We want Starfish to natively and seamlessly support 3D images. There are several TODOs in the code where this transition has not been made yet.

Explain how to read `tile` output

To the uninitiated (aka me), it can take a while to understand whether the x-axis is the hybridization rounds and the y-axis is the channels, or vice versa.

image

Manually adding small, unobtrusive text to the figure would greatly help:

import matplotlib.pyplot as plt

fig = tile(s.squeeze(), size=20);
fig = plt.gcf()

# fig.text(x, y, text)
# x (y) = number from 0 to 1, where 0 is the left (bottom) of the plot and 1 is the right (top) of the plot.
# The numbers were found by manually playing around
fig.text(.5, .8, "Channels")
fig.text(.11, .53, "Hybridization rounds", rotation=90)

image

Documentation

This project needs some documentation style guidelines, preliminary documentation, auto-generation of documentation, and a statement in the contribution.md about how to build new documentation.

An example of a project that does this well: https://dask.pydata.org/en/latest/

Crash logging and performance analytics

We want to know when things crash and why. We want to know what's slow and why. We need a sensible logging infrastructure to keep track of these issues. This is particularly important for the CLI but probably not the API.

error running detect_spots

The gather step of munge.py is giving a ValueError: setting an array element with a sequence.

The error arises during the pd.melt in the function below. Haven't tried to chase down further, wanted to flag it.

def gather(df, key, value, cols):
    id_vars = [col for col in df.columns if col not in cols]
    id_values = cols
    var_name = key
    value_name = value
    return pd.melt(df, id_vars, id_values, var_name, value_name)

Add type hinting

  • include argparse types (argparse.Namespace, argparse.ArgumentParser)

Stitching multiple fields of views

Currently, Starfish processes one field of view at a time. At some point, these processed (and potentially unprocessed) images/results need to be stitched together for visualization. This can either be done implicitly, e.g., for each FOV we record an offset and position in an overall grid with necessary overlap information, or explicitly, e.g., the former information is used to create one large image / table representing results.

What are the right algorithms for stitching, handling boundary artifacts, and de-duping tabular results data?

Fix ISS notebook registration API

  • putting this here so I remember.

ISS notebook is currently does not work because the registration module has moved.

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-8-7b1a2d078906> in <module>()
----> 1 from starfish.registration._fourier_shift import FourierShiftRegistration
      2 from starfish.registration import Registration
      3 
      4 upsample = 1000
      5 s = Registration.run("FourierShiftRegistration", s, upsample)

ModuleNotFoundError: No module named 'starfish.registration'

When I fix that, the pipeline appears to have changed the run API, so either @ttung can take a stab at it or I'll review the commits that changed this when I've had more sleep.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-19c5ac120260> in <module>()
      3 
      4 upsample = 1000
----> 5 s = Registration.run("FourierShiftRegistration", s, upsample)

/usr/local/lib/python3.6/site-packages/starfish/pipeline/pipelinecomponent.py in run(cls, algorithm_name, stack, *args, **kwargs)
     20     def run(cls, algorithm_name, stack, *args, **kwargs):
     21         """Runs the registration component using the algorithm name, stack, and arguments for the specific algorithm."""
---> 22         algorithm_cls = cls._class_for_algorithm(algorithm_name)
     23         instance = algorithm_cls(*args, **kwargs)
     24         return instance.register(stack)

AttributeError: type object 'Registration' has no attribute '_class_for_algorithm'

Quality control metrics

How do we know our results of processed data are accurate? Is there a set of QC metrics that one can compute, that generalize across assays, that can answer this question? What QC metrics do methods developers want? What QC metrics do computational biologists want? What QC metrics do software engineers want?

Work here will also inform #60 and #58

Overhall stack API

The current Stack abstraction in io.stack.py conflates 'hybridization' images and 'auxilary' images. It's probably a good idea to separate these two under a common base class. This will:

  1. Eliminate having to load (and potentially write) all 'hybridization' and 'auxilary' images when you only want to operate on, say, a single auxilary images

  2. Eliminate the propagation of an 'org.json' file at each step of computation

  3. Make the CLI more flexible and less tied to specific intermediary file formats

Enforce uint16 input type

Right now starfish loads float, we should do one of:

  • only accept uint16
  • convert to uint16 or complain if we can't deduce how to do this from the data.

Convert existing jupyter notebooks into unit tests

The notebook repository will contain several examples of how to use Starfish to analyze data from several assays, e.g., MERFISH, DARTFISH, Padlock Probes, sequential smFISH, etc.

While these notebooks provide direct examples of how to use the Starfish API, they can also be re-factored into unit tests such that developers can make sure they're not making breaking changes.

Developers will also need to make sure that the Jupyter notebooks are sufficiently updated if the corresponding unit tests need to be updated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.