vanvalenlab / deepcell-label Goto Github PK

View Code? Open in Web Editor NEW

70.0 9.0 13.0 42.81 MB

Cloud-based data annotation tools for biological images

Home Page: https://label.deepcell.org

License: Other

Python 11.94% CSS 0.12% JavaScript 79.00% HTML 0.25% Dockerfile 0.10% TypeScript 8.59%

annotate-images biology image-segmentation javascript react xstate flask python

deepcell-label's Introduction

DeepCell Label: Cloud-Based Labeling for Single-Cell Analysis

DeepCell Label is a web-based tool to visualize and label biological images. It can segment an image, assign cells across a timelapse, and track divisions in multiplexed images, 3D image stacks, and time-lapse movies.

As it's available through a browser, DeepCell Label can crowdsource data labeling or review, correct, and curate labels as a domain expert.

The site is built with React, XState, and Flask and runs locally or on the cloud.

Visit label.deepcell.org to create a project from an example file or your own .tiff, .png, or .npz. Dropdown instructions are available while working on a project in DeepCell Label.

deepcell-label's People

Contributors

Stargazers

Watchers

Forkers

novayang1112 goodboywhite ccamacho89 saeedseyyedi wong-lab naterenegar johnypark rossbar ykevu jgu13 mimithecoconut drtysonlab yshokrollahi

deepcell-label's Issues

edit mode visual improvements

Miscellaneous small improvements would give the pixel-editing mode a nicer feel. These include:

color of brush preview should match color of cell that is being annotated
brush should be somewhat transparent
annotations currently are composited on top of image, making them difficult to see against dim background. Maybe some combination of compositing and adjusting alpha?
(bonus) it would be awesome if the transparency of brush/annotations could be adjusted by the user. the simplest version of this could be toggling the overlay on/off (this is implemented in desktop caliban) but a nicer version would perhaps include a slider that corresponds to the alpha value when drawing the overlays

eb deploy fails for browser caliban

Amazon's Elastic Beanstalk (eb) command-line interface works to create new caliban apps, but "deploy" does not work to update existing apps with modified code. It would be awesome if we could fix this, since otherwise, updating the eb caliban app requires creating a new one, modifying the configuration settings, and then reconfiguring caliban.deepcell.org to use the newest eb site. If there is something we can do on our end to get deploy to work correctly, it should be much easier to update the app with code changes. (This would also help prevent configuration errors; these are more likely if a person must remember to change each setting every time this happens.)

Update readme with corrected docker instructions

Volume mounted should be $PWD/desktop not $PWD/caliban/desktop.

Could also include some of the pitfalls we've encountered so far? Ie, what to do if port 5900 isn't free, some of the Windows troubleshooting, etc.

refine parent assignment in trk files

Currently, the "parent" action for trk files will prevent duplicate daughters from being added to the daughters list of a parent cell (implemented in desktop in #36 and in browser in #78), but additional changes to the action could help it perform correctly/avoid bugs in a wider variety of cases.

parent/daughter relationship should not be assigned if parent label ever appears in same frame as daughter label, or if parent label ever appears in movie again once daughter label has appeared (implemented for browser in #78)
label should not be assigned as parent to itself (implemented for browser in #78)
frame_div should be set to earliest frame that any daughter appears (for edge divisions, sometimes the second daughter appears in frame after the first daughter does, which can confuse frame_div assignment) (implemented for browser in #78)
if a parent/daughter relationship is assigned incorrectly, and a new parent is assigned to the daughter cell(s), the daughter label should be cleared from the old, incorrect parent's lineage info

implement highlight mode for npz mode

Currently highlight mode is only implemented for trk mode: would be useful for npz mode as well to enable cycling through present cells

submit button should look more like a button

The button to submit a file just looks like text. This should be visually updated to more obviously be the submit button.

Also, we should consider adding a confirmation dialog to the button in case of misclick.

Readme should have more information for users

Effects of different commands should be explained, and list of commands should be expanded as functionality expands. Will also need significant update for other Caliban use modes (eg, zstack editing).

Text Editing

Allow for the deletion/modification of lineages (parent/children/capped/etc) manually via text

rendering problems with large images

The interface gets buggy with large images. Nothing appears unless the window is resized, at which point the labels appear; this seems to only work sometimes. Perhaps enforce a maximum image size in the short term. Long term can decide whether it's worth effort to support 1024x1024

DeepCell Label should be tested for accessibility

Not sure what else this might cover but colorblindness is an obvious case. This may not be an immediate issue (for launching some Caliban jobs to annotators) but definitely should be explored.

add single-frame versions of actions

Many of the actions in Caliban were implemented with trk editing in mind, where changing only one frame was not the desired behavior. However, in npz corrections, we often want to only modify the current frame with actions such as create or replace. Desktop caliban (npz) has added single-frame versions of many of these actions. These would be useful to have in browser caliban because they enable easier (and less mistake-prone) fixes of certain annotation errors.

Each single-frame action should:

be added to caliban.py as a function
be added to the javascript as an action
have display text when action is selected (ie, instead of showing "space/esc" in side info panel, it should be "space/s/esc")

Fill holes in annotation mask

Poor annotations can have holes in the middle of a cell. Fixing these does not require updating lineage information, just updating the annotation. skimage.morphology has some functions that may be helpful here: https://scikit-image.org/docs/dev/api/skimage.morphology.html (notably, fill_small_holes and flood_fill).

Caliban should never make an empty track

Before saving, the program should check to see if there are any tracks that do not contain any frames and delete them.

cell_display info crashes when cell 1 not present

https://github.com/vanvalenlab/caliban/blob/2a9af44325324ff678845e838b5cc9d3537b6f1d/desktop/caliban.py#L629

The line of code above looks in the dictionary associated with cell 1 to get the parameter names. If the mask is missing cell 1 the program crashes. Could change to either take the first cell id present, or a random cell id

Caliban should be able to open files with no annotations

It is possible that we will deal with npz (or trk?) files where no annotations exist. This could be because there are no objects that need to be annotated, or because objects that need annotations do not have corresponding annotations. Currently, the reshape npz (preprocessing) function in deepcell-toolbox addresses this by not saving npzs with empty annotation files. However, we may want to run "empty" annotation files through fig8 in the future, or even just check through files with caliban.

Image scaling option and/or larger display

add timestamps to db rows

Would help to determine which rows in the database can be deleted without causing problems (eg, if a row hasn't been updated in a week). Not urgent but may be useful as browser caliban sees more traffic, as files won't necessarily be submitted each time they are opened (eg, demos, debugging, checking fig8 results), which will lead to rows that never get deleted from db.

Correct for single pixel segmentation errors

Allow for label deletion in a frame (in case of 1 pixel segmentation). Should also allow for expansion of labels (in case of missing pixels)

Display color overlays additional channel

Currently, each channel is treated as greyscale image that is given color by intensity scaling. A nice potential improvement would be the ability to display color overlays in the form of RGB images that are pre-defined by the user and then loaded into the npz file. This would require 1) the ability to render RGB images, and 2) an additional dimension to separate out multiple RGB and greyscale channels from each other, so that they could be scrolled through as is currently implemented.

save checkpoint or undo feature for browser users

In desktop Caliban, users can save frequently so that if they make a mistake, they can go back to a previous save instead of starting annotation over from scratch. We don't have a way to allow this for browser users, since the file is saved only when it is submitted. Pixel-editing is a little more robust to mistakes since annotators can erase or re-draw annotations, but bulk label editing mistakes can take more work to undo.

Ideally, we would have an "undo" command that undoes the most recent command. (I'm not sure how we would implement that, or if we'd be able to extend it to being able to undo/redo multiple commands.) What may be easier to implement is a save/load checkpoint feature. One checkpoint could be stored at a time that annotators could use to reload their file from. Since this relies on annotators remembering to checkpoint their file at appropriate intervals, this is less ideal than "undo" but could still work as a compromise until undo is implemented.

we should have more files in our test folder

We should have several files in the test folder so we can check browser caliban functionality over a range of use cases. Some variables I'd like to make sure get included are:

a range of image sizes
different data types (trk, untracked movie as npz, zstack, single frame npz) with both nuclear and cytoplasmic images
easy and difficult tasks
files at various stages of completion (ie, things that would get fixed with different sets of tools, such as the bulk mode operations vs various pixel-level tools that we haven't implemented yet)

Each file in the test folder should be included in the dropdown list on the caliban website. We may want to include a way to say which features each file has (ie, test2.npz is multichannel cytoplasm zstack, uncorrected, with shape of (slices, y, x, channels)). Including comments along those lines in the code or the browser caliban readme may also be helpful. Files will likely get added to the test folder as we find them in the course of annotating data, and the list of useful test examples may change as time goes on.

scale image display to fill available space

The html canvas element is likely to remain the same size across different jobs, but npzs and trks might have a range of different sizes. Currently, browser caliban scales these files by 2 to display them. Preferably, the javascript load_file function would include the available canvas size as an arg so that upon initialization, the python object (ZStackReview or TrackReview) would scale to fit that size. Scaling is currently the only way we have to magnify the image (until a zoom feature can be implemented), so we should use it to the fullest extent we can.

display channel names

Currently channels are displayed based on their index; being able to provide a set of text labels associated with each channel and display that would be helpful for annotating to know what channel is being displayed

add keybind to cycle backwards through channels

Currently, "c" advances the channel being viewed. It would be nice to add another keybind, perhaps shift+c, to cycle through channels in the other direction. That way, with npzs that contain >2 channels, contributors don't need to cycle through all of the channels to get back to a previous channel.

favicon for browser caliban

Add a favicon to the flask app. Can use the deepcell.org favicon or a custom favicon. Minor cosmetic detail but will also put an end to "favicon not found" errors.

Swap cell masks in just one frame

Occasionally, cell tracking messes up in just one frame, such that cell 1 is misidentified as cell 2 only in that frame and vice versa. The current swap feature can't fix this because it swaps the track information between the two cells for all frames of the movie. This is an uncommon error but does happen occasionally. See cells 9 and 10 (erroneously swapped in frame 13) in attached .trk file for example.
HEK293_S0P1_Batch44.zip

watershed only works when multiple cells selected

https://github.com/vanvalenlab/caliban/blob/2a9af44325324ff678845e838b5cc9d3537b6f1d/desktop/caliban.py#L233

Is there a reason that watershed only works with multiple cells selected? This seems like it would be more likely to cause problems than requiring only one cell be selected

Separate non-adjacent cell masks

Watershed can be used when cells are touching, but sometimes a cell mask appears on opposite sides of the movie (ie, the real cell 5 is near the left edge, but a few pixels called "cell 5" are on the right edge). Currently there is no way to separate those stray pixels from the real cell using Caliban.

add updated color map system to TrackReview

ZStackReview has been recently updated to allow for a new color map system. Now, when viewing .trk files, we get the following error:

'TrackReview' object has no attribute 'get_array'

To fix this, add updated color map system to TrackReview.

zoom in and out

It would be useful to add ability to zoom in and out while annotating

Watershed struggles to seperate labels with low contrast raw

When an image has two cells in close proximity that are given the same label and the corresponding raw image of these cells is very low contrast, Watershed sometimes fails to correctly split the label.

Investigate Containerization

Use deepcell-tf as roadmap to investigate containerization. It is likely a graphics port will need to be exposed.

Faster scroll through images option

add keybind to set brush color to unused value

We should have a keybind in edit mode (perhaps "n") to set the brush value to an unused value. (Perhaps setting the brush preview to show the same color as highlighted cells?) This would make it easier for annotators to draw in new cells without accidentally duplicating labels.

Browser Caliban should be able to load files from bucket even if they are not in a subfolder

I'm having trouble loading the npz file we use for testing (caliban-input/test.npz, no subfolders). This may be because the landing page for Caliban has not been updated to reflect the new way of accessing files (where input_bucket, output_bucket, and folder structure are encoded in the url). The caliban website should be updated so that the dropdown list of files leads to working caliban sessions. This may not be the issue, but we should also check that caliban is able to access files even if they aren't in subfolders in the s3 bucket.

Watershed clears nearby cell masks

When using watershed to separate one mask into two (eg cell 1 -> cell 1 and cell 2), nearby cells will have portions of their segmentation masks overwritten (ie, a chunk of cell 3 that is near cell 1 goes missing). These masks should be left unmodified by watershed.

actions should use add and del cell info helper functions

Browser caliban.py should use the cell info helper functions in action functions (eg, watershed, replace, create, etc). This will clean up the code and keep behavior consistent between these functions. Consistently using the helper functions will also make it easier to add other actions, such as the single-cell versions of several actions (implemented in the npz class in desktop/caliban.py).

Scroll wheel also changes LUT of annotations

Corrupted .npz file after sending to S3 bucket

.npz file isn't being properly fed to S3 bucket. Had same issue before when sending .trk file, but that has been fixed and resolved. Look into TrackReview class at load()/loadtrk() for solutions.

3D features should be ported to track side

All the improved functionality of the 3D edit mode should be duplicated in track mode.

Add new mask to annotation

Useful in cases where annotation is incorrect (missing pixels) or user has erroneously deleted a cell mask in a frame.

This option should:

require user input to determine where annotation should be created
- two clicks to determine corners of bounding box that contains cell to be annotated
- third click to determine seed location for watershed transform
create new mask in that frame using largest cell value in movie + 1 (new unique mask)
create new lineage information corresponding to cell mask

"c- relabel selected cell with an unused label" doesn't work in npz mode

https://github.com/vanvalenlab/caliban/blob/23e8aa39cafa4fdb3909c5e7c2945882edb3c96d/desktop/caliban.py#L352

When relabeling a cell in npz mode, all cells that have that label get moved to a new label, rather than just one of those cells.

Perhaps this isn't how we're supposed to be using c?
As it currently stands, if a single cell needs to be split into two cells, if we erase half of the cell, pick an arbitrary label for the new second half of the cell, and then select the new cell and use c to relabel it, both that cell and whatever other cell happens to also have that label will get moved to a new, unused channel.

Let me know if there's a better way to be doing this!

Highlight cell

Sometimes it is difficult to identify mislabeled pixels in an image. This can be because of low contrast between cell masks, small mistakes (eg, a single pixel annotated incorrectly in the corner of the image), or even both. Movies that are annotated incorrectly can lead to noticeable errors in tracking, most often due to the center of the "cell" shifting drastically between two frames. These errors can be time consuming to fix because they require locating the incorrect pixels.

Distinguishing cells can also be difficult in trk files with many cells. Normally, training data is made with fairly small (~30 cell) .trk files, but for benchmarking/challenges/unforeseen use cases trk files may have many (200+) cells per frame. This may also be the case for small field of view but long timescale tracking movies that have many divisions or cells crossing in/out of the movie. In these cases, the contrast between masks may be low (even with enhancements such as adjusting contrast with the scroll wheel).

A highlight cell option would help make difficult pixels more visible. Such an option would display a cell mask with a different color. Some ways this might be implemented:

select a single cell, use "h" to toggle highlighting of that cell
with no cells selected, use "h" to prompt input. type in the cell id of the cell to be highlighted. (it is rare but possible to have a few pixels labeled but to only know by inference, eg a gap in the numbering of the cells you can find easily)
toggle highlighting visible/invisible with h; one cell is always selected for highlighting and can be cycled through with other keys (eg, m/n). Ie, when highlighting is toggled invisible, display is the same as always and cycling through highlighted cell does nothing. when highlighting toggled visible, can use cycling keys to display one cell at a time as highlighted

Preferably, cell would be highlighted with a color distinct from the default cmap. (Bright red?)

change watershed to work on currently selected channel

https://github.com/vanvalenlab/caliban/blob/2a9af44325324ff678845e838b5cc9d3537b6f1d/desktop/caliban.py#L427

Currently watershed operates over a tensor of all features/channels combined. I think the behavior for multi-channel data would be more straightforward if instead it was changed to watershed over the currently selected channel. We could index into the img_array with the current feature; does this make sense?

display channels and labels simultaneously

Currently either channels or labels can be displayed, but not both. It would be great if you could draw your labels directly over the channels data, rather than over the grey background.

It looks like some sort of transformed version of the current channel can be viewed, but not the actual image itself

colormaps should be robust to different ranges of labels

Eg, an image that has labels between 100 and 120 should be just as easy to look at as an image with labels between 0 and 20. Ideally, even labels that span a wide range of values (eg, if the labels in an image were [1, 5, 10, 50, 100, 500]) should be easy to tell apart.

I'd like to avoid having user-adjustable label colors in browser caliban as they exist in desktop caliban. Since browser caliban.py masks background with black, setting vmin = min(self.tracks) might be an easy step towards a solution. There may be additional ways we can improve the quality of the colormapping.

Delete annotation option

Option to delete extraneous cell masks by selecting and deleting them (perhaps the x key). Delete should only remove mask from one frame at a time. (If it is useful to delete a whole track at a time, this should be a separate command.) Delete should change all of the pixels of selected value to zero in the selected frame, and remove that frame from the list of frames for that cell id in the lineage data. If that is the only frame the cell appears in, the rest of the lineage data should be deleted.

Tracks created by watershed not the same as tracked created with c

When watershed is used to separate an annotation into two, the new annotation should be associated with a new entry in the lineage data. Currently, cells created by watershed have different behavior than cells created with "create new track". Watershed-created cells behave normally if they are replaced by an existing track.

For an example of behavior differences:
Create two new tracks in a .trk file. The cell ids should be different. Separate a cell with watershed. Without replacing the "new cell" created by watershed, create a new track with c. The cell ids of these cells will be the same. This is a problem in cases where the appropriate track to replace the watershed-created cell does not already exist.

reduce lag for browser caliban

try timing functions to identify slowest parts
look into reducing how often object needs to be pickled/unpickled
do connections need to be closed each time, or can we maintain a persistent connection to the db?

Check that all labels are sequential

In some cmap handling we check for max(track label), we should either verify that all labels are sequential or switch to len(unique labels)