see-insight / see-segment Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 21.0 56.95 MB

Simple Evolutionary Exploration - Image Segmentation

License: MIT License

Jupyter Notebook 95.51% Python 4.27% Makefile 0.02% TeX 0.20%

see-segment's People

Contributors

Stargazers

Watchers

see-segment's Issues

Add image reference page for examples included in project.

Typically I have individuals joining the team create an example input and ground truth image as their first pull request project. I would like a markdown file in the examples folder with a description of each team member, the pictures they included and a brief description of why it was chosen.

SEE-points workflow - Anchor point selection

I would like to generalize the GeneticSearch library to work with workflows beyond segmentation. I think an easy option would be to make an AnchorPoint module similar to our previous work (link may be password protected, see Dirk for access):

https://gitlab.msu.edu/imageinformatics/chamview

Population Animation

I would like to create a function (Maybe part of JupyterGUI) which takes a population and image as an input and "animates" the segmentation solutions from lowest fitness to highest. I think this may help visualize the results.

Create group website

Right now we are using pdoc3 to generate a github.io site. I also want to add a landing page for the team and link to the software documentation.

Try to fix binary trivial solution

In the binary case there is a trivial solution that is not trivial. This can be reproduced with the following example:

%matplotlib inline
import matplotlib.pylab as plt
import imageio
from skimage import color

from see import Segmentors

img = imageio.imread('Image_data/CO-SKEL_v1.1/images/frog/0007.jpg')
gmask = imageio.imread('Image_data/CO-SKEL_v1.1/GT_masks/frog/0007.png')

params = ['CT', 7563, 0.13, 2060, 0.01, 4342, 850, 10, 0.57, 1863, 1543, 1, 3, 1, 0.35, (1, 1), 8.1, 'checkerboard', 'checkerboard', 3, 7625, -35, 0.0, 0.0, 0.0]

seg = Segmentors.algoFromParams(params)
mask = seg.evaluate(img)
plt.imshow(mask)


fitness = Segmentors.FitnessFunction(mask,gmask)
print(f"fitness[0]")

This is occurring because both regions are mapping to the background. Although not a good solution it should not be discounted as it may lead to a better solution.

We need to find an update to the fitness function that takes this case into consideration.

Fix Single Channel Images

As written single channel images (grayscale) will not work as some of the functions are hard coded to accept 3D arrays. Converting single image to 3-channel images is one option. Maybe we can think of something that is more flexible.

Transfer learning

use previous populations to inform new problems.

New Jupyter GUI for painting images

Hey gang, I just heard about Gradio on the pythonbytes podcast. We should check it out and see if it would work for a simple paiting GUI!

Fix search if there is no good solution

Currently the top solution gets 50% of the next generation. We need a quick if statement to check if the best is above a threshold. If it is then randomly select the entire next generation.

Alternative fitness functions

Different fitness functions have different types of bias. It would be nice/helpful to include a few different ones in the search and allow selection of the best fitness function for a problem based on researcher input. Sort of meta-meta level learning.

Review/update JOSS submission

Add New grammar instances pre-processing

Add a new grammar instance to pre-process the image after color selection. Things like gaussian blurring, sharpening, etc. This is a searchable parameter space.

Selection Operator

SEE-Segment doesn't appear to be using a selection operator. It seems to just select the HoF and apply mutation and crossover to those. It could help improve the diversity maintenance to use a selection operator such as tournament selection which would allow individuals other than just the top few to continue to contribute their genetic information to the population. I would also think that the ordering of the models should also be shuffled prior to applying crossover and mutation so that HoF[0] isn't always paired with HoF[1], etc. It should be possible that a model with high fitness be paired with a model of weak fitness.

Population Distribution Animation

I would like to create an animation that takes a sequence of populations, generates a histogram at each step in the sequence and animates how this histogram changes. It would be nice to also highlight the bar (algorithm) class with the current best fitness function.

SEE-learn workflow - Machine Learning Workflow

I would like to generalize the GeneticSearch library to work with workflows beyond segmentation. I think an easy option would be to make an machine learning module by leveraging scikit-learn similar to how we leveraged scikit-image.

Refactor GeneticSearch to maybe get rid of Deap

The current GeneticSearch is getting sloppy. We need to refactor the code. Some things to consider:

Maybe get rid of deap dependency. Maybe we can still use deap but not for the core search. Our solution is so slow that different types of parallelization may be useful.
Record output after each individual is processed. Since this is so slow anyway it should not add to much to the search.
Give us more control over the population.

Lots of other stuff. (see todo and other issues).

Port SEE-Segment Tool to XSEDE

Manual Image Segmentation GUI

I would like to build a prototype manual image segmentation GUI that that can "paint" an image and generate a segmented ground truth. Conceptually this GUI will be independent from SEE-segment and just call SEE-segment when we are done. However, it would also be nice to include a GUI with see-segment so a user doesn't have to use GIMP or some other tool. Long term plans is to build a robust GUI.

Generating new population is slow

We are running tests and sometime after the mutate and before the next iteration starts the evolver is slow and we should check to see why.

DataDownload Module needs cleaning and testing

We need to go though the DataDownload Module and update the DataDownload.ipynb.

Move print_best_algorithm_code function to segmentors

For some reason the print_best_algorithm_code funciton is in the genetic search and not in segmentor. For object oriented reasons I think it makes more sense to put this in segmentor since it is related to the segmentor and has nothing to do with the search itself.

Add New grammar instances post-processing

Add a new grammar instance to post-process the labeled array. This should use things like image morphology (dilation and erosion).

Finish Fitness Function paper.

This has grown stagnant

Update Chan Vese and Morphological Chan Vese to use the Channel parameter instead of only the grayscale image.

Probably the best way to do this is to add a function for channel selection. This function can be based on the code used in the color threshold algorithm. Basically use the Channel parameter to select from (R, G, B, H, S, V) color spaces (or others as we progress). There is a lot we can do to clean up this part of the code.

Clean up DataDownload Filenames to use a consistant format for all the datasets

Also create a function that takes the dataset name as input. For example:

Download('SKY')

Systematic Refinement of Parameter Space

We are still using the same parameter space from the research conducted in Summer of 2019. This was always intended to be a temporary naive solution. There was some effort to improve the space Made by Katrina some bugs in other code prevented us from making progress. I would like to modify the parameter space as follows:

When possible normalize the parameters specific ranges so they can generalize.
Reuse as many parameter's as possible to try and reduce the search space.
When possible group parameters across segmentation algorithms that have "similar" functionality.

Once these changes are made we need to compare results from older parameter space to the new one.

Get parameter file I/O working again

With the new grammar the parameter space file I/O is broken-ish. This needs to be fixed so that transfer learning (and other research projects) will work.

Quickshift is not so quick

Although sometimes it is fine, quickshift is always not so quick. My guess is that one of the hyperparameters settings really slow things down. We need someone to dig into the algorithm and see if they can figure out which parameter is causing the problem. We can tighten the parameter space to avoid the slow values, we just need to know which parameters to tighten. We may also need to just do a parameter sweep on all of the algorithms to make sure there are not others with similar problems.

Fix print_best_algorithm_code function to work with all segmentors

While giving a demo, I noticed that some of the segmentors do not work when we run the "print_best_algorithm_code". I do not have an exmaple. We should unit test all of the segmentors and find/fix the ones that do not work.

Avoid Search Repeats

As the code is written there is a non-zero chance that the same algorithm will be evaluated multiple times. We should try to measure the frequency this repeat work and decide if we need to eliminate it. I have two basic ideas:

Make a parameter-space hash table (dictionary) and store everything we have tested. If there is a repeat, just return the fitness value. Pro - this should be easy to implement. Con - is that this may take up a lot of memory?
Calculate a unique number for each algorithm (Officially the search space is finite so this should be possible). Then use that value in a lookup table. Pro - This should take up much less memory. Con - the calculation, although conceptually trivial, may be hard to get right and may add a non-trivial time to calculations.

I probably would start with 1 and then, if we determine that repeats are happening often we may need to do a timing study for 2.

Develop a algorithm distance function

Develop a function that takes two algorithm vectors and returns a "similarity" value.

We could also look at solutions that are good but very different than other good solutions.

Add Algorithm Dependent Local Search

We should add a new function to the segmentor class that returns a population of offspring (maybe all it local_offspring that only modifies the parameters associated with the specific segmentor. We can then use this function at each iteration to make it more likely that the population will make variations on the current best individuals.

As it is written now the search is entirely random so we get the same basic solution over many generations (even when it is mutating).

Allow for 4-channel images

The Headphones.jpeg file is a 4 channel image. This causes problems with a few algorithms. Specifically a value error shown at the end of this message.

We need to update all algorithms to account for 4 channel images.

<class 'see.Workflow.workflow'> parameters:
colorspace = RGB
multichannel = True
channel = 0
algorithm = QuickShift
alpha1 = 0.69140625
alpha2 = 0.3828125
beta1 = 0.68359375
beta2 = 0.80078125
gamma1 = 0.78125
gamma2 = 0.59375
n_segments = 4
max_num_iter = 11

Traceback (most recent call last):
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/./see-segment/see/RunSearch.py", line 144, in
geneticsearch_commandline()
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/./see-segment/see/RunSearch.py", line 141, in geneticsearch_commandline
continuous_search(args.input_file, args.input_mask,pop_size=args.pop_size,num_iter=args.num_iter);
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/./see-segment/see/RunSearch.py", line 99, in continuous_search
population = my_evolver.run(ngen=1,population=population)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/GeneticSearch.py", line 472, in run
_, population = self.popfitness(population)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/GeneticSearch.py", line 324, in popfitness
for ind, data in zip(tpop, outdata):
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/base_classes.py", line 162, in runAlgo
data = self.pipe(data)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/Workflow.py", line 43, in pipe
data = algo.pipe(data)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/Segment_Fitness.py", line 543, in pipe
data.fitness = self.evaluate(data.mask, data.gmask)[0]
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/Segment_Fitness.py", line 539, in evaluate
return FitnessFunction(mask, gmask)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/Segment_Fitness.py", line 510, in FitnessFunction
return FF_ML2DHD_V2(inferred, ground_truth)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/Segment_Fitness.py", line 469, in FF_ML2DHD_V2
ground_truth = color.rgb2gray(ground_truth) # comment out
File "/miniconda3/lib/python3.9/site-packages/skimage/_shared/utils.py", line 394, in fixed_func
return func(*args, **kwargs)
File "/miniconda3/lib/python3.9/site-packages/skimage/color/colorconv.py", line 875, in rgb2gray
rgb = _prepare_colorarray(rgb)
File "/miniconda3/lib/python3.9/site-packages/skimage/color/colorconv.py", line 140, in _prepare_colorarray
raise ValueError(msg)
ValueError: the input array must have size 3 along channel_axis, got (1920, 1080, 4)

New pig Ground Truth examples not same size as RGB images.

Not sure how we missed this. Maybe an off by one error. Someone needs to check the image sizes in python and fix them.

Bus Error when running some Felzenszwalb segmentation

Sometimes the Felzenszwalb function crashes with a bush error. My guess is that one of the input parameters is causing a problem. It could be in the selected colorspace or the Felzenszwalb function itself. Here is an example of a set of parameters that fails.

<class 'see.Workflow.workflow'> parameters:
colorspace = RGB CIE
multichannel = True
channel = 1
algorithm = Felzenszwalb
alpha1 = 0.98828125
alpha2 = 0.8984375
beta1 = 0.94140625
beta2 = 0.7578125
gamma1 = 0.41796875
gamma2 = 0.1875
n_segments = 7
max_iter = 1

We need to track down the source of this failure and see if we can come up with a solution to either 1) limit the parameter space to only use values that result in errors 2) find and fix the bug in the segmentation code (report as a pull request) 3) use input tests to pick out input that will cause the problem and return an erorr or a default value or 4) try to catch this inside the code?

The first step is to build a test that can reproduce the problem and then see what can be changed in the test to fix the bug.

Clean up master repository and submit to PiPy.

We need to go though and clean up the master repository. I want to have all developers move to branches so we can submit the master branch to PiPy and then submit the paper to JOSS.

Explore Storing Algoirthm Params inside Exif image data for each mask.

The PIL library has a setExif function we can try to add meta data to the output image.

import jpeg
jpeg.setExif(jpeg.getExif('foo.jpg'), 'foo-resized.jpg')

Investigate swapping out our search algorithm with the TPOT core.

https://github.com/EpistasisLab/tpot

Parameter Sweep / Robustness Checking

I would like to make a new module that takes a param as an input variable and conducts a parameter sweep for a particular algorithm with local values. The goal of this is to generate surface plots showing how changing parameters change the fitness values. This application could be parallelized as well.

Cleanup and fix parameter names in search space.

The search space and parameter names have lost their meanings. We need to go through and use more generic names and maybe update to use less parameters in the search space (pair up more parameters).

Command Line argument parser

As part of the DataDownload Module we should have a common argument parser. The same basic code is in DataDownload.py and BatchRun.py. Lets keep it all in one place.

Build New Manager/Worker parallel message system

make things go faster on the HPC or cloud syst4ems.

Try out OSG and conda-pack

http://chtc.cs.wisc.edu/conda-installation.shtml

Test to see if any of the algoirhtms int he search space can take advantage of gpus

I am not sure if any can but this may be a way to speed things up.

Add automatic thresholding to color based segmentation option.

The six parameter threshold is a little much. Maybe we can use Otsu's and a few other algorithms to narrow down the search and provide a more robust algorithms. I'm thinking of single channel learning with one threshold determined by a variety of thresholding algorithms (there should be a list in scikit learn).

The DEAP Library is included but not used

Last summer (2019) we had the DEAP library working but we stripped it out during the overhaul over the Fall of 2019 and Spring of 2020. We either need to remove all reference to DEAP or (better yet) get it working again in the new format.

Add in times to all search (return best so far when timer is exeeded)

Multi-Image learning

We need to start thinking/researching ways to combine multiple images in the fitness function.

Fix algorithm specific mutation

Currently mutation is evenly weighted based on all the parameters. We need to fix this to only change parameters relative to the individual algorithm.

Verify Population generation is not redundant.

While I am writing the paper draft I realized that lines 313-321 of the GeneticSearch.py may produce redundant results. I think we may be evaluating the same individuals repeatedly and wasting 10% of our compute time. This code needs reviewing to ensure only new algorithms are used (at least compared to the previous iteration) and we still store the best so far in the hall of fame variable (HOF).

see-insight / see-segment Goto Github PK

see-segment's People

Contributors

Stargazers

Watchers

Forkers

see-segment's Issues

Recommend Projects

Recommend Topics

Recommend Org