Giter Club home page Giter Club logo

see-segment's People

Contributors

bhardw41 avatar chenqili2020 avatar colbrydi avatar dalpm avatar emanihunter avatar emmaline11235 avatar genster6 avatar grabilln avatar hoolagans avatar hurleyc6 avatar kai-pinckard avatar katiereagan avatar lindavin avatar nabri50 avatar padmapraba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

see-segment's Issues

Add image reference page for examples included in project.

Typically I have individuals joining the team create an example input and ground truth image as their first pull request project. I would like a markdown file in the examples folder with a description of each team member, the pictures they included and a brief description of why it was chosen.

Population Animation

I would like to create a function (Maybe part of JupyterGUI) which takes a population and image as an input and "animates" the segmentation solutions from lowest fitness to highest. I think this may help visualize the results.

Create group website

Right now we are using pdoc3 to generate a github.io site. I also want to add a landing page for the team and link to the software documentation.

Try to fix binary trivial solution

In the binary case there is a trivial solution that is not trivial. This can be reproduced with the following example:

%matplotlib inline
import matplotlib.pylab as plt
import imageio
from skimage import color

from see import Segmentors

img = imageio.imread('Image_data/CO-SKEL_v1.1/images/frog/0007.jpg')
gmask = imageio.imread('Image_data/CO-SKEL_v1.1/GT_masks/frog/0007.png')

params = ['CT', 7563, 0.13, 2060, 0.01, 4342, 850, 10, 0.57, 1863, 1543, 1, 3, 1, 0.35, (1, 1), 8.1, 'checkerboard', 'checkerboard', 3, 7625, -35, 0.0, 0.0, 0.0]

seg = Segmentors.algoFromParams(params)
mask = seg.evaluate(img)
plt.imshow(mask)


fitness = Segmentors.FitnessFunction(mask,gmask)
print(f"fitness[0]")

This is occurring because both regions are mapping to the background. Although not a good solution it should not be discounted as it may lead to a better solution.

We need to find an update to the fitness function that takes this case into consideration.

Fix Single Channel Images

As written single channel images (grayscale) will not work as some of the functions are hard coded to accept 3D arrays. Converting single image to 3-channel images is one option. Maybe we can think of something that is more flexible.

Fix search if there is no good solution

Currently the top solution gets 50% of the next generation. We need a quick if statement to check if the best is above a threshold. If it is then randomly select the entire next generation.

Alternative fitness functions

Different fitness functions have different types of bias. It would be nice/helpful to include a few different ones in the search and allow selection of the best fitness function for a problem based on researcher input. Sort of meta-meta level learning.

Add New grammar instances pre-processing

Add a new grammar instance to pre-process the image after color selection. Things like gaussian blurring, sharpening, etc. This is a searchable parameter space.

Selection Operator

SEE-Segment doesn't appear to be using a selection operator. It seems to just select the HoF and apply mutation and crossover to those. It could help improve the diversity maintenance to use a selection operator such as tournament selection which would allow individuals other than just the top few to continue to contribute their genetic information to the population. I would also think that the ordering of the models should also be shuffled prior to applying crossover and mutation so that HoF[0] isn't always paired with HoF[1], etc. It should be possible that a model with high fitness be paired with a model of weak fitness.

Population Distribution Animation

I would like to create an animation that takes a sequence of populations, generates a histogram at each step in the sequence and animates how this histogram changes. It would be nice to also highlight the bar (algorithm) class with the current best fitness function.

SEE-learn workflow - Machine Learning Workflow

I would like to generalize the GeneticSearch library to work with workflows beyond segmentation. I think an easy option would be to make an machine learning module by leveraging scikit-learn similar to how we leveraged scikit-image.

Refactor GeneticSearch to maybe get rid of Deap

The current GeneticSearch is getting sloppy. We need to refactor the code. Some things to consider:

  • Maybe get rid of deap dependency. Maybe we can still use deap but not for the core search. Our solution is so slow that different types of parallelization may be useful.
  • Record output after each individual is processed. Since this is so slow anyway it should not add to much to the search.
  • Give us more control over the population.

Lots of other stuff. (see todo and other issues).

Manual Image Segmentation GUI

I would like to build a prototype manual image segmentation GUI that that can "paint" an image and generate a segmented ground truth. Conceptually this GUI will be independent from SEE-segment and just call SEE-segment when we are done. However, it would also be nice to include a GUI with see-segment so a user doesn't have to use GIMP or some other tool. Long term plans is to build a robust GUI.

Generating new population is slow

We are running tests and sometime after the mutate and before the next iteration starts the evolver is slow and we should check to see why.

Move print_best_algorithm_code function to segmentors

For some reason the print_best_algorithm_code funciton is in the genetic search and not in segmentor. For object oriented reasons I think it makes more sense to put this in segmentor since it is related to the segmentor and has nothing to do with the search itself.

Systematic Refinement of Parameter Space

We are still using the same parameter space from the research conducted in Summer of 2019. This was always intended to be a temporary naive solution. There was some effort to improve the space Made by Katrina some bugs in other code prevented us from making progress. I would like to modify the parameter space as follows:

  1. When possible normalize the parameters specific ranges so they can generalize.
  2. Reuse as many parameter's as possible to try and reduce the search space.
  3. When possible group parameters across segmentation algorithms that have "similar" functionality.

Once these changes are made we need to compare results from older parameter space to the new one.

Get parameter file I/O working again

With the new grammar the parameter space file I/O is broken-ish. This needs to be fixed so that transfer learning (and other research projects) will work.

Quickshift is not so quick

Although sometimes it is fine, quickshift is always not so quick. My guess is that one of the hyperparameters settings really slow things down. We need someone to dig into the algorithm and see if they can figure out which parameter is causing the problem. We can tighten the parameter space to avoid the slow values, we just need to know which parameters to tighten. We may also need to just do a parameter sweep on all of the algorithms to make sure there are not others with similar problems.

Avoid Search Repeats

As the code is written there is a non-zero chance that the same algorithm will be evaluated multiple times. We should try to measure the frequency this repeat work and decide if we need to eliminate it. I have two basic ideas:

  1. Make a parameter-space hash table (dictionary) and store everything we have tested. If there is a repeat, just return the fitness value. Pro - this should be easy to implement. Con - is that this may take up a lot of memory?

  2. Calculate a unique number for each algorithm (Officially the search space is finite so this should be possible). Then use that value in a lookup table. Pro - This should take up much less memory. Con - the calculation, although conceptually trivial, may be hard to get right and may add a non-trivial time to calculations.

I probably would start with 1 and then, if we determine that repeats are happening often we may need to do a timing study for 2.

Develop a algorithm distance function

Develop a function that takes two algorithm vectors and returns a "similarity" value.

We could also look at solutions that are good but very different than other good solutions.

Add Algorithm Dependent Local Search

We should add a new function to the segmentor class that returns a population of offspring (maybe all it local_offspring that only modifies the parameters associated with the specific segmentor. We can then use this function at each iteration to make it more likely that the population will make variations on the current best individuals.

As it is written now the search is entirely random so we get the same basic solution over many generations (even when it is mutating).

Allow for 4-channel images

The Headphones.jpeg file is a 4 channel image. This causes problems with a few algorithms. Specifically a value error shown at the end of this message.

We need to update all algorithms to account for 4 channel images.

<class 'see.Workflow.workflow'> parameters:
colorspace = RGB
multichannel = True
channel = 0
algorithm = QuickShift
alpha1 = 0.69140625
alpha2 = 0.3828125
beta1 = 0.68359375
beta2 = 0.80078125
gamma1 = 0.78125
gamma2 = 0.59375
n_segments = 4
max_num_iter = 11

Traceback (most recent call last):
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/./see-segment/see/RunSearch.py", line 144, in
geneticsearch_commandline()
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/./see-segment/see/RunSearch.py", line 141, in geneticsearch_commandline
continuous_search(args.input_file, args.input_mask,pop_size=args.pop_size,num_iter=args.num_iter);
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/./see-segment/see/RunSearch.py", line 99, in continuous_search
population = my_evolver.run(ngen=1,population=population)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/GeneticSearch.py", line 472, in run
_, population = self.popfitness(population)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/GeneticSearch.py", line 324, in popfitness
for ind, data in zip(tpop, outdata):
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/base_classes.py", line 162, in runAlgo
data = self.pipe(data)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/Workflow.py", line 43, in pipe
data = algo.pipe(data)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/Segment_Fitness.py", line 543, in pipe
data.fitness = self.evaluate(data.mask, data.gmask)[0]
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/Segment_Fitness.py", line 539, in evaluate
return FitnessFunction(mask, gmask)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/Segment_Fitness.py", line 510, in FitnessFunction
return FF_ML2DHD_V2(inferred, ground_truth)
File "/mnt/home/colbrydi/UserCode/colbrydi/see-benchmark/see-segment/see/Segment_Fitness.py", line 469, in FF_ML2DHD_V2
ground_truth = color.rgb2gray(ground_truth) # comment out
File "/miniconda3/lib/python3.9/site-packages/skimage/_shared/utils.py", line 394, in fixed_func
return func(*args, **kwargs)
File "/miniconda3/lib/python3.9/site-packages/skimage/color/colorconv.py", line 875, in rgb2gray
rgb = _prepare_colorarray(rgb)
File "/miniconda3/lib/python3.9/site-packages/skimage/color/colorconv.py", line 140, in _prepare_colorarray
raise ValueError(msg)
ValueError: the input array must have size 3 along channel_axis, got (1920, 1080, 4)

Bus Error when running some Felzenszwalb segmentation

Sometimes the Felzenszwalb function crashes with a bush error. My guess is that one of the input parameters is causing a problem. It could be in the selected colorspace or the Felzenszwalb function itself. Here is an example of a set of parameters that fails.

<class 'see.Workflow.workflow'> parameters:
colorspace = RGB CIE
multichannel = True
channel = 1
algorithm = Felzenszwalb
alpha1 = 0.98828125
alpha2 = 0.8984375
beta1 = 0.94140625
beta2 = 0.7578125
gamma1 = 0.41796875
gamma2 = 0.1875
n_segments = 7
max_iter = 1

We need to track down the source of this failure and see if we can come up with a solution to either 1) limit the parameter space to only use values that result in errors 2) find and fix the bug in the segmentation code (report as a pull request) 3) use input tests to pick out input that will cause the problem and return an erorr or a default value or 4) try to catch this inside the code?

The first step is to build a test that can reproduce the problem and then see what can be changed in the test to fix the bug.

Parameter Sweep / Robustness Checking

I would like to make a new module that takes a param as an input variable and conducts a parameter sweep for a particular algorithm with local values. The goal of this is to generate surface plots showing how changing parameters change the fitness values. This application could be parallelized as well.

Cleanup and fix parameter names in search space.

The search space and parameter names have lost their meanings. We need to go through and use more generic names and maybe update to use less parameters in the search space (pair up more parameters).

Command Line argument parser

As part of the DataDownload Module we should have a common argument parser. The same basic code is in DataDownload.py and BatchRun.py. Lets keep it all in one place.

Add automatic thresholding to color based segmentation option.

The six parameter threshold is a little much. Maybe we can use Otsu's and a few other algorithms to narrow down the search and provide a more robust algorithms. I'm thinking of single channel learning with one threshold determined by a variety of thresholding algorithms (there should be a list in scikit learn).

The DEAP Library is included but not used

Last summer (2019) we had the DEAP library working but we stripped it out during the overhaul over the Fall of 2019 and Spring of 2020. We either need to remove all reference to DEAP or (better yet) get it working again in the new format.

Multi-Image learning

We need to start thinking/researching ways to combine multiple images in the fitness function.

Fix algorithm specific mutation

Currently mutation is evenly weighted based on all the parameters. We need to fix this to only change parameters relative to the individual algorithm.

Verify Population generation is not redundant.

While I am writing the paper draft I realized that lines 313-321 of the GeneticSearch.py may produce redundant results. I think we may be evaluating the same individuals repeatedly and wasting 10% of our compute time. This code needs reviewing to ensure only new algorithms are used (at least compared to the previous iteration) and we still store the best so far in the hall of fame variable (HOF).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.