Giter Club home page Giter Club logo

decotools's Introduction

decotools

Build Status Python 2.7 Python 3.6

A set of tools to help make DECO life easier โœจ

Installation

The decotools Python package can be installed directly from GitHub. decotools is built on Google's Tensorflow, which must be installed to use decotools. If tensorflow is already installed, then decotools can be installed from GitHub via

$ pip install git+https://github.com/WIPACrepo/decotools#egg=decotools

Alternatively, if tensorflow is not installed, then the following commands can be used to install tensorflow along with decotools.

For installing the CPU version of tensorflow:

$ pip install git+https://github.com/WIPACrepo/decotools#egg=decotools[tf]

For installing the GPU version of tensorflow:

$ pip install git+https://github.com/WIPACrepo/decotools#egg=decotools[tf-gpu]

Documentation

The documentation for decotools is available at https://wipacrepo.github.io/decotools/.

Contributing

Contributions to decotools are welcome! Please see the contributing guide for more information.

decotools's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

decotools's Issues

Add image metrics tool to help with blob extraction

It would be nice to have a function that takes an image path and returns several metrics to help us determine if it is a "good" image that we should pass to decotools.extract_blobs.

Metrics I can think of:

  1. Average pixel intensity (over the entire image) โ€” this can help filter out super noisy images
  2. Maximum pixel intensity (over the entire image)
  3. Number of pixels above a specified intensity threshold.

@mattmeehan @cschneider6 @milesjwinter any other potential metrics you can think of?

Another bug in get_iOS_files()

I tried calling get_iOS_files(start_date='2017-07-26',device_id='D8D8E48D-7D3F-4693-A927-A402CF127D25') and I got this error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Tried testing with a few different device IDs and it's the same problem. Basically, any start_date before 2017-07-27 seemed to be causing errors. And when start_date='2017-07-28' (today's date), no error would occur but I would get back an empty list (even when I manually check and the files do exist).

get_android_files returns image file paths that don't exist

So it appears that there are some entries in the database that don't have a corresponding image file in the file system (specifically, this happens for 30 entries in the DB). Given that get_android_files constructs the image file path from the "path" column in the DB, it makes sense that get_android_files should also check if the image file exists. If the file doesn't exist, then it can be dropped from the output of get_android_files.

Add documentation page

It would be really great if we could add a documentation page hosted on GitHub pages. I'll work on getting an MkDocs directory together.

io.py caused issue ipython

When running ipython within the decotools directory, the following error occurs:

  File "/home/mrmeehan/.local/lib/python2.7/site-packages/IPython/utils/openpy.py", line 9, in <module>
    import io
  File "/home/mrmeehan/software/decotools/decotools/io.py", line 7, in <module>
    import pandas as pd

Add testing utility to generate test image arrays

Currently, when we want to create a test (fake) image files in the decotools tests, we save a random numpy array to a temporary file. For example,

import numpy as np
from skimage.io import imsave

tmpfile_1 = tmpdir.join('temp_image_1.png')
imsave(str(tmpfile_1), np.random.random((5, 5, 4)))
tmpfile_2 = tmpdir.join('temp_image_2.png')
imsave(str(tmpfile_2), np.random.random((200, 100, 4)))

files = [str(tmpfile_1), str(tmpfile_2)]

I'd prefer to have a function that generates test images, so we don't have to repeat code.

get_iOS_files by deviceID

In the get_iOS_files function, I think it would be useful to include a parameter to select a specific device ID. This way we could easily separate images taken by each device. Also, it might be useful to be able to exclude a specific device ID. There is one device that has ~15,000 events, but I think this was because the camera wasn't properly covered, so these events wouldn't be useful.

Add parallelized histogramming of pixel intensities

We often histogram pixel intensities when analyzing DECO data. While this is a simple task on its own, it can be time-consuming to histogram many images in serial. I think it would be useful to have a function that parallelizes this using dask. The function would just take a list of image files as input and return the corresponding histograms, e.g.:

def get_intensity_histograms(files, rgb_sum=False, cumulative=False, n_jobs=1):
    ...

We'd have parameters to determine the rgb conversion, cumulative vs differential distribution, and number of processes for parallelization.

@jrbourbeau What do you think? And any other parameters or features that would be useful?

Move CV validation into fit method

Currently the CNN class has both a fit and fit_with_kfold method. The content of these methods is similar. fit_with_kfold is just fit with sklearn.model_selection.StratifiedKFold built into it.

I'd like to unify these two methods by adding a cv parameter to the fit method.

@mattmeehan @milesjwinter any objections?

Add image as a BlobGroup class attribute

Right now several BlobGroup class methods are defined like below:

def get_sub_image(self, image):
def get_raw_moment(self, image, p, q):
def get_max_intensity(self, image):

I think that this image input should be upgraded to a BlobGroup class attribute. This makes sense logically because each blob group is only defined for a given image.

@mattmeehan can you foresee any issue with this?

Addition of image collection statistics

I think it would be useful to have some built-in image collection statistics. Not exactly sure what all we would want to include. But things along the lines of (for a given date range)

  • Number of images taken
  • Number of images per day
  • Number of images that pass some specified cuts
  • etc.

We might even consider adding some interactive plotting capabilities.

Add flake8 support

I'd like to add flake8 support to enforce a consistent coding style.

TODO:

  • Edit decotools to be flake8 compliant
  • Add flake8 . to .travis.yml so that it is always run automatically

@zdgriffith @mattmeehan anything else you guys can think of related to coding style?

Switch to tensorflow for keras backend

Currently, decotools uses Theano as the backend for keras. However, Theano will stopped being supported. It's final release, Theano 1.0.0, was recently released. I'd like to switch to using tensorflow as the keras backend.

Grayscale conversion options

Currently, in blob_extraction.py images are converted to grayscale using a simple RGB sum. However, we have older code that converts to grayscale using a weighted RGB sum in PIL. It would be nice to have an option to use either of these conventions for backwards compatibility.

Bug in get_iOS_files

I tried running get_iOS_files to see if there were any new devices in the data set. It work just fine with the default settings, but then I tried setting include_min_bias=True in case any new phones haven't seen events yet, and I'm getting a weird error when I do this:

Traceback (most recent call last):
File "getAllDevices.py", line 6, in
all_files = dt.get_iOS_files(include_min_bias=True)
File "/home/cschneider/.virtualenvs/deco/lib/python2.7/site-packages/decotools/fileio.py", line 193, in get_iOS_files
df = get_metadata_dataframe(file_list)
File "/home/cschneider/.virtualenvs/deco/lib/python2.7/site-packages/decotools/fileio.py", line 57, in get_metadata_dataframe
xml_dict = xml_to_dict(xml_file)
File "/home/cschneider/.virtualenvs/deco/lib/python2.7/site-packages/decotools/fileio.py", line 19, in xml_to_dict
tree = ET.parse(xmlfile) # Initiates the tree Ex:
File "/cvmfs/icecube.opensciencegrid.org/py2-v3/RHEL_6_x86_64/lib64/python2.7/xml/etree/ElementTree.py", line 1182, in parse
tree.parse(source, parser)
File "/cvmfs/icecube.opensciencegrid.org/py2-v3/RHEL_6_x86_64/lib64/python2.7/xml/etree/ElementTree.py", line 657, in parse
self._root = parser.close()
File "/cvmfs/icecube.opensciencegrid.org/py2-v3/RHEL_6_x86_64/lib64/python2.7/xml/etree/ElementTree.py", line 1665, in close
self._raiseerror(v)
File "/cvmfs/icecube.opensciencegrid.org/py2-v3/RHEL_6_x86_64/lib64/python2.7/xml/etree/ElementTree.py", line 1517, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

Any ideas on how to fix this?

Add blob extraction unit tests

I'd like to have some unit tests for decotools.extract_blobs. We could use a publically available android test image to check that things like the blob area, length, etc. are the expected values.

Add tests

Adding tests for decotools would be a nice addition

Switch to sphinx docs

I'd like to switch to using sphinx instead of mkdocs for the decotools documentation. Mkdocs doesn't auto-generate API documentation.

String format error in get_iOS_files()

I tried running decotools.get_iOS_files(device_id='F216114B-8710-4790-A05D-D645C9C79C27',end_date='2017-07-20',n_jobs=20) and this returns an error message that reads "ValueError: Unknown string format."

I've run this same format of code before so I'm not sure what's been changed to now cause this error.

Problem with missing metadata files in fileio.py

I think the KeyError issue when specifying a device ID or phone model is coming from the fact that there are actually no .xml files in the 2017.07.29, 2017.07.30, or 2017.07.31 directories in /net/deco. I manually did a check for anything ending in .xml in these directories and nothing is being returned/listed. So adding the if-else statement to look for files starting with 'metadata-' vs. files that just start with the device ID does not fix this error. It must be a problem when the program attempts to open the non-existent file. I can take a look at this some more and see if I can fix it.

Bug in fileio.py

When I try calling get_iOS_files() on a specific phone model or device ID, I get this long error message that I think ultimately has to do with the pandas.concat function. The final line of the error message says:

File "/home/cschneider/.virtualenvs/deco/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 239, in init
raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

Add blob finding support

I would like to have some blob finding support to decotools. Something along the lines of

blob_list = decotools.get_image_features(image_file)

Add bounding box option to metrics module

It would be useful to be able to calculate local image metrics in addition to global ones. We could include a bounding box keyword argument in get_intensity_metrics and get_rgb_hists, which would allow the user to evaluate those functions on a local box of pixels surrounding a blob, rather than the entire image. This way we can reject noise on both a global and local level.

convnet.convert_images raises ValueError if edge check fails

convnet.convert_images raises a ValueError if any of the images passed to it failed the edge check. Here's some example code with the full error message:

import decotools as dt
# Get some image files
image_files = dt.get_android_files(start_date='2016.10.30', end_date='2016.11.07')
# process_image_files calls the convert_images function
images = dt.process_image_files(image_files)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-59e1aa4ad428> in <module>()
----> 1 images = dt.process_image_files(image_files)

/home/mrmeehan/software/decotools/decotools/convnet.py in process_image_files(image_files, size)
     74             images.append(None)
     75
---> 76     scaled_images = convert_images(images)
     77
     78     return scaled_images

/home/mrmeehan/software/decotools/decotools/convnet.py in convert_images(images)
     54         shape:(n_images, n_rows, n_cols, 1)
     55     """
---> 56     images = np.array(images, dtype='float32')
     57     images = np.mean(images/255., axis=-1, keepdims=True)
     58     if len(images.shape) == 3:

ValueError: setting an array element with a sequence.

This happens because images that fail the edge check in process_image_files are still added to the image list as None:

if pass_edge_check(maxX, maxY, image.size, crop_size=2*size):
x0, x1, y0, y1 = get_crop_range(maxX, maxY, size=size)
cropped_img = image.crop((x0, y0, x1, y1))
cropped_img = np.asarray(cropped_img)
images.append(cropped_img)
else:
images.append(None)

image is no longer regularly shaped, so numpy can't convert it. An easy fix would be to ignore images that fail the edge check altogether, i.e. don't add anything to the image list. It might be good to have some sort of logging or print message to let the user know which, or at least how many, images failed the edge check. Many images end up failing, so it might be easier to just stick with the number.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.