voxel51 / eta Goto Github PK

ETA: Extensible Toolkit for Analytics

License: Apache License 2.0

Python 99.30% Shell 0.52% Dockerfile 0.15% Makefile 0.03%

artificial-intelligence classification computer-vision deep-learning image-processing machine-learning object-detection segmentation video-processing

eta's People

Contributors

Stargazers

Watchers

Forkers

reactivexyz-dev jiaruixu darshin30 pvjammer kunyilu pavantankasala1 erfantagh mangalbhaskar bugdiaries magiccodess sadjadasghari clementpinard rohis06

eta's Issues

The `or_die` part of `eta.core.utils.communicate_or_die` should be refactored into a decorator

The or_die functionality will be more generally useful.

Developer samples and "User" samples

We have a samples directory now that has examples code that seems to be intended for developers.

We also need to create samples for every module. Right now, I am putting such examples in the same place, but it is not clear to me that this is the right thing to do. Having examples running pipelines will make using and extending eta much easier.

Also: I do not like the word samples here. These are examples.

new requirement dill

eta.core.serial now imports dill. It should be added to requirements.txt. e.g.

dill==0.2.7.1

Add an `eta.core.objects.BaseFrame` class to encapsulate Frame implementation

There are many types of objects that we will want to store in Frame-like classes. We should have a BaseFrame class that defines all the common functionality and then subclasses like DetectedFrame, EmbeddedFrame, TrackedFrame, etc. that are thin-wrappers over BaseFrame that specify what type of objects are in the list.

Support `eta run --dry-run` flag

Request to add pipeline support for dry-run case that removes all configs and output files for the case that the user only wants to see the stdout.

Add an iterator to `eta.core.data.DataFileSequence` so it can iterate over the rendered file names

__iter__

Make eta.core.utils.parse_dir_* methods into builder methods of DataFileSequence

Methods like eta.core.utils.parse_dir_pattern and eta.core.utils.parse_bounds_from_dir_pattern should be converted into builder methods of eta.core.data.DataFileSequence, which should be our one-stop shop for all file-sequence-related operations.

(I like eta.core.data.DataFileSequence --- this idea has been sorely missing)

VGG16Featurizer should force user to call start() and stop()

Currently if we use eta.core.vgg16.VGG16Featurizer without explicitly calling start() and stop(), it will silently load and destroy a huge CNN every time featurize() is called. This is never what the user really wants.

I can see why Featurizer allows this to silently happen (setup/tear-down could be cheap), but VGG16Featurizer should raise an error here.

The other option is to set keep_alive=True, but then the naive user would be carrying around a CNN in memory, which also deserves an error.

Make eta.core.video.FramesRanges more general

The need to pass around sets of numbers like [1, 5, 6, 7, 10] or "1,5-7,10" is pretty general. We should upgrade the eta.core.video.FramesRanges class to provide this general functionality.

It should accept strings (including "*") and lists (including [])

eta.core.config.Config should also understand how to accept fields of this new type.

`Serializable` needs to be reflective

We need everything in eta that is written via json to be reflective. This would enhance and simplify overall functionality.

I also think we should deprecate from_json and write_json to just read and write.

Need to set up a lint process to check for formatting and styles so code chat's can focus on content.

pipelines: need a way to have global config settings inherited by individual modules

If I have a pipeline with a dozen modules and they all require a "frames" setting because they are all working with the same video. It would be far easier to have a setting like this set in the top-level config and then inherited. And, less room for error.

This would be harder if there are multiple videos. But, even less room for error.

(This is a thought I had while working with the pipeline bits. Up for discussion, of course, but wanted to get it down.)

Need ability to assign names to modules in pipelines

This will allow us to, for example, write a pipeline with multiple instances of the same module in it in different places.

These "custom" names would be used when setting parameters and defining the module connections in the pipeline metadata file.

generalize VideoFeaturizer to non-npz type methods and wrap detectors and others with it

Painful to keep rerunning the detector to fix config bugs in my pipeline only to have it reprocess frames :)

I think the pipeline can check for some state/status in the output as another way of doing this, but I like the notion of a generalizing VideoFeaturizer that maintains a backing store.

Fresh install does not install tensorflow --- NO WAIT, sudo bash or bash...

Now that vgg is in the repo, we should have the install scripts install tensorflow.
On my mac, I got this after running the install script and then running embed_image.

jcorso@newbury-2 /voxel51/w/eta
$ cd examples/embed_vgg16
/voxel51/w/eta/examples/embed_vgg16
jcorso@newbury-2 /voxel51/w/eta/examples/embed_vgg16
$ python embed_image.py 
Traceback (most recent call last):
  File "embed_image.py", line 20, in <module>
    import tensorflow as tf
ImportError: No module named tensorflow

Ah, after digging a bit deeper, this is actually a problem with the install script. It got up to the install python bits, but then quit (without message) because they failed. My suspicion is that those bits did not get executed as sudo and my python requires sudo for installing for some reason that escapes me. (this is on a mac).

So, something needs to be changed/improved, even if it is the doc on how to run the install_externals as sudo.

Thoughts?

Proposal to include the local `./modules` and `./pipelines` into the default config search path.

This seems natural me, as when I am developing in an eta project, I normally have local modules and pipelines directories.

  4     "module_dirs": ["/voxel51/j/eta/eta/modules", "./modules"],
  5     "pipeline_dirs": ["/voxel51/j/eta/eta/pipelines", "./pipelines"],

Add an `eta.core.config.Config.parse_enum` method to parse config fields that are enumerations

It would be useful to have an eta.core.config.Config.parse_enum() method that works like this:

class MyConfig(Config):

    def __init__(self, d);
        self.value = self.parse_enum(d, "value", Choices)

where the "enum" can be defined either as a class:

class Choices(Enum):
    A = valA
    B = valB

or a dict:

Choices = {
    "A": valA,
    "B": valB
}

A common pattern will be to use this mechanism when the user needs to choose between one or classes or functions to use.

Should we have a monthly or quarterly merge from develop to master as our practice?

Our public versioning can be something more like ubuntu then. What do you think?

PipelineBuildRequest should allow the user to save pipeline outputs in specified locations

eta.core.builder.PipelineBuildRequest should allow the user to optionally specify paths to write copies of the pipeline outputs.

We should also add flags to eta build to control whether the intermediate outputs are retained.

Bug rendering block diagrams for modules having an I/O field that matches their module name

eta.core.images: overlay function should not use to_double when not needed.

Is unnecessary if not transparent and waste memory and cycles.
Not critical.

Add support for optional and required attributes to `eta.core.serial.Serializable`

BaseDataRecord has this capability, but all Serializable objects should support it.

PipelineBuilder should inherit file types from outputs when populating the pipeline

Support more locations for the ETA config

Options to support, in order of precedence:

path in ETA_CONFIG environment variable
~/.etac
<eta>/config.json

Fresh clone and build reveals numpy + tensorflow versioning issue.

tensorflow 1.7.0 has requirement numpy>=1.13.3, but you'll have numpy 1.13.1 which is incompatible.

Serializable needs a write_json method

Need to add a Serializable.write_json method. We really shouldn't be calling serial.write_json directly. Data I/O to disk should almost always be done through a "data class" that implements Serializable

ETA should support config settings to control Tensorflow's GPU usage

Use environment variables/config fields to choosing between the following options:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

Reference
https://www.tensorflow.org/programmers_guide/using_gpu

Refactor clips-processing functionality out of VideoProcessor?

It is awkward that VideoProcessor currently tries to support both clips-based and video-based processing. We should probably refactor the clips processing into a new VideoClipsProcessor class.

Sample Data Should Be A Separate Download

We have been putting the sample data into the repository, but this will quickly bloat the repository if we add any sizable amount making it hard to work with. We need to establish a separate data dump that can be fetched if the user wants to run the examples, etc.

Installs: virtualenv and cross-platform

Probably not best practice to rely on system-wide installs.

Also: the mac parts rely on brew. Some of us use port (macports) instead of brew. How to reconcile? (Virtualenv?)

Return the correct stream based on codec type in `video.py`

Return the correct stream based on codec_type (video) in get_stream_info in eta.core.video.

It assumes the first stream, currently.

Should modules be included as a package in eta?

Currently modules is just a set of executable python code that uses the eta codebase. It is not a package (it has no "init.py" file). But, it is inside of eta within the repo. I'd suggest either moving it outside of the eta directory or turning it into a package.

Is there a fundamental reason why we would not want to allow modules to import other modules. It would not be possible just to "import modulename" because the actual code may be executing somewhere else.

Fresh clone and `install_externals.bash` fails on `cmake: command not found`

This failure is during the opencv build. Fix would be to have a sudo apt-get install cmake in the script.

Add CUDA/GPU/CuDNN information to install

Either add to install, create a second install script, or just add information to install.

Change the internal representation in EventDetection from bools to ints such that it can generalize to multi-class classification output

Model versions should be forced to be monotonically increasing

Split `eta.core` into subpackages

eta.core is getting quite large, so we should split it out into more wieldy sub-packages.

Functionality to query/list available modules and pipelines on the path

A new-to-ETA developer will want to get acquainted with the available functionality out of the box. A seasoned-ETA developer will want to learn what new modules or pipelines may have been added recently. A pipeline developer will need to list available modules.

ETA needs an apt-cache-like functionality to navigate the module and pipeline space.

Formalize the notion of conditional execution of modules

For example:

only resize a video if it is above a certain allowed resolution. This is currently achieved via a max_size argument of the resize_videos module, but perhaps this is a general enough need that we should provide formal support for it.
only resize a video if a size argument is provided; if no argument is provided, the module should be "skipped" all together. This is currently achieved on a per-module basis in the resize_videos module by symlinking the outputs to the inputs, but perhaps this is a general enough need that we should provide formal support for it.

Only compute pipeline outputs if they are included in the output request

Also, inspect the computation graph and only run the modules necessary to generate the requested outputs.

This will allow us to define more flexible pipelines that can be used for various purposes.

Generalize `eta.core.diagram` to support LaTeX, tikz, etc.

We should generalize the eta.core.diagram module to support more full-featured reporting capabilities such as automatic LaTeX generation and diagramming with tikz so that the diagrams look nicer.

Split data configs from program configs

As discussed, this seems to be the direction we want to move in.

Create an `FFmpegVideoClipper` class to formalize support for extracting clips from a video

One can currently do this via VideoProcessor with an appropriately chosen frames string, but this doesn't give the user full control over the output filenames, which is undesirable

Ability to set cleanup flag, logging level, etc. in pipeline requests

And any other related meta-settings

Support Python 2 and 3 via six module

This would not be hard to do. We could even support both simultaneously with
https://pythonhosted.org/six/

Some resources
https://docs.python.org/3/howto/pyporting.html
http://python-future.org/automatic_conversion.html

pipeline exception handling improvement

Exceptions raised and outputted when running pipelines need to be more informative, such as pwd and module name, etc...

Is a custom OpenCV build necessary or worth it?

We currently build OpenCV from source during our external installs, but it is causing us pain every time we re-install ETA on a new machine (new developers, production deployments, etc). Moreover, the only customization we currently do is setting the WITH_CUDA flag.

Should we continue building OpenCV from source, or would pip install opencv-python suffice for us?

Need ability to include/run one pipeline within another

Options:
(A) support this only at the pipeline metadata level by adding a "pipelines" field that allows access to I/O of other pipelines. When a pipeline is built, a single pipeline config would be populated based on this information
(B) support this at the pipeline config level by allowing pipeline configs to point to other pipeline configs.

I'm leaning towards (A).