tri-ml / dgp Goto Github PK

View Code? Open in Web Editor NEW

94.0 6.0 62.0 93.63 MB

ML Dataset Governance Policy for Autonomous Vehicle Datasets

Home Page: https://tri-ml.github.io/dgp/

License: MIT License

Dockerfile 0.31% Makefile 0.43% Python 99.07% Shell 0.18%

machine-learning autonomous-driving pytorch data-management deep-learning

dgp's Issues

Docs build tracks latest docs but shows old version

Description

See the live documentation here. In the upper-left corner, we mention DGP 1.0. At the time of this writing, though, we are on v1.3.

Improvement

Either automate part of the release process to bump version numbers in docs before building them, for example as part of a GitHub Actions workflow, or do this manually around the time of a version release.

DGP Proto: Which Ontology message to be used?

Currently there are two places with Ontology messages:

dgp/dgp/proto/dataset.proto

Line 18 in 80e3877

message Ontology {
dgp/dgp/proto/ontology.proto

Line 42 in 80e3877

message Ontology {

Could you confirm which is the one to be used in DGP 1.0 release implementation? Thank you!

How to view and access 'depth' information

I'm trying to use the 'depth' information, but the visualization result looks very strange.
I follow DDAD.ipynb example to draw depth image, but it looks like an empty image.

from matplotlib.cm import get_cmap
plasma_color_map = get_cmap('plasma')

ddad_train = SynchronizedSceneDataset(
    json_path,
    split='train',
    datum_names=['lidar', 'CAMERA_01', 'CAMERA_05', 'CAMERA_06', 'CAMERA_07', 'CAMERA_08', 'CAMERA_09'],
    generate_depth_from_datum='lidar'
)

sample_0 = ddad_train[0]
camera_01 = sample_0[0][0]

depth_map = plasma_color_map(camera_01['depth'])[:, :, :3]
plt.imshow((camera_01['depth']*255).astype(np.uint8))
)

This is what it looks like:

Is there a way to get DDAD depth information and show it like below image?

error in class name imported in visualize_dataset.py

dgp/dgp/scripts/visualize_dataset.py attempts to SynchronizedDataset from dgp.datasets.synchronized_dataset, when the class is actually named _SynchronizedDataset

Pre-push hook prevents from pushing to fork of DGP repository.

The pre-push hook prevents from pushing to fork of DGP repository with the following error message.

Here, the virtual environment was created and activated by following this doc.

(dev) nehal@device:~/dgp$ git push nehaldgp feat/nehal/point-line-polygon-3d-proto
************* Module .pylintrc
.pylintrc:1: [E0015(unrecognized-option), ] Unrecognized option found: accept-no-param-doc, accept-no-return-doc, accept-no-yields-doc
Aborting push due to files with lint.
error: failed to push some refs to '[email protected]:nehalmamgain/dgp.git'

Questions about DDAD_tiny dataset.

Hi, thanks for your good work! When I use the DDAD_tiny dataset by the dgp, I encountered some errors. The code I used is as follow.

DDAD_TRAIN_VAL_JSON_PATH = '/DDAD_tiny/ddad_tiny.json' 
DATUMS = ['camera_01']
ddad_train = SynchronizedSceneDataset(
    DDAD_TRAIN_VAL_JSON_PATH,
    split='train',
    datum_names=DATUMS,
    generate_depth_from_datum='lidar'
)

The error report is

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home1/wangyufei/anaconda3/envs/tri/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home1/wangyufei/anaconda3/envs/tri/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home4/user_from_home1/wangyufei/dgp-1.0/dgp/datasets/base_dataset.py", line 1071, in _datum_index_for_scene
    return scene.datum_index
  File "/home4/user_from_home1/wangyufei/dgp-1.0/dgp/datasets/base_dataset.py", line 375, in datum_index
    assert len(datum_key_to_idx_in_scene) == bad_datums + num_datums, "Duplicated datum_key"
AssertionError: Duplicated datum_key
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home1/wangyufei/anaconda3/envs/tri/lib/python3.6/code.py", line 91, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/home1/wangyufei/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home1/wangyufei/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home4/user_from_home1/wangyufei/dc/packnet-sfm/test_dataset.py", line 30, in <module>
    generate_depth_from_datum='lidar'
  File "/home4/user_from_home1/wangyufei/dgp-1.0/dgp/datasets/synchronized_dataset.py", line 424, in __init__
    only_annotated_datums=only_annotated_datums
  File "/home4/user_from_home1/wangyufei/dgp-1.0/dgp/datasets/synchronized_dataset.py", line 83, in __init__
    requested_autolabels=requested_autolabels
  File "/home4/user_from_home1/wangyufei/dgp-1.0/dgp/datasets/base_dataset.py", line 704, in __init__
    self.datum_index = self._build_datum_index()
  File "/home4/user_from_home1/wangyufei/dgp-1.0/dgp/datasets/base_dataset.py", line 1080, in _build_datum_index
    datum_index = list(proc.map(BaseDataset._datum_index_for_scene, self.scenes))
  File "/home1/wangyufei/anaconda3/envs/tri/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home1/wangyufei/anaconda3/envs/tri/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
AssertionError: Duplicated datum_key

Would you provide some help?

Pull fails after commiting inside docker container.

Bug

Difficult to fetch from the repository after running git commit.

To Reproduce

Steps to reproduce the behavior:

When following Getting Started,
presumably at make setup-linters , pre-commit run --all-files, git commit inside the docker, permissions under .git change for the following folders from user to root

-rw-r--r--  1 root  root     73 Dec  7 11:46 COMMIT_EDITMSG
-rw-r--r--  1 root  root      0 Dec  7 12:34 FETCH_HEAD
-rw-r--r--  1 root  root     23 Dec  7 12:28 HEAD
-rw-r--r--  1 root  root    322 Dec  7 12:30 config
-rw-r--r--  1 root  root  39355 Dec  7 12:28 index

This prevents git operations outside the docker like

~/dgp$ git pull
error: cannot open .git/FETCH_HEAD: Permission denied

and inside the docker like

/home/dgp# git pull
Bad owner or permissions on /root/.ssh/config
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

For shared development devices where users do not have root permissions, it is impossible to change the permissions affected by the container.

Expected behavior

Being able to pull either from inside or outside the container. With above constraint (no root permissions), both workflows are blocked unless setting up repo from scratch again.
The best way would probably be for above folder permissions to not change to root in the first place 🤔

Additional context

no 3d annotation in dataset

Hi, thanks for your greate work!

I found that there is no 3d annotation in the dataset, did i miss someting?

Parallel Domain GUDA dataset link broken

Bug

Hello,

I am unable to download the ParallelDomain GUDA dataset from the provided link (https://paralleldomain.com/public-datasets)

The link to the dataset appears to be broken/ incorrect.

To Reproduce

I use: curl -s https://tri-ml-public.s3.amazonaws.com/github/vidar/datasets/PD_guda.tar | tar xv -C vidar/

Expected behavior

The PD GUDA data should be downloadable from the link provided on this page: https://paralleldomain.com/public-datasets

Semantic label for lidar

Feature

Proposal

Alternatives

Additional context

Thanks for your great work! Would you release LiDAR labels and 3D bounding box annotation for all scenes in the future?

ERROR: Cannot install -r requirements.txt (line 2) and botocore==1.12.79 because these package versions have conflicting dependencies.

Hello!

When installing
https://github.com/TRI-ML/packnet-sfm

I got the error:

Step 38/47 : RUN git clone https://github.com/TRI-ML/dgp.git     && cd dgp     && pip3 install -r requirements.txt
 ---> Running in 8fe2dbd1205b
Cloning into 'dgp'...
Requirement already satisfied: torch==1.4.0 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 17)) (1.4.0)
Requirement already satisfied: torchvision==0.5.0 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 18)) (0.5.0)
Collecting attrs==19.1.0
  Downloading attrs-19.1.0-py2.py3-none-any.whl (35 kB)
Collecting awscli==1.16.192
  Downloading awscli-1.16.192-py2.py3-none-any.whl (1.7 MB)
Requirement already satisfied: docutils>=0.10 in /usr/local/lib/python3.6/dist-packages (from awscli==1.16.192->-r requirements.txt (line 2)) (0.15.2)
INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of attrs to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install -r requirements.txt (line 2) and botocore==1.12.79 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

The conflict is caused by:
    The user requested botocore==1.12.79
    awscli 1.16.192 depends on botocore==1.12.182

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

The command '/bin/bash -cu git clone https://github.com/TRI-ML/dgp.git     && cd dgp     && pip3 install -r requirements.txt' returned a non-zero code: 1

So it looks like this needs a bit of a fix!

`build-docker` GH Actions workflow fails on master

Description

The build-docker workflow fails on master with

#4 [internal] load metadata for docker.io/nvidia/cuda:11.1-devel-ubuntu18.04
#4 ERROR: docker.io/nvidia/cuda:11.1-devel-ubuntu18.04: not found
------
 > [internal] load metadata for docker.io/nvidia/cuda:11.1-devel-ubuntu18.04:
------
error: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to create LLB definition: docker.io/nvidia/cuda:11.1-devel-ubuntu18.04: not found
Error: buildx failed with: error: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to create LLB definition: docker.io/nvidia/cuda:11.1-devel-ubuntu18.04: not found

Reproduction

Run the build-docker workflow on master manually or trigger it via a merge to master.

Fix

Update the base image in our Dockerfile.

Formerly:

FROM nvidia/cuda:11.1-devel-ubuntu18.04

I don't see this entry in the docker registry. I suspect it was renamed or replaced with alternatives.

Fix:

FROM nvidia/cuda:11.1.1-devel-ubuntu18.04

The new image is here.

Difficult for users to locally run linters used in CI

Description

DGP features various linters: pylint, YAPF, SuperLinter, and even a commit linter (CI-only?). To run all linters, users have to do so manually or attempt a commit locally to trigger the git hooks.

Improvement

While this introduces yet another tool, let's use something like pre-commit to manage our various linting tools. pre-commit provides a common entrypoint to containerized linting, making it easy to run the same checks that we use in CI locally so that people can fix linting issues before going to CI. We could configure and use (most of?) the same linters we currently use, and we'd remove .githooks/ entirely.

`make build-proto` and `make test` fail after following Getting Start instructions

Description

At the time of this writing, make test fails in the Docker environment after following the getting-started instructions.

Reproduction

docker pull ghcr.io/tri-ml/dgp:master
docker image tag ghcr.io/tri-ml/dgp:master dgp:latest
make docker-start-interactive
# either of the following fail
make build-proto
make test

For example, make test fails with:

root@hostname:/home/dgp# make test
python3 setup.py clean && \
rm -rf build dist && \
find . -name "*.pyc" | xargs rm -f && \
find . -name "__pycache__" | xargs rm -rf
Traceback (most recent call last):
  File "setup.py", line 5, in <module>
    from setuptools import find_packages, setup
ModuleNotFoundError: No module named 'setuptools'
Makefile:33: recipe for target 'clean' failed
make: *** [clean] Error 1

python3 is actually 3.6.9.

root@hostname:/home/dgp# python3
Python 3.6.9 (default, Jun 29 2022, 11:45:57)

but setuptools is installed for 3.7:

root@hostname:/home/dgp# pip show setuptools
Name: setuptools
Version: 63.2.0
Summary: Easily download, build, install, upgrade, and uninstall Python packages
Home-page: https://github.com/pypa/setuptools
Author: Python Packaging Authority
Author-email: [email protected]
License: 
Location: /usr/local/lib/python3.7/dist-packages
Requires: 
Required-by: astroid, grpcio-tools, xarray

Fix

Two solutions: one is to not install Python 3.6 at all so that python3 symlinks to 3.7. Another is to update the Makefile to use python3.7 specifically.

When to publish the DDAD15M datasets?

Is there plans to publish DDAD15M datasets?

Missing 'key_line_2d' in `ONTOLOGY_REGISTRY`

Since 'key_line_2d' is not defined in the 'ONTOLOGY_REGISTRY', an Exception is generated when instantiating FrameSceneDataset()

FrameSceneDataset(
/usr/local/lib/python3.8/dist-packages/dgp/datasets/frame_dataset.py:211: in __init__
    dataset_metadata = DatasetMetadata.from_scene_containers(scenes, requested_annotations, requested_autolabels)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'dgp.datasets.base_dataset.DatasetMetadata'>
scene_containers = [SceneContainer[<path_to_scene>][Samples: 100], SceneContainer[<path_to_scene>][Samples: 100], SceneContainer[<path_to_scene>][Samples: 100], ...]
requested_annotations = ['key_line_2d'], requested_autolabels = []

    @classmethod
    def from_scene_containers(cls, scene_containers, requested_annotations=None, requested_autolabels=None):
        """Load DatasetMetadata from Scene Dataset JSON.
    
        Parameters
        ----------
        scene_containers: list of SceneContainer
            List of SceneContainer objects.
    
        requested_annotations: List(str)
            List of annotations, such as ['bounding_box_3d', 'bounding_box_2d']
    
        requested_autolabels: List(str)
            List of autolabels, such as['model_a/bounding_box_3d', 'model_a/bounding_box_2d']
        """
        assert len(scene_containers), 'SceneContainers is empty.'
        requested_annotations = [] if requested_annotations is None else requested_annotations
        requested_autolabels = [] if requested_autolabels is None else requested_autolabels
    
        if not requested_annotations and not requested_autolabels:
            # Return empty ontology table
            return cls(scene_containers, directory=os.path.dirname(scene_containers[0].directory), ontology_table={})
        # For each annotation type, we enforce a consistent ontology across the
        # dataset (i.e. 2 different `bounding_box_3d` ontologies are not
        # permitted). However, an autolabel may support a different ontology
        # for the same annotation type. For example, the following
        # ontology_table is valid:
        # {
        #   "bounding_box_3d": BoundingBoxOntology,
        #   "bounding_box_2d": BoundingBoxOntology,
        #   "my_autolabel_model/bounding_box_3d": BoundingBoxOntology
        # }
        dataset_ontology_table = {}
        logging.info('Building ontology table.')
        st = time.time()
    
        # Determine scenes with unique ontologies based on the ontology file basename.
        unique_scenes = {
            os.path.basename(f): scene_container
            for scene_container in scene_containers
            for _, _, filenames in os.walk(os.path.join(scene_container.directory, ONTOLOGY_FOLDER)) for f in filenames
        }
        # Parse through relevant scenes that have unique ontology keys.
        for _, scene_container in unique_scenes.items():
            for ontology_key, ontology_file in scene_container.ontology_files.items():
                # Keys in `ontology_files` may correspond to autolabels,
                # so we strip those prefixes when instantiating `Ontology` objects
                _autolabel_model, annotation_key = os.path.split(ontology_key)
    
                # Look up ontology for specific annotation type
                if annotation_key in ONTOLOGY_REGISTRY:
    
                    # Skip if we don't require this annotation/autolabel
                    if _autolabel_model:
                        if ontology_key not in requested_autolabels:
                            continue
                    else:
                        if annotation_key not in requested_annotations:
                            continue
    
                    ontology_spec = ONTOLOGY_REGISTRY[annotation_key]
    
                    # No need to add ontology-less tasks to the ontology table.
                    if ontology_spec is None:
                        continue
    
                    # If ontology and key have not been added to the table, add it.
                    if ontology_key not in dataset_ontology_table:
                        dataset_ontology_table[ontology_key] = ontology_spec.load(ontology_file)
    
                    # If we've already loaded an ontology for this annotation type, make sure other scenes have the same ontology
                    else:
                        assert dataset_ontology_table[ontology_key] == ontology_spec.load(
                            ontology_file
                        ), "Inconsistent ontology for key {}.".format(ontology_key)
    
                # In case an ontology type is not implemented yet
                else:
>                   raise Exception(f"Ontology for key {ontology_key} not found in registry!")
E                   Exception: Ontology for key key_line_2d not found in registry!

/usr/local/lib/python3.8/dist-packages/dgp/datasets/base_dataset.py:592: Exception

tri-ml / dgp Goto Github PK

dgp's Issues

Description

Improvement

Bug

To Reproduce

Expected behavior

Additional context

Bug

To Reproduce

Expected behavior

Feature

Proposal

Alternatives

Additional context

Description

Reproduction

Fix

Description

Improvement

Description

Reproduction

Fix

Recommend Projects

Recommend Topics

Recommend Org