Giter Club home page Giter Club logo

geo_split's Introduction

Localization Is All You Evaluate

Data Leakage in Online Mapping Datasets and How to Fix It

CVPR 2024

The state-of-the-art methods for online mapping are based on supervised learning and are trained predominantly using two datasets: nuScenes and Argoverse 2. These datasets revisit the same geographic locations across training, validation, and test sets which yields inflated performance numbers being reported.

Specifically, over $80$% of nuScenes and $40$% of Argoverse 2 validation and test samples are located less than $5$ m from a training sample. The figure below displays an example of this, where three samples from the nuScenes dataset are highlighted. Despite being from different sets, the samples are situated in the same geographic location.

In our paper: Localization is All You Evaluate we propose to split the nuScenes and Arogverse 2 datasets by the samples' positions; Geographically Disjoint splits. This repository contains the propsed Near Extrapolation and Far Extrapolation splits, and the code used to generate them.

We also release some examples on how to convert SOTA online mapping methods' Original split pickle files to Geographically Disjoint split pickle files.

nuScenes Near Extrapolation Splits

Argoverse 2 Near Extrapolation Splits

Usage

You can use the proposed splits to train and evaluate the performance of online mapping methods directly.

The Geographically disjoint splits are defined in txt files (pkl-files are also provided for convinience) under /near_extrapolation_splits and /far_extrapolation_splits respectively.

For the nuScenes Near Extrapolation splits there are two versions:

1 - near_extrapolation_splits/nuscenes/samples: where all samples are used and sequences that straddles a set boundary are split in two parts and assigned to the respective sets (see paper for details). The split-files consist of all indivudual samples' set assignment.

2 - near_extrapolation_splits/nuscenes/scenes: sequences that straddles a set boundary are removed. The split-files contain the scene-name for each set assignment.

For Far Extrapolation splits the name of the file indicates the city and the set. E.g. singapore.txt contains the scenes from Singapore and PIT+MIA.txt contains the log ids for Pittsburgh and Miami.

Create/Verify Geographical Splits and results in paper

If you want to verify the Geographically Disjoint splits, you can install the required packages and run the accompanying code as follows:

Install

conda create --name geosplits python=3.8
conda activate geosplits
pip install -r requirements.txt

Download data

Download according to the instructions in the respective repositories:

Generate Geographical Splits

Create nuScenes splits:

python src/nuscenes/generate_geo_split.py --data_dir /home/sun/MapTR/data/nuscenes 

Create Argoverse 2 splits:

python src/argoverse2/generate_geo_split.py --data_dir /media/sun/z/argoverse2/sensor

Generate original pkl files using the method of your choice

Generate the necessary dataset pkls following the instructions in the respective repositories:

Convert pickle files from a method to geographically disjoint split pkls

Convert the dataset pkl files you generated in the previous step to geographically disjoint split pkls:

python src/nuscenes/convert_pkls.py --method my-selected-method --pkl_dir /path/to/pkls/folder/of/my/selected/method --output_dir /path/to/output
python src/nuscenes/convert_pkls.py --method maptrv2 --pkl_dir /home/sun/MapTR/data/nuscenes/ --out_dir /home/sun/MapTR/data/nuscenes/ 
python src/argoverse2/convert_pkls.py --method my-selected-method --pkl_dir /path/to/pkls/folder/of/my/selected/method --output_dir /path/to/output 
python src/argoverse2/convert_pkls.py --method maptrv2 --pkl_dir /home/sun/MapTR/data/argoverse2/sensor --out_dir /home/sun/MapTR/data/argoverse2/sensor

The '--og_pkl_name' argument can be used to specify the base name of the original pkl files. E.g. the default for nuscenes is 'nuscenes_map_infos_temporal' and then '_train', '_val', '_test' will be appended to the base name to find the original pkl files.

Train & Evaluate

Follow the instructions in the respective repositories for training and evaluation. Simply replace the path to the original pkl files with the geographical split pkls you created above.

geo_split's People

Contributors

liljaadam avatar kinjonun avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.