Giter Club home page Giter Club logo

opentad's Issues

Proposal Generation Only

Hello, I am interested in performing only the task of Temporal Action Localization and do not want to classify each proposal. Could you please guide me on how to modify the code accordingly?

AdaTAD License question

Does the Apache License apply to all models in the repo, and specifically does it apply to the Adatad model?
Thanks!!

epic-kitchens100 eval is nan

3615f1fa-2363-40a9-89c8-bfad7b2170ac

#===========================================================================================================
annotation_path = "train_code/OpenTAD/data/epic_kitchens-100/annotations/epic_kitchens_verb.json"
class_map = "train_code/OpenTAD/data/epic_kitchens-100/annotations/category_idx_verb.txt"
data_path = "data/video_dataset/EPIC-Kitchens100/epic_kitchens_100_30fps_512x288/"
block_list = None

window_size = 768*8
scale_factor = 1
chunk_num = window_size * scale_factor // 16
# 768/16=48 chunks, since videomae takes 16 frames as input

# ==================================================
#Video Preprocessing Sliding Window
#Frame Stride 2
#Frame Number 768×8
# ==================================================

dataset = dict(
    train=dict(
        type="EpicKitchensSlidingDataset",
        ann_file=annotation_path,
        subset_name="training",
        block_list=block_list,
        class_map=class_map,
        data_path=data_path,
        filter_gt=False,
        # ==================================================
        #Video Preprocessing Sliding Window
        #Frame Stride 2
        #Frame Number 768×8
        # ==================================================
        # ==================================================
        feature_stride=2,
        sample_stride=1,
        
        fps=30,
        
        offset_frames=8,
        
        window_size=window_size,
        window_overlap_ratio=0.5,
        # ==================================================
        pipeline=[
            dict(type="PrepareVideoInfo", format="mp4"),
            dict(type="mmaction.DecordInit", num_threads=4),
            dict(type="LoadFrames", 
                 num_clips=1, 
                 method="sliding_window", 
                 scale_factor=scale_factor),
            dict(type="mmaction.DecordDecode"),
            #================================================================
            #Frame Resolution 160×160
            #RandomResizedCrop + Flip + ImgAug + ColorJitter
            #================================================================
            dict(type="mmaction.Resize", scale=(-1, 182)),
            dict(type="mmaction.RandomResizedCrop"),
            dict(type="mmaction.Resize", scale=(160, 160), keep_ratio=False),
            dict(type="mmaction.Flip", flip_ratio=0.5),
            dict(type="mmaction.ImgAug", transforms="default"),
            dict(type="mmaction.ColorJitter"),
            dict(type="mmaction.FormatShape", input_format="NCTHW"),
            dict(type="ConvertToTensor", keys=["imgs", "gt_segments", "gt_labels"]),
            dict(type="Collect", inputs="imgs", keys=["masks", "gt_segments", "gt_labels"]),
        ],
    ),
    val=dict(
        type="EpicKitchensSlidingDataset",
        ann_file=annotation_path,
        subset_name="val",
        block_list=block_list,
        class_map=class_map,
        data_path=data_path,
        filter_gt=False,
        # ==================================================
        
        feature_stride=2,
        sample_stride=1,
        
        fps=30,
        
        offset_frames=8,
        
        window_size=window_size,
        window_overlap_ratio=0.5,
        # ==================================================
        pipeline=[
            dict(type="PrepareVideoInfo", format="mp4"),
            dict(type="mmaction.DecordInit", num_threads=4),
            dict(type="LoadFrames", 
                 num_clips=1, 
                 method="sliding_window", 
                 scale_factor=scale_factor),
            dict(type="mmaction.DecordDecode"),
            dict(type="mmaction.Resize", scale=(-1, 160)),
            dict(type="mmaction.CenterCrop", crop_size=160),
            dict(type="mmaction.FormatShape", input_format="NCTHW"),
            dict(type="ConvertToTensor", keys=["imgs", "gt_segments", "gt_labels"]),
            dict(type="Collect", inputs="imgs", keys=["masks", "gt_segments", "gt_labels"]),
        ],
    ),
    test=dict(
        type="EpicKitchensSlidingDataset",
        ann_file=annotation_path,
        subset_name="val",
        block_list=block_list,
        class_map=class_map,
        data_path=data_path,
        filter_gt=False,
        # ==================================================
        test_mode=True,
                
        feature_stride=2,
        sample_stride=1,
        
        fps=30,
        
        offset_frames=8,
        
        window_size=window_size,
        window_overlap_ratio=0.5,
        # ==================================================
        pipeline=[
            dict(type="PrepareVideoInfo", format="mp4"),
            dict(type="mmaction.DecordInit", num_threads=4),
            dict(type="LoadFrames", 
                 num_clips=1, 
                 method="sliding_window", 
                 scale_factor=scale_factor),
            dict(type="mmaction.DecordDecode"),
            dict(type="mmaction.Resize", scale=(-1, 160)),
            dict(type="mmaction.CenterCrop", crop_size=160),
            dict(type="mmaction.FormatShape", input_format="NCTHW"),
            dict(type="ConvertToTensor", keys=["imgs"]),
            dict(type="Collect", inputs="imgs", keys=["masks"]),
        ],
    ),
)


evaluation = dict(
    type="mAP",
    subset="validation",
    tiou_thresholds=[0.3, 0.4, 0.5, 0.6, 0.7],
    ground_truth_filename=annotation_path,
)


#===========================================================================================================
_base_ = [
    "/mnt2/ninghuayang/train_code/OpenTAD/configs/_base_/models/actionformer.py",
]
model = dict(
    backbone=dict(
        type="mmaction.Recognizer3D",
        backbone=dict(
            type="VisionTransformerAdapter",
            img_size=224,
            patch_size=16,
            embed_dims=1024,
            depth=24,
            num_heads=16,
            mlp_ratio=4,
            qkv_bias=True,
            num_frames=16,
            drop_path_rate=0.1,
            norm_cfg=dict(type="LN", eps=1e-6),
            return_feat_map=True,
            with_cp=True,  # enable activation checkpointing
            total_frames=window_size * scale_factor,
            adapter_index=list(range(24)),
        ),
        data_preprocessor=dict(
            type="mmaction.ActionDataPreprocessor",
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            format_shape="NCTHW",
        ),
        custom=dict(
            pretrain="pretrained/vit-large-p16_videomae-k400-pre_16x4x1_kinetics-400_20221013-229dbb03.pth",
            pre_processing_pipeline=[
                dict(type="Rearrange", 
                     keys=["frames"], 
                     ops="b n c (t1 t) h w -> (b t1) n c t h w", 
                     t1=chunk_num),
            ],
            post_processing_pipeline=[
                #=========================
                #Spatial Average Pooling + Resize
                #Feature Resize Length  768
                #=========================
                dict(type="Reduce", 
                     keys=["feats"], 
                     ops="b n c t h w -> b c t", 
                     reduction="mean"),
                dict(type="Rearrange", 
                     keys=["feats"], 
                     ops="(b t1) c t -> b c (t1 t)", 
                     t1=chunk_num),
                #=========================
                #Spatial Average Pooling + Resize
                #Feature Resize Length  768
                #=========================
                dict(type="Interpolate", 
                     keys=["feats"], 
                     size=768),
            ],
            norm_eval=False,  # also update the norm layers
            freeze_backbone=False,  # unfreeze the backbone
        ),
    ),
    projection=dict(
        in_channels=1024,
        max_seq_len=768,
        attn_cfg=dict(n_mha_win_size=9),
    ),
    rpn_head=dict(
        num_classes=97,
        prior_generator=dict(
            strides=[1, 2, 4, 8, 16, 32],
            regression_range=[(0, 4), (2, 8), (4, 16), (8, 32), (16, 64), (32, 10000)],
        ),
        loss_normalizer=250,
    ),
)

#=============
#Batch Size 2
#=============
solver = dict(
    train=dict(batch_size=2, num_workers=4),
    val=dict(batch_size=2, num_workers=2),
    test=dict(batch_size=2, num_workers=2),
    clip_grad_norm=1,
    amp=True,
    fp16_compress=True,
    static_graph=True,
    ema=True,
)

optimizer = dict(
    type="AdamW",
    lr=1e-4,
    weight_decay=0.05,
    paramwise=True,
    backbone=dict(
        lr=0,
        weight_decay=0,
        custom=[dict(name="adapter", lr=1e-4, weight_decay=0.05)],
        exclude=["backbone"],
    ),
)
scheduler = dict(type="LinearWarmupCosineAnnealingLR", 
                 #=============
                 #Warmup Epoch 5
                 #=============
                 warmup_epoch=5, 
                 max_epoch=20)

inference = dict(load_from_raw_predictions=False, save_raw_prediction=False)
post_processing = dict(
    pre_nms_topk=5000,
    nms=dict(
        use_soft_nms=True,
        sigma=0.4,
        max_seg_num=2000,
        iou_threshold=0,  # does not matter when use soft nms
        min_score=0.001,
        multiclass=True,
        voting_thresh=0.75,  #  set 0 to disable
    ),
    save_dict=False,
)

#=============
#Total Epoch 35
#=============
workflow = dict(
    logging_interval=10,
    checkpoint_interval=2,
    val_loss_interval=-1,
    val_eval_interval=2,
    val_start_epoch=2,
    end_epoch=60,
)

work_dir = "exps"

Cannot use gdown to download Anet raw video data

When I use this command to download _Anet_videos_15fps_short256.zip from google drive.

gdown [download link]

I got this error:

Failed to retrieve file url:

        Cannot retrieve the public link of the file. You may need to change
        the permission to 'Anyone with the link', or have had many accesses.
        Check FAQ in https://github.com/wkentaro/gdown?tab=readme-ov-file#faq.

You may still be able to access the file from the browser:

        [download link]

but Gdown can't. Please check connections and permissions.

Could you please change the permission to 'Anyone with the link'?

AdaTAD Training Resume

Hi Shuming,

Will the training automatically resume if I kill the process and run it again?

Thanks.

epic_kitchens-100 opentad RuntimeError: CUDA error: device-side assert triggered

#===========================================================================================================
annotation_path = "train_code/OpenTAD/data/epic_kitchens-100/annotations/epic_kitchens_verb.json"
class_map = "train_code/OpenTAD/data/epic_kitchens-100/annotations/category_idx_verb.txt"
data_path = "data/video_dataset/EPIC-Kitchens100/epic_kitchens_100_30fps_512x288/"
block_list = None

window_size = 768*8
scale_factor = 1
chunk_num = window_size * scale_factor // 16
# 768/16=48 chunks, since videomae takes 16 frames as input

# ==================================================
#Video Preprocessing Sliding Window
#Frame Stride 2
#Frame Number 768×8
# ==================================================

dataset = dict(
    train=dict(
        type="EpicKitchensSlidingDataset",
        ann_file=annotation_path,
        subset_name="training",
        block_list=block_list,
        class_map=class_map,
        data_path=data_path,
        filter_gt=False,
        # ==================================================
        #Video Preprocessing Sliding Window
        #Frame Stride 2
        #Frame Number 768×8
        # ==================================================
        # ==================================================
        feature_stride=2,
        sample_stride=1,
        
        fps=30,
        
        offset_frames=8,
        
        window_size=window_size,
        window_overlap_ratio=0.5,
        # ==================================================
        pipeline=[
            dict(type="PrepareVideoInfo", format="mp4"),
            dict(type="mmaction.DecordInit", num_threads=4),
            dict(type="LoadFrames", 
                 num_clips=1, 
                 method="sliding_window", 
                 scale_factor=scale_factor),
            dict(type="mmaction.DecordDecode"),
            #================================================================
            #Frame Resolution 160×160
            #RandomResizedCrop + Flip + ImgAug + ColorJitter
            #================================================================
            dict(type="mmaction.Resize", scale=(-1, 182)),
            dict(type="mmaction.RandomResizedCrop"),
            dict(type="mmaction.Resize", scale=(160, 160), keep_ratio=False),
            dict(type="mmaction.Flip", flip_ratio=0.5),
            dict(type="mmaction.ImgAug", transforms="default"),
            dict(type="mmaction.ColorJitter"),
            dict(type="mmaction.FormatShape", input_format="NCTHW"),
            dict(type="ConvertToTensor", keys=["imgs", "gt_segments", "gt_labels"]),
            dict(type="Collect", inputs="imgs", keys=["masks", "gt_segments", "gt_labels"]),
        ],
    ),
    val=dict(
        type="EpicKitchensSlidingDataset",
        ann_file=annotation_path,
        subset_name="val",
        block_list=block_list,
        class_map=class_map,
        data_path=data_path,
        filter_gt=False,
        # ==================================================
        test_mode=True,
        
        feature_stride=2,
        sample_stride=1,
        
        fps=30,
        
        offset_frames=8,
        
        window_size=window_size,
        window_overlap_ratio=0.5,
        # ==================================================
        pipeline=[
            dict(type="PrepareVideoInfo", format="mp4"),
            dict(type="mmaction.DecordInit", num_threads=4),
            dict(type="LoadFrames", 
                 num_clips=1, 
                 method="sliding_window", 
                 scale_factor=scale_factor),
            dict(type="mmaction.DecordDecode"),
            dict(type="mmaction.Resize", scale=(-1, 160)),
            dict(type="mmaction.CenterCrop", crop_size=160),
            dict(type="mmaction.FormatShape", input_format="NCTHW"),
            dict(type="ConvertToTensor", keys=["imgs", "gt_segments", "gt_labels"]),
            dict(type="Collect", inputs="imgs", keys=["masks", "gt_segments", "gt_labels"]),
        ],
    ),
    test=dict(
        type="EpicKitchensSlidingDataset",
        ann_file=annotation_path,
        subset_name="val",
        block_list=block_list,
        class_map=class_map,
        data_path=data_path,
        filter_gt=False,
        test_mode=True,
        # ==================================================
        feature_stride=2,
        sample_stride=1,
        
        fps=30,
        
        offset_frames=8,
        
        window_size=window_size,
        window_overlap_ratio=0.5,
        # ==================================================
        pipeline=[
            dict(type="PrepareVideoInfo", format="mp4"),
            dict(type="mmaction.DecordInit", num_threads=4),
            dict(type="LoadFrames", 
                 num_clips=1, 
                 method="sliding_window", 
                 scale_factor=scale_factor),
            dict(type="mmaction.DecordDecode"),
            dict(type="mmaction.Resize", scale=(-1, 160)),
            dict(type="mmaction.CenterCrop", crop_size=160),
            dict(type="mmaction.FormatShape", input_format="NCTHW"),
            dict(type="ConvertToTensor", keys=["imgs"]),
            dict(type="Collect", inputs="imgs", keys=["masks"]),
        ],
    ),
)


evaluation = dict(
    type="mAP",
    subset="validation",
    tiou_thresholds=[0.3, 0.4, 0.5, 0.6, 0.7],
    ground_truth_filename=annotation_path,
)


#===========================================================================================================
_base_ = [
    "/mnt2/ninghuayang/train_code/OpenTAD/configs/_base_/models/actionformer.py",
]
model = dict(
    backbone=dict(
        type="mmaction.Recognizer3D",
        backbone=dict(
            type="VisionTransformerAdapter",
            img_size=224,
            patch_size=16,
            embed_dims=1024,
            depth=24,
            num_heads=16,
            mlp_ratio=4,
            qkv_bias=True,
            num_frames=16,
            drop_path_rate=0.1,
            norm_cfg=dict(type="LN", eps=1e-6),
            return_feat_map=True,
            with_cp=True,  # enable activation checkpointing
            total_frames=window_size * scale_factor,
            adapter_index=list(range(24)),
        ),
        data_preprocessor=dict(
            type="mmaction.ActionDataPreprocessor",
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            format_shape="NCTHW",
        ),
        custom=dict(
            pretrain="pretrained/vit-large-p16_videomae-k400-pre_16x4x1_kinetics-400_20221013-229dbb03.pth",
            pre_processing_pipeline=[
                dict(type="Rearrange", 
                     keys=["frames"], 
                     ops="b n c (t1 t) h w -> (b t1) n c t h w", 
                     t1=chunk_num),
            ],
            post_processing_pipeline=[
                #=========================
                #Spatial Average Pooling + Resize
                #Feature Resize Length  768
                #=========================
                dict(type="Reduce", 
                     keys=["feats"], 
                     ops="b n c t h w -> b c t", 
                     reduction="mean"),
                dict(type="Rearrange", 
                     keys=["feats"], 
                     ops="(b t1) c t -> b c (t1 t)", 
                     t1=chunk_num),
                #=========================
                #Spatial Average Pooling + Resize
                #Feature Resize Length  768
                #=========================
                dict(type="Interpolate", 
                     keys=["feats"], 
                     size=window_size),
            ],
            norm_eval=False,  # also update the norm layers
            freeze_backbone=False,  # unfreeze the backbone
        ),
    ),
    projection=dict(
        in_channels=1024,
        max_seq_len=window_size,
        attn_cfg=dict(n_mha_win_size=-1),
    ),
)

#=============
#Batch Size 2
#=============
solver = dict(
    train=dict(batch_size=2, num_workers=2),
    val=dict(batch_size=2, num_workers=2),
    test=dict(batch_size=2, num_workers=2),
    clip_grad_norm=1,
    amp=True,
    fp16_compress=True,
    static_graph=True,
    ema=True,
)

optimizer = dict(
    type="AdamW",
    lr=1e-4,
    weight_decay=0.05,
    paramwise=True,
    backbone=dict(
        lr=0,
        weight_decay=0,
        custom=[dict(name="adapter", lr=1e-4, weight_decay=0.05)],
        exclude=["backbone"],
    ),
)
scheduler = dict(type="LinearWarmupCosineAnnealingLR", 
                 #=============
                 #Warmup Epoch 5
                 #=============
                 warmup_epoch=5, 
                 max_epoch=100)

inference = dict(load_from_raw_predictions=False, save_raw_prediction=False)
post_processing = dict(
    nms=dict(
        use_soft_nms=True,
        sigma=0.7,
        max_seg_num=2000,
        multiclass=True,
        voting_thresh=0.7,  #  set 0 to disable
    ),
    save_dict=False,
)

#=============
#Total Epoch 35
#=============
workflow = dict(
    logging_interval=50,
    checkpoint_interval=2,
    val_loss_interval=-1,
    val_eval_interval=2,
    val_start_epoch=10,
    end_epoch=60,
)

work_dir = "exps/thumos/adatad/e2e_actionformer_videomae_s_768x1_160_adapter"

Anet_videos_15fps_short256/missing_files.txt

Hi Shuming,

Did you upload Anet_videos_15fps_short256/missing_files.txt?

FileNotFoundError: [Errno 2] No such file or directory: 'data/activitynet-1.3/raw_data/Anet_videos_15fps_short256/missing_files.txt'

I downloaded [Update]_Anet_videos_15fps_short256.zip, but there was no missing_files.txt.

Where can I find it?

Thanks.

question about `scale_factor` in AdaTAD

I noticed that for anet you use scale_factor = 4 to account for the ViT backbone downsampling, but use scale_factor = 1 for thumos although it uses the same backbone. Can you please explain the logic?

adatad on multithumos

I noticed that the performance of adatad on the multi-label dataset (multithumos) has not been reported. Can you tell me what its mAP is ?

Best_mAP in training

Hi Shuming, during training, is it possible to save checkpoints based on the best validation mAP?

No recurring indicator results

official

Backbone GPUs Setting Frames Img Size [email protected] [email protected] [email protected] [email protected] [email protected] ave. mAP
VideoMAE-S 2 AdaTAD 768 160 83.90 79.01 72.38 61.57 48.27 69.03
VideoMAE-B 2 AdaTAD 768 160 85.95 81.86 75.02 63.29 49.56 71.14
VideoMAE-L 2 AdaTAD 768 160 87.17 83.58 76.88 66.81 53.13 73.51
VideoMAE-H 2 AdaTAD 768 160 88.42 84.63 78.72 69.04 53.95 74.95
VideoMAEV2-g 2 AdaTAD 768 160 88.63 85.39 79.17 68.34 53.79 75.06
VideoMAEV2-g 2 AdaTAD 1536 224 89.93 86.83 81.24 69.97 57.36 77.07

e2e_thumos_videomaev2_g_768x1_160_adapter.py

2024-04-22 20:36:07 Train INFO: [Train]: Epoch 41 started
2024-04-22 20:39:06 Train INFO: [Train]: [041][00050/00099]  Loss=0.3434  cls_loss=0.1847  reg_loss=0.1587  lr_backbone=6.8e-05  lr_det=6.8e-05  mem=30703MB
2024-04-22 20:41:54 Train INFO: [Train]: [041][00099/00099]  Loss=0.3310  cls_loss=0.1753  reg_loss=0.1557  lr_backbone=6.7e-05  lr_det=6.7e-05  mem=30703MB
2024-04-22 20:50:28 Train INFO: Evaluation starts...
2024-04-22 20:50:48 Train INFO: Loaded annotations from validation subset.
2024-04-22 20:50:48 Train INFO: Number of ground truth instances: 3325
2024-04-22 20:50:48 Train INFO: Number of predictions: 422000
2024-04-22 20:50:48 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-04-22 20:50:48 Train INFO: **Average-mAP: 74.85 (%)**
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.30 is 88.80%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.40 is 85.10%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.50 is 78.95%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.60 is 68.09%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.70 is 53.30%

e2e_thumos_videomaev2_g_768x2_224_adapter.py

2024-04-23 08:09:01 Train INFO: [Train]: Epoch 39 started
2024-04-23 08:18:31 Train INFO: [Train]: [039][00050/00099]  Loss=0.2967  cls_loss=0.1572  reg_loss=0.1395  lr_backbone=1.4e-04  lr_det=7.1e-05  mem=51851MB
2024-04-23 08:27:33 Train INFO: [Train]: [039][00099/00099]  Loss=0.3542  cls_loss=0.1892  reg_loss=0.1650  lr_backbone=1.4e-04  lr_det=7.0e-05  mem=51851MB
2024-04-23 09:00:06 Train INFO: Evaluation starts...
2024-04-23 09:00:26 Train INFO: Loaded annotations from validation subset.
2024-04-23 09:00:26 Train INFO: Number of ground truth instances: 3325
2024-04-23 09:00:26 Train INFO: Number of predictions: 422000
2024-04-23 09:00:26 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-04-23 09:00:26 Train INFO: **Average-mAP: 75.73 (%)**
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.30 is 88.47%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.40 is 85.66%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.50 is 79.79%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.60 is 69.55%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.70 is 55.19%

Train again without changing anything

e2e_thumos_videomaev2_g_768x2_224_adapter.py

2024-04-20 07:39:00 Train INFO: [Train]: Epoch 39 started
2024-04-20 07:48:33 Train INFO: [Train]: [039][00050/00099]  Loss=0.3000  cls_loss=0.1612  reg_loss=0.1388  lr_backbone=1.4e-04  lr_det=7.1e-05  mem=51859MB
2024-04-20 07:57:37 Train INFO: [Train]: [039][00099/00099]  Loss=0.3349  cls_loss=0.1770  reg_loss=0.1578  lr_backbone=1.4e-04  lr_det=7.0e-05  mem=51859MB
2024-04-20 08:30:23 Train INFO: Evaluation starts...
2024-04-20 08:30:42 Train INFO: Loaded annotations from validation subset.
2024-04-20 08:30:42 Train INFO: Number of ground truth instances: 3325
2024-04-20 08:30:42 Train INFO: Number of predictions: 422000
2024-04-20 08:30:42 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-04-20 08:30:42 Train INFO: **Average-mAP: 76.32 (%)**
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.30 is 89.55%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.40 is 86.40%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.50 is 79.45%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.60 is 70.78%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.70 is 55.43%

Roadmap and Feedback

We keep this issue open to collect feature requests and feedback from users, and thus keep improving this codebase.

If you didn't find the features you need in the Road Map, please leave a message here.

Thank you!

position of the adapter

Thank you for your outstanding work! How to train adapters in TAD models outside the backbone network

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.