Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorFlow Object Detection API - How to train on COCO dataset and achieve same mAP as the reported one?

I'm trying to reproduce the officially reported mAP of EfficientDet D3 in the Object Detection API by training on COCO using a pretrained EfficientNet backbone. The official COCO mAP is 45.4% and yet all I can manage to achieve is around 14%. I don't need to reach the same value, but I wish to at least come close to it.

I am loading the EfficientNet B3 checkpoint pretrained on ImageNet found here, and using the config file found here. The only parameters I changed are batch size (to fit into an RTX 3090), learning rate (0.08 was yielding loss=NaN so I reduced it to 0.01), and steps, which I increased to 600k. This is my pipeline.config file:

  model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 90
    add_background_class: false
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: [1.0, 2.0, 0.5]
        scales_per_octave: 3
      }
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 896
        max_dimension: 896
        pad_to_max_dimension: true
        }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        depth: 160
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          force_use_bias: true
          activation: SWISH
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            scale: true
            decay: 0.99
            epsilon: 0.001
          }
        }
        num_layers_before_predictor: 4
        kernel_size: 3
        use_depthwise: true
      }
    }
    feature_extractor {
      type: 'ssd_efficientnet-b3_bifpn_keras'
      bifpn {
        min_level: 3
        max_level: 7
        num_iterations: 6
        num_filters: 160
      }
      conv_hyperparams {
        force_use_bias: true
        activation: SWISH
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          scale: true,
          decay: 0.99,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.25
          gamma: 1.5
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.5
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint: "/API/Tensorflow/models/research/object_detection/test_data/efficientnet_b3/efficientnet_b3/ckpt-0"
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint_type: "classification"
  batch_size: 2
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  use_bfloat16: false
  num_steps: 600000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_scale_crop_and_pad_to_square {
      output_size: 896
      scale_min: 0.1
      scale_max: 2.0
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: 1e-2
          total_steps: 600000
          warmup_learning_rate: .001
          warmup_steps: 2500
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  label_map_path: "/DATASETS/COCO/classes.pbtxt"
  tf_record_input_reader {
    input_path: "/DATASETS/COCO/coco_train.record-00000-of-00100"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1;
}

eval_input_reader: {
  label_map_path: "/DATASETS/COCO/classes.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/DATASETS/COCO/coco_val.record-00000-of-00050"
  }
}

These are the results:

mAP Loss

like image 867
Laguilhoat Avatar asked Oct 16 '25 06:10

Laguilhoat


2 Answers

Your loss is too high. A loss around 1 indicates that your model is not being trained. It doesn’t learn the weights. There are a couple of things you can check:

  1. The dataset. Are all images used during training? Also have a look at the annotations. Are classes and bounding boxes correct? Or is there anything weird? For example, COCO’s bounding boxes should be given as absolute value. If there are given as relative value, this might indicate you need to rescale them.
  2. Is the image resized? If so, its bounding box also needs to be resized.
  3. Check the bounding boxes. Maybe plot a few images with their bounding boxes. If the bounding boxes are not in the correct format or its values are incorrectly scaled, you’ll see it.
  4. To narrow down the source of this bug, try to load weights for EfficientNet that have been trained on COCO and see what happens if you try to finetune them further with a very low lr. If that doesn’t work, then that is a very strong indication that there are problems with the annotations.
like image 181
emely_pi Avatar answered Oct 17 '25 19:10

emely_pi


Two suggestions: Batch-size is an essential hyper-parameter in deep learning. Different batch sizes may lead to various testing and training accuracies. Choosing an optimal batch size is crucial when training a neural network.[Source]

Using a batch-size of 1 (or 2) for a model with so many parameters may be the reason for lower accuracy.

A higher number of epochs does not compensate for lower batch-size.

Another point which I noticed is that the paper makes use of jitter for augmentation.

like image 24
YScharf Avatar answered Oct 17 '25 18:10

YScharf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!