Giter Club home page Giter Club logo

Comments (7)

ijkguo avatar ijkguo commented on August 19, 2024

There is only one _minus operation in get_vgg_rcnn. What changes did you do?

from mx-rcnn.

 avatar commented on August 19, 2024

@precedenceguo I only changed the class labels in pascal_voc.py to fit my datasets and set config.TEST.CXX_PROPOSAL = False, for 'Symbol' doesn't have the 'Proposal' attribute.
Here is the get_vgg_rcnn code I use:

data = mx.symbol.Variable(name="data")
rois = mx.symbol.Variable(name='rois')
label = mx.symbol.Variable(name='label')
bbox_target = mx.symbol.Variable(name='bbox_target')
bbox_weight = mx.symbol.Variable(name='bbox_weight')

# reshape input
rois = mx.symbol.Reshape(data=rois, shape=(-1, 5), name='rois_reshape')
label = mx.symbol.Reshape(data=label, shape=(-1, ), name='label_reshape')
bbox_target = mx.symbol.Reshape(data=bbox_target, shape=(-1, 4 * num_classes), name='bbox_target_reshape')
bbox_weight = mx.symbol.Reshape(data=bbox_weight, shape=(-1, 4 * num_classes), name='bbox_weight_reshape')

# shared convolutional layers
relu5_3 = get_vgg_conv(data)

# Fast R-CNN
pool5 = mx.symbol.ROIPooling(
    name='roi_pool5', data=relu5_3, rois=rois, pooled_size=(7, 7), spatial_scale=1.0 / config.RCNN_FEAT_SRTIDE)
# group 6
flatten = mx.symbol.Flatten(data=pool5, name="flatten")
fc6 = mx.symbol.FullyConnected(data=flatten, num_hidden=4096, name="fc6")
relu6 = mx.symbol.Activation(data=fc6, act_type="relu", name="relu6")
drop6 = mx.symbol.Dropout(data=relu6, p=0.5, name="drop6")
# group 7
fc7 = mx.symbol.FullyConnected(data=drop6, num_hidden=4096, name="fc7")
relu7 = mx.symbol.Activation(data=fc7, act_type="relu", name="relu7")
drop7 = mx.symbol.Dropout(data=relu7, p=0.5, name="drop7")
# classification
cls_score = mx.symbol.FullyConnected(name='cls_score', data=drop7, num_hidden=num_classes)
cls_prob = mx.symbol.SoftmaxOutput(name='cls_prob', data=cls_score, label=label, normalization='batch')
# bounding box regression
bbox_pred = mx.symbol.FullyConnected(name='bbox_pred', data=drop7, num_hidden=num_classes * 4)
bbox_loss_ = bbox_weight * mx.symbol.smooth_l1(name='bbox_loss_', scalar=1.0, data=(bbox_pred - bbox_target))
bbox_loss = mx.sym.MakeLoss(name='bbox_loss', data=bbox_loss_, grad_scale=1.0 / config.TRAIN.BATCH_ROIS)

# reshape output
cls_prob = mx.symbol.Reshape(data=cls_prob, shape=(config.TRAIN.BATCH_IMAGES, -1, num_classes), name='cls_prob_reshape')
bbox_loss = mx.symbol.Reshape(data=bbox_loss, shape=(config.TRAIN.BATCH_IMAGES, -1, 4 * num_classes), name='bbox_loss_reshape')

# group output
group = mx.symbol.Group([cls_prob, bbox_loss])
return group

from mx-rcnn.

ijkguo avatar ijkguo commented on August 19, 2024

We can see that the only _minus is bbox_pred - bbox_target.
Use sym.tojson() to check where is the _minus2

from mx-rcnn.

 avatar commented on August 19, 2024

@precedenceguo Yes, it indeed has only one 'minus', but it's name is _minus2. Part of the .json file is like below:

{
  "op": "null", 
  "param": {}, 
  "name": "bbox_target", 
  "inputs": [], 
  "backward_source_id": -1
}, 
{
  "op": "Reshape", 
  "param": {
    "keep_highest": "False", 
    "reverse": "False", 
    "shape": "(-1,84)", 
    "target_shape": "(0,0)"
  }, 
  "name": "bbox_target_reshape", 
  "inputs": [[83, 0]], 
  "backward_source_id": -1
}, 
{
  "op": "_Minus", 
  "param": {}, 
  _"name": "_minus2",_ 
  "inputs": [[82, 0], [84, 0]], 
  "backward_source_id": -1
}, 
{
  "op": "smooth_l1", 
  "param": {"scalar": "1"}, 
  "name": "bbox_loss_", 
  "inputs": [[85, 0]], 
  "backward_source_id": -1
}, 
{
  "op": "_Mul", 
  "param": {}, 
  "name": "_mul2", 
  "inputs": [[79, 0], [86, 0]], 
  "backward_source_id": -1
}, 
{
  "op": "MakeLoss", 
  "param": {
    "grad_scale": "0.0078125", 
    "normalization": "null", 
    "valid_thresh": "0"
  }, 
  "name": "bbox_loss", 
  "inputs": [[87, 0]], 
  "backward_source_id": -1
}, 

from mx-rcnn.

ijkguo avatar ijkguo commented on August 19, 2024

OK, so
mxnet.base.MXNetError: InferShape Error in _minus2's rhs argument
Shape inconsistent, Provided=(1536,12), inferred shape=(256,12)
means that lhs is shaped (1536,12) and rhs is shaped (256,12). lhs is bbox_pred and rhs is bbox_target.

Would you please check their shape again, which refer to the bbox_pred fc and the bbox_target io?

from mx-rcnn.

ijkguo avatar ijkguo commented on August 19, 2024

Why not checkout coco as a different dataset?

from mx-rcnn.

 avatar commented on August 19, 2024

@precedenceguo Thank you! I will checkout coco to give a try.

from mx-rcnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.