思悦使用基于VIS+LSTM视觉问答模型为主播搭建智能协同平台,帮助主播实现对粉丝、用户问题的实时应答、自动应答、一键选择应答。 与常见的QA问答系统相比,用户不仅可以输入简单的纯文字或图像询问,而且可以同步输入图片+文字组合,模型可给出预测的答案。
th extract_fc7.lua -split train
th extract_fc7.lua -split val
batch_size
: Batch size. Default is 10.split
: train/val. Default istrain
.gpuid
: 0-indexed id of GPU to use. Default is -1 = CPU.proto_file
: Path to thedeploy.prototxt
file for the VGG Caffe model. Default ismodels/VGG_ILSVRC_19_layers_deploy.prototxt
.model_file
: Path to the.caffemodel
file for the VGG Caffe model. Default ismodels/VGG_ILSVRC_19_layers.caffemodel
.data_dir
: Data directory. Default isdata
.feat_layer
: Layer to extract features from. Default isfc7
.input_image_dir
: Image directory. Default isdata
.
th train.lua
rnn_size
: Size of LSTM internal state. Default is 512.num_layers
: Number of layers in LSTMembedding_size
: Size of word embeddings. Default is 512.learning_rate
: Learning rate. Default is 4e-4.learning_rate_decay
: Learning rate decay factor. Default is 0.95.learning_rate_decay_after
: In number of epochs, when to start decaying the learning rate. Default is 15.alpha
: Alpha for adam. Default is 0.8beta
: Beta used for adam. Default is 0.999.epsilon
: Denominator term for smoothing. Default is 1e-8.batch_size
: Batch size. Default is 64.max_epochs
: Number of full passes through the training data. Default is 15.dropout
: Dropout for regularization. Probability of dropping input. Default is 0.5.init_from
: Initialize network parameters from checkpoint at this path.save_every
: No. of iterations after which to checkpoint. Default is 1000.train_fc7_file
: Path to fc7 features of training set. Default isdata/train_fc7.t7
.fc7_image_id_file
: Path to fc7 image ids of training set. Default isdata/train_fc7_image_id.t7
.val_fc7_file
: Path to fc7 features of validation set. Default isdata/val_fc7.t7
.val_fc7_image_id_file
: Path to fc7 image ids of validation set. Default isdata/val_fc7_image_id.t7
.data_dir
: Data directory. Default isdata
.checkpoint_dir
: Checkpoint directory. Default ischeckpoints
.savefile
: Filename to save checkpoint to. Default isvqa
.gpuid
: 0-indexed id of GPU to use. Default is -1 = CPU.
th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610.t7 -input_image_path data/train2014/COCO_train2014_000000405541.jpg -question 'What is the cat on?'
checkpoint_file
: Path to model checkpoint to initialize network parameters frominput_image_path
: Path to input imagequestion
: Question string
从VQA测试集中随机抽取图像+文字组合问题对,给出基于VIS+LSTM的视觉问答模型预测的答案。
Q: 站着的女生穿的衣服是什么颜色? A: 红褐色
Q: 女主播手里拿着什么? A: 蓝色牛仔裤
Q: 女生手里拿的是什么? A: 土豆
- [Exploring Models and Data for Image Question Answering], Ren et al., NIPS15
- [VQA: Visual Question Answering], Antol et al., ICCV15