Giter Club home page Giter Club logo

cmrc2018-drcd-bert's Introduction

BERT Baselines for CMRC 2018 and DRCD

This repository contains the BERT baseline systems for CMRC 2018 and DRCD (unofficial). These are two Chinese Span-Extraction Machine Reading Comprehension datasets (like SQuAD) publically available.

Content

Section Description
Datasets CMRC 2018 and DRCD
Usage How to train and test
Baseline Results BERT baseline results

Datasets

CMRC 2018 (Simplified Chinese): https://github.com/ymcui/cmrc2018

DRCD (Traditional Chinese): https://github.com/DRCSolutionService/DRCD

You can download these datasets through the links above, or alternatively, you can also download directly from data directory. Note that, we use SQuAD-like CMRC 2018 datasets, which could be accessed through link.

For more Chinese machine reading comprehension datasets, please infer: https://github.com/ymcui/Chinese-RC-Datasets

Dependency Requirements

There is nothing special dependency requirements, except for TensorFlow==1.12. Could also work on other versions of TensorFlow (not tested).

The code is based on official BERT implementation of run_squad.py.

Check: https://github.com/google-research/bert/blob/master/run_squad.py

Usage

Step 1: Download BERT weights (skip if you have them)

Step 2: Set correct local variables

  • $PATH_TO_BERT: the path to BERT weights (TensorFlow version)
  • $DATA_DIR: the path to your dataset
  • $MODEL_DIR: output directory for your model

Step 3: Training

Then, we use the following script for training. We take CMRC 2018 dataset and multi-lingual BERT as an example.

python run_cmrc2018_drcd_baseline.py \
	--vocab_file=${PATH_TO_BERT}/multi_cased_L-12_H-768_A-12/vocab.txt \
	--bert_config_file=${PATH_TO_BERT}/multi_cased_L-12_H-768_A-12/bert_config.json \
	--init_checkpoint=${PATH_TO_BERT}/multi_cased_L-12_H-768_A-12/bert_model.ckpt \
	--do_train=True \
	--train_file=${DATA_DIR}/cmrc2018_train.json \
	--do_predict=True \
	--predict_file=${DATA_DIR}/cmrc2018_dev.json \
	--train_batch_size=32 \
	--num_train_epochs=2 \
	--max_seq_length=512 \
	--doc_stride=128 \
	--learning_rate=3e-5 \
	--save_checkpoints_steps=1000 \
	--output_dir=${MODEL_DIR} \
	--do_lower_case=False \
	--use_tpu=False

Step 4: Evaluation

We use official evaluation script for CMRC 2018 and DRCD. Note that, as DRCD official does not provide evaluation script, we also use cmrc2018_evaluate.py for DRCD.

python cmrc2018_evaluate.py cmrc2018_dev.json predictions.json

Baseline Results

We provide both BERT-Chinese and BERT-multilingual baselines.

Note that, each baseline is performed on 10 runs, and we take average scores of them for reliable results (do not applied on hidden sets).

CMRC 2018

System DEV-EM DEV-F1 TEST-EM TEST-F1 CHALLENGE-EM CHALLENGE-F1 Note
BERT (Multi-lingual) 63.6 84.0 68.4 86.5 18.5 43.0 -
BERT (Chinese) 63.5 83.6 67.5 85.6 18.4 42.1 -
P-Reader (single model) 59.894 81.499 65.189 84.386 15.079 39.583 -
GM-Reader (ensemble) 58.931 80.069 64.045 83.046 15.675 37.315 -
MCA-Reader (ensemble) 66.698 85.538 71.175 88.090 15.476 37.104 -
Z-Reader (single model) 79.776 92.696 74.178 88.145 13.889 37.422 -
SRC->DS(±) (Yang et al., 2019) 49.2 65.4 - - - - -

Leaderboard: https://hfl-rc.github.io/cmrc2018/open_challenge/
Note that, some of the previous submission are using development set for training as well.

DRCD

System DEV-EM DEV-F1 TEST-EM TEST-EM Note
BERT (Multi-lingual) 83.0 89.7 83.2 89.8 -
BERT (Chinese) 82.2 89.3 81.6 88.8 -
SRC + DS(±) (Yang et al., 2019) 55.4 67.7 - - -
r-net (single model) - - 29.1 44.4 -

Disclaimer

This is not an official baseline release of the respective dataset.

Contact

For help or issues, please submit a GitHub issue.

cmrc2018-drcd-bert's People

Contributors

ymcui avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.