Giter Club home page Giter Club logo

airdialogue's Introduction

AirDialogue

AirDialogue is a benchmark dataset for goal-oriented dialogue generation research. This python library contains a collection of tookits that come with the dataset.

What's New

  • Jul 13,2020: Fixed a bug in BLEU evaluation. The current version gives higher BLEU scores. Support evaluation for different roles and add KL-divergence metric (see --infer_metrics).
  • Jul 12,2020: We update the AirDialogue dataset to version v1.1. We fixed typos, misalignment between KB file and dialogue file. Please download and use the new data.

Prerequisites

General

  • python (verified on 3.7)
  • wget

Python Packages

  • tensorflow (tested on 1.15.0)
  • tqdm
  • nltk
  • flask (for visualization)

Install

To install the pre-build version from pip, use

pip install airdialogue-essentials

To install the bleeding edge from github, use

python setup.py install

Quick Start

Scoring

The official scoring function evaluates the predictive results for a trained model and compare it to the AirDialogue dataset.

airdialogue score --true_data PATH_TO_DATA_FILE --true_kb PATH_TO_KB_FILE \
    --infer_metrics bleu

--infer_metrics can be one of (bleu:all|rouge:all|kl:all|bleu:brief|kl:brief). brief mode gives a single number metric. (bleu|kl) is equivalent to (belu:brief|kl:brief)

Context Generation

Context generator generates a valid context-action pair without conversatoin history.

airdialogue contextgen \
    --output_data PATH_TO_OUTPUT_DATA_FILE \
    --output_kb PATH_TO_OUTPUT_KB_FILE \
    --num_samples 100

Preprocessing

AirDialogue proprocess tookie tokenizes dialogue. Preprocess on AirDialogue data requires 50GB of ram to work. Parameter job_type is a set of 5 bits separted by |, which reqpresents train|eval|infer|sp-train|sp-eval. Parameter input_type can be either context for context only data or dialogue for dialogue data with full history.

airdialogue prepro \
  --data_file PATH_TO_DATA_FILE \
  --kb_file PATH_TO_KB_FILE \
  --output_dir "./data/airdialogue/" \
  --output_prefix 'train' --job_type '0|0|0|1|0' --input_type context

Simulator

Simulator is built on top of context generator that provides not only a context-action pair but also a full conversation history generated by two templated chatbot agents.

airdialogue sim \
    --output_data PATH_TO_OUTPUT_DATA_FILE \
    --output_kb PATH_TO_OUTPUT_KB_FILE \
    --num_samples 100

Visualization

Visualization tool displays the content of the raw json file.

airdialogue vis --data_path ./data/airdialogue/json/

airdialogue's People

Contributors

hmjianggatech avatar josephch405 avatar sun51 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

airdialogue's Issues

Bugs in the function `score_human_data` in evaluator/evaluator_main.py

In

expanded_kb = expanduser(flags.kb)
expanded_data = expanduser(flags.data)

It acutally should load flags.true_kb and flags.true_data.

In

f2 = gfile.Open(expanded_kb)
with gfile.Open(expanded_data) as f:

There is an error about tensorflow (tf v1.15):
AttributeError: module 'tensorflow._api.v1.compat.v1.io.gfile' has no attribute 'Open'
Use tf.gfile.GFile as Line 236-238?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.