Paper List for Open-eNded Language Generation (ONLG)

Contributed by Jian Guan, Zhexin Zhang, Zhuoer Feng

Introduction

Open-eNded Language Generation (ONLG) refers to those generation tasks where only very limited information is given in the input and there are many plausible output for the same input (also known as one-to-many). ONLG roughly includes chit-chat dialog/story/review/essay generation, etc.

Some active authors in the list

Minlie Huang, Stephen Roller, Nanyun Peng, Jianfeng Gao, Joelle Pineau, Angela Fan, Jason Weston, Ryan Lowe, Noah A. Smith...

0. Resource
1. Survey
2. Generative Model
3. Evaluation
- 3.1 Metric
- 3.2 Protocol
4. Others

0. Resource

STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation. Nader Akoury, Shufan Wang, Josh Whiting, Stephen Hood, Nanyun Peng, Mohit Iyyer. EMNLP 2020 [pdf]
GLUCOSE: GeneraLized and COntextualized Story Explanations. Nasrin Mostafazadeh, Aditya Kalyanpur, Lori Moon, David Buchanan, Lauren Berkowitz, Or Biran, Jennifer Chu-Carroll. EMNLP 2020 [pdf]
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning. Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi, Xiang Ren. Findings of EMNLP 2020 [pdf]
A Large-Scale Chinese Short-Text Conversation Dataset. Yida Wang, Pei Ke, Yinhe Zheng, Kaili Huang, Yong Jiang, Xiaoyan Zhu, Minlie Huang. NLPCC 2020 [pdf]
Storytelling with Dialogue: A Critical Role Dungeons and Dragons Dataset. Revanth Rameshkumar, Peter Bailey. ACL 2020 [pdf]
MuTual: A Dataset for Multi-Turn Dialogue Reasoning. Leyang Cui, Yu Wu, Shujie Liu, Yue Zhang, Ming Zhou. ACL 2020 [pdf]
Designing Precise and Robust Dialogue Response Evaluators. Tianyu Zhao, Divesh Lala, Tatsuya Kawahara. ACL 2020 short paper [pdf]
Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models. Maarten Sap, Eric Horvitz, Yejin Choi, Noah A. Smith, James Pennebaker. ACL 2020 [pdf]
Exploring the Effect of Author and Reader Identity in Online Story Writing: the STORIESINTHEWILD Corpus. Tal August, Maarten Sap, Elizabeth Clark, Katharina Reinecke, Noah A. Smith. ACL 2020 [pdf]
Counterfactual Story Reasoning and Generation. Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark, Yejin Choi. EMNLP 2019 [pdf]
Hierarchical neural story generation. Angela Fan, Mike Lewis, and Yann Dauphin. ACL 2018 [pdf]
Modeling Naive Psychology of Characters in Simple Commonsense Stories. Hannah Rashkin, Antoine Bosselut, Maarten Sap, Kevin Knight, Yejin Choi. ACL 2018
LSDSCC: a Large Scale Domain-Specific Conversational Corpus for Response Generation with Diversity Oriented Evaluation Metrics. Zhen Xu, Nan Jiang, Bingquan Liu, Wenge Rong, Bowen Wu, Baoxun Wang, Zhuoran Wang, Xiaolong Wang. NAACL 2018 [pdf]
Visual storytelling. Ting-Hao Kenneth Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell. NAACL 2016 [pdf]
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, James Allen. NAACL 2016 [pdf]
WikiPlots. Mark Ridel. Github 2017 [link]

1. Survey

A Survey of Knowledge-Enhanced Text Generation. Wenhao Yu, Chenguang Zhu, Zaitang Li, Zhiting Hu, Qingyun Wang, Heng Ji, Meng Jiang. arxiv 2020 [pdf]
Automatic Story Generation: State of the Art and Recent Trends. Brian Daniel Herrera-González, Alexander Gelbukh, and Hiram Calvo. MICA 2020 [pdf]
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols. Sarah E. Finch, Jinho D. Choi. SIGDIAL 2020 [pdf]
Challenges in Building Intelligent Open-domain Dialog Systems. Minlie Huang, Xiaoyan Zhu, Jianfeng Gao. TOIS 2020 [pdf]
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions. Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson. arxiv 2020 [pdf]
A Survey of Evaluation Metrics Used for NLG Systems. Ananya B. Sai, Akash Kumar Mohankumar, Mitesh M. Khapra. arxiv 2020 [pdf]
Evaluation of Text Generation: A Survey. Asli Celikyilmaz, Elizabeth Clark, Jianfeng Gao. arxiv 2020 [pdf]
Survey on evaluation methods for dialogue systems. Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre & Mark Cieliebak. Artificial Intelligence Review 2020 [pdf]
Judge the Judges: A Large-Scale Evaluation Study of Neural Language Models for Online Review Generation. Cristina Garbacea, Samuel Carton, Shiyan Yan, Qiaozhu Mei. arxiv 2019 [pdf]
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, Joelle Pineau. EMNLP 2016 [pdf]
对话系统评价方法综述 张伟男, 张杨子, 刘挺. **科学 : 信息科学 [pdf]
A Survey of Available Corpora for Building Data-Driven Dialogue Systems. Iulian Vlad Serban, Ryan Lowe, Peter Henderson, Laurent Charlin, Joelle Pineau. arxiv 2015 [pdf]

2. Generative Model

2.1 Story

Facts2Story: Controlling Text Generation by Key Facts. Eyal Orbach, Yoav Goldberg. [pdf]
Story Generation with Rich Details. Fangzhou Zhai, Vera Demberg, Alexander Koller. COLING 2020 [pdf]
Consistency and Coherency Enhanced Story Generation. Wei Wang, Piji Li, Hai-Tao Zheng. arxiv 2020 [pdf]
Plan-CVAE: A Planning-based Conditional Variational Autoencoder for Story Generation. Lin Wang, Juntao Li , Dongyan Zhao, Rui Yan. CCL 2020 [pdf]
Controllable Multi-Character Psychology-Oriented Story Generation. Feifei Xu, Xinpeng Wang, Yunpu Ma, Volker Tresp, Yuyi Wang, Shanlin Zhou, Haizhou Du. CIKM 2020 [pdf]
Creative Storytelling with Language Models and Knowledge Graphs. Xinran Yang, Ilaria Tiddi. CIKM 2020 workshop [pdf]
Modeling Protagonist Emotions for Emotion-Aware Storytelling. Faeze Brahman, Snigdha Chaturvedi. EMNLP 2020 [pdf]
PlotMachines: Outline-Conditioned Generation with Dynamic Plot State Tracking. Hannah Rashkin, Asli Celikyilmaz, Yejin Choi, Jianfeng Gao. EMNLP 2020 [pdf]
Cue Me In: Content-Inducing Approaches to Interactive Story Generation. Faeze Brahman, Alexandru Petrusca, Snigdha Chaturvedi. AACL 2020 [pdf]
Content Planning for Neural Story Generation with Aristotelian Rescoring. Seraphina Goldfarb-Tarrant, Tuhin Chakrabarty, Ralph Weischedel, Nanyun Peng. EMNLP 2020 [pdf]
MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models. Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar, Bryan Catanzaro. EMNLP 2020 [pdf]
A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation. Jian Guan, Fei Huang, Zhihao Zhao, Xiaoyan Zhu, Minlie Huang. TACL 2020 [pdf]
Improving Neural Story Generation by Targeted Common Sense Grounding. Huanru Henry Mao, Bodhisattwa Prasad Majumder, Julian McAuley, Garrison Cottrell. EMNLP 2020 [pdf]
Narrative Interpolation for Generating and Understanding Stories. Su Wang, Greg Durrett, Katrin Erk. arxiv 2020 [pdf]
Strategies for structuring story generation. Angela Fan, Mike Lewis, and Yann Dauphin. ACL 2019 [pdf]
Plan-and-write: Towards better automatic storytelling. Lili Yao, Nanyun Peng, Ralph Weischedel, Kevin Knight, Dongyan Zhao, and Rui Yan. AAAI 2019 [pdf]
Story ending generation with incremental encoding and commonsense knowledge. Jian Guan, Yansen Wang, and Minlie Huang. AAAI 2019 [pdf]
Learning to Write Stories with Thematic Consistency and Wording Novelty. Juntao Li, Lidong Bing, Lisong Qiu, Dongmin Chen, Dongyan Zhao, and Rui Yan. AAAI 2019 [pdf]
Learning to Predict Explainable Plots for Neural Story Generation. Gang Chen, Yang Liu, Huanbo Luan, Meng Zhang, Qun Liu, and Maosong Sun. arxiv 2019 [pdf]
Hierarchical neural story generation. Angela Fan, Mike Lewis, and Yann Dauphin. ACL 2018 [pdf]
Discourse-Aware Neural Rewards for Coherent Text Generation. Antoine Bosselut, Asli Celikyilmaz, Xiaodong He, Jianfeng Gao, Po-Sen Huang, Yejin Choi. ACL 2018 [pdf]
A skeleton-based model for promoting coherence among sentences in narrative story generation. Jingjing Xu, Xuancheng Ren, Yi Zhang, Qi Zeng, Xiaoyan Cai, and Xu Sun. EMNLP 2018 [pdf]
Event representations for automated story generation with deep neural nets. Lara Martin, Prithviraj Ammanabrolu, Xinyu Wang, William Hancock, Shruti Singh, Brent Harrison, and Mark Riedl. AAAI 2018 [pdf]
Discourse-Driven Narrative Generation with Bipartite Planning. David R. Winer and R. Michael Young. ACL 2016 [pdf]

2.2 Dialog

Learning to Plan and Realize Separately for Open-Ended Dialogue Systems. Sashank Santhanam, Zhuo Cheng, Brodie Mather, Bonnie Dorr, Archna Bhatia, Bryanna Hebenstreit, Alan Zemel, Adam Dalton, Tomek Strzalkowski, Samira Shaikh. Findings of EMNLP 2020 [pdf]
A Large-Scale Chinese Short-Text Conversation Dataset. Yida Wang, Pei Ke, Yinhe Zheng, Kaili Huang, Yong Jiang, Xiaoyan Zhu, Minlie Huang. NLPCC 2020 [pdf]
DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation. Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. ACL 2020 [pdf]
Target-Guided Open-Domain Conversation. Jianheng Tang, Tiancheng Zhao, Chenyan Xiong, Xiaodan Liang, Eric Xing, Zhiting Hu. [pdf]
Recipes for building an open-domain chatbot. Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. arxiv 2020 [pdf]
Towards a Human-like Open-Domain Chatbot. Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le. arxiv 2020 [pdf]
Adversarial Learning for Neural Dialogue Generation. Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, Dan Jurafsky. EMNLP 2017 [pdf]

2.3 Others

PAIR: Planning and Iterative Reﬁnement in Pre-trained Transformers for Long Text Generation. Xinyu Hua, Lu Wang. EMNLP 2020 [pdf]
Language Models are Few-Shot Learners. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. OpenAI blog 2020 [pdf]
Progressive Generation of Long Text. Bowen Tan, Zichao Yang, Maruan AI-Shedivat, Eric P. Xing, Zhiting Hu. arxiv 2020 [pdf]
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, Luke Zettlemoyer. ACL 2020 [pdf]
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. JMLR 2020 [pdf]
Long and Diverse Text Generation with Planning-based Hierarchical Variational Model. Zhihong Shao, Minlie Huang, Jiangtao Wen, Wenfei Xu, Xiaoyan Zhu. EMNLP 2019 [pdf]
Language Models are Unsupervised Multitask Learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. OpenAI blog 2019 [pdf]
Improving Language Understanding by Generative Pre-Training. Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever. OpenAI blog 2018 [pdf]
Chinese poetry generation with planning based neural network. Zhe Wang, Wei He, Hua Wu, Haiyang Wu, Wei Li, Haifeng Wang, Enhong Chen. COLING 2016 [pdf]
Sentence-Level Content Planning and Style Specification for Neural Text Generation. Xinyu Hua, Lu Wang. EMNLP 2019 pdf
Summarize, Outline, and Elaborate : Long-Text Generation via Hierarchical Supervision from Extractive Summaries. Xiaofei Sun, Chun Fan, Zijun Sun, Yuxian Meng, Fei Wu, JiWei Li. arxiv 2020 pdf

3. Evaluation

3.1 Metric

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation. Jian Guan, Minlie Huang. EMNLP 2020 [pdf]
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems. Lishan Huang, Zheng Ye, Jinghui Qin, Liang Lin, Xiaodan Liang. EMNLP 2020 [pdf]
GRUEN for Evaluating Linguistic Quality of Generated Text. Wanzheng Zhu and Suma Bhat. Findings of EMNLP 2020 [pdf]
How To Evaluate Your Dialogue System: Probe Tasks as an Alternative for Token-level Evaluation Metrics. Prasanna Parthasarathi, Joelle Pineau, Sarath Chandar. arxiv 2020 [pdf]
BLEURT: Learning robust metrics for text generation. Thibault Sellam, Dipanjan Das, and Ankur Parikh. ACL 2020 [pdf]
Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation. Bo Pang, Erik Nijkamp, Wenjuan Han, Linqi Zhou, Yixian Liu, Kewei Tu. ACL 2020 [pdf]
Learning an Unreferenced Metric for Online Dialogue Evaluation. Koustuv Sinha, Prasanna Parthasarathi, Jasmine Wang, Ryan Lowe, William L. Hamilton, Joelle Pineau. ACL 2020 [pdf]
Evaluating Dialogue Generation Systems via Response Selection. Shiki Sato, Reina Akama, Hiroki Ouchi, Jun Suzuki, Kentaro Inui. ACL 2020 [pdf]
Speaker Sensitive Response Evaluation Model. JinYeong Bak, Alice Oh. ACL 2020 [[pdf]](Speaker Sensitive Response Evaluation Model)
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation. Shikib Mehri, Maxine Eskenazi. ACL 2020 [pdf]
uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems. Tsuta Yuma, Naoki Yoshinaga, Masashi Toyoda. ACL 2020 [pdf]
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation. Weixin Liang, James Zou, Zhou Yu. ACL 2020 [pdf]
Designing Precise and Robust Dialogue Response Evaluators. Tianyu Zhao, Divesh Lala, Tatsuya Kawahara. ACL 2020 short paper [pdf]
Unsupervised Evaluation of Interactive Dialog with DialoGPT. Shikib Mehri, Maxine Eskenazi. SIGDIAL 2020 [pdf]
Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills. Eric Michael Smith, Mary Williamson, Kurt Shuster, Jason Weston, Y-Lan Boureau. ACL 2020 [pdf]
Bertscore: Evaluating text generation with bert. Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. ICLR 2020 [pdf]
Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems. Sarik Ghazarian, Ralph Weischedel, Aram Galstyan, Nanyun Peng. AAAI 2020 [pdf]
Learning to compare for better training and evaluation of open domain natural language generation models. Wangchunshu Zhou and Ke Xu. AAAI 2020 [pdf]
Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining. Ananya B. Sai, Akash Kumar Mohankumar, Siddhartha Arora, Mitesh M. Khapra. TACL 2020 [pdf]
How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning. Lu Li, Zhongheng He, Xiangyang Zhou, Dianhai Yu. arxiv 2019 [pdf]
Moverscore: Text generation evaluating with contextualized embeddings and earth mover distance. Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M Meyer, and Steffen Eger. EMNLP 2019 [pdf]
TIGEr: Text-to-Image Grounding for Image Caption Evaluation. Ming Jiang, Qiuyuan Huang, Lei Zhang, Xin Wang, Pengchuan Zhang, Zhe Gan, Jana Diesner, Jianfeng Gao. EMNLP 2019 [pdf]
Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References. Prakhar Gupta, Shikib Mehri, Tiancheng Zhao, Amy Pavel, Maxine Eskenazi, Jeffrey Bigham. SIGDIAL 2019 [pdf]
Unifying Human and Statistical Evaluation for Natural Language Generation. Tatsunori Hashimoto, Hugh Zhang, Percy Liang. NAACL 2019 [pdf]
Better automatic evaluation of open-domain dialogue systems with contextualized embeddings. Sarik Ghazarian, Johnny Wei, Aram Galstyan, and Nanyun Peng. NAACL 2019 workshop [pdf]
Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses. Ananya B. Sai, Mithun Das Gupta, Mitesh M. Khapra, Mukundhan Srinivasan. AAAI 2019 [pdf]
Towards a Better Metric for Evaluating Question Generation Systems. Preksha Nema, Mitesh M. Khapra. EMNLP 2018 [pdf]
The price of debiasing automatic metrics in natural language evaluation. Arun Tejasvi Chaganty, Stephen Mussman, Percy Liang. ACL 2018 [pdf]
Ruse: Regressor using sentence embeddings for automatic machine translation evaluation. Hiroki Shimanaka, Tomoyuki Kajiwara, and Mamoru Komachi. ACL 2018 workshop [pdf]
Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement. Jan Deriu, Mark Cieliebak. INLG 2018 [pdf]
Ruber: An unsupervised method for automatic evaluation of open-domain dialog systems. Chongyang Tao, Lili Mou, Dongyan Zhao, and Rui Yan. AAAI 2018 [pdf]
One “Ruler” for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning. Xiaowei Tong, Zhenxin Fu, Mingyue Shang, Dongyan Zhao, Rui Yan. IJCAI 2018 [pdf]
Adversarial Learning for Neural Dialogue Generation. Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, Dan Jurafsky. EMNLP 2017 [pdf]
Topic-based Evaluation for Conversational Bots. Fenfei Guo, Angeliki Metallinou, Chandra Khatri, Anirudh Raju, Anu Venkatesh, Ashwin Ram. NeurIPS 2017 workshop [pdf]
On Evaluating and Comparing Open Domain Dialog Systems. Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki Metallinou, Rahul Goel, Shaohua Yang, Anirudh Raju. NeurIPS 2017 workshop [pdf]
Towards an automatic turing test: Learning to evaluate dialogue responses. Ryan Lowe, Michael Noseworthy, Iulian Vlad Serban, Nicolas Angelard-Gontier, Yoshua Bengio, and Joelle Pineau. ACL 2017 Best Paper [pdf]
Evaluating Story Generation Systems Using Automated Linguistic Analyses. Melissa Roemmele, Andrew S. Gordon, Reid Swanson. ACM SIGKDD 2017 [pdf]
Adversarial evaluation of dialogue models. Anjuli Kannan and Oriol Vinyals. arxiv 2017 [pdf]
deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets. Michel Galley, Chris Brockett, Alessandro Sordoni, Yangfeng Ji, Michael Auli, Chris Quirk, Margaret Mitchell, Jianfeng Gao, Bill Dolan. ACL 2015 [pdf]
ROUGE: A package for automatic evaluation of summaries. Chin-Yew Lin. ACL 2004 [pdf]
Bleu: a method for automatic evaluation of machine translation. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. ACL 2002 [pdf]

3.2 Protocol for human evaluation

STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation. Nader Akoury, Shufan Wang, Josh Whiting, Stephen Hood, Nanyun Peng, Mohit Iyyer. EMNLP 2020 [pdf]
Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems. Jan Deriu, Don Tuggener, Pius von Däniken, Jon Ander Campos, Alvaro Rodrigo, Thiziri Belkacem, Aitor Soroa, Eneko Agirre, Mark Cieliebak. EMNLP 2020 [pdf]
Towards Best Experiment Design for Evaluating Dialogue System Output. Sashank Santhanam, Samira Shaikh. INLG 2019 [pdf]
Communication-based Evaluation for Natural Language Generation. SCiL 2019 [pdf]
Domain-Independent turn-level Dialogue Quality Evaluation via User Satisfaction Estimation. SIGDIAL 2019[pdf]
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons. Margaret Li, Jason Weston, Stephen Roller. AAAI 2019 [pdf]
ChatEval: A Tool for the Systematic Evaluation of Chatbots. João Sedoc, Daphne Ippolito, Arun Kirubarajan, Jai Thirani, Lyle Ungar, Chris Callison-Burch. ACL 2018 [pdf]
Towards a Method For Evaluating Naturalness in Conversational Dialog Systems. Victor Hung, Miguel Elvir, Avelino Gonzalez, Ronald DeMara. IEEE ICSMC 2009 [pdf]
Empirical Methods for Evaluating Dialog Systems. Tim Paek. ACL 2001 [pdf]

4. Others

If beam search is the answer, what was the question? Clara Meister, Tim Vieira, Ryan Cotterell. EMNLP 2020 [pdf]
A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation. Moin Nadeem, Tianxing He, Kyunghyun Cho, James Glass. AACL 2020 [pdf]

rogervaas / paperforonlg Goto Github PK

paperforonlg's Introduction

Paper List for Open-eNded Language Generation (ONLG)

Introduction

Some active authors in the list

Contents

0. Resource

1. Survey

2. Generative Model

2.1 Story

2.2 Dialog

2.3 Others

3. Evaluation

3.1 Metric

3.2 Protocol for human evaluation

4. Others

paperforonlg's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org