Giter Club home page Giter Club logo

evaluationpapers4chatgpt's Introduction

Evaluation Papers for ChatGPT

License: MIT

Introduction

This repository stores Dataset Resources, Evaluation Papers and Detection Tools for ChatGPT.

0. Survey

  1. ChatGPT: A Meta-Analysis after 2.5 Months.

    Christoph Leiter, Ran Zhang, Yanran Chen, Jonas Belouadi, Daniil Larionov, Vivian Fresen, Steffen Eger. [abs], 2023.2

1. Dataset Resource

  1. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection.

    Biyang Guo, Xin Zhang , Ziyuan Wang, Minqi Jiang , Jinran Nie, Yuxuan Ding, Jianwei Yue , Yupeng Wu. [abs],[github], 2023.1

  2. ChatGPT: Jack of all trades, master of none.

    Jan Kocoń , Igor Cichecki , Oliwier Kaszyca , Mateusz Kochanek , Dominika Szydło , Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz, Anna Kocoń, Bartłomiej Koptyra, Wiktoria Mieleszczenko-Kowszewicz, Piotr Miłkowski, Marcin Oleksy, Maciej Piasecki, Łukasz Radliński, Konrad Wojtasik, Stanisław Woźniak and Przemysław Kazienko. [abs],[github], 2023.2

  3. Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT.

    Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao. [abs],[github], 2023.2

  4. Is ChatGPT A Good Translator? A Preliminary Study.

    Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Xing Wang, Zhaopeng Tu. [abs],[github], 2023.1

  5. On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective.

    Jindong Wang, Xixu Hu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Haojun Huang, Wei Ye, Xiubo Geng, Binxin Jiao, Yue Zhang, Xing Xie . [abs],[github], 2023.2

  6. An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP).

    Paulo Shakarian, Abhinav Koyyalamudi, Noel Ngu, Lakshmivihari Mareedu. [abs][github], 2023.2

Data statistics of these resources:

Paper with Dataset Task #Examples
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection QA + Dialog 40,000
ChatGPT: Jack of all trades, master of none 25 classification/ QA/reasoning task 38,000
Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT sentiment analysis / Paraphrase / NLI 475
Is ChatGPT A Good Translator? A Preliminary Study Translation 5,609
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective Robustness 2,237
An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP). Reasoning 1,000

2. Evaluation Papers

2.1 Natural Language Understanding

  1. Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT.

    Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao. [abs],[github], 2023.2

  2. ChatGPT: Jack of all trades, master of none.

    Jan Kocoń , Igor Cichecki , Oliwier Kaszyca , Mateusz Kochanek , Dominika Szydło , Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz, Anna Kocoń, Bartłomiej Koptyra, Wiktoria Mieleszczenko-Kowszewicz, Piotr Miłkowski, Marcin Oleksy, Maciej Piasecki, Łukasz Radliński, Konrad Wojtasik, Stanisław Woźniak and Przemysław Kazienko. [abs],[github], 2023.2

  3. How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks.

    Xuanting Chen, Junjie Ye, Can Zu, Nuo Xu, Rui Zheng, Minlong Peng, Jie Zhou, Tao Gui, Qi Zhang, Xuanjing Huang. [abs], 2023.3

2.2 Open-ended Generation

  1. Exploring AI Ethics of ChatGPT: A Diagnostic Analysis.

    Terry Yue Zhuo, Yujin Huang , Chunyang Chen , Zhenchang Xing. [abs], 2023.2

  2. Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech.

    Fan Huang, Haewoon Kwak, Jisun An. [abs], 2023.2

2.3 Long Text Summarization

  1. Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization.

    Xianjun Yang, Yan Li, Xinlu Zhang, Haifeng Chen, Wei Cheng. [abs], 2023.2

  2. Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?

    Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon. [abs], 2023.2

  3. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports.

    Katharina Jeblick, Balthasar Schachtner, Jakob Dexl, Andreas Mittermeier, Anna Theresa Stüber, Johanna Topalis, Tobias Weber, Philipp Wesp, Bastian Sabel, Jens Ricke, Michael Ingrisch. [abs], 2022.12

  4. Cross-Lingual Summarization via ChatGPT.

    Jiaan Wang, Yunlong Liang, Fandong Meng, Zhixu Li, Jianfeng Qu, Jie Zhou. [abs], 2023.2

2.4 Reasoning

  1. Mathematical Capabilities of ChatGPT.

    Simon Frieder, Luca Pinchetti, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, Alexis Chevalier, Julius Berner. [abs], 2023.1

  2. Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

    Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, Diyi Yang. [abs], 2023.2

  3. A Categorical Archive of ChatGPT Failures.

    Ali Borji. [abs], 2023.2

  4. An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP).

    Paulo Shakarian, Abhinav Koyyalamudi, Noel Ngu, Lakshmivihari Mareedu. [abs][github], 2023.2

2.5 Multimodal

  1. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity.

    Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung. [abs], 2023.2

  2. A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning

    Zhisheng Tang, Mayank Kejriwal. [abs], 2023.2

2.6 Information Extraction

  1. Zero-Shot Information Extraction via Chatting with ChatGPT

    Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, Yong Jiang, Wenjuan Han. [abs][github][demo], 2023.2

2.7 Other Domains

Education

  1. ChatGPT: The End of Online Exam Integrity?

    Teo Susnjak. [abs], 2022.12

  2. ChatGPT: Bullshit spewer or the end of traditional assessments in higher education?

    Jürgen Rudolph, Samson Tan, Shannon Tan. [pdf], 2023.1

  3. Will ChatGPT get you caught? Rethinking of Plagiarism Detection

    Mohammad Khalil, Erkan Er. [abs], 2023.2

Biology

  1. How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

    Aidan Gilson, Conrad Safranek, Thomas Huang, Vimig Socrates, Ling Chi, R. Andrew Taylor, David Chartash. [pdf], 2022.12

  2. Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making.

    Arya Rao, John Kim, Meghana Kamineni, Michael Pang, Winston Lie, Marc D. Succi. [pdf], 2023.2

  3. Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts health answer correctness.

    Guido Zuccon, Bevan Koopman. [abs], 2023.2

Law

  1. Chatgpt goes to law school

    Teo Susnjak. [abs], 2023

3. Detection Tools

3.1 Metrics

Metrics Before ChatGPT

  1. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature.

    Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn. [abs],[demo], 2023.1

  2. GPTScore: Evaluate as You Desire.

    Jinlan Fu, See-Kiong Ng, Zhengbao Jiang, Pengfei Liu. [abs],[github], 2023.2

  3. MAUVE Scores for Generative Models: Theory and Practice.

    Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui. [abs], 2022.12

Using ChatGPT as evaluation metric

  1. Large Language Models Are State-of-the-Art Evaluators of Translation Quality.

    Tom Kocmi, Christian Federmann. [abs],[github], 2023.2

Metrics for detecting ChatGPT

  1. AI vs. Human -- Differentiation Analysis of Scientific Content Generation.

    Yongqiang Ma, Jiawei Liu, Fan Yi, Qikai Cheng, Yong Huang, Wei Lu, Xiaozhong Liu. [abs], 2023.1

3.2 Available Tools

  1. Hello-SimpleAI ChatGPT Detector: An open-source detection project consists of three versions of models to detect text generated with ChatGPT, including QA version, Sinlge-text version and Linguistic version.
  2. GPTZero: A demo to detect writings generated by ChatGPT. The creator has seen that the technology was used by students to cheat on assignments, so he came up with a safeguard.
  3. OpenAI Classifier: A classifier fine-tuned on a dataset of pairs of human-written text and AI-written text on the same topic.
  4. Contentatscale AI Content Detector : A tool that allows users to receive the Human or AI Content score in the text to detect. It provides probability for each sentence.
  5. Writers AI Content Detector: A tool similar to Contentatscale. It requires either the URL of the page or text to calculate the “Human-Generated Content” score.

Statistics of these tools:

Tool Detection Target Language Input Range (# characters)
Hello-SimpleAI ChatGPT Detector ChatGPT en/zh (0,~1500] (512tokens)
GPTZero LLM en (250,♾️)
OpenAI Classifier LLM en (0,♾️)
Contentatscale AI Content Detector AI Content (NLP+SERP) en (0,25,000]
Writers AI Content Detector AI Content en (0, 1,500]

evaluationpapers4chatgpt's People

Contributors

imryanxu avatar shangqingtu avatar sparrowzheyuan18 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.