Giter Club home page Giter Club logo

chatgpt-comparison-detection's People

Contributors

beyondguo avatar izhx avatar minqi824 avatar sufeheisenberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chatgpt-comparison-detection's Issues

Help with Open-Assistance

Hey hey - I'm one of the folks who started the work at LAION on open assistance. Want to speak on how we can work together?

请教一下关于模型evaluate中F1 Score的计算问题

您好,
很抱歉这个issue可能会打扰到项目组成员,但对于此项目上的复现我一直不得要点,得不到与文章相同的结果,还望前辈拨冗解惑。
对于贵组放出的 chatgpt-detector-roberta-chinese 模型的描述,此模型是由mix-filter训练得到的。
我采取的测试方式如下所示

最后对raw-full进行测试的结果:
2024-03-05 19:44:46,902 - testing - INFO - test_doc: {'f1': 0.9976726144297905}

与原论文的表中数据显著不同,所以我想请教一下,是我的测试方式有误吗,如果有误,正确的测试方式应该是什么?

最后,无论如何都感谢贵组的工作。

import argparse
import os
import numpy as np
import sys
import evaluate
import pandas as pd
import torch
import logging
import torch.nn.functional as F
from torch.utils.data import DataLoader
from tqdm import tqdm
from datasets import Dataset, concatenate_datasets
from sklearn.metrics import classification_report, roc_auc_score, confusion_matrix
from transformers import (
        AutoModelForSequenceClassification, 
        AutoTokenizer,
        AutoConfig,
        BertForSequenceClassification
    )

logging.basicConfig(level=logging.DEBUG, 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger('testing')
file_handler = logging.FileHandler('test.log') 
file_handler.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)

sys.path.append('./')

_PARSER = argparse.ArgumentParser('ptm detector')

_PARSER.add_argument('--model_name', type=str, default='/data1/xxxxxx/DeepfakeText-chinese/model/chinese-roberta-wwm-ext', help='ptm model name')
_PARSER.add_argument('--roberta_model',type=str, default='/data1/xxxxxx/DeepfakeText-chinese/model/chatgpt-detector-roberta-chinese', help='roberta_model')
_PARSER.add_argument('--test_doc', type=str, default='../../data/zh_doc_test.csv', help='input doc test file path')
_PARSER.add_argument('--test_sent', type=str, default='../../data/shuffled_zh_sent_test.csv', help='input test sent file path')
_PARSER.add_argument('--batch_size', type=int, default=16, help='batch size')
_PARSER.add_argument('--epochs', type=int, default=2, help='epochs')
_PARSER.add_argument('--num_labels', type=int, default=2, help='num_labels')
_PARSER.add_argument('--cuda', type=str, default='0', help='gpu ids, like: 1,2,3')
_PARSER.add_argument('--seed', type=int, default=42, help='random seed.')
_PARSER.add_argument('--max_length', type=int, default=365, help='max_length')
_PARSER.add_argument('--stacking', type=bool, default=True, help='stacking')

_ARGS = _PARSER.parse_args()

if len(_ARGS.cuda) > 1:
    os.environ['TOKENIZERS_PARALLELISM'] = 'false'
    os.environ['TORCH_DISTRIBUTED_DEBUG'] = 'DETAIL'

os.environ["OMP_NUM_THREADS"] = '8'
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"  # if cuda >= 10.2
os.environ['CUDA_VISIBLE_DEVICES'] = _ARGS.cuda

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


def create_dataloader(args: argparse.Namespace):
    """
    dataloaders分别是train_doc, test_doc, test_sent
    """
    datasets = []
    files = [args.test_doc, args.test_sent]
    for file in files:
        df = pd.read_csv(file)
        dataset = Dataset.from_pandas(df)
        datasets.append(dataset)
    tokenizer = AutoTokenizer.from_pretrained(args.model_name, trust_remote_code=True)
    def tokenize_fn(example):
        return tokenizer(example['answer'], max_length=args.max_length, padding='max_length', truncation=True)
    datasets = [datasets[0], datasets[1]]
    names = ['id', 'question', 'answer', 'source']
    tokenized_datasets = []
    for dataset in datasets:
        tokenized = dataset.map(
                        tokenize_fn,
                        batched=True,
                        remove_columns=names)
        tokenized_datasets.append(tokenized)
    def collate_fn(examples):
        return tokenizer.pad(examples,return_tensors='pt')
    
    dataloaders = []
    for dataset in tokenized_datasets:
        dataloader = DataLoader(dataset, shuffle=False, collate_fn=collate_fn, batch_size=args.batch_size)
        dataloaders.append(dataloader)
    return dataloaders

def eval(args, dataloaders):
    if args.stacking:
        # roberta_cnn_model = torch.load(args.roberta_cnn_model).to(device)
        # roberta_cnn_model.eval()
        # print("roberta_cnn_model loaded")
        
        # roberta_model = torch.load(args.roberta_model).to(device)
        # roberta_model.eval()

        config = AutoConfig.from_pretrained(
            args.roberta_model,
            num_labels=2,
        )
        roberta_model = BertForSequenceClassification.from_pretrained(
            args.roberta_model,
            config=config,
        ).to(device)
        
        # print(roberta_model.base_model)
        # exit()
        # for param in roberta_model.base_model.parameters():
        #     param.requires_grad = False
        print("roberta_rnn_model loaded")

        # roberta_rcnn_model = torch.load(args.roberta_rcnn_model).to(device)
        # roberta_rcnn_model.eval()
        # print("roberta_rcnn_model loaded")

        # roberta_rcnn_model = torch.load(args.roberta_rcnn_model).to(device)
        # roberta_rcnn_model.eval()
        # print("roberta_rcnn_model loaded")

        eval_name_list = ['test_doc', 'test_sent']
        for item, eval_name in enumerate(eval_name_list, 0):
            metric = evaluate.load("/data1/xxxxxx/DeepfakeText-chinese/dataset/metrics/f1")
            for step, batch in enumerate(tqdm(dataloaders[item], desc='Evaling', colour="green")):
                batch.to(device)
                with torch.no_grad():
                    labels = batch.pop('label')
                    outputs = roberta_model(**batch)['logits']
                predictions = outputs.argmax(dim=-1)
                predictions, references = predictions, labels
                metric.add_batch(
                    predictions=predictions,
                    references=references,
                )
            eval_metric = metric.compute()
            logger.info(f"{eval_name}: {eval_metric}")

daataLoader = create_dataloader(_ARGS)
eval(_ARGS,daataLoader)

模型代码

您好,想问一下您检测的模型代码(论文中的第二个模型基于robert)是在哪呢

The ling detector on Huggingface down? and other issues

Dear Hello-SimpleAI,

Thank you for your detectors, which have been valuable resources for my university admin duties. I've posted a similar discussion as this one on Huggingface, hoping one of these will evoke a reply.

  1. I've been making use of all three metrics (one from the single text interface and two from the linguistic interface). Is the 'ling' detector on Huggingface completely offline now? Is it anticipated to be running again at some point in the future?
    • I do see that the 'ling' detector is operational on modelscope and will switch to using it there.
  2. I haven't yet pored over the available data/code carefully. Is the material available enough for me to run your detectors locally (on my own computer)?
  3. The 'ling' detector on modelscope reports "Error" on the following text: "Imagine yourself strolling through the vibrant streets of Mexico City, where a lively celebration is in full swing. The atmosphere is charged with excitement as colors, music, and the exuberant spirit of the Mexican people fill the air. Amidst the bustling crowds, your attention is captivated by an array of national symbols proudly on display. The iconic Mexican flag unfurls, showcasing its bold tricolor of green, white, and red, which dances in the breeze. At the heart of this visual spectacle stands the unmistakable emblem of the Mexican eagle, with its fierce countenance and outstretched wings symbolizing a deep-rooted sense of national pride and identity. Now, let's transport ourselves to the enchanting landscapes and historic cities of the Netherlands, a country renowned for its rich cultural heritage."
    • Removing the phrase ", a country renowned for its rich cultural heritage" from the last sentence produces no Error. This also happens with the single-text detector on modelscope.
  4. The versions of the single text detector on Huggingface and Modelscope seem to be different (with the latter giving same numbers as your roberta detector). I, for one, would prefer the single text detector to remain different from your roberta detector (as the variety of detectors helps in my admin tasks).

Thank you!

Best,
Jay

无法复现论文中的结果 Unable to reproduce the results in the paper

我尝试复现中论文中的结果,现在我是直接导入hugging face上的chatgpt-detector-roberta作为model和tokenizer,根据页面上的描述这是由mixed数据集训练的,在论文中对raw-full的F1 score应该为99.44,但我没办法得到这个数据,我使用的数据集是在hc3中readme中的谷歌网盘下载的,以下是我得到的结果
{'0': {'precision': 0.9994103425909546, 'recall': 0.9951852504256943, 'f1-score': 0.9972933215651661, 'support': 17031.0}, '1': {'precision': 0.9898640296662546, 'recall': 0.9987528061860813, 'f1-score': 0.994288552272163, 'support': 8018.0}, 'accuracy': 0.9963271986905665, 'macro avg': {'precision': 0.9946371861286046, 'recall': 0.9969690283058878, 'f1-score': 0.9957909369186646, 'support': 25049.0}, 'weighted avg': {'precision': 0.9963546382901745, 'recall': 0.9963271986905665, 'f1-score': 0.9963315170942771, 'support': 25049.0}}

I am trying to reproduce the results of a paper. Currently, I am using the 'chatgpt-detector-roberta' model and tokenizer directly imported from Hugging Face. According to the information on their website, this model was trained on a mixed dataset. In the paper, the F1 score for 'raw-full' should be 99.44. However, I am unable to achieve this result, The dataset used was downloaded from the Google Drive link provided in the README of hc3, and here are the results I obtained.
{'0': {'precision': 0.9994103425909546, 'recall': 0.9951852504256943, 'f1-score': 0.9972933215651661, 'support': 17031.0}, '1': {'precision': 0.9898640296662546, 'recall': 0.9987528061860813, 'f1-score': 0.994288552272163, 'support': 8018.0}, 'accuracy': 0.9963271986905665, 'macro avg': {'precision': 0.9946371861286046, 'recall': 0.9969690283058878, 'f1-score': 0.9957909369186646, 'support': 25049.0}, 'weighted avg': {'precision': 0.9963546382901745, 'recall': 0.9963271986905665, 'f1-score': 0.9963315170942771, 'support': 25049.0}}

Do we have data splits?

Hi dear developers,

I am wondering whether data splits (e.g. train/val/test) has been released; I saw a issue 3 weeks ago with an official reply saying "We will release the train-test split later." However, as I inspected the data, it seems they have not been splited so far. Please remind me if the official data splits are released. Thanks!

How to get CSV?

Hi, thank you for your nice job! But I have a question, I can get the .json data set, but how can I get the .csv file needed to train the model? Or what should the .csv data format be?

Have detection model baselines been open-sourced?

Hi developers,

I'm kind of wondering whether detection model baseline codes & pretrained models (e.g. RoBERTa & GLTA) have been released; I failed find them in the repo or on the project main page... Please remind me if they have been open-sourced. Thank you very much!

Algorithm is too weak, almost ineffective, please improve (details)

Sorry, but this article proves that this tool is simply not valid: https://www.zhihu.com/question/578268304/answer/2843077198 . Please let all the developers in the group read it carefully and continue to improve your algorithms.

不好意思,这篇文章证明了这个工具是根本无效的:https://www.zhihu.com/question/578268304/answer/2843077198 。请让组内的开发者都仔细阅读,继续改进你们的算法。

如何自行部署?

因为我的访问可能较为频繁且较多(每周几千次),所以在考虑使用cloudflare workers或者cloudflare pages等服务实现一个自己的,请问该如何自行部署?

What is the license?

hi, may I know what is the license of the released dataset? We may use it for commercial production so need to know what the license is exactly. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.