Giter Club home page Giter Club logo

ptt-gossiping-chatbot's Introduction

PTT-Gossiping-Chatbot

A Chat bot that use training data from PTT Gossiping

課堂上的專題,自己做來玩的小玩具,有緣人想看我醜醜的code就拿去用吧!

Dataset

2021 5月份 PTT-gossiping版上的問卦文章

Data Preprocessing

  • 有效文章: 有至少1則推文、回應、噓文且標題為問卦的文章
  • 推文: 最多每篇文章只取前100推文
  • 將文章及其推文整理成QA型態 -> 文章標題為問題,取最佳推文為回答
    • 取最佳推文方法
      1. 統計該文章下每一則推文出現的詞
      2. 將所有出現過的詞賦予權重(出現次數即該詞彙的權重),出現次數越多的權重越高
      3. 計算每一則推文出現的詞彙的權重和
      4. 取權重和最高的推文,權重一樣時取越早推文的
    • 最終處理完之後總計有48560筆問答
  • 斷詞: CKIP
  • Tokenize
    • Character based: 總共有4764個字(Q+A)
    • Word based: 總共有31565個詞(Q+A)
    • Special token: unk sos end pad
  • Padding
    • Word-based
      • Question pad 到 30 個詞(最長問題有30個詞)
      • Answer pad 到 60 個詞(最長回答有59個詞)
    • Character-based
      • Question pad 到 42 個詞(最長問題有42個詞)
      • Answer pad 到 70 個詞(最長回答有70個詞)

Architecture

  • Word-based Chatbot_arch
  • Character-based Chatbot_arch_char

Implementation result

  • Word-based

Word_based ex1

Word_based ex2

  • Character-based

not HTML

train_char_example2

Future work

  • 改善取最佳推文的方法
  • 資加有效資料量(現在只有用1個月)
  • parameter fine-tune
  • Architecture: BERT

ptt-gossiping-chatbot's People

Contributors

hsuanchia avatar

Stargazers

doudou avatar Ting-En Hsu avatar Jiazheng Shen avatar

Watchers

Jiazheng Shen avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.