Giter Club home page Giter Club logo

twcs2personachat-1's Introduction

twcs2PersonaChat ๐Ÿฆ2๐Ÿค–

Issues Forks License

This project allows to take the Twitter Customer Support and format it in the Persona Chat format. This is helpful to adapt the model described in this paper into an task oriented version.

Table of content

0 - Download the project

Click here and extract the zip to your preferred dirctory.

1 - Install pipenv

The first step is to install pipenv. Go to the project directory and run: On mac: You can use homebrew:

brew install pipenv

or pip:

pip install pipenv

On Linux:

sudo apt install software-properties-common python-software-properties
sudo add-apt-repository ppa:pypa/ppa
sudo apt update
sudo apt install pipenv

2 - Install project requirements

On the project directory run:

pip install -r requirements.txt

3 - Run the project

To run the project:

python cli.py [module_name] [options]
This project includes 3 modules: **getMetadata**, **preprocess**, and **personify**.

getMetadata

This module allows you to retrieve some metadata about the Twitter Customer Support to use it run:

python cli.py getMetadata

preprocess

This module allows you to preprocess the Twitter Customer Support. Here are the options you can use:

  • --emojis: Boolean, if True, removes all emojis from the dataset (default: True)
  • --emoticons: Boolean, if True, removes all emoticons from the dataset (default: True)
  • --urls: Boolean, if True, tags urls as '(URL)' from the dataset (default: True)
  • --html_tags: Boolean, if True, removes all html tags (default: True)
  • --acronyms: Boolean, if True, converts acronyms to their meaning. E.g.: SMH -> So much Hate (default: True)
  • --spelling: Boolean, if True, spellchecks the dataset (default: False)
  • --usernames: Boolean, if True, tags usernames (default: False)

To run:

python cli.py preprocess [options]

personify

This modules allows you to format the (preprocessed or not) dataset. The options are:

  • --brand: String, represents the name of a brand, only uses the interactions with a specific brand. If none, uses the whole dataset (default: None)
  • --limit: Integer, only uses a limited amount of conversations. If -1 uses the whole dataset (default: -1)

If you have any ideas, just open an issue and tell us what you think!

If you'd like to contribute, please fork the repository and make changes as you'd like. Pull requests are warmly welcome.

โญ Star us on GitHub โ€” it helps!

This project is MIT licensed.

twcs2personachat-1's People

Contributors

gonced8 avatar gonmelo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.