Giter Club home page Giter Club logo

yowac's Introduction

YOWAC

Hi there ๐Ÿ‘‹, here is YOWAC

I am Your Own Word AutoComplete

I am Your Own Word AutoComplete

.NET Word Addin coupled with Natural language processing ๐Ÿค— Transformers Network fine-tuned to custom texts.

This repository features PyTorch Code to fine-tune GPT2-based language model to your own texts, some code to help you prepare your own dataset to do so and code snippets, that you can implement in your own Word VB .NET Addin (or even just VBA Macro).

Motivation

It might come in handy to autocomplete sentences with a button shortcut in your writing style. If you have extensive email exchanges or reports as part of your daily work then you might have accumulated enough data to get it done with ML.
That's where the combination of fine-tuning a GPT model and VB .NET comes in.

VB .NET get the Office part done (if you are using MS Office, that is). I intentionnaly kept this part separated since someone might want to adopt it to TeX editor for example.
With ๐Ÿค— Transformers fine-tuning of GPT model happens.

Training and Results

I currently tested it with English (distilgpt2), German (dbmdz/german-gpt2) and French (bigscience/bloom-560m is actually multilingual) models with decent results.

  • The notebooks folder has Jupyter notebooks to prepare the dataset, train the model and do predictions.
  • The dotnet_vba folder has VBA code which is easily adaptable to VB .NET to get a single sentence as a prompt and send it as a JSON to model entrypoint.
  • For obvious reasons, I saved under the data not my personal and professional email exchange, but tiny_shakespeare dataset (from Andrey Karpathy's Github) and Goethe texts dataset compiled from free online libraries.

Trained on Google Colab (takes around 30min on Pro instances and couple of hours on free ones).

Future stuff

  • ๐Ÿ”ญ Iโ€™m currently working on TorchServe Container to deploy the model in a more optimal way.
  • โšก Fun fact: Automatic language detection in data preparation helps a lot to clean up the dataset.

github linkedin

yowac's People

Contributors

garnik-arut avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.