Giter Club home page Giter Club logo

textgenie's Introduction

The generated sequences and the trained models will be uploaded to google-drive here

The dependencies are located in requirements.txt file

The models have been trained on the following three books:-

  1. Repbulic by Plato (republic.txt)
  2. Moby Dick by Herman Melville (moby.txt)
  3. Adam Bede by George Eliot (adam.txt only trained the model on first 89 chapters to reduce training time)

the sequences created are sored in book_sequences.txt

the models are stored as model_book.h5

the tokenizer files are sored as a pickle dump as tokenizer_book.pkl

The notebook train_book.ipynb contains the required code for training the model

train_republic.ipynb contains annotated code for better understnding

predict.ipynb file is used for the final sentence completion part by utilising the stored model

The folder also contains other books (jungle.txt (jungle book) and eliot.txt they were not used to train the model due to their smaller size

The books were downloaded from the project gutenberg website which is a free repository for many such books

(https://www.gutenberg.org/)

This project can be expanded in the fututre for style transer on text where the user can supply a text and choose an author of their choice and the provided text will be rewritten in the style of the chosen author

Limitation:-

  1. While predicting new text currently we have to keep in mind that it only considers the last 50 words of a sequence and hence larger seed text will be not useful
  2. When we tokenize our seed text our tokenizer files conatain only those word and integer mapping that are present in the book so if we supply our tokenizer a word that was not present in the book it will return an error

textgenie's People

Contributors

harsh16dawar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.