Giter Club home page Giter Club logo

bntransformer's Introduction

bntransformer

Build Status PyPI version

bntransformer build with transformers for different transformer based inference task for Bengali language.

Installation

pip install bntransformer

or

pip install -U bntransformer

Dependency

  • pytorch(1.6+)

Usage

Usage Notes

  • All below task are using default model for Bengali tokenization, question answering, name entity recognition, translation, text generation. You can find default model link here.

  • You can pass your own trained local transformers model or huggingface model hub model. All you need to pass that model while calling the base class.

  • Example: while calling BanglaQA class you can use default model for inference as bnqa = BanglaQA() or you can pass another model like bnqa = BanglaQA("another_model")

  • You can find an example colab notebook under examples

Tokenization

from bntransformer import BanglaTokenizer

bntokenizer = BanglaTokenizer() 
# you can pass custom model path or other bengali huggingface model path
# example: bntokenizer = BanglaTokenizer("bert-base-multilingual-uncased")
# default it takes "sagorsarker/bangla-bert-base"
text = "আমি বাংলায় গান গাই ।"
tokens = bntokenizer.tokenize(text)
print(tokens)
# outputs: ['আমি', 'বাংলা', '##য', 'গান', 'গাই', '।']
encode_ids = bntokenizer.encode(text)
print(encode_ids)
decode_text = bntokenizer.decode(encode_ids)
print(decode_text)

Bangla Question Answering

from bntransformer import BanglaQA

bnqa = BanglaQA()
# you can pass custom QA model path or other bengali huggingface QA model path
# default it takes "sagorsarker/mbert-bengali-tydiqa-qa"
context = "সূর্য সেন ১৮৯৪ সালের ২২ মার্চ চট্টগ্রামের রাউজান থানার নোয়াপাড়ায় অর্থনৈতিক ভাবে অস্বচ্ছল পরিবারে জন্মগ্রহণ করেন। তাঁর পিতার নাম রাজমনি সেন এবং মাতার নাম শশী বালা সেন। রাজমনি সেনের দুই ছেলে আর চার মেয়ে। সূর্য সেন তাঁদের পরিবারের চতুর্থ সন্তান। দুই ছেলের নাম সূর্য ও কমল। চার মেয়ের নাম বরদাসুন্দরী, সাবিত্রী, ভানুমতী ও প্রমিলা। শৈশবে পিতা মাতাকে হারানো সূর্য সেন কাকা গৌরমনি সেনের কাছে মানুষ হয়েছেন। সূর্য সেন ছেলেবেলা থেকেই খুব মনোযোগী ভাল ছাত্র ছিলেন এবং ধর্মভাবাপন্ন গম্ভীর প্রকৃতির ছিলেন।"
question = "মাস্টারদা সূর্যকুমার সেনের বাবার নাম কী ছিল ?"

answers = bnqa.find_answer(context, question)
print(answers)
# output: {'score': 0.8070710301399231, 'start': 131, 'end': 141, 'answer': 'রাজমনি সেন'}

Bangla NER

from bntransformer import BanglaNER

bnner = BanglaNER()
# you can pass custom NER model path or other bengali huggingface NER model path
# default it takes "neuropark/sahajBERT-NER"
sentence = "আমি জাহিদ হাসান এবং আমি ঢাকায় বাস করি ।"
output = bnner.ner_tag(sentence)
print(output)

Bangla Mask Generation

from bntransformer import BanglaMaskGeneration

bnunmasker = BanglaMaskGeneration()
# you can pass custom mask generation model path or other bengali huggingface model path
# default it takes "sagorsarker/bangla-bert-base"
sentence = "আমি জাহিদ হাসান এবং আমি [MASK] বাস করি । "
output = bnunmasker.generate_mask(sentence)
print(output)

Bangla To English Translation

from bntransformer import BanglaTranslation

bntrans = BanglaTranslation()
# you can pass custom translation model path or other bengali huggingface translation model path
# default it takes "Helsinki-NLP/opus-mt-bn-en"
bn_sentence = "আমার নাম জাহিদ, আমি ঢাকায় বাস করি।"
output = bntrans.bn2en(bn_sentence)
print(output)
# output: My name is Zahid, I live in Dhaka.

Bangla Text Generation

from bntransformer import BanglaTextGeneration

bntrans = BanglaTextGeneration()
# you can pass custom text generation model path or other bengali huggingface Bengali text gen model path
# default it takes "flax-community/gpt2-bengali"
input_text = "আমি রতন এবং আমি"
output = bntrans.generate_text(input_text)
print(output)

Default Inference Models

NB: Or you can use custom model local model path or other huggingface model path while calling the base class

bntransformer's People

Contributors

sagorbrur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

bntransformer's Issues

Error building "tokenizers" wheel when installing "bntransformer" via pip in Colab environment

I encountered an error while trying to install the "bntransformer" library in a Colab environment using both commands pip install -U bntransformer and pip install -U bntransformer. The installation process failed with the following error message:

`
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting bntransformer
Using cached bntransformer-2.1.0-py3-none-any.whl (6.2 kB)
Collecting transformers==4.6.1 (from bntransformer)
Using cached transformers-4.6.1-py3-none-any.whl (2.2 MB)
Collecting sentencepiece (from bntransformer)
Using cached sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.6.1->bntransformer) (3.12.0)
Collecting huggingface-hub==0.0.8 (from transformers==4.6.1->bntransformer)
Using cached huggingface_hub-0.0.8-py3-none-any.whl (34 kB)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.6.1->bntransformer) (1.22.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from transformers==4.6.1->bntransformer) (23.1)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.6.1->bntransformer) (2022.10.31)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.6.1->bntransformer) (2.27.1)
Collecting sacremoses (from transformers==4.6.1->bntransformer)
Using cached sacremoses-0.0.53-py3-none-any.whl
Collecting tokenizers<0.11,>=0.10.1 (from transformers==4.6.1->bntransformer)
Using cached tokenizers-0.10.3.tar.gz (212 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.6.1->bntransformer) (4.65.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.6.1->bntransformer) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.6.1->bntransformer) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.6.1->bntransformer) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.6.1->bntransformer) (3.4)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from sacremoses->transformers==4.6.1->bntransformer) (1.16.0)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from sacremoses->transformers==4.6.1->bntransformer) (8.1.3)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from sacremoses->transformers==4.6.1->bntransformer) (1.2.0)
Building wheels for collected packages: tokenizers
error: subprocess-exited-with-error

× Building wheel for tokenizers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for tokenizers (pyproject.toml) ... error
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
`

Steps to reproduce

  • Open a new Colab Notebook.
  • Write pip install -U bntransformer or pip install -U bntransformer

Additional information

  • Python version: 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.