Giter Club home page Giter Club logo

rajspeaks / deep-learning-approach-to-bengali-word-embedding-using-bengaliword2vec-from-bnlp Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 2.15 MB

Bengali word embedding using BengaliWord2Vec from BNLP. A mini project under the mentorship of Prof. Sandipan Ganguly, HIT-K.

License: GNU General Public License v3.0

Jupyter Notebook 100.00%
bnlp nlp natural-language-processing nlp-machine-learning nlp-parsing nlp-library word2vec word2vec-model word2vec-algorithm word2vec-embeddinngs

deep-learning-approach-to-bengali-word-embedding-using-bengaliword2vec-from-bnlp's Introduction

Bengali Word Embedding using BengaliWord2Vec

This is a mini project on Bengali Word Embedding using BengaliWord2Vec Model from BNLP (Bengali Natural Language Processing) Toolkit under the mentorship of Prof. Sandipan Ganguly, Heritage Institute of Technology, Kolkata-India.

What is BNLP?

BNLP is a natural language processing toolkit for Bengali Language. This tool will help you to tokenize Bengali text, Embedding Bengali words, Bengali POS Tagging, Construct Neural Model for Bengali NLP purposes. Developed by Prof. Sagor Sarker from Bangladesh.

Source Link: https://bnlp.readthedocs.io/en/latest/#word-embedding__

BNLP GitHub : https://github.com/sagorbrur/bnlp__

Installation:

  • pypi package installer(python 3.6, 3.7, 3.8 tested okay)

    pip install bnlp_toolkit

    or Upgrade

    pip install -U bnlp_toolkit

What is Word Embedding?

A word embedding is a learned representation for text where words that have the same meaning have a similar representation. It is this approach to representing words and documents that may be considered one of the key breakthroughs of deep learning on challenging natural language processing problems.

What is Word2Vec?

Word2Vec is a statistical method for efficiently learning a standalone word embedding from a text corpus.

What is CBOW & Skip-Gram?

The CBOW model learns the embedding by predicting the current word based on its context. The continuous skip-gram model learns by predicting the surrounding words given a current word.

(Information Source: https://machinelearningmastery.com/what-are-word-embeddings/)

BengaliWord2Vec:

BengaliWord2Vec is a Library function from BNLP Toolkit. It helps in embedding words to find similar meaning of words and also generating vectors of those words.

Methodology:

  • At first I have imported BengaliWord2Vec function from BNLP.
  • Then I took a pre-trained model bnwiki_word2vec.model.
  • Took a bengali word and got the generate both the vector-shape and vector-values of that bengali word.
  • Repeated the above step by taking another bengali word.
  • In the next step I have taken a bengali word along with the pre-trained model (bnwiki_word2vec.model) and then applied BengaliWord2Vec function. Also, I have limited the range of output to max 10 words of similar meaning in that code. Hence, I got the output of 10 bengali words along with their vector values, carrying approximately similar/relevant/nearest meaning of that bengali word mentioned in the code.
  • Repeated the previous step 9 times more but took different bengali sample words & took different no. of word limits to continue testing & got the output of approximately similar/relevant/nearest bengali words of that sample word mentioned in the code.

Tools:

  1. Jupyter Notebook (You can use Google Colab also)
  2. Language: Python
  3. BNLP; Link: https://bnlp.readthedocs.io/en/latest/#word-embedding

Developer:

LinkedIn Profile: https://www.linkedin.com/in/itsrajdeepdas/

Note:

All bengali words in the output are not similar meaning of thee word I took in code but many of them are close to the meaning.

Thank you

deep-learning-approach-to-bengali-word-embedding-using-bengaliword2vec-from-bnlp's People

Contributors

rajspeaks avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

aliksarkar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.