kazimdzuber / langauge-identification-in-code-mixed-script Goto Github PK
View Code? Open in Web Editor NEWIn this project, we created a unique corpus, and the first time we used a highly imbalanced corpus to do language identification. we did language identification at sentence level in the code-mixed script by applied different machine learning algorithms such as Support vector classifier, Multinomial Naive Bayes, Random Forest, Decision Tree, and Logistic Regression. We manually tag each code-mixed sentence quantity around 6300 sentences involves three different languages (Gujarati, Hindi, and English). We achieved 92% accuracy which is recorded first-time high accuracy with highly imbalance code-mixed data. Also, have published a full-length survey paper on it.