-
can't load the entire dataset. This model was trained and tested on only 5000 records. In these 5000 records, all the languages do not have equal no. of entries.
-
there are 22 languages in the dataset. we need to decide if we want to take all of them or some selected ones.
-
check if all languages are predicted correctly or not.
-
more data cleaning needs to be done, like removing punctuations, special symbols (@, #, $, etc.), etc.
-
write a function for 3.
-
trying different algorithms and compare their results to see which suits our case the best. see this: https://towardsdatascience.com/how-to-choose-the-right-machine-learning-algorithm-for-your-application-1e36c32400b9
-
any further improvements/features, etc.
sakinanomi / project Goto Github PK
View Code? Open in Web Editor NEWThis project forked from tanvi355/language-detection-model