Giter Club home page Giter Club logo

Comments (7)

henrifroese avatar henrifroese commented on June 1, 2024 1

I believe the issue is that the proposed changes are on the Fix-Remove-Diacritics branch of the repository. With your command, pip uses the master branch where the changes are not yet implemented.
You can try using !pip install git+https://github.com/SummerOfCode-NoHate/texthero.git@Fix_Remove_Diacritics, I think that should work. If not, you could just copy the functions I pasted in #72 and try them out directly.

from texthero.

henrifroese avatar henrifroese commented on June 1, 2024

Hi, I just had a look and opened a PR to fix this at #72 . I pasted the new functions there, would be great if you could try them out and comment there whether everything works for you. Maybe I missed something regarding the Urdu language.

from texthero.

cmhashim avatar cmhashim commented on June 1, 2024

You can correct me if am doing wrong, I have never engaged into Github.
Installed your version,
!pip install git+https://github.com/SummerOfCode-NoHate/texthero.git
and tested with the same code

import pandas as pd 
import texthero as hero
s = pd.Series("Montréal, über, noël, 889, اِس, اُس")
s1 = hero.remove_diacritics(s)
s1

gives the following output

0    Montreal, uber, noel, 889, is, us
dtype: object

from texthero.

jbesomi avatar jbesomi commented on June 1, 2024

Hi Hashim,

thank you for your contribution!

@henrifroese and @cmhashim, probably the way we should design multilingual support for Texthero is to have:

from texthero.ur import hero
hero.remove_diacritics(...)

Where this remove_diacritics is specialized in dealing with Urdu text.

What's your opinion? That way we can keep the code in each function simple, as well as develop each function for that specific language.

from texthero.

henrifroese avatar henrifroese commented on June 1, 2024

I think that if functions for multilingual support are added (e.g. functions to handle stuff regarding arabic script specifically) they should get separate modules and that would make sense. However, I think that this issue/fix is more generally improving the remove_diacritics by preventing transliteration, so it can now handle everything from before + urdu / arabic / ... , which is why I wouldn't put it in a separate module.

from texthero.

jbesomi avatar jbesomi commented on June 1, 2024

That makes complete sense! 👍

from texthero.

cmhashim avatar cmhashim commented on June 1, 2024
from texthero.ur import hero
hero.remove_diacritics(...)

Where this remove_diacritics is specialized in dealing with Urdu text.

I think its time to do this. It was my mistake i gave very simple example of Urdu text with diacritics, but it much more complex to handle diacritics in Urdu/Arabic. Some diacritics are part of Urdu words, and it must be written, and some can be excluded. Hence, can we have a optional argument, to exclude/include a list of diacritics to retain/remove it.

Some Examples:

retain_diacritics_eg_text = "فوراً, حتیٰ, آزاد, ہوئی"

from texthero.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.