Giter Club home page Giter Club logo

Comments (2)

balhafni avatar balhafni commented on June 9, 2024

Hello, we currently do not support this. But this can be accomplished by using something like the function below:

import re
def remove_repetitions_ar(s, policy=1):
    """Reduces the repeated characters (more than two repeated) 
    from an Arabic string to one or two characters based on the 
    optional specified policy.
    Args:
        s (:obj:`str`): The string to be normalized.
        policy (:obj:`int`, optional):
            The reduction policy. If policy=`1` the repeated characters will
            be reduced to `1` character. If policy=`2` the repeated characters
            will be reduced to `2` characters. Defaults to `1`.
    Returns:
        :obj:`str`: The normalized string.
    """

    _REP_AR_RE = re.compile(r'(.)\1{2,}')

    if policy == 1:
        return _REP_AR_RE.sub(u'\\1', s)
    elif policy == 2:
        return _REP_AR_RE.sub(u'\\1\\1', s)
    else:
        raise ValueError("Policy value should be either 1 or 2!")

remove_repetitions_ar('مرحباااا')
'مرحبا'

Hope this is helpful.

from camel_tools.

manel-hikk avatar manel-hikk commented on June 9, 2024

yes it helps a lot
why I asked because I saw in the docs the module camel_tools.morphology.errors.MorphologyError
So I was thinking may be this module is for errors like the repeating characters.
but unfortenatly the docs don't have enough examples.
so is there any module in camel tools that check grammar or orthographic errors and correct it?

from camel_tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.