Giter Club home page Giter Club logo

Comments (6)

deanishe avatar deanishe commented on July 28, 2024

Punctuation is, IMO, an entirely separate concern to diacritic folding. There's no reason to drop the distinction between é and e or ü and u on the floor because there's a non-ASCII apostrophe or an m-dash in the data.

Removing/replacing non-ASCII apostrophes and smart quotes is a matter for the search key generation function, not the query/key matching one (i.e. Workflow.filter()).

If it's relatively straightforward, I have no objection to adding a normalize_punctuation() method to Workflow, but treating n- and m-dashes as equivalent to non-ASCII letters wrt folding is just wrong. That is, however, contingent on non-ASCII being an actual problem.

WRT the location of the replacements dictionary: can't your editor collapse those lines? If not, what kind of a half-arsed editor are you using? 😉

I suppose that the ASCII_REPLACEMENTS dictionary could be moved to a separate module, but it should probably be accompanied either by all the non-ASCII-to-ASCII code or all the other "constants" (e.g. the paths to the icons).

from alfred-workflow.

fractaledmind avatar fractaledmind commented on July 28, 2024

WRT the location of the replacements dictionary: can't your editor collapse those lines? If not, what kind of a half-arsed editor are you using?
It's not good that I didn't know such magic existed until now. Tho it is good that I do now know. That is crazy helpful.

You're right about the distinction. I was folding text to ASCII-only before storing in the other FTS database, so I needed to catch such non-ASCII punctuation marks. The only counter-argument I'd make is for key data likely to be searched. Say, an author's last name has a hyphen. The user searches van-buren. Will that match Van–Buren? If so, then no worries. I fixed what I needed for my particular case. You can alter Alfred-Workflow as you see fit for the general use-case.

from alfred-workflow.

deanishe avatar deanishe commented on July 28, 2024

Ah right. You're using the methods directly.

Do people actually write their names with n-dashes? In any case, I guess smart apostrophes are definitely a thing, and I don't imagine many people use n-/m-dashes in Alfred's query box.

I'm unsure as to when to replace "smart" punctuation. My gut feeling is it should happen all the time.

What do you think? Which punctuation marks need ASCII-fying?

from alfred-workflow.

fractaledmind avatar fractaledmind commented on July 28, 2024

I'd say the punctuation from Unidecode's [x000](https://raw.githubusercontent.com/iki/unidecode/master/unidecode/x000.py) and [x001](https://raw.githubusercontent.com/iki/unidecode/master/unidecode/x001.py) files. That should be sufficient, don't you think?

from alfred-workflow.

deanishe avatar deanishe commented on July 28, 2024

There doesn't seem to be much punctuation in there, mostly just normal letters and whitespace.

I'm thinking more along the lines of n- and m-dashes and "smart" quotes, which would be replaced with their ASCII counterparts, and possible stripping stuff like « and » entirely.

from alfred-workflow.

deanishe avatar deanishe commented on July 28, 2024

Added in v1.10.1.

Nothing (search keys or queries) is changed by default, but there is a Workflow.dumbify_punctuation() method that implements replacing "smart" punctuation with ASCII equivalents.

In v2, I will enable it by default, but probably only on search keys.

from alfred-workflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.