Vietnamese input method extension (IME) for Firefox, Thunderbird, SeaMonkey, Komodo, etc. — bộ gõ tiếng Việt dành cho Firefox, Thunderbird, SeaMonkey, Komodo, …
There’s already an Accentuate.us extension that automatically adds diacritics to text as you type. But users may find that approach too aggressive or (depending on writing style and content) not quite accurate enough. AVIM could optionally integrate with Accentuate.us to provide suggestions for diacritics to insert, similar to what Google Input Tools does for Vietnamese.
Optionally remember on/off state per-site, using FUEL or the Content Preferences service. Maybe limit this feature to bookmarks (FUEL), since the UI to set the preferences would be easier to figure out. Perhaps we could model the UI after Firebug’s panel enabling preferences.
I don’t think it’s as important to remember input methods per site, because most users use only one input method regularly, while they might type in another language and have no use for AVIM on certain sites.
Another possibility would be to automatically turn AVIM on and off based on a page’s HTML lang attribute. That would be consistent with Firefox’s spell checking feature.
The stylesheet uses the :-moz-any() pseudoclass extensively, including for the rule that sets text-decoration on the panel and toolbar button labels. Firefox 2–3 don’t support this pseudoclass, so the cross-out effect doesn’t work, making it difficult to tell when AVIM is turned on.
The original, custom support for Scintilla in Komodo IDE/Edit was laid alongside plain text support: every call to modify the DOM was special-cased. But then Scintilla support was migrated to the TextControlProxy model, which operates on a line-by-line basis. Unfortunately, that makes undo/redo really annoying, especially when performing multicursor edits.
TextControlProxy operates on a line-by-line basis because most of its subclasses don’t have fine-grained access to the text control. But we have superb access to Scintilla, so we should do better.
iCloud doesn’t officially support Firefox, but it seems to work fairly well regardless. It provides more hooks than Google Docs, so we probably won’t have to resort to the fragile copy-paste method.
The browser’s autocompletion feature fails to update whenever AVIM handles a keystroke by modifying the text, as opposed to letting the keydown event through. AVIM should be sending an input event to tell the browser to autocomplete, but maybe it isn’t getting sent anymore.
If Silverlight is set to “Ask to activate” in about:addons, but the user unblocks a particular Silverlight object after page load, AVIM fails to recognize it. AVIM is only looking for Silverlight objects on page load. There must be a way to listen for plugins getting unblocked.
Update: This bug has been repurposed for Firefox Electrolysis support. Fennec Electrolysis support is now covered by #86.
AVIM supported the very first versions of Fennec (now Firefox Mobile), but then Fennec 4.0 adopted an IPC architecture called Electrolysis that severely restricts access to page content from chrome. Mozilla also tried to move desktop Firefox in that direction at one point.
Although such architectures are generally seen as optimal for extensions like AVIM that primarily interface with page DOM, AVIM relies on anonymous DOM nodes, internal command handlers, and various other features that would never be available to the page context. So simply shoving AVIM into a page script would break most of AVIM’s unique feature set and expose it to conflicts with in-page IMEs like avim.js.
I’m not sure what the solution is, but we’ve continually gotten requests to bring back Firefox Mobile support, because unfortunately people found out about it at about the time Electrolysis landed.
AVIM is unable to add diacritics in Komodo IDE and Komodo Edit’s incremental search bar. It’s possible that the application has an event handler for incrementally searching that happens to cancel further event propagation.
This task primarily consists of converting the complex web of overlays into functions that programmatically build AVIM’s UI. We’re partway there with the new toolbar.
Upon installation, bootstrap.js would manually build the UI in every existing window and register for new windows using FUEL/STEEL.
Upon uninstallation, it would delete window.avim from every existing window, remove UI, and remove the new window listener. The XPCOM component probably doesn’t need any changes.
AVIM used to work in Zoho Docs except for Zoho Show. It doesn’t work at all in Zoho Docs anymore.
Zoho Show uses #texteditable[contenteditable] but does something tricky behind the scenes. ShapeEvent.text and ShapeEditor.text.editor look like good starting points for a proxy.
AVIM creates tons of sandboxes every time a key is pressed. As far as I know, nothing hangs onto these sandboxes, but it might be a good idea to more thoroughly clean up by nuking them all at the end of the key event handlers.
Localization has always been done in BabelZilla. Its benefits include tight integration with extension development workflows and a community of translators who specialize in the Mozilla ecosystem. However, its interface is quite painful to use, and publishing localizations is a manual affair.
We should move localization to Transifex, which has good GitHub integration and a top-notch interface. It supports Mozilla DTD and .properties files. We can also convert amo.dtd into a Markdown file for easier translation. My only concerns are that most translators there have little to no familiarity with Mozilla-based applications, and that some rely too heavily on machine translation.
Neither AVIM’s Vietnamese Input menu nor its status bar panel function correctly in Songbird 1.8, though its keyboard shortcuts and preference pane continue to work fine.
We should remove the dedicated VIQR* input method and let you choose between + and * and between ' and / for the VIQR input method. Per–input method preferences would also allow us to support an alternate version of Telex (entirely different dead keys) that’s in use in a few places.
Firefox introduced event-based extension hooks to the find bar so that pdf.js can search PDFs. It would be really neat if AVIM could customize in-page find to ignore diacritics until diacritics are added to the search terms. The find engine would probably involve querying for text nodes that match a certain regular expression.
Google Docs was supported off and on around 2011 with a very fragile solution that groped around to determine the context and mimicked copy-paste to modify the text. It broke because Google kept changing the DOM. Meanwhile, they added Virtual Keyboard to the site, so users stopped pleading for Google Docs support in AVIM. Nonetheless, users expect AVIM to work everywhere within the browser, and Google Docs should be no exception. In principle, the implementation could be made to work again; we just need to figure out where things have moved.
AdBlock Plus includes basic preference controls in its about:addons page instead of opening a separate dialog. There is an API for doing so. It’s a terrible UI – checkboxes to the right of their labels? – but arguably more accessible than the separate dialog box. It just feels more integrated.
With the javascript.options.strict preference set, Firefox currently warns sometimes when typing using AVIM. We need to get the bottom of these warnings and make use of some modern JavaScript idioms.
Convert var to let to prevent hoisting and accidental reuse of variables
The uppercase eth (Ð) and uppercase retroflex D (Ɖ) both look identical to the Vietnamese Đ. The eth used to be prevalent on Vietnamese-language websites, and it still crops up once in a while due to copy-paste. AVIM should recognize and normalize both characters to Đ.
The toolbar button icon is just a quick and dirty placeholder that I whipped up in Seashore. It’s off-center and doesn’t render well on HiDPI screens. Although the built-in Australis toolbar icons are PNGs, I think it’s possible to do a good approximation of the Character Encoding icon, substituting “æ” for “Đ”, in SVG. There may need to be a different version for each platform, but the resulting size will probably be smaller than the current PNG.
WebODF uses a custom <canvas>-based editor that AVIM doesn’t know how to hook into. We should make it possible to input Vietnamese into the ODT editing demo.
AVIM hooks into the IME and DiMENSION extension to turn input fields red when AVIM is on. There are several other extensions that serve the same purpose, including IMEStatus. AVIM should hook into those extensions too.
As Phan Tùng Quân notes, Ctrl+Alt+V is a rather inconvenient shortcut for toggling AVIM, something that happens very often. (On the Mac, ⌥⌘V is a bit more comfortable.) AVIM used Ctrl+Shift+V back in 2008 (from version 20080224.59 to 20080224.87), but users complained that it conflicted with AdBlock Plus, leaving them with no shortcut for AVIM at all. I don’t think Mozilla would be very happy if AVIM forcefully remapped AdBlock Plus’s shortcut, so we should find something else that is both easy to reach and logical.
Mudim uses Alt+/ (not available on the Mac)
My AVIM uses F12 (often conflicts with Firefox or OS functions)
AVIM for Chrome uses double Ctrl (double ⌃ on the Mac)
Google Input Tools uses Ctrl+G (⌃G on the Mac)
VietInput has no keyboard shortcut
AVIM for Chrome’s double Ctrl sounds pretty good. (We can keep Ctrl+Alt+V for those who are used to it.) We could possibly improve on this approach by playing a subtle sound cue, like Sticky Keys on Windows and OS X.
In rich text editors, tone marks are lost when shifted to the last character in the word. For example, xo'a becomes “xoa”, instead of “xoá”, when the Old Accents option is disabled.
AVIM registers too many undo levels (about one per letter). Undo transactions were coalesced in 20080224.139, but the undo levels came back in 20080728.325 with the custom SpliceTxn transaction object. To get coalescing working again, SpliceTxn needs to implement the merge() method on Components.interfaces.nsITransaction.
AVIM should only consider the nearest valid syllable when adding diacritics. That would make it much easier to type multisyllabic loan words like “kilômét” (kilo^me't) as well as Vietnam2000Vietnamese2020.
Perhaps it needs to be a preference (that could replace the existing spelling enforcement preference), because spelling enforcement would be severely weakened. Should adding + to “chỗo” result in “chỗơ” or “chỡo” or “chõo+”? We need to think about these issues more deeply.
AVIM’s existing site is a pain to maintain and localize. I experimented with building a new site atop Movable Type; it looked pretty but was similarly hard to maintain. Now that AVIM has moved to GitHub, it’s only natural that we adopt GitHub Pages. Hopefully we can still get the installation workflow right.
I don’t know if the “Test Drive” page can be hosted on GitHub Pages, especially with the embedded Ace editor and Silverlight applet. Perhaps it needs to remain a static HTML page.
When editing a bookmark’s tags, Firefox automatically autofills tags and highlights those suggestions. Because AVIM refuses to add diacritics when text is selected, it’s very difficult to type diacritics in the tag field.
In an Eclipse Orion editor, such as the one here, typing chu+~ in VIQR gets you “chữ”, but as soon as you press any key or click anywhere, the word reverts to “chu”. Effectively, Orion support is just an illusion.
AVIM ignores the URL bar and any <input type="url"> fields when urlbar is in the blacklist. But the blacklist is intended for matching element IDs, not element types, and it’s a very advanced preference. Instead, there should be a dedicated preference for ignoring URL fields, like the password preference. When migrating users to the new preference format, urlbar should be removed from the blacklist and the new preference should be set.
For Australis (Firefox 29+), AVIM hides the Add-on Bar panel unconditionally, because the “Add-on Bar” sits right next to AVIM’s toolbar button. We should hide the preference, too, or show it as unchecked and disabled.
AVIM adds a toolbar button to Firefox’s toolbar palette but not Thunderbird’s. Thunderbird’s status bar is shown by default, so it isn’t such a big deal, but some users may prefer to have AVIM’s controls in the Composition Toolbar.
AVIM loads tons and tons of code into every single window. It’s a huge waste and possibly a big performance issue. Mozilla has suggested that we shove everything into a JavaScript module (.jsm), but that would break support for older browsers like Firefox 2.x. (Do we care?) Alternatively, we could move everything into an XPCOM service, which would work in every application AVIM supports.
In an effort to tame the original AVIM input engine code long ago, I replaced many complex, multiply-nested conditionals with opportunistic use of early returns. This approach is more useful in some parts of the code, like ckspell(), than in others. By a back-of-the-napkin estimate, converting avim.js to conform to structured programming style would save almost 2 kB. I wonder if we could refactor some of this code to eliminate the large branches that incentivize early returns.
The Mudim monitor stopped working in Firefox 4. At the time, I didn’t bother fixing it because Mudim wasn’t available for Firefox 4, but then Mozilla started automatically bumping up its maximum version. Long story short, we need the Mudim monitor again. It should appear at the top of about:addons, like the “restart to install” banner.
Debug builds of AVIM have a built-in test harness that automatically types and verifies all the words in an input file (such as the FVDP corpus) in various configurations. The test harness made for an interesting demonstration years ago, but these days it makes much more sense to run unit tests on the command line, without Firefox running.
Before we can get conventional unit tests up and running, we should factor out the core input method engine #20 to remove the dependency on the DOM. The unit tests should be run via Spidermonkey or, failing that, some other JavaScript command line tool like node.
Vietnamese text in PDFs is usually typeset in non-Unicode fonts that use VNI, VPS, ABC, or TCVN3 layouts. PDF.js renders this text fine, but the underlying representation is a mangled mess. Because AVIM specializes in Vietnamese input tools, it’s uniquely suited to detecting legacy-encoded Vietnamese text and converting it on the fly when finding or copying inside a PDF.
PDF.js’ text layer includes a <div> for each run of text; each <div> has a data-font-name attribute that identifies the font used for that run. There must be some way to map that identifier to the original font name, which we can then use to guess an encoding. VNI-encoded fonts always begin with “VNI-”, for instance.