mozilla-extensions / firefox-voice Goto Github PK
View Code? Open in Web Editor NEWFirefox Voice is an experiment in a voice-controlled web user agent
License: Mozilla Public License 2.0
Firefox Voice is an experiment in a voice-controlled web user agent
License: Mozilla Public License 2.0
Seems the max height for a popup is 600px, so we need to work within those constraints even when a card has lots of text, an image, etc. CC @awallin
Right now it opens the mic and never closes
The popup should call window.close()
after giving feedback to the user for a reasonable amount of time
We need to propose a schema and submit it to
We have some work in this Google doc
We need to create something like the work in this repository
We could support intents based on text summaries of an article:
Presumably we'd just use some text summarization service
We could support an intent that lets you move through search results:
To implement this we will have to detect and save information about any searches in a tab, then detect and save the list of search results, and then match the current page against that list to determine what would come next. The implementation is fairly involved.
We should decide on another chime sound for when you start audio (replacing https://jcambre.github.io/vf/mic_open_chime.ogg)
Right now, the text input field is overlapping with the header. Also need to fix the submit button so it shows proper behavior on hover
We'll need to do some substitutions in manifest.json, so we should generate it. While I've used mustache in the past, I think ejs is even simpler.
The template should be rerendered everytime npm start
is run.
We might want to support copying some text or pieces of a page:
What we copy can be HTML or text, but unfortunately we can't have a smart choice of which based on paste (we can try to make the text paste work OK when we copy HTML, but it's limited).
We use bodymovin for animations, but we want to use lottie instead. They both consume the same animations.
We can pop a tab into its own window
Implementation is trivial.
We could add an intent to pin tabs:
We could potentially support other tab movement intents:
The action implementation is trivial.
Right now the read
intent handler tries to enter reader mode and start narration, but we've had a problem actually getting it to start. Maybe something in https://github.com/mozilla/firefox-narrate-experiment would help.
In #52 the read intent didn't get copied over (or maybe it's empty?). Also we need to make it work.
When trying to use the keyboard shortcut (Command-.
), getUserMedia never returns in the popup. This is similar to the behavior when the extension doesn't have media permission.
I like to use milestones for prioritization and triage, with issues with no milestone as untriaged, and a milestone for the next release, one for backlog, and maybe one extra for Stretch or something like that (especially for code-based issues that don't effect the product experience, but I might want to do anyway).
Timer intents would support:
We could implement a timer natively, or attempt some integration. Technically Google supports timers via search, but we'd have to leave the page open, and the interface isn't particularly nice.
This file isn't used any longer.
The find intent doesn't seem to work right. It always falls back to a search for me, and I have a hard time constructing the right words to trigger the regex.
We'd like other extensions to be able to extend the capabilities of this project.
An open issue: how do we extend the intent parser given these extensions?
Two options for extensions that could support Firefox Voice to demonstrate how this works:
(These are good options because I developed them and can make the changes.)
We should make an intents/
directory in the extension, and each intent should go in there. We should use one directory per "category" of intent (e.g., there is a play music and pause music, but they would both be in a music/ folder).
For now I think it can be as simple as, say, intents/find/find.js
(I find a kajillion index.js
files a bit hard to handle, so I'd rather clone the directory name as the main file).
We should fix that. And figure out what it means along the way.
While the other files in vendor/
are copied as part of npm install
, I wasn't able to figure out where webrtc_vad.(js,wasm)
came from (or when I found some files they didn't match what we had).
Ideally we would minimize the JS file as we copy it in.
Simply:
Ideally this would display in the popup (not as a new tab). A google search result usually produces a good result. DuckDuckGo cards don't seem to work well here.
We want a circleci task that builds and maybe lightly tests the project (all we have currently are eslint-style tests).
This might even be as simple as a script, but I feel like the surface area is somewhat unclear without some documentation or a process.
Technically Selenium testing should be possible, but ugh.
We could support:
To actually do this we'd have to create our own record of active tabs, as the APIs don't reflect tab history very well. We might want to consider how long a user has to dwell on a tab for us to treat it as "active".
We should accept some messages in the background process that will be used to assemble and then submit the telemetry ping.
We should keep a pending payload, and allow intents or other components to add partial data. Then some final message/event will send the ping, and the payload will be reset.
The code in background.js all involves starting the experience, but that happens automatically now with default_popup in manifest.json.
Right now the code in content.js, specifically around stm_start
does direct UI manipulation. Instead it should fire off some kind of events, and some other code will hook those up to UI changes (per #34)
We want to deploy builds of the add-on (via #2) to some server. We need to be able to upload to this server from CircleCI. The URL isn't too important (it won't host any site, just the xpi and an update.xml file).
Some S3 location perhaps? Circle needs to be able to do the uploading.
We want to add intents to play music:
To do this we need to create a list of music services we want to support, we need to detect what music service the user already uses, we may want to allow setting the music service, and then we have to create music-service-specific code to interact with the individual players.
Right now the text input seems to be a <span>
. This is extra work and has accessibility problems. We should just use an input. All the styles can still be overridden so it can look like whatever (though it takes somewhat more work).
We aren't sure yet if we have to use a remote-hosted intent parser, or if we might be able to construct a local wasm-based parser. This will be an ongoing experiment.
At least in testing (npm start
) I'm frequently seeing a problem where the microphone isn't acquired in the 2 second time. I haven't seen if it's an indefinite problem or not. Just trying again seems to fix it.
One possible hacky fix: if it's an issue of warming up the mic and/or permission, we could open the onboarding tab on startup, and close it after the mic is acquired. That would be OK for onboarding generally.
I'd like to move log.js from screenshots or maybe personal-history-archive
Right now we setup ports in the add-on, but I don't think we need to, we can just use browser.runtime.sendMessage/onMessage
, which I think will be slightly easier to manage.
I added a logging system, but the log messages are very ad hoc. I think it might be useful to have a general log viewer, that shows us both incidental information that's logged, and some essential information like text input, parsed intents, error messages, etc.
This doesn't have any product value, but I think I may find it useful if only for my own work.
In this model you would first indicate where you are taking notes, then add things:
This has some relation to #77 and copy intents, except we'd immediately put the text into a specific destination. Integrating with different note-taking tools would take some effort (especially if we want, for instance, to be cursor-position-neutral), but it's not incredibly hard.
We need some tests we can run against our different intent parsing approaches (regex, Snips, remote, etc):
It wouldn't be a huge amount of work to add some simple screenshot-related intents:
The first two implicitly take a screenshot of the viewport.
This would not use Firefox Screenshots, but would simply make the screenshot (which isn't very hard). Taking a screenshot of just a portion of the page would be out of scope.
Some ways to ask for weather:
In almost all cases Google returns an appropriate weather card for its search. Can we simply detect these and display that card in the popup? Google does not display things like 10 day forecast for Keene particularly well.
Right now we have a bunch case statements around intents. Instead intents should register themselves. Ideally this would include any regexes, sample statements, and the handlers.
Any Google query should have Safe Search on.
We should have a code glossary of the terms we are using in the codebase. Something simple and short in docs/
In several products we use Sentry to collect unexpected exceptions (i.e., get field reports of bugs in our product). Maybe we should do that for this project?
Getting access to Sentry is pretty easy, but we have to add some collection to the extension, especially to collect errors that come from content scripts and some more unusual locations.
We should have a popup/ui.js
file, which handles the UI, but does not have non-UI internal logic. This includes moving some HTML into popup.html
.
We might want a bunch of standard ones (e.g., a few of these – but not all!) We should decide how we want to do discussion, UX, etc.
We can translate both pages and specific text:
Both could use translate.google.com. Translating a specific word could happen in the popup.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.