w3c / mediasession Goto Github PK
View Code? Open in Web Editor NEWMedia Session API
Home Page: https://w3c.github.io/mediasession/
License: Other
Media Session API
Home Page: https://w3c.github.io/mediasession/
License: Other
It seems highly insufficient UI-wise to not have preloaded the image and a bit of the music file. Since you basically cannot release a player that does not have that, we should make sure that the examples cover it I think, as it might illustrate some API holes as well. (E.g. we might need to have more state on when artwork is loaded and such.)
As far as I can tell, there are at most four MediaSession objects in existence, one per kind that appears in the page. Media objects get assigned the appropriate one to their .session
property automagically based on their .kind
property.
This seems weird. You're really assigning media objects to mutually-exclusive sets; the set they belong to determines how they behave wrt media focus and events. Why not reify this pattern directly? Add a global somewhere (window.mediaSessions
?) with four set-likes hanging off of it statically with appropriate names, which you can add media objects to. To maintain mutual-exclusivity, I guess the add() method on the set-likes needs to check for the presence of its argument in the other sets, and remove it automatically. Then you can pretend to have a MutationObserver looking for kind=''
attribute changes and translating them into .add()
on the appropriate set-like.
If people need to know which set a media object is in, they can either check .has()
on the various ones, or we can expose a static method on the global that just tells you. Or keep .kind
on media objects as a readonly that reflects the membership.
https://mediasession.spec.whatwg.org/#media-session-release
Optionally, based on platform conventions, the user agent must prevent any hardware and/or software media keys from controlling playback of current media session’s active participating media elements.
Here we are talking about media session's active participating media elements. But step2 says,
"If current media session has one or more active participating media elements then abort the remaining steps and return". So when we reach step5 the current media session will not have active participating media elements, in other words if there are active media elements in the current session we will not reach step5.
So is step5 description correct?
Does any OS support per-app support of volume control through media keys? I've not seen this before.
https://mediasession.spec.whatwg.org/#extensions-to-the-htmlmediaelement-interface
Let m have no current media session, if it currently has one.
You want to use "set" here, not "let". And you want to set "m's current media session", not "m".
In the README, it says the spec should support "Any additional buttons, e.g. like/favorite." I think we should restrict the scope to a limited set initially (i.e., whatever set is currently interoperably supported across keyboards and mobile).
https://mediasession.spec.whatwg.org/#mediasession-constructors
Set s’s kind attribute to the value of media session category.
This does not make sense since you the kind attribute a) only has a getter and b) that getter is already defined to return the value of media session category. So it seems better to simply remove this step.
https://mediasession.spec.whatwg.org/#current-media-session
When a media element is created the user agent
Needs a comma after created.
Mozilla still feels quite strongly that this API needs to work with Flash. Rationale being that we want existing sites to be able to make use of this functionality ASAP, while they transition away from relying on plugins.
Would it be possible for MediaSession to also accept a HTMLObjectElement
? It won't really have any effect on the mobile ecosystem, which doesn't use Flash anyway.
This issue is based on the discussion starting at #48 (comment) and the reply comment at #48 (comment):
I think it would be better to introduce some concept like "audio producer" and say that media elements and audio contexts both have an audio producer. Maybe eventually audio producer gets defined in terms of web audio, e.g. as an audio worker.
So I will work on replacing 'participating media elements' with a generic like 'participating audio producers'.
The net effect of this would be to isolate the interface points between media session and media elements/audio contexts down to just a single section in the spec, instead of spreading it throughout.
I will then work on defining interaction of media sessions with audio producers in separate algorithms (e.g. 'pause an audio producer', 'unpause an audio producer', 'duck an audio producer', 'unduck an audio producer', etc).
Right now those algorithms will probably treat AudioContext and HTMLMediaElement objects separately (e.g. 'For each AudioContext-based audio producer...suspend() the audio context', 'For each HTMLMediaElement-based audio producer...pause() the media element', etc). We could then eventually replace those algorithms with something based on AudioWorker or whatever our unified 'audio producer' ends up being.
This spec is great! It should get some love.
It looks like all MediaSession does is:
el.kind
or el.session.kind
.Rather than have a MediaSession object at all, we can just move the release() method over to media objects. Unless you plan on expanding it to do more in the future?
https://mediasession.spec.whatwg.org/#extensions-to-the-htmlmediaelement-interface
(whether both null or both the same media session) then
This needs a comma before then.
https://mediasession.spec.whatwg.org/#media-session-release#media-session-release
For every other media session known to the user agent, run the following substeps, passing in each media session as incumbent media session:
This omits releases happening outside the user agent, e.g. a notification sound from a calendar app finishes, causing the audio focus to return to the user agent potentially unducking or unpausing non-transient audio.
In lifecycle, it states:
When a new media session is created its current state must be initially set to idle.
You already stated this in the text about creating the media session. Please DRY-up the spec or it gets confusing.
The MediaSession kind corresponds to a number of behaviors (does it interrupt others, can it mix with others, etc.) that we could expose directly:
partial interface MediaSession {
attribute boolean transient;
attribute boolean mixable;
attribute boolean pauseInBackground; // silly name
}
The content kind would correspond to all of these being false, while e.g. transient-solo is {transient: true, mixable: false }
.
In this setup, kinds would be a matter for HTMLMediaElement only.
https://mediasession.spec.whatwg.org/#mediaremotecontrols-event-handlers
When the user presses the MediaTrackPrevious media key,
How should this happen relative to the task that is already queued for pressing that key? Should it in fact all be part of the same task, or a set of tasks (what order?).
It seems the only special bit is about receiving them while not in focus. Anything else? @jakearchibald
https://mediasession.spec.whatwg.org/#mediasession-constructors
then set s’s controls attribute
It would be good to avoid this kind of language. What actually happens is that you set an internal slot of s and controls
getter returns the value of that internal slot. New specifications should be written with that model in mind.
https://mediasession.spec.whatwg.org/#mediasession-constructors
If media session category is "content" then
This needs a comma after "content". Same for the other hit for '" then'.
Proposal:
partial interface AudioContext {
attribute MediaSession? session;
}
As far as possible, AudioContext
and HTMLMediaElement
should integrate with MediaSession
in the same way. An important difference is that new AudioContext()
creates an audio context which is already playing (silence), so it would be like connecting a playing media element to a session, which we haven't wanted to do.
We have had a lot of informal discussion around how web developers should be able to set media session metadata. Let's use this issue to track input and ideas on how we could do this.
For "content"-based media sessions we want to display meaningful metadata within notification area and lock screen control interfaces. We plan to expose 'live' attributes for web developers to set and change metadata dynamically before or during the running of an active media session.
That means we are currently thinking about the following:
partial interface MediaSession {
// null except if `this.kind === "content"`
readonly attribute MediaSessionMetaData? metadata;
};
interface MediaSessionMetaData {
attribute DOMString title;
attribute DOMString artist;
attribute DOMString album;
attribute USVString artwork;
};
What we also need to discuss is what the default metadata should be if these attributes are not set prior to a media session becoming active or what we should display if these attributes are unset during an active media session. We could use the top-level browsing context's title
and favicon, and/or define new default media session metadata meta element extensions or leave it to user agents to decide what to display when no metadata is provided.
If there are no objections then I plan to add this to the specification this week.
https://mediasession.spec.whatwg.org/#indefinitely-pause
must invoke its pause() method
We should not invoke public methods from prose as they can be overridden by script. Please instead invoke the algorithm pause() invokes directly. This might involve filing a bug on the relevant specification to give that algorithm a name.
So that we can compare and discuss more concretely, can we commit the different proposals to a document which can be iterated upon by pull requests?
I'm interested in documenting the MediaSession
idea. @richtr, do you want to make a HTMLMediaElement
-only proposal? @marcoscaceres, is there a proposal that has support at Mozilla you want to share?
https://mediasession.spec.whatwg.org/#the-mediaremotecontrols-interface
https://mediasession.spec.whatwg.org/#the-mediasession-interface
The MediaRemoteControls interface would have to internally link back to its MediaSession, so the split is a bit artificial from an implementation point of view.
iOS has this nice feature to skip time (like 30seconds) vs skip to next track. It's quite useful for audio books, where you generally never want to skip to next chapter - but you do want to rewind to re-listen to something, or to quickly find your place in a chapter you already heard (e.g., if you were distracted or fell asleep).
Notice the icons even show the skip amount time - so nice! :)
The media session is currently suspended from having platform-level media focus and its participating media elements are either paused or ducked
"ducked"?
I would have thought you'd design the lower-level primitive first, then see what libraries people build with it regarding HTMLMediaElements, and then try to standardize the winning ideas.
It has become apparent that our current model, which requires that one of the media session's participants start using the audio output device, is actually slightly more restrictive than iOS, and of course also Android. Both platforms make it possible to request audio focus, thus interrupting other apps, before beginning to produce any audio output. The iOS restriction is that you will not become the "now playing" app until you start producing audio, which is what gives you control over the playback controls in the drawer.
I propose something like this:
partial interface MediaSession {
Promise<void> activate();
Promise<void> deactivate();
}
Using with media elements:
var session = new MediaSession();
session.setMetadata({title: 'Punk Rock'});
var audio = new Audio('music');
audio.session = session;
audio.oncanplay = function() {
session.activate().then(function() {
audio.play();
});
};
Using with Web Audio:
var session = new MediaSession();
session.setMetadata({title: 'Synth Pop'});
var context = new AudioContext(session);
// context is started inactive because session was
// prepare the context for playback
session.activate().then(function() {
context.resume();
});
It's clear that you activate and then play. This is the actual order in the current model as well, but the activation is implicit by attempting to play.
This is not to the exclusion of implicit activation by media elements, but that could be defined in terms of the web-facing API. For Web Audio I can't see a reason to do implicit activation at all.
That would make it much easier to find out where various definitions are invoked.
https://mediasession.spec.whatwg.org/#media-session-states
4.1. States
Please just use the enum names. E.g. "idle", "active", ...
Why are play/pause not exposed?
https://mediasession.spec.whatwg.org/#extensions-to-the-htmlmediaelement-interface
If the new value is null then set the m’s kind attribute to the empty string.
This needs to be far more clear what kind of attribute is being set. IDL or content?
And after that is clarified, how does that affect "When a media element’s kind attribute is set, changed, or removed then the user agent must update media element’s media session for media element."?
This has big implications for the shape of the API.
Android:
registerMediaButtonEventReceiver
and a newer MediaSession
API allow apps to handle media buttons. Both appear to be orthogonal to audio focus and audio playback.iOS:
CC @sicking, @jernoble, @richtr, @marcoscaceres. Anyone else?
The spec says:
Set media session’s current media session type to the corresponding media session type of media session category. If no corresponding media session type can be found for the provided media session category or media session category is empty, then set media session’s current media session type to "content".
Won't WebIDL have thrown because the media session is not one of MediaSessionKind
? Also, why don't you just default kind
to "content" in the WebIDL constructor? Then you are assured the media session category is content when no argument is given (and you don't need the redundant prose).
https://mediasession.spec.whatwg.org/#mediasession-constructors
The MediaSession(kind) constructor when invoked must run these steps:
You need commas around "when invoked" I think.
https://mediasession.spec.whatwg.org/#assigning-a-media-session-declaratively
is Default then
Needs a comma before then. Happens at least twice.
See https://mediasession.spec.whatwg.org/#idl-index
The current name is a bit odd. Usually content attributes will use "foo-bar" while IDL properties use "fooBar". Should we use the same convention here? Are there some precedent? On top of my head, I don't know of any composed enumerated attribute.
https://mediasession.spec.whatwg.org/#interrupting-a-media-session-from-a-media-element
When the pause() method on a media element is invoked
This needs to be more clearly identified as monkey patching and should link to the corresponding bug that either makes pause() extensible or adjusts pause() to integrate the monkey patch (the latter is generally a better solution).
In order to not close the door on having MediaSession stand on its own, rename release()
to deactivate()
to match a potential activate()
method that would bring the media session into the activate state. These are the names used on iOS.
CC @domenic
I think this will be solved once we make this a distinct object, but filing it just in case. We need to be able to detect what MediaMetadata features the user agent supports.
https://mediasession.spec.whatwg.org/#the-mediasession-interface
Media sessions implement the following interface:
This seems incorrect. MediaSession
objects implement it and have a corresponding media session...
I've created https://github.com/richtr/universal-remote-control-access which demonstrates how, if we concetrate on implementing only a HTMLMediaElement
-centric model for remote control events access, we can use that implementation to relay remote control events to any other kind of in-page web content (such as Web Audio API content, Flash-based media players, presentations and slideshows) we may wish to control via available hardware and software remote control interfaces.
It works by very efficiently generating an arbitrary length WAV audio blob on the client-side and then uses that to obtain media focus via a 'dummy' HTMLMediaElement
. This then enables the web page to obtain remote control event access - mediated through this dummy HTMLMediaElement
object. The web page can then register to handle events fired at that element to drive any non-'HTMLMediaElement' content they wish.
This example library will allow web developers to request and obtain media focus for any length of time they wish to hold it and/or until another web page or application takes its media focus away.
tl;dr If we concentrate on allowing web pages to obtain media keys and media focus around HTMLMediaElement
objects only then there are ways that could be used to control any in-page web content (such as Flash-based media players, Web Audio API streams and presentations) within a web page as desired.
In the MediaSession proposal, what's the use case that motivates having the media session know when it's being ducked? I can't think of any really valid one (the app has no business knowing if I'm getting notifications or whatever).
If there is no strong use case, I would prefer we remove .onduckstrart|end
and the related state.
From the spec:
When playing media on the web, developers are currently forced to adopt a single default platform modality for playing all media content. On the other hand, native applications can access much richer media integration options with an underlying platform. On mobile devices, native application developers can request many different forms of media integration with the platform to obtain access to headphone buttons, lock screens and notification areas as needed. On desktop devices, native applications have access to keyboard media key events. Native application developers can specify the conditions in which media content should pause or duck on audio interruptions (i.e. pause or lower the volume for the duration of an interruption), continue playing out when application focus is lost or the device screen is switched off and interface with internal and external remote controllers.
If media sessions allow web developers to opt-in to custom platform-level media behavior on different platforms why do we insist on enforcing strict, arbitrary platform-level integration in the case that web media content has not opted-in to that?
Currently on mobile devices, by default, <audio> will continue playing out when the web browser is backgrounded and/or the device's screen is switched off. It may provide notification area controls and allow users to play and pause the audio content from the notification area. Clicking on the notification may bring the user back to the browser and, ideally, bring the tab making the noise to the foreground. It may display audio metadata on the homescreen, obtained from either the <audio> element or document metadata (such as document title and favicon). It may only allow only one media element to play out at a time or mix multiple media elements to play out at the same time. It may automatically pause <video> when the browser is backgrounded. ...or it may not do any or some of these things depending on which browser you try.
All of this behavior is a.) inconsistently provided across different web browsers, b.) completely magic in that it cannot be observed or controlled by web applications and c.) must be opted-out of (instead of having to opt-in to it in the first place) by web developers through the introduction of media sessions.
In line with the principles of the extensible web manifesto we must try to explain or remove this auto-magic behavior for default media handling by specifying how 'default' media playback should perform consistently across different web browsers and devices.
So what should we do? Describe the current magic of default media handling somehow? Choose a single sensible, consistent modality for default media handling on mobile devices (e.g. by default let's choose to treat all media as 'ambient' content)? Or should we just leave the magic of default media handling alone and not try to explain it in programmatic terms and continue to leave this up to implementors to decide what platform-level inter-operation and integration to provide by default for media content?
So that I can request https://foo.spec.whatwg.org/ we need a value for foo
. Much of the shape of the spec is still unknown, but the concepts involved are known: audio focus and media keys. Options:
MediaSession
)https://mediasession.spec.whatwg.org/#media-session-invocation
If this step is run and platform-level media focus can not be obtained for any reason then this algorithm must stall on this step.
The media element should also be paused.
https://github.com/richtr/html-media-focus
@richtr, do you want me to invite you to the whatwg org so that you can fiddle with this repo too? If that's OK by you, @marcoscaceres?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.