w3c / mediasession Goto Github PK

View Code? Open in Web Editor NEW

123.0 43.0 35.0 1.5 MB

Media Session API

Home Page: https://w3c.github.io/mediasession/

License: Other

Python 7.20% Makefile 0.29% Bikeshed 92.51%

mediasession's Introduction

Media Session API

https://w3c.github.io/mediasession/

This standardization project aims to add support for media keys and audio focus to the Web. Media keys, like play, pause, fast forward and rewind, are found on keyboards, headsets, remote controls, and on lock screens of mobile devices.

Explainer

The explainer of the MediaSession API can be found here.

Use cases

The use cases of the MediaSession API can be found here.

Extensibility

Our goal is to provide developers with low-level primitives that both help explain the platform, while also allowing developers to build rich multimedia experiences that leverage media key events. Keeping to our commitment to the extensible web manifesto, we want to allow the media key events to be routed to wherever you need them in your web application. At the same time, we want to make sure that whatever solution we come up with is easy to use – by possibly extending existing HTML elements or APIs.

Limitations

Access to media keys and lock screen UI will only be granted when audio playback begins, ensuring that audio focus is not taken from another application prematurely and that lock screen UI is only shown when it can be used. This matches the iOS model.

Contribute

This spec is built using Bikeshed.

Update index.bs and send a Pull Request with your changes. When your Pull Request will be merged, a new index.html will be generated. If you want to test locally, you can run make to generate index.html using Bikeshed's web interface. However, you should not send the index.html file in your Pull Request.

To run Bikeshed locally, install Bikeshed and then run bikeshed spec in the working directory.

Everyone is welcome to contribute! See the CONTRIBUTING.md file for practical licensing details for contributions.

Code of conduct

We are committed to providing a friendly, safe and welcoming environment for all. Please read and respect the W3C Code of Conduct.

mediasession's People

Contributors

Stargazers

Watchers

mediasession's Issues

"If this step is run and platform-level media fo..."

https://mediasession.spec.whatwg.org/#media-session-invocation

If this step is run and platform-level media focus can not be obtained for any reason then this algorithm must stall on this step.

The media element should also be paused.

"then set s’s controls attribute"

https://mediasession.spec.whatwg.org/#mediasession-constructors

then set s’s controls attribute

It would be good to avoid this kind of language. What actually happens is that you set an internal slot of s and controls getter returns the value of that internal slot. New specifications should be written with that model in mind.

"When the pause() method on a media element is i..."

https://mediasession.spec.whatwg.org/#interrupting-a-media-session-from-a-media-element

When the pause() method on a media element is invoked

This needs to be more clearly identified as monkey patching and should link to the corresponding bug that either makes pause() extensible or adjusts pause() to integrate the monkey patch (the latter is generally a better solution).

"4.1. States "

https://mediasession.spec.whatwg.org/#media-session-states

4.1. States

Please just use the enum names. E.g. "idle", "active", ...

Media session metadata

We have had a lot of informal discussion around how web developers should be able to set media session metadata. Let's use this issue to track input and ideas on how we could do this.

For "content"-based media sessions we want to display meaningful metadata within notification area and lock screen control interfaces. We plan to expose 'live' attributes for web developers to set and change metadata dynamically before or during the running of an active media session.

That means we are currently thinking about the following:

partial interface MediaSession {
  // null except if `this.kind === "content"`
  readonly attribute MediaSessionMetaData? metadata;
};

interface MediaSessionMetaData {
  attribute DOMString title;
  attribute DOMString artist;
  attribute DOMString album;
  attribute USVString artwork;
};

What we also need to discuss is what the default metadata should be if these attributes are not set prior to a media session becoming active or what we should display if these attributes are unset during an active media session. We could use the top-level browsing context's title and favicon, and/or define new default media session metadata meta element extensions or leave it to user agents to decide what to display when no metadata is provided.

If there are no objections then I plan to add this to the specification this week.

MediaSession duck-related events - lack of use case?

In the MediaSession proposal, what's the use case that motivates having the media session know when it's being ducked? I can't think of any really valid one (the app has no business knowing if I'm getting notifications or whatever).

If there is no strong use case, I would prefer we remove .onduckstrart|end and the related state.

Flatten MediaRemoteControls into MediaSession

https://mediasession.spec.whatwg.org/#the-mediaremotecontrols-interface
https://mediasession.spec.whatwg.org/#the-mediasession-interface

The MediaRemoteControls interface would have to internally link back to its MediaSession, so the split is a bit artificial from an implementation point of view.

Replace 'participating media elements' with 'participating audio producers'

This issue is based on the discussion starting at #48 (comment) and the reply comment at #48 (comment):

I think it would be better to introduce some concept like "audio producer" and say that media elements and audio contexts both have an audio producer. Maybe eventually audio producer gets defined in terms of web audio, e.g. as an audio worker.

So I will work on replacing 'participating media elements' with a generic like 'participating audio producers'.

The net effect of this would be to isolate the interface points between media session and media elements/audio contexts down to just a single section in the spec, instead of spreading it throughout.

I will then work on defining interaction of media sessions with audio producers in separate algorithms (e.g. 'pause an audio producer', 'unpause an audio producer', 'duck an audio producer', 'unduck an audio producer', etc).

Right now those algorithms will probably treat AudioContext and HTMLMediaElement objects separately (e.g. 'For each AudioContext-based audio producer...suspend() the audio context', 'For each HTMLMediaElement-based audio producer...pause() the media element', etc). We could then eventually replace those algorithms with something based on AudioWorker or whatever our unified 'audio producer' ends up being.

"transientsolo" should be renamed "transient-solo"

See https://mediasession.spec.whatwg.org/#idl-index

The current name is a bit odd. Usually content attributes will use "foo-bar" while IDL properties use "fooBar". Should we use the same convention here? Are there some precedent? On top of my head, I don't know of any composed enumerated attribute.

Figure out the coupling between audio focus/session, audio playback and remote control events

This has big implications for the shape of the API.

Android:

The Audio Focus API allows apps to pause, resume and duck audio as appropriate. Ideally, one should request focus and start audio playback if the request is successful, but it seems possible to play without audio focus and to get audio focus without playing.
The old registerMediaButtonEventReceiver and a newer MediaSession API allow apps to handle media buttons. Both appear to be orthogonal to audio focus and audio playback.

iOS:

The Audio Session API mediates which app is playing and how they deal with interruptions. The documentation says ‘For app sound to play, or for recording to work, your audio session must be active.’ and ‘The system has final authority to activate or deactivate any audio session present on a device.’
The Remove Control Events allows apps to handle media buttons. Crucially, ‘Your app must be the “Now Playing” app. Restated, even if your app is the first responder and you have turned on event delivery, your app does not receive remote control events until it begins playing audio.’ (It's not clear to me if “Now Playing” means having an active audio session, or also having a playing media player.)

CC @sicking, @jernoble, @richtr, @marcoscaceres. Anyone else?

ducked?

The media session is currently suspended from having platform-level media focus and its participating media elements are either paused or ducked

"ducked"?

Volume

Does any OS support per-app support of volume control through media keys? I've not seen this before.

Pick a spec name

So that I can request https://foo.spec.whatwg.org/ we need a value for foo. Much of the shape of the spec is still unknown, but the concepts involved are known: audio focus and media keys. Options:

audio-focus
media-keys
media-focus (suggested by @mounirlamouri)
media-session (as in MediaSession)
something else?

Use dfn.js

That would make it much easier to find out where various definitions are invoked.

"Any additional buttons, e.g. like/favorite."

In the README, it says the spec should support "Any additional buttons, e.g. like/favorite." I think we should restrict the scope to a limited set initially (i.e., whatever set is currently interoperably supported across keyboards and mobile).

"is Default then"

https://mediasession.spec.whatwg.org/#assigning-a-media-session-declaratively

is Default then

Needs a comma before then. Happens at least twice.

Transform MediaSession.kind into a set of bools/enums

The MediaSession kind corresponds to a number of behaviors (does it interrupt others, can it mix with others, etc.) that we could expose directly:

partial interface MediaSession {
  attribute boolean transient;
  attribute boolean mixable;
  attribute boolean pauseInBackground; // silly name
}

The content kind would correspond to all of these being false, while e.g. transient-solo is {transient: true, mixable: false }.

In this setup, kinds would be a matter for HTMLMediaElement only.

MediaSession auto-connection seems weird

As far as I can tell, there are at most four MediaSession objects in existence, one per kind that appears in the page. Media objects get assigned the appropriate one to their .session property automagically based on their .kind property.

This seems weird. You're really assigning media objects to mutually-exclusive sets; the set they belong to determines how they behave wrt media focus and events. Why not reify this pattern directly? Add a global somewhere (window.mediaSessions?) with four set-likes hanging off of it statically with appropriate names, which you can add media objects to. To maintain mutual-exclusivity, I guess the add() method on the set-likes needs to check for the presence of its argument in the other sets, and remove it automatically. Then you can pretend to have a MutationObserver looking for kind='' attribute changes and translating them into .add() on the appropriate set-like.

If people need to know which set a media object is in, they can either check .has() on the various ones, or we can expose a static method on the global that just tells you. Or keep .kind on media objects as a readonly that reflects the membership.

What to do with the default media handling 'magic' in mobile browsers?

From the spec:

When playing media on the web, developers are currently forced to adopt a single default platform modality for playing all media content. On the other hand, native applications can access much richer media integration options with an underlying platform. On mobile devices, native application developers can request many different forms of media integration with the platform to obtain access to headphone buttons, lock screens and notification areas as needed. On desktop devices, native applications have access to keyboard media key events. Native application developers can specify the conditions in which media content should pause or duck on audio interruptions (i.e. pause or lower the volume for the duration of an interruption), continue playing out when application focus is lost or the device screen is switched off and interface with internal and external remote controllers.

If media sessions allow web developers to opt-in to custom platform-level media behavior on different platforms why do we insist on enforcing strict, arbitrary platform-level integration in the case that web media content has not opted-in to that?

Currently on mobile devices, by default, <audio> will continue playing out when the web browser is backgrounded and/or the device's screen is switched off. It may provide notification area controls and allow users to play and pause the audio content from the notification area. Clicking on the notification may bring the user back to the browser and, ideally, bring the tab making the noise to the foreground. It may display audio metadata on the homescreen, obtained from either the <audio> element or document metadata (such as document title and favicon). It may only allow only one media element to play out at a time or mix multiple media elements to play out at the same time. It may automatically pause <video> when the browser is backgrounded. ...or it may not do any or some of these things depending on which browser you try.

All of this behavior is a.) inconsistently provided across different web browsers, b.) completely magic in that it cannot be observed or controlled by web applications and c.) must be opted-out of (instead of having to opt-in to it in the first place) by web developers through the introduction of media sessions.

In line with the principles of the extensible web manifesto we must try to explain or remove this auto-magic behavior for default media handling by specifying how 'default' media playback should perform consistently across different web browsers and devices.

So what should we do? Describe the current magic of default media handling somehow? Choose a single sensible, consistent modality for default media handling on mobile devices (e.g. by default let's choose to treat all media as 'ambient' content)? Or should we just leave the magic of default media handling alone and not try to explain it in programmatic terms and continue to leave this up to implementors to decide what platform-level inter-operation and integration to provide by default for media content?

Skip by time and skip by track

iOS has this nice feature to skip time (like 30seconds) vs skip to next track. It's quite useful for audio books, where you generally never want to skip to next chapter - but you do want to rewind to re-listen to something, or to quickly find your place in a chapter you already heard (e.g., if you were distracted or fell asleep).

Notice the icons even show the skip amount time - so nice! :)

"pause the user agent"

https://mediasession.spec.whatwg.org/#pause

pause the user agent

Needs a comma after pause, twice.

Evangelism stuff

This spec is great! It should get some love.

Logo
Twitter account
Entry on https://spec.whatwg.org/
Intro blog post on https://blog.whatwg.org/

"The MediaSession(kind) constructor when invoked..."

https://mediasession.spec.whatwg.org/#mediasession-constructors

The MediaSession(kind) constructor when invoked must run these steps:

You need commas around "when invoked" I think.

"If the new value is null then set the m’s kind ..."

https://mediasession.spec.whatwg.org/#extensions-to-the-htmlmediaelement-interface

If the new value is null then set the m’s kind attribute to the empty string.

This needs to be far more clear what kind of attribute is being set. IDL or content?

And after that is clarified, how does that affect "When a media element’s kind attribute is set, changed, or removed then the user agent must update media element’s media session for media element."?

Expose play/pause

Why are play/pause not exposed?

"must invoke its pause() method"

https://mediasession.spec.whatwg.org/#indefinitely-pause

must invoke its pause() method

We should not invoke public methods from prose as they can be overridden by script. Please instead invoke the algorithm pause() invokes directly. This might involve filing a bug on the relevant specification to give that algorithm a name.

Rename MediaSession.release() to deactivate()

In order to not close the door on having MediaSession stand on its own, rename release() to deactivate() to match a potential activate() method that would bring the media session into the activate state. These are the names used on iOS.

CC @domenic

Media session " initially set to idle".

In lifecycle, it states:

When a new media session is created its current state must be initially set to idle.

You already stated this in the text about creating the media session. Please DRY-up the spec or it gets confusing.

"Media sessions implement the following interface:"

https://mediasession.spec.whatwg.org/#the-mediasession-interface

Media sessions implement the following interface:

This seems incorrect. MediaSession objects implement it and have a corresponding media session...

Expose MediaSession.activate() and define any implicit activation in terms of it

It has become apparent that our current model, which requires that one of the media session's participants start using the audio output device, is actually slightly more restrictive than iOS, and of course also Android. Both platforms make it possible to request audio focus, thus interrupting other apps, before beginning to produce any audio output. The iOS restriction is that you will not become the "now playing" app until you start producing audio, which is what gives you control over the playback controls in the drawer.

I propose something like this:

partial interface MediaSession {
  Promise<void> activate();
  Promise<void> deactivate();
}

Using with media elements:

var session = new MediaSession();
session.setMetadata({title: 'Punk Rock'});
var audio = new Audio('music');
audio.session = session;
audio.oncanplay = function() {
  session.activate().then(function() {
    audio.play();
  });
};

Using with Web Audio:

var session = new MediaSession();
session.setMetadata({title: 'Synth Pop'});
var context = new AudioContext(session);
// context is started inactive because session was
// prepare the context for playback
session.activate().then(function() {
  context.resume();
});

It's clear that you activate and then play. This is the actual order in the current model as well, but the activation is implicit by attempting to play.

This is not to the exclusion of implicit activation by media elements, but that could be defined in terms of the web-facing API. For Web Audio I can't see a reason to do implicit activation at all.

"When the user presses the MediaTrackPrevious me..."

https://mediasession.spec.whatwg.org/#mediaremotecontrols-event-handlers

When the user presses the MediaTrackPrevious media key,

How should this happen relative to the task that is already queued for pressing that key? Should it in fact all be part of the same task, or a set of tasks (what order?).

Document in spec-like form the proposals on the table

So that we can compare and discuss more concretely, can we commit the different proposals to a document which can be iterated upon by pull requests?

I'm interested in documenting the MediaSession idea. @richtr, do you want to make a HTMLMediaElement-only proposal? @marcoscaceres, is there a proposal that has support at Mozilla you want to share?

Possible WebIDL violation?

The spec says:

Set media session’s current media session type to the corresponding media session type of media session category. If no corresponding media session type can be found for the provided media session category or media session category is empty, then set media session’s current media session type to "content".

Won't WebIDL have thrown because the media session is not one of MediaSessionKind? Also, why don't you just default kind to "content" in the WebIDL constructor? Then you are assured the media session category is content when no argument is given (and you don't need the redundant prose).

" (whether both null or both the same media sess..."

https://mediasession.spec.whatwg.org/#extensions-to-the-htmlmediaelement-interface

(whether both null or both the same media session) then

This needs a comma before then.

When and how should we load media metadata artwork?

It seems highly insufficient UI-wise to not have preloaded the image and a bit of the music file. Since you basically cannot release a player that does not have that, we should make sure that the examples cover it I think, as it might illustrate some API holes as well. (E.g. we might need to have more state on when artwork is loaded and such.)

" Set s’s kind attribute to the value of media s..."

https://mediasession.spec.whatwg.org/#mediasession-constructors

Set s’s kind attribute to the value of media session category.

This does not make sense since you the kind attribute a) only has a getter and b) that getter is already defined to return the value of media session category. So it seems better to simply remove this step.

What value does MediaSession interface bring?

It looks like all MediaSession does is:

Give people a second way to tell what kind of session the media is in (they can do either el.kind or el.session.kind.
Release media focus for all the sources with the given kind.

Rather than have a MediaSession object at all, we can just move the release() method over to media objects. Unless you plan on expanding it to do more in the future?

"Optionally, based on platform conventions, the ..."

https://mediasession.spec.whatwg.org/#media-session-release

Optionally, based on platform conventions, the user agent must prevent any hardware and/or software media keys from controlling playback of current media session’s active participating media elements.

Here we are talking about media session's active participating media elements. But step2 says,
"If current media session has one or more active participating media elements then abort the remaining steps and return". So when we reach step5 the current media session will not have active participating media elements, in other words if there are active media elements in the current session we will not reach step5.
So is step5 description correct?

Collect media keys use cases

https://github.com/richtr/html-media-focus

@richtr, do you want me to invite you to the whatwg org so that you can fiddle with this repo too? If that's OK by you, @marcoscaceres?

Integrate with Web Audio

Proposal:

partial interface AudioContext {
  attribute MediaSession? session;
}

As far as possible, AudioContext and HTMLMediaElement should integrate with MediaSession in the same way. An important difference is that new AudioContext() creates an audio context which is already playing (silence), so it would be like connecting a playing media element to a session, which we haven't wanted to do.

Demo: Enabling media centric remote control event access allows us to implicitly control any other in-page content

I've created https://github.com/richtr/universal-remote-control-access which demonstrates how, if we concetrate on implementing only a HTMLMediaElement-centric model for remote control events access, we can use that implementation to relay remote control events to any other kind of in-page web content (such as Web Audio API content, Flash-based media players, presentations and slideshows) we may wish to control via available hardware and software remote control interfaces.

It works by very efficiently generating an arbitrary length WAV audio blob on the client-side and then uses that to obtain media focus via a 'dummy' HTMLMediaElement. This then enables the web page to obtain remote control event access - mediated through this dummy HTMLMediaElement object. The web page can then register to handle events fired at that element to drive any non-'HTMLMediaElement' content they wish.

This example library will allow web developers to request and obtain media focus for any length of time they wish to hold it and/or until another web page or application takes its media focus away.

tl;dr If we concentrate on allowing web pages to obtain media keys and media focus around HTMLMediaElement objects only then there are ways that could be used to control any in-page web content (such as Flash-based media players, Web Audio API streams and presentations) within a web page as desired.

Why is the HTMLMediaElement sugar being designed in parallel with the lower-level primitive?

I would have thought you'd design the lower-level primitive first, then see what libraries people build with it regarding HTMLMediaElements, and then try to standardize the winning ideas.

Make MediaMetadata more elaborate

Track position
Number of seconds
...?

"For every other media session known to the user..."

https://mediasession.spec.whatwg.org/#media-session-release#media-session-release

For every other media session known to the user agent, run the following substeps, passing in each media session as incumbent media session:

This omits releases happening outside the user agent, e.g. a notification sound from a calendar app finishes, causing the audio focus to return to the user agent potentially unducking or unpausing non-transient audio.

"If media session category is "content" then"

https://mediasession.spec.whatwg.org/#mediasession-constructors

If media session category is "content" then

This needs a comma after "content". Same for the other hit for '" then'.

Why can't keyboard events be used here?

It seems the only special bit is about receiving them while not in focus. Anything else? @jakearchibald

Feature detection for MediaMetadata

I think this will be solved once we make this a distinct object, but filing it just in case. We need to be able to detect what MediaMetadata features the user agent supports.

" Let m have no current media session, if it cur..."

https://mediasession.spec.whatwg.org/#extensions-to-the-htmlmediaelement-interface

Let m have no current media session, if it currently has one.

You want to use "set" here, not "let". And you want to set "m's current media session", not "m".

"When a media element is created the user agent"

https://mediasession.spec.whatwg.org/#current-media-session

When a media element is created the user agent

Needs a comma after created.

Making this work with Flash

Mozilla still feels quite strongly that this API needs to work with Flash. Rationale being that we want existing sites to be able to make use of this functionality ASAP, while they transition away from relying on plugins.

Would it be possible for MediaSession to also accept a HTMLObjectElement? It won't really have any effect on the mobile ecosystem, which doesn't use Flash anyway.

w3c / mediasession Goto Github PK

mediasession's Introduction

Media Session API

Explainer

Use cases

Extensibility

Limitations

Contribute

Code of conduct

mediasession's People

Contributors

Stargazers

Watchers

Forkers

mediasession's Issues

Recommend Projects

Recommend Topics

Recommend Org