Giter Club home page Giter Club logo

webvtt's Introduction

WebVTT

This is the source of the WebVTT specification.

Contributions can be made via the W3C Text Tracks Community Group.

You can file new issues from the specification itself. There are also old bugs reported in W3C BugZilla.

Generating the spec

This spec is generated using bikeshed.

To generate a CG draft, run:

$ bikeshed spec

To generate a WD snapshot, run e.g.:

$ ./snapshot.sh WD 2016-01-01 2015-12-08

Also see https://github.com/w3c/webvtt/commit/754f13e3cf03d6036c3e4628c6920d17b412f778 for manual fixup of the generated output.

To format the index.bs file, run:

$ ./format.py index.bs

webvtt's People

Contributors

alastor0325 avatar autokagami avatar benjaminschaaf avatar ckennedy44 avatar deniak avatar foolip avatar gkatsev avatar himorin avatar nigelmegitt avatar palemieux avatar plehegar avatar rjksmith avatar silviapfeiffer avatar tidoust avatar tmichel07 avatar zcorpan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

webvtt's Issues

Expose the fallback language in the API

In #257 I made it possible for the "List of WebVTT Node Objects" object to have an "applicable language", but this can't be exposed in getCueAsHTML() since that object maps to a DocumentFragment, which can't have a lang attribute.

One way to make this work is to change from DocumentFragment to an HTML div or span element, and set lang on it if is has an "applicable language".

Existing scripts that just insert whatever comes out of getCueAsHTML() to the document should continue to work, it would just get an extra element. I suppose span is a bit more versatile than div because it is allowed in more places (e.g. in a p element). OTOH, just dumping all cues and have them be separated from each other by default could be nice.

An alternative could be to have the fallback language be a property of TextTrack, and have JS be responsible for setting a lang attribute on a container element in which it inserts cues with getCueAsHTML().

Change syntax for cue id to a setting?

The parser is a bit annoying because the id is before the timings. I think it would be better if the id was a setting of the cue instead, as in:

00:00:00.000 --> 00:00:02.000 id:foo
Hello world

This would restrict the id to not contain spaces.

For the parser, it would mean that a new block can just look at the first line to determine what kind of block it is, instead of checking the second line first to see if it contains -->. For instance, the following is a valid cue with the id "NOTE", which is a bit weird (and similarly for other block types we introduce):

NOTE
00:00:00.000 --> 00:00:02.000
Hello world

If we were to change this, then legacy files would have their IDs dropped on the floor (as invalid blocks terminated by a line that contains --> which starts a new cue).

Is it too late to change this? What do people think?

Allow FF alongside space and tab in the syntax

https://w3c.github.io/webvtt/#file-structure

U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab)

The parser allows U+000C FORM FEED here (and in other places, just not right after the signature).

Fixing #221 properly would mean that almost every time the parser had seen a FF, it would be a syntax violation. But that seems a bit silly. It seems saner to define "whitespace" as space, tab and FF, in the syntax.

Note: not suggesting any change to the parser.

Clarify "Valid BCP47 language tag"

In http://dev.w3.org/html5/webvtt/#webvtt-cue-text you still say the cue language span "must be a valid BCP 47 language tag", where 'valid' has a particular meaning in BCP 47 and it is not clear if you intend that specific meaning. You should: (a) omit the word valid; (b) clarify that you mean BCP 47 valid; or (c) use the word 'well-formed' instead. (Valid in BCP 47 means that the implementation checks to see that all of the subtags are registered, at least as of an implementation specific date)

I think (b) is intended. HTML has the same requirement for its lang attribute. But it's not a requirement on implementations in HTML and WebVTT, but on authors.

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28255#c17

Allow more styling (block, padding, rounded corners)

Related bug https://www.w3.org/Bugs/Public/show_bug.cgi?id=25633

Also #235 (comment)

The styling abilities in WebVTT are pretty limited now, in particular it's not possible per spec to achieve e.g. Safari's default rendering, nor is it possible to do anything about the background box other than changing color or setting a background image.

I think it should be possible to change between inline background and block background; set padding, rounded corners, box-decoration-break. Maybe even border (probably have to set box-sizing:border-box by default). But we need to be careful to not screw up the positioning algorithm.

Missing internal file-wide language declaration

See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28255#c16

Straw-man syntax:

WEBVTT
lang:en

00:00:00.000 --> 00:00:05.000
Hello world.

The in-file language would be added to the language stack after the fallback language (if any) (see #257), and set on the "list of WebVTT Node Objects" object regardless of the fallback language, so a node's applicable language would be

  1. Language of the nearest <lang>
  2. Language of in-file declaration (this issue)
  3. Language of external fallback language (<track srclang>) #257

Are implementors interested in supporting this?

incomplete ruby implementation [I18N-ISSUE-431]

[moving to github from https://www.w3.org/Bugs/Public/show_bug.cgi?id=28265]

i was just reviewing this with a mind to close our i18n tracker issue for now (and reopen as support for the other aspects of the html5 markup model spreads), when it occurred to me that webvtt doesn't support the rb element.

This is widely supported in browsers (see the test results at http://www.w3.org/International/tests/repo/results/ruby-html#position), and i think it should therefore be supported by webvtt. Adding it allows for additional styling options, as well as reducing the potential for confusion in authors, who are used to using it in HTML.

Note that HTML5 supports all of the following:

<ruby><rb>...</rb><rt>...</rt>...
<ruby><rb>...<rt>...
<ruby>...<rt>...</rt>...

Definition of "text alignment" doesn't match processing requirements

https://w3c.github.io/webvtt/#webvtt-cue-text-alignment

A text alignment
An alignment for all lines of text within the cue box, in the dimension of the writing direction and the paragraph direction [BIDI], one of:

Start alignment
The text is aligned towards the paragraph direction start side of the cue box.

As far as I can tell, this text is wrong. The processing model sets 'unicode-bidi' to plaintext which isolates paragraphs, and then sets 'text-align' to start for "Start alignment", and the behavior of CSS in that case does not match the description above.

See
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3719
and flip the value of 'text-align' between start and end (in Gecko or WebKit or Blink), to see what the effect of the processing requirements are.

So for start alignment, each line is aligned to the start edge for that line, which is separate from the cue's "paragraph direction".

My understanding is that @r12a is happy with the effect of the processing requirements, so I suggest we fix the description here to match.

timings in header followed by a blank line is not parsed correctly

https://w3c.github.io/webvtt/#file-parsing

If the character indicated by position is a U+000A LINE FEED (LF) character, advance position to the next character in input.

Consider this file:

WEBVTT
00:00:00.000 --> 00:00:01.000

Test

This parses into a cue with cue text "Test" per spec, AFAICT. This looks like a bug.

Metadata header loop

14 The character indicated by position is a U+000A LINE FEED (LF) character. Advance position to the next character in input.

15 If line contains the three-character substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then set the already collected line flag and jump to the step labeled cue loop.

Now position is at the blank line.

17 Cue loop: If the already collected line flag is set, then jump to the step labeled cue creation.

Cue creation:

22 If line contains the three-character substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then jump to the step labeled timings below.

28 Timings: Unset the already collected line flag.

29 Collect WebVTT cue timings and settings from line using regions for cue. If that fails, jump to the step labeled bad cue.

didn't fail...

31 Cue text loop: If position is past the end of input, then jump to the step labeled cue text processing.

32 If the character indicated by position is a U+000A LINE FEED (LF) character, advance position to the next character in input.

This is the bug. This skips past the blank line so position is at the start of the "Test" line.

Out-of-range position value

https://w3c.github.io/webvtt/#cues

If the position is numeric, then return the value of the position and abort these steps

If the position value is out-of-range [0~100], what's value we should return to?
We should have a mechanism like #webvtt-cue-line's, returning a default value when it's out-of-range.

Have tags for specifying color and background color

See #261 (comment)

Again I suppose this would be needed on the cue level as well as on the element level, if it should be possible to override the default background color.

I don't particularly like adding new elements for color, it seems like it would be better to be able to set colors on any element, like you can set classes on any element.

We could allow color specifiers at the end of start tags, as annotations. For elements that already require annotations, we could make a "#" switch from annotation to color. (It would be possible to write stylesheets that work in legacy clients using ^= attribute selector, e.g. v[voice^=Roger].)

WEBVTT

00:00:00.000 --> 00:00:10.000.foo.bar align:left #FFF #000
<v.foo.bar Roger #FFF #000>Hello

The cue itself has classes "foo" and "bar", color "#FFF", background color "#000".
The v element has the same classes, annotation "Roger", same colors.

To support non-opaque colors, we can use hex annotation with 4 or 8 characters (this is supported in CSS now, per spec at least). e.g. fully transparent black is #0000 or #00000000. https://drafts.csswg.org/css-color/#hex-notation

The colors specified this way would be overridable with CSS, so it's like HTML <font color>, not like style="...".

Change the syntax of Region headers

See https://www.w3.org/Bugs/Public/show_bug.cgi?id=18657

It is suggested to change the syntax here to use REGION blocks before the cues (consistent with STYLE), instead of headers. I suppose this would also be an opportunity to change from = to : so that would be consistent with cue settings.

If we use a block it would be possible to allow linebreaks between settings here, if people think that would be nice to be able to do. e.g.

WEBVTT

REGION
id:fred
width:40%
lines:3
regionanchor:0%,100%
viewportanchor:10%,90%
scroll:up

REGION
id:bill width:40% lines:3 regionanchor:100%,100% viewportanchor:90%,90% scroll:up

...

Apple has shipped support for regions, so need to check with them if this change is OK. @dwsinger ?

Consider adding 'ruby-align' as a supported CSS property

https://w3c.github.io/webvtt/#selectordef-cue

:​:​cue

In https://www.w3.org/Bugs/Public/show_bug.cgi?id=28183 it is pointed out that 'ruby-align' exists and could be useful. It's supported in Gecko and IE.

We have an align cue setting which sets 'text-align' on the root element, and we don't allow setting 'text-align' with ::cue or ::cue(), but I don't see any problem with allowing 'ruby-align'. It doesn't affect the size of the ruby box, and you might want different alignment for the whole cue vs. ruby. For instance, if a cue contains a single ruby annotation and the cue is start-aligned, it could make sense to start-align the ruby as well. But if a cue contains a normal run of text and has some ruby annotations here and there, it might make more sense to use the default space-around ruby alignment regardless of the cue's alignment.

Support character escapes in classes (<c.foo&amp;bar>)

https://w3c.github.io/webvtt/#webvtt-start-tag-class-state

WebVTT start tag class state
Jump to the entry that matches the value of c:
[...]
Anything else
Append c to buffer and jump to the step labeled next.

Should we support character escapes in classes?

<c.foo&amp;bar>

The class name above is parsed to foo&amp;bar, not foo&bar. i.e. the Selector to match it would be ::cue(.foo\&amp\;bar) and not ::cue(.foo\&bar). I think this is a surprising and unnecessary, I don't see any problem with supporting escapes here as well as in the annotation (which is already supported).

The syntax disallows "&" in classes (so the above is not valid):

https://w3c.github.io/webvtt/#webvtt-cue-span-start-tag

WebVTT cue span start tag has a tag name and either requires or disallows an annotation, and consists of the following components, in the order given:

A U+003C LESS-THAN SIGN character (<).
The tag name.
Zero or more occurrences of the following sequence:
U+002E FULL STOP character (.)
One or more characters other than U+0009 CHARACTER TABULATION (tab) characters, U+000A LINE FEED (LF) characters, U+000D CARRIAGE RETURN (CR) characters, U+0020 SPACE characters, U+0026 AMPERSAND characters (&), U+003C LESS-THAN SIGN characters (<), U+003E GREATER-THAN SIGN characters (>), and U+002E FULL STOP characters (.), representing a class that describes the cue span’s significance.

Clarify that <b> can use a class name and CSS

https://w3c.github.io/webvtt/#webvtt-bold-object

WebVTT Bold Objects

See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28262

Also http://www.w3.org/International/questions/qa-b-and-i-tags (the Japanese example)

For HTML @r12a said that the way to handle this example in HTML is to use span or b with a class name, and use CSS to style it differently for different languages. So we should clarify that this is possible in WebVTT as well and add an example of how to do that for this in particular.

introduce cue-level classes

I would like the WebVTT spec to define a method for publishers to include custom cue settings.

I would like to be able to mark some cues as the beginning of a paragraph. At TED Conferences, we use data like this to construct a readable transcript from the subtitles created by our volunteer translators. You can see an example in the English transcript of Shonda Rhimes'
TEDTalk
.

We currently use a custom JSON format to transmit this data. We would like to switch to a more standard format, but need support for this additional per-caption metadata.

Something like HTML's data-* attributes would serve our purposes nicely.

For example:

WEBVTT

00:00:00.960 --> 00:00:04.616 data-paragraph:true
So a while ago, I tried an experiment.

00:00:04.640 --> 00:00:08.080
For one year, I would say yes
to all the things that scared me.

cc @zcorpan @jwarchol @bendk

Tokenizer doesn't parse escapes in annotation

https://w3c.github.io/webvtt/#webvtt-start-tag-annotation-state

WebVTT start tag annotation state
Jump to the entry that matches the value of c:

U+003E GREATER-THAN SIGN character (>)
Advance position to the next character in input, then jump to the next "end-of-file marker" entry below.

End-of-file marker
Remove any leading or trailing space characters from buffer, and replace any sequence of one or more consecutive space characters in buffer with a single U+0020 SPACE character; then, return a start tag whose tag name is result, with the classes given in classes, and with buffer as the annotation, and abort these steps.

Anything else
Append c to buffer and jump to the step labeled next.

This doesn't tokenize escapes, AFAICT. But the syntax allows escapes in annotations:

https://w3c.github.io/webvtt/#webvtt-cue-span-start-tag

If the start tag requires an annotation: a U+0020 SPACE character or a U+0009 CHARACTER TABULATION (tab) character, followed by one or more of the following components, the concatenation of their representations having a value that contains at least one character other than U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) characters:

WebVTT cue span start tag annotation text, representing the text of the annotation.
A WebVTT cue amp escape, representing a "&" character in the text of the annotation.
A WebVTT cue lt escape, representing a "<" character in the text of the annotation.
A WebVTT cue gt escape, representing a ">" character in the text of the annotation.
A WebVTT cue lrm escape, representing a U+200E LEFT-TO-RIGHT MARK Unicode bidirectional formatting character in the text of the cue.
A WebVTT cue rlm escape, representing a U+200F RIGHT-TO-LEFT MARK Unicode bidirectional formatting character in the text of the cue.
A WebVTT cue nbsp escape, representing a U+00A0 NO-BREAK SPACE character in the text of the cue.

Minimal resolution of overlapping cues is difficult

If there is a position to which the boxes in boxes can be moved while maintaining the relative positions of the boxes in boxes to each other such that none of the boxes in boxes would overlap any of the boxes in output, and all the boxes in boxes would be within the video’s rendering area, then move the boxes in boxes to the closest such position to their current position,

The spec does not propose an algorithm for doing this, and I suspect that it is equivalent to implementing a constraint solver for systems of linear inequalities. (I'm not against this; if a constraint solver is the way create the best layout, so it is.)

It's not difficult to display a new cue and find a non-overlapping place for it (or, whether there is no such place). But figuring out how to minimally move around existing cues to make room for a new one is nontrivial.

Native controls cue overlap avoidance works badly for vertical

https://w3c.github.io/webvtt/#processing-model

If the user agent is exposing a user interface for video, add to output one or more completely transparent positioned CSS block boxes that cover the same region as the user interface.

Most desktop browsers at least have a horizontal controls bar at the bottom of the video, showing and hiding based on hover and paused state. This works fine for horizontal cues. But for vertical cues, assuming the default size:100%, it means they don't fit anywhere.

I suppose it's similar if you have a horizontal cue and a vertical cue active at the same time, which is not unreasonable if you have e.g. English and vertical Japanese tracks both activated.

https://zcorpan.github.io/live-webvtt-viewer/#vtt=WEBVTT%0A%0A00%3A00%3A00.000+--%3E+00%3A00%3A10.000%0AHorizontal+1%0A%0A00%3A00%3A00.000+--%3E+00%3A00%3A10.000+vertical%3Alr+size%3A100%25%0AVertical+1%0A%0A00%3A00%3A01.000+--%3E+00%3A00%3A10.000%0AHorizontal+2

Isolate bidi for ruby text?

Should the text in <rb> be isolated for the purposes of bidi? It seems to me that it should be possible to use e.g. English ruby text for Arabic ruby base.

<ruby>مرحبا!<rt>Hello!</ruby>

I think setting 'unicode-bidi' to "plaintext" for <rt> would fix this.

CC @r12a

Wrong description for computing "x/y-position"

https://w3c.github.io/webvtt/#processing-model

If the WebVTT cue snap-to-lines flag is set, then run the appropriate steps from the following list:​

If I understand correctly, the y-position should be computed when the WebVTT cue's writing direction is horizontal, and x-position for the vertical writing direction.

Think about that, we set y-position be 0 in above step 6, and said it's "temporary positions used to calculate box dimensions below." However, we still set the x-position in following steps, so that the y-position still be zero and it seems like a serious error.

WebVTT alignment cue setting keywords

4.4. WebVTT cue settings
https://w3c.github.io/webvtt/#cue-settings

"A WebVTT alignment cue setting configures the alignment of the text within the cue. The keywords are relative to the cue text’s lines' base direction; for left-to-right English text, "start" means left-aligned."

i suspect the second sentence ought to say

"The start and end keywords are relative to the cue text’s lines' base direction; for left-to-right English text, "start" means left-aligned."

Explicitly handle NOTE blocks in the parser

@foolip

zcorpan_: if you could also handle NOTE and explicitly ignore it, it would make it obvious how to avoid emitting console warnings for them in implementations

This is editorial but seems like a good idea to me.

Wrong BOM characters mentioned in spec

Hi,

Section 4.1 of the WebVTT spec says:
"An optional U+FEFF BYTE ORDER MARK (BOM) character."
This is a UTF-16 BOM, while a few lines above the spec says:
"A WebVTT file must consist of a WebVTT file body encoded as UTF-8"
At the end of the spec, under 'magic number(s)' the BOM EF BB BF is used, so I believe the first occurrence is just wrong.

Thanks

Eran

Change the WebVTT standard link at the top of the github/w3c/webvtt page

The top of the page https://github.com/w3c/webvtt includes the text:

WebVTT Standard https://webvtt.spec.whatwg.org/

This should now be changed in line with the new publication location to https://w3c.github.io/webvtt/ and it would also be helpful to point to the WG Rec Track version, for reference anyway, at http://www.w3.org/TR/webvtt1/

This is also what Silvia summarised at http://dev.w3.org/cvsweb/html5/webvtt/README.txt?rev=1.1;content-type=text%2Fx-cvsweb-markup

Explain the difference between cue direction and paragraph direction

See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28266

The processing model is OK but it is very unclear what the effect is. We should add a note clarifying this, with an example. In particular clarify that the cue direction (which is determined by the direction of the first paragraph in the cue) affects positioning of the cue, but unicode-bidi:plaintext still makes the direction of each paragraph be isolated from each other.

Also see http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3718

Add default styling to "well-known" classes

See #261 (comment)

I think the default rendering should be the same in all user agents, but I don't really mind having a simple default stylesheet for webvtt with some classes like "red".

One problem is if we want to support changing the background color of the background box, we probably have to add a way to specify classes on cues.

This would actually be backwards compatible syntax if we need classes on cues:

WEBVTT

00:00:00.000 --> 00:00:10.000.foo.bar align:start
Hello world.

Revamp the parser

To fix #219 (comment) I find the flat structure of the parsing algorithm with "jump to step foo" makes things hard to work with. Explore a different way to specify the parser that can e.g. share steps for skipping to the next blank line or "-->", and makes it easier to add new kinds of blocks.

This should be purely editorial.

Also see http://krijnhoetmer.nl/irc-logs/whatwg/20151022#l-301 for some discussion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.