Giter Club home page Giter Club logo

Comments (23)

BigBlueHat avatar BigBlueHat commented on June 8, 2024 1

(apparently I forgot to click "comment" about 3.5 hours ago...glad you found Web Annotation @llemeurfr! 😃)

Hi all. I'd highly recommend a look at Open Annotation's successor Web Annotation (full disclosure, I'm one of the editors 😺).

It supports storing selectors precisely for serving the "multi-prong approach" use cases @danielweck mentioned: https://www.w3.org/TR/annotation-model/#selectors

Using that model, it's possible to store CFI, XPath, and CSS together and use them together or in succession to accomplish the anchoring, positioning, or reading state retrieval.

Also, the New York Public Library is using the model to share reading state for patrons as they move between various devices. So it's flexible beyond just highlighting, commenting, and bookmarking.

Lastly, to @HadrienGardeur's comment, it's quite possible to re-use Selector's (from Web Annotation) across media types (vs. fragment identifiers such as CFI which are media type specific). This is at minimum useful for reapplying selectors among variations of (X)HTML, but the more broadly applicable selectors like TextQuoteSelector can work across EPUB, PDF, CommonMark/Markdown, HTML, etc.

from architecture.

rkwright avatar rkwright commented on June 8, 2024

From Juan Corona:
I'm aware of some of the flaws, but I've been made biased (figuratively, not literally) towards keeping CFIs. It would help if someone could list out the flaws.

from architecture.

rkwright avatar rkwright commented on June 8, 2024

From Juan Corona:
XPath? Query selectors? Roll our own with the goal of only using it internally?

from architecture.

HadrienGardeur avatar HadrienGardeur commented on June 8, 2024

I think that we need more than one pointer, and an API that will return these different values.

from architecture.

HadrienGardeur avatar HadrienGardeur commented on June 8, 2024

I had something like that in mind:

{
  "highlight": "Laurentides",
  "before": "Lors de ses vacances dans les ",
  "after": "il a décidé de réécrire Readium SDK",
  "locators": 
  {
    "cfi": "...",
    "xpath": "...",
    "page": 147,
    "position": 12.67
  }
}  

from architecture.

rkwright avatar rkwright commented on June 8, 2024

I like this (especially the quote :-) But it could get quite messy in terms of resolving mismatches, i.e. where the various methods don't agree. But I like the concept.

from architecture.

HadrienGardeur avatar HadrienGardeur commented on June 8, 2024

Well, having multiple locators means that you have to resolve mismatch, but it also means that you have better chance of being able to anchor an annotation if the contain is ever updated.

IMO, a single locator is a very weak approach for a number of reasons:

  • for a search API or annotations, we need context that a CFI or XPath can't provide
  • locators that basically go through an XML tree (CFI, XPath) are much more likely to break when there's an update than the ones that are a little more "vague" (text, location)

It doesn't mean that we need to use 4-5 locators each and every time, but it should be part of the design.

from architecture.

llemeurfr avatar llemeurfr commented on June 8, 2024

Another approach could be to consider that the RS must handle CFI + XPath + page ... but that a reference will be indicated either via one or the other option. This is more or less what XPointer tried a long time ago (with its different "schemes").

Note also that XPointer, as an extension of XPath, introduces the Point (position between characters) et Range, i.e what CFI mimics. I don't like XPointers for their complexity, but a streamed down version could be interesting if CFIs are rejected.

Re. the context of search/annotation (the "snippet" as Google names it) I think that this must be discussed at a different level, not the "pointer" level.

Also, the "location break if update" is a problem with no solution, isn't it?

from architecture.

mickael-menu-mantano avatar mickael-menu-mantano commented on June 8, 2024

We don't have to resolve every locators, just the most precise one, thus ignoring mismatches.

Multiple kind of locators can be useful too when we want to go to a specific place, without knowing the exact content (CFI). For example, "go to page 45" using a progression control. Or "go to last page (-1) of spine item X" when trying to go backward to the previous spine item.

Also, the "location break if update" is a problem with no solution, isn't it?

That's where the context (snippet) could be useful, to recover broken locations.

from architecture.

olivierkorner avatar olivierkorner commented on June 8, 2024

If we were to move from CFI as a location approach, migration of existing bookmarks could become a headache for vendors. Having multiple kind of locators might indeed make the transition smoother, but i think we need to keep a CFI resolver somewhere.

from architecture.

danielweck avatar danielweck commented on June 8, 2024

The "Open Annotation in EPUB" draft proposes the use of multiple "selector" schemes too:
http://www.idpf.org/epub/oa/#h.4o298bjh2atb
...but bear in mind that this is about an authoring and interchange serialization format, which has different design goals than an internal, implementation-specific, machine processing-oriented data model and persistence / archival syntax. Both care about interoperability, but the applicable scope is different.

As a developer, if I had to re-think from scratch a "document position / range" design, I would probably come-up with a design very similar to CFI. CFI character references operate at a low document level (UTF16 code units, although I will admit that code points have benefits too), so this maps well with DOM ranges and Javascript string libraries.
Just like with character offsets, the CFI "XML path" syntax is ; crucially ; canonical, which allows comparing / sorting without having to load the target document / DOM. This is a useful quality, for example when storing / managing / rendering / navigating locations that target separate documents within a publication.
Note that XPath, XPointer's various schemes, Selectors, etc. do not support this, which is why highlights / bookmarks / annotations services on the web usually implement their own internal format.

The problem with CFI is that it is ; by definition / by design ; not a human-friendly serialization format. CFI references are really only meant to be generated and consumed by tools / software libraries.

So, I too think that from a usability perspective, a multi-prong approach is necessary . That being said, behind the scenes I would keep working with CFI. Contrary to popular belief (and that is partly our fault because we tend to label evil pagination bugs with the term "cfi" ;) ), the part that is hard to implement is all to do with DOM / CSS idiosyncrasies (browser discrepancies). The CFI bits are actually already pretty solidly implemented.

Dan

from architecture.

HadrienGardeur avatar HadrienGardeur commented on June 8, 2024

@danielweck lack of readability and implementation outside of EPUB-land are just two of the problems with a CFI only approach.

There are other valid reasons:

  • an extra context (text, position in the table of contents, page number) is necessary to display a number of things such as annotations or search results to the user
  • CFI can very easily break if the document is updated, and since we want to prepare for the future this covers PWP/BFF where publications will live on the Web and can be frequently updated

CFIs might still be our number one priority when we attempt to anchor back an annotation to a document, but relying strictly on it is IMO potentially harmful.
Also, I don't think that the left part of the CFI should be used at all, only the right part is truly useful.

We also need to explore if using HTML instead of XHTML will have an impact on CFI.

from architecture.

llemeurfr avatar llemeurfr commented on June 8, 2024

I followed @danielweck link and moved to the W3C Open Annotation spec https://www.w3.org/TR/2016/CR-annotation-model-20160906/#selectors which is more recent than the OpenAnnotation link given in he IDPF spec. I read there that "Multiple Selectors can be given to describe the same Segment in different ways in order to maximize the chances that it will be discoverable later, and that the consuming user agent will be able to use at least one of the Selectors." -> this is what @HadrienGardeur has proposed.

from architecture.

danielweck avatar danielweck commented on June 8, 2024

@llemeurfr also take a look at
http://w3c.github.io/web-annotation/selector-note/#selectors
I posted an issue about TextPositionSelector:
http://w3c.github.io/web-annotation/selector-note/#TextPositionSelector_def
here:
w3c/web-annotation#350

from architecture.

danielweck avatar danielweck commented on June 8, 2024

And just a shout-out for a useful TextPositionSelector implementation:
https://github.com/tilgovi/dom-anchor-text-position
https://github.com/tilgovi/dom-seek
;)

from architecture.

BigBlueHat avatar BigBlueHat commented on June 8, 2024

I'll also note (now that you've found @tilgovi's fabulous code), that there's an effort to build some amount of the selection bits into the upcoming Apache Annotator (incubating) project--which is the hopeful successor of the AnnotatorJS project.

We're just getting that on it's feet, but you're welcome to participate, and certainly we hope to have code built there which Redium (and everyone) can benefit from.

Apache Annotator (incubating) Mailing List

from architecture.

danielweck avatar danielweck commented on June 8, 2024

Thanks @BigBlueHat for the heads-up re. Apache Annotator!

from architecture.

iherman avatar iherman commented on June 8, 2024

Just to be clear: that note (the plan is to publish it as a formal note early 2017) is just an extract of the Web Annotation Model specification, making it easier to use for those who are only interested by selectors.

Ivan

On 11 Oct 2016, at 21:50, Daniel Weck [email protected] wrote:

@llemeurfr https://github.com/llemeurfr also take a look at
http://w3c.github.io/web-annotation/selector-note/#selectors http://w3c.github.io/web-annotation/selector-note/#selectors
I posted an issue about TextPositionSelector:
http://w3c.github.io/web-annotation/selector-note/#TextPositionSelector_def http://w3c.github.io/web-annotation/selector-note/#TextPositionSelector_def
w3c/web-annotation#350 w3c/web-annotation#350

You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #9 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AAfyE1Ay7VCvQNUgwp_iv2cgxjnm-k9_ks5qy-hvgaJpZM4KDJN6.

from architecture.

danielweck avatar danielweck commented on June 8, 2024

@HadrienGardeur 's doc:
https://github.com/readium/readium-2/tree/master/locators

Note that 3.1 Fragment Selector lists EPUB3 CFI as a valid potential type:
http://w3c.github.io/web-annotation/selector-note/index-respec.html#FragmentSelector_def
...but this is different from Hadrien' suggestion to use only the rightmost part of CFI expressions (without dereferencing the leftmost OPF / spine -related "path").

PS, fun fact:
Web Annotation Data Model is a W3C Proposed Recommendation dated 17 January 2017 ;)
(in the future from now!)
http://w3c.github.io/web-annotation/model/wd2/

from architecture.

HadrienGardeur avatar HadrienGardeur commented on June 8, 2024

@danielweck the list of selectors in the W3C Web Annotation Data Model spec is indeed a good source of inspiration, and I think we're aiming for something quite similar.

For the syntax though, I'm not entirely sure that we need to adopt their JSON-LD based model since we're not dealing with annotations (at least not yet) and so far the only place where we plan on representing locators is in search, where a different syntax would probably be more useful.

from architecture.

iherman avatar iherman commented on June 8, 2024

from architecture.

HadrienGardeur avatar HadrienGardeur commented on June 8, 2024

Thanks for those links @iherman !

For search, we don't really need a fragment id based syntax since we'll need to output a JSON document anyway.
But it could prove useful to support a fragment id based syntax purely for navigation. It would be up to the reading system to decide when such fragments are useful.

from architecture.

iherman avatar iherman commented on June 8, 2024

from architecture.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.