Giter Club home page Giter Club logo

Comments (5)

fuxialexander avatar fuxialexander commented on August 27, 2024
  1. For a new PDF, if you read and annotate it in org-noter with org-noter-pdftools, I believe all annotation will be incrementally added in proper order (page order), which seems would not have the problem you mentioned?

  2. Do you think the solution @UndeadKernel mentioned here weirdNox/org-noter#94 (comment) solves your issue? If what you want is to have a function that sync between all existing annotations with the org file every time the function is called, I think what he mentioned will work, and is feasible to achieve. We already have a variable org-noter-pdftools-use-unique-org-id to let org-noter-pdftools store the annotation id in PDF-tools which is unique and will not change across usage.

  3. I'm not very sure about the connection between org-drill and an incremental skeleton creation function, could you elaborate more?

from org-pdftools.

UndeadKernel avatar UndeadKernel commented on August 27, 2024

Hey @fuxialexander, I was wondering about doing what you mention in your first point above. When I create a new annotation in org-noter, using pdf-annot-add-highlight-markup-annotation for example, am I supposed to see a new headline added with what was highlighted?

Syncing annotation comments with those also defined in org-noter would also be a rather nice thing to have. This way, you can more easily share the commented paper with others.

If you want support programming anything like this. Give me a few pointers and I might be able to implement it myself.

from org-pdftools.

fuxialexander avatar fuxialexander commented on August 27, 2024

@UndeadKernel You need to select the text and call org-noter-insert-note rather than use pdf-annot-add-highlight-markup-annotation. Then the text will be highlighted (you can tweak the color using custom variables) and a heading will be inserted.

from org-pdftools.

fuxialexander avatar fuxialexander commented on August 27, 2024

@UndeadKernel For your second question, you might want to look into pdf-info-editannot and org-narrow-to-subtree.

(pdf-info-editannot 'annot-1-18 '((contents . "Some org text")))

This will set the contents of annotation(e.g. highlights) with an ID "annot-1-18" (ID's are saved in the org-pdftools link and thus also in the org-heading property in org-noter files) to "Some org text".
And you just need to org-narrow-to-subtree (or some other way to get the text you want) with save-excursion and get the text, and call pdf-info-editannot to insert it.

from org-pdftools.

alessivs avatar alessivs commented on August 27, 2024

@suiokami

I believe that we can create a workable PDF IR system by mostly stitching together existing tools.

Additionally, we can structure the extracted content for org-drill by adding the :drill: tag, making sure the header has a faux subheader so that org-drill recognizes the entry as a card etc.

I believe org-drill integration should only exist as a separate package. IR requires more than stitching existing tools together.

Firstly, org-drill is a terribly designed package (barring a proof-of-concept consideration), and in many sensible aspects mistaken, which makes it a bad choice for coupling. It claims to implement algorithms (SM-5 and SM-8) where only an outline of the algorithms exist; where late or mid-interval repetitions are not adjusted...sensible algorithm-related features are implemented with absolutely no validation (neither theoretical nor user-guided); it measures flipped cards together (not separately); cards with randomic answer sides are also computed together; one bug sat for many months where its implementation of SM-2 was not even doing what it meant to do, with absolutely no notice to the users; etc.

As an alternative to tight integration I suggest to develop an independent package, with an actual focus on IR, that:

  • Records headline IDs
  • Keeps track of which IDs are active (subject to incremental review), and which ones are done (i.e. dismissed, to be skipped)
    • List and visit active headings according to a priority function.

org-drill is not enough to implement incremental reading, be it PDFs or any other document format. See: Minimum definition of incremental reading (spoiler: The incremental reading review function is part of a separate algorithm–not one of the SM-ones; this is important).

If you do not wish to implement mechanisms for review of Org headings such as a global priority queue, priority protection, overload management, and so on, it is still possible to implement incremental reading (in the SuperMemo 2000-2004 sense) by using reading lists (which are still prioritized lists). In any case, it is far more than org-drill provides.

Lastly, we are painting ourselves into a corner by limiting the proposed Incremental Reading process to the structure provided by highlights from PDFs. (The assumption here, is that by focusing on highlights we can go back to them from our learning material): A valuable portion of IR is incremental elaboration; this process potentially mutates a neat structure inherited from a PDF document into a more personal "cognitive structure" (if you will), where banking on back-referencing highlights from internalized knowledge material may not be practical (unless, maybe, you're memorizing poetry or lyrics). This new cognitive structure would be the Org reproduction/annotation/summary/elaboration of the PDF, and it is to be the new source of truth, from which active-recall cards will be derived. The PDF will only exist as passive reference, or for book-keeping purposes, or may simply disappear into oblivion.

EDIT: expanded below

My thinking, which I describe here, is that we have an incremental extraction of highlighted elements, or even more simply an advice-add or function that triggers an extraction via org-noter's code for every highlight made, as its made.

The mechanism for this process already exists thanks to a few interactive functions.

  • org-noter-create-skeleton reads the bookmarks of the outline section of the PDF, and recreates them in the synced org file with precise locations. In many cases it is a good enough structure to kick off a semantic (branch) review process (i.e. based on, or structured by, a hierarchy of topics marked by document sections)
  • org-noter-insert-note with a text selection, inserts a precise note and intelligently locates the appropriate Org heading candidates (usu. the corresponding document section of the created skeleton) to place it into; without a text selection, inserts a note anchored to the page.
  • org-noter-insert-precise-note is a godsend. It is possible to do what the org-noter-create-skeleton does when dealing with bookmark-less, image scan-only, or otherwise problematic PDF (see: What's so hard about PDF text extraction?). Since it doesn't depend on PDF capabilities (except the ability to point to a coordinate of the page), and only deals with Org, you can insert a precise note pointing to anywhere. Run this function interactively whenever you see a structural element you would like to insert into the synced Org file; you remain in control of the outline structure at all times. If the PDF doesn't allow text selection, you fill this heading with your own text, and still have a precise location to go back to for further processing. You cannot do this with highlights.

The difference with your proposal is:

  • It doesn't bank on the ability of PDF text to be highlighted (works for any document).
  • Because it doesn't deal with PDF annotations, one is not constrained by the comparatively low number of possible annotation formats permitted by the PDF spec (however neatly the current org-pdftools package helps with this limitation)
  • You remain in control of the semantic structure; not the author or producer of the PDF.
  • The PDF is not the source of truth anymore. It is your elaboration of it and ultimately the active recall material that becomes part of long-term memory; it needs not relate to a single PDF.

In short, the obvious: annotating Org is far better than annotating PDF.

The (somewhat) bad news is that it is still up to you how to schedule review of your structured notes. I export to SuperMemo itself (wrapping ox-clip for now) and use it to go back to one of the headings of the PDF-linked Org file when it tells me to (until, in the end, everything to be remembered is managed by SM). There's no reason one cannot do an implementation in pure elisp; I just want to point out that basing it off PDF annotations is an inferior approach, a few conceptions about incremental reading embedded in the proposalmay be slightly mistaken, and why the proposed feature perhaps should not be part of one of the existing noter packages.

from org-pdftools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.