alphapapa / org-web-tools Goto Github PK

View Code? Open in Web Editor NEW

627.0 20.0 33.0 154 KB

View, capture, and archive Web pages in Org-mode

License: GNU General Public License v3.0

Emacs Lisp 50.10% Makefile 1.66% Shell 48.24%

org-web-tools's People

Contributors

Stargazers

Watchers

org-web-tools's Issues

heads up...tracking a problem with archive.today and also wget options

Just a heads up...

For some reason archive.today requests are failing (no, not using Cloudflare) and then the backup wget is failing because it does not like the '--execute robots=off' option.

I'm going to try to solve the archive.today problem first but I'll race ya! ;)

Problem with autoloads between files?

Thank you for fixing the last issue so quickly. I'm getting now a new error when I try to attach a website:
byte-code: Symbol’s function definition is void: org-web-tools--read-url

I'm still using GNU Emacs 25.2.2 on Ubuntu 18.04 :)

I don't know if its relevant but this would be my init config:

(use-package org-web-tools
    :ensure t
    :after org
    :bind (:map org-mode-map
        ("C-c w" . org-web-tools-attach-url-archive)
        ("C-c W" . org-web-tools-view-archive)))

Unable to get submitid

Hello,

This is my first time using this tool, I ran it on this link and it failed. I don't know how to get a more descriptive backtrace.

Debugger entered--Lisp error: (error "Unable to get submitid")
  error("Unable to get submitid")
  org-web-tools-archive--archive.is-submitid()
  org-web-tools-archive--archive.is-url-id("http://www.hifi-forum.de/viewthread-185-6402.html#1")
  org-web-tools-archive--archive.is-archive-url("http://www.hifi-forum.de/viewthread-185-6402.html#1")
  org-web-tools-archive--archive.is("http://www.hifi-forum.de/viewthread-185-6402.html#1")
  org-web-tools-attach-url-archive--1("http://www.hifi-forum.de/viewthread-185-6402.html#1")
  org-web-tools-archive-attach("http://www.hifi-forum.de/viewthread-185-6402.html#1" nil nil)
  funcall-interactively(org-web-tools-archive-attach "http://www.hifi-forum.de/viewthread-185-6402.html#1" nil nil)
  command-execute(org-web-tools-archive-attach record)
  execute-extended-command(nil "org-web-tools-archive-attach" nil)
  funcall-interactively(execute-extended-command nil "org-web-tools-archive-attach" nil)
  command-execute(execute-extended-command)

Pandoc failed with org-web-tools-read-url-as-org

Emacs 27.050.
Windows 10.
pandoc 1.17.5.1
I just receive pandoc failed when executing function org-web-tools-read-url-as-org. Emacs --debug-init does not give any additional information.

View archived web pages via Hypothesis

Thanks for the awesome package! I wonder if org-web-tools-archive-view can be configured to open archived web pages via Hypothesis? Unfortunately, their web proxy doesn't seem to work with local web pages (the bookmarklet does works though). Could there be a workaround this issue?

SSL Problems with archive.fo

This is most likely not something you can fix.
However I thought it might be good to report it here anyway in case someone else runs into this problem.

archive.today seems to have some SSL problems at the moment.
At least it doesn't really work for me when I tried to use it with Firefox.
I got this error:

Cannot communicate securely with peer: no common encryption algorithm(s). Error code: SSL_ERROR_NO_CYPHER_OVERLAP

In Emacs I get this:

error in process filter: gnutls-negotiate: GnuTLS error: #<process archive.today>, -12
error in process filter: GnuTLS error: #<process archive.today>, -12

Make org format customizable

Hello,

Thanks very much for the great package: I use it with org-feed and finally getting feeds content is much more reliable!

About the issue: currently downloading a link results in something like

[[link][title]] :website:
timestamp
** Article
contents

For my use case I do not need the ** Article heading.
This is enforced in org-web-tools--url-as-readable-org in this bit here:

...
    (with-temp-buffer
      (org-mode)
      ;; Insert article text
      (insert converted)
      ;; Demote in-article headings
      (org-web-tools--demote-headings-below 2)
      ;; Insert headings at top
      (goto-char (point-min))
      (insert "* " link " :website:" "\n\n"
              timestamp "\n\n"
              "** Article" "\n\n")
      (buffer-string))))

I have the feeling that this can be abstracted in a function format article-contents which defaults to your template, but that can be configured by the user. Something along the lines of:

...
    (format converted))))

(defun format (contents)
  "formats the article contents with title, timestamp, article heading"
  (with-temp-buffer
      (org-mode)
      ;; Insert article text
      (insert contents)
      ;; Demote in-article headings
      (org-web-tools--demote-headings-below 2)
      ;; Insert headings at top
      (goto-char (point-min))
      (insert "* " link " :website:" "\n\n"
              timestamp "\n\n"
              "** Article" "\n\n")
      (buffer-string)))

Would that make sense? For now I am using a modified version of org-web-tools--url-as-readable-org, but I really would like to not miss any future enhancement of this nice package :)
Thanks very much for the time spent in this!

Weird pandoc behavior (?)

Hi, thanks a lot for the very nice tool. I started playing with it for my web clipping activities. There is however some kind of weird behavior of pandoc or org-web-tools, I can not tell at this moment. It happened like this: I have pandoc installed on my Macmini running macOS High Sierra. When I tried to use org-web-tools-read-url-as-org, the minibuffer said something like "Can not test pandoc, please report a bug". Just by chance, I was opening the terminal and first tried which pandoc to check if it had been installed (it had been) and then I ran pandoc for testing in the terminal. However, somehow by letting pandoc run in the terminal, the function of org-web-tools-read-url-as-org worked. It stopped working when i tried to stop pandoc running in the terminal.
Is there anything I need to change in the environment or emacs configuration file for it to work properly without needing to have the terminal running with the pandoc command? I did have this (when (memq window-system '(mac ns x)) (exec-path-from-shell-initialize)) in the .emacs file after googling about the problems of emacs running in macOS not being able to use executable files in usr/local/bin.
Thanks again for the nice tool and also the helm-org-rifle tool, I love both of them a lot.

org-element-at-point and w3m error

I am getting this error now after an Emacs update...

Error running timer ‘org-reveal’: (error "‘org-element-at-point’ cannot be used in non-Org buffer #<buffer w3m> (w3m-mode)") [2 times]

It happens when I am in a w3m buffer, trying to capture it...
with the command:
C-c c
then:
w

I hadn't had such an error in the past...

Wrong type argument: integer-or-marker-p, nil

Recently I'm getting this error message org-web-tools--url-as-readable-org: Wrong type argument: integer-or-marker-p, nil with a lot of articles (here is an example).

This happens with Emacs 26.1 on both Arch Linux and Debian.

condition-case: Bad url: /

When I run the M-x org-web-tools-insert-web-page-as-entry for an url, the error is as follow:

condition-case: Bad url: /

I don't know why this happens.

My emacs version is 26.3 on macOS 10.15.4.

Error when the HTML contains an empty title

org-web-tools-insert-link-for-url raises the following error when the HTML of the url contains a title element with no content:

Debugger entered--Lisp error: (wrong-type-argument arrayp nil)
 replace-regexp-in-string("\n" " " nil t t)
 s-replace("\n" " " nil)
 org-web-tools--cleanup-title(nil)

I may fix this issue if I have time, but I'll just file it for now. Actually, the command doesn't make sense if the web page contains no title, so I have no idea what to do with this case.

The tool is not able to insert some pages

First, thanks for your time and effort here, second I'm trying to move from a cloud-based read-it-later service and found org-web-tools as a powerful tool to download articles on my machine.

But I faced a problem with some links where the tool was not able to download content and instead give off the following in the buffer:

 The plain HTTP request was sent to HTTPS port

Here are some of those links:

https://www.artofmanliness.com/articles/the-problem-with-minimalism/
https://www.stilldrinking.org/programming-sucks
https://hackernoon.com/function-composition-with-lodash-d30eb50153d1
https://alistapart.com/article/cult-of-the-complex
https://medium.com/@Imaginary_Cloud/javascript-ecosystem-overview-2018-fa9f776ddf74

handling same-page relative links, such as footnotes

hi ap, & as the others have said, thx for this great package.

i was impressed when i converted an academic essay with -read-url-as-org and it rendered all the footnote anchors as org-links, but it turns out they are relative links to nowhere, not to the notes at bottom of page/document. ditto the footnote links back up to the body of the text. i guess it is just a pandoc issue? is there any way they could be further processed somehow?

for me they appear as [[#en41][41]]. while the footnote return links appear as [[#fn1][↩], having been rendered from something like <a id="fn41" class="endnote-link" href="#en41" rel="footnote">41</a>.

my example text:
https://monthlyreview.org/2014/07/01/surveillance-capitalism/

i'm not sure if its something that should be supported, but thought i'd mention it in case there is a workaround or if others have the same issue.

thx again.

Curl error

After issuing any "M-x org-web-tools-*" commands, I am getting this error:

plz: plz: Curl error: "Curl error", #s(plz-error (6 . "Couldn't resolve host. The given remote host was not resolved.") nil nil)

archive.today archiving function does not work anymore

I'm not sure what argument this is I'm calling the function without a choose-fn argument

Archive not yet available.

When i try to org-web-tools-archieve-attach i get "Archive not yet available", even after 6 retries. When i manually go to archieve.today and submit url it gets created in seconds. What can be the reason? I tried 2 urls - 1st ft.com article, 2nd random post on reddit.

Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-EgsajH/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Archive not yet available.  Retrying in 15 seconds (1/6 attempts)
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-EO31P2/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Archive not yet available.  Retrying in 15 seconds (2/6 attempts)
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-vPabrh/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Archive not yet available.  Retrying in 15 seconds (3/6 attempts)
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-acU6fX/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Archive not yet available.  Retrying in 15 seconds (4/6 attempts)
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-RbDen2/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Archive not yet available.  Retrying in 15 seconds (5/6 attempts)
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-1zI1YB/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Retrying with other functions...
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-ON8K5y/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Retrying with other functions...
"wget output:

/data/data/com.termux/files/usr/bin/wget: unrecognized option '--execute robots=off'
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.
"
wget output:

/data/data/com.termux/files/usr/bin/wget: unrecognized option '--execute robots=off'
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.

Error running timer: (error "No such directory found via CDPATH environment variable")

add space around markup text from "abcbolddef" to "abc bold def"

some html like this:

<p>abc<strong>bold</strong>def</p>
<p>abc<em>italic</em>def</p>
<p>abc<code>verbatim</code>def</p>

when converted to org, pandoc produced result as below, which emacs cannot recognize and highlight.

abc*bold*def

abc/italic/def

abc=verbatim=def

so i wish to add similar function to 'org-web-tools--clean-pandoc-output'.

Symbol’s function definition is void: caddr

Running emacs 25.3.2. Installed org-web-tools from MELPA. I'm getting errors when trying to run org-web-tools-insert-web-page-as-entry or org-web-tools-insert-link-for-url

Contacting host: boingboing.net:443
org-web-tools--html-title: Symbol’s function definition is void: caddr
Contacting host: boingboing.net:443
org-web-tools--eww-readable: Symbol’s function definition is void: caddr

"Unable to test Pandoc"

I am trying to apply org-web-tools-insert-link-for-url, and I get:

 Unable to test Pandoc!  Please report this bug! (include the output of "pandoc --dump-args --no-wrap")

From the terminal, running pandoc --dump-args --no-wrap I get:

[pandoc warning] --no-wrap is deprecated. Use --wrap=none or --wrap=preserve instead.

I am running Eamcs 25.1 on FreeBSD 11.1, with Pandoc 1.19.2.1

ultra clean output via pandoc

This was already noted pandoc does not cleanup entirely the html file from unncessary infomation

for example when I simply download

https://www.theguardian.com/politics/2017/aug/30/may-to-press-japan-on-its-eu-trade-deal-in-hopes-of-a-model-for-uk
and convert it to org via pandoc there are a lot of

#+BEGIN_HTML

#+END_HTML

which is what happen when I use
org-web-tools-insert-web-page-as-entry

fortunately there is a workaround as it was pointed out to me on the pandoc mailing list, namely

pandoc -f html-raw_html-native_divs -t org May_to_press_Japan_on_its_EU_trade_deal_in_hopes_of_a_model_for_UK_Politics_The_Guardian.html -o neu.org

so could these options be included, in org-web-tools?

thanks

Uwe Brauer

Feature: replace existing URLs in Org document

First of all, thanks for this package. It's making my notes, which often include a lot of (lists of) links to articles and so on, a lot more readable.

I have quite a few notes which already include a number of plain URLs or URLs with a short version of the page title. To make these notes more readable I would like to automatically change their description to the title of the linked page. Instead of yanking each URL and calling org-web-tools-insert-link-for-url, it would be handy to have another command which looks at the URL under the cursor, and replaces it with a link that has the title of the page as its description. If the link already has a description it could simply replace the existing description.

MELPA recipe force org install

Hello,

I noticed that org-web-tools' MELPA entry lists org 9.x as dependency: http://melpa.org/#/org-web-tools

That forces Emacs to install org 9.x, although it may be already installed via org-plus-contrib, like in my case.

I am afraid having two org installs may lead to some confusion.

I am not sure if this is something to let MELPA team know.

Sorry for pestering you with minor issues.

Thank you for writing this package

Hey @alphapapa,

Thanks a lot for putting your time into this package. It's crazy useful. org-web-tools-insert-web-page-as-entry is a staple in my workflow.

Feel free to close this. 🙂

Compilation errors

This package has a few compilation errors:

In org-web-tools--pandoc-no-wrap-option:
org-web-tools.el:140:7:Warning: reference to free variable
    ‘org-web-tools--pandoc-no-wrap-option’
org-web-tools.el:141:13:Warning: assignment to free variable
    ‘org-web-tools--pandoc-no-wrap-option’

In org-web-tools--check-pandoc-no-wrap-option:
org-web-tools.el:164:22:Warning: reference to free variable
    ‘org-web-tools-pandoc-sleep-time’

In org-web-tools--remove-bad-characters:
org-web-tools.el:181:40:Warning: reference to free variable
    ‘org-web-tools-pandoc-replacements’

In org-web-tools--get-url:
org-web-tools.el:342:53:Warning: reference to free variable
    ‘url-http-end-of-headers’

In end of data:
org-web-tools.el:505:1:Warning: the following functions are not known to be
    defined: libxml-parse-html-region, string-empty-p

Update template example for newer Org, and add autoload for function

I used the snippet from the code to setup my org-capture template:

 ("w" "web site from clipboard" entry (file "~/org/articles.org")
  "%(org-web-tools--url-as-readable-org)")

but I get the error Template is not a valid Org entry or tree.

Not sure what's going on here.

"X selection unavailable for this frame"

Installed org-web-tools.
In org-mode, with cursor over a link in Org format: [[URL][Description]]
Attempt to M-x any of the commands results in the error message: "X selection unavailable for this frame."

Is it possible to support images?

Howdy @alphapapa, thanks for another amazing package!

I would love to download images as well. For instance this article works just fine with eww-readable, and a couple of images are critical to understanding the context.

Looking at the org-web-tools code, it appears that images are not fetched at all and therefore cannot be displayed. Pandoc support may be the other potential pitfall.

Am I on the right track, or are there other issues for supporting images that I'm not seeing?

Add option to disable use of `eww-readability` in `org-web-tools-read-url-as-org`

I am taking advantage of org-web-tools-read-url-as-org to retrieve word definitions and synonyms from web pages as org buffers. This is truly convenient as I get the content in org syntax and can use no matter what site I want without having to rely on a package.

Unfortunately content is missing for some urls. Examples:

Tracked this down to the line (eww-score-readability dom) in org-web-tools--eww-readable. If I remove that line from the function it inserts all content. eww-score-readability is rather cryptic so not sure how to solve this issue.

Setting up with org capture templates and doct

I'd appreciate help on using the doct package with org-web-tools, or rather the correct way to use org-web-tools commands with org capture since I'm new to elisp. This problem feels closer to a syntax error.

The initial goal is to be able to capture a URL (often from eww, or from a general clipboard copy) as a task, and extend this to construct other templates to capture entire webpages.

The function I have with the template is below. This uses the %x expansion fed into org-web-tools--get-url.

;; function holding the doct template
(defun sr/todo-file-ext-link-act-date ()
'("* %{todo-state} %(org-web-tools--get-url '%x)"
":PROPERTIES:"
":CREATED: %<%Y-%m-%d %a %H:%M>"
":PLANNED: %^t"
":END:"
"%?"))

Org capture template for URL's via doct:
(I've only pasted the relevant capture snippet below rather than all the templates I use)

(setq org-capture-templates
      (doct '(("Todo" :keys "t"
               :file "~/my_org/todo.org"
               :prepend t
	       :children (("External link"  
			   :keys "e"
			   :type entry
			   :headline "@reading"
			   :todo-state "TODO"
			   :template sr/todo-file-ext-link-act-date))))))

The error is "Capture abort: Invalid read syntax: ")". I am using org 9.3.6

org-bracket-link-regexp is obsolete since org 9.3

org-bracket-link-regexp is now an alias for org-link-bracket-re.
You cannot obtain a link description from its third matching substring but the second one.
The following expression in org-web-tools--read-org-bracket-link sets desc to nil, which causes an error in org-web-tools-read-url-as-org:

(when (re-search-forward org-bracket-link-regexp (point-at-eol) t)
          (setq target (match-string-no-properties 1)
                desc (match-string-no-properties 3)))

Instead, you could rewrite it as follows:

(when (re-search-forward org-link-bracket-re (point-at-eol) t)
          (setq target (match-string-no-properties 1)
                desc (match-string-no-properties 2)))

You'll probably need a workaround for supporting Org < 9.3, though.

org-web-tools--get-first-url fails in terminal

When running emacs -nw:

ELISP> (org-web-tools--get-first-url)
*** Eval error ***  Wrong type argument: stringp, nil

Relevant code.

The problem is that (gui-get-selection 'CLIPBOARD) returns nil, thus the first item is nil, which gets passed to string-match, which blows up when it sees nil.

convert only part of website? (selected/copied text)

Another suggestion i have is to allow the same tools to work on selected parts of a website and not the whole webpage. for example convert only the selected/copied text? any thoughts on this?

Symbol’s function definition is void: file-attribute-size

I'm getting this error when I try to attach a website:
org-web-tools-attach-url-archive: Symbol’s function definition is void: file-attribute-size

I'm using GNU Emacs 25.2.2 on Ubuntu 18.04

archive and attach all urls in an entry

Is it possible to run a single command to archive all urls in an org entry and attaching? Right now if I run org-web-tools-archive-attach on an entry, it archives and attaches whatever the first URL in the kill-ring is as far as I can tell. It only works when the point is on a link.

Document `org-web-tools--org-link-for-url` in README

This is a very useful function to include in bookmark capture templates but it's not documented in the README.

Cryptic errors?

Hi all:

I can't get a single org-web-tools-FOO command to work. I keep getting things like /443 Name or service not known

or Before first headline at position 1 in buffer FOO.org.

Any ideas as to what I'm doing wrong?

Open local url?

Is it possible to use org-web-tools-read-url-as-org with a local html file? I tried:

(org-web-tools-read-url-as-org "/path/to/index.htm")

But I got bad url error.

get rid of pandoc left overs

Hi again

Ok couldnt wait until i got home to play around with it :)

so when using pandoc ive noticed before it leaves alot of junk, like:


[pandoc warning] --no-wrap is deprecated. Use --wrap=none or --wrap=preserve instead.
#+BEGIN_HTML
  <div class="mw-parser-output">
#+END_HTML

#+BEGIN_HTML
  <div class="hatnote navigation-not-searchable" role="note">
#+END_HTML

i get around this by manually using this function

(defun z/clean-html-2-org-pandoc ()
(interactive)
(replace-string "#+BEGIN_QUOTE" "#+BEGIN_SRC R :session Rorg  :results none" nil (point-min) (point-max))
(replace-string "#+END_QUOTE" "#+END_SRC" nil (point-min) (point-max))
;clean extra \\
(replace-string "\\\\" "" nil (point-min) (point-max))
;;clean \_
(replace-string "\\\_" "_" nil (point-min) (point-max))
( z/bullets-to-spaces)
)

perhaps it could be of use and you can use this or something similar to auto clean the files right after the pandoc process?

best and thx again

handle images in links

@c1-g i tried out your branch, it works well for relative links to files and images.

i also noticed another issue, which perhaps you could address?
(it is in the main branch too, but maybe you have such knowhow?)

html like, this (an image that is also a link):

<a href="/vorratsdatenspeicherung" hreflang="de"><img src="/sites/default/files/styles/medium_crop/public/2017-09/fsa-unschuldsvermutung_%20John-Paul_Bader_cc-by-sa2.jpg?itok=CQntlFzw" width="410" height="208" alt="John-Paul Bader, CC BY SA 2.0" loading="lazy" typeof="foaf:Image" class="image-style-medium-crop" />

renders into a kind of hyperactivated org link

[[https://digitalcourage.de/digitale-selbstverteidigung][[[https://digitalcourage.de/sites/default/files/styles/medium_crop/public/2017-09/IMG_20160107_155735159.jpg?h=63c968e9&itok=rkduUPh5]]]]

i.e. it generates two links, one from href= and one from img src=, with mangled square brackets.

SVG links broken

Hi again!

thx for extending the alphapapa universe ;-)

i will test the package later as im traveling yet one quick note

the svg links in the homepage seem broken :)

best

`org-element-at-point` cannot be used in non-Org buffer

i recently updated my packages, now i receive an error when i try to use org-web-tools-read-url-as-org.

the backtrace i see looks like this (w url and html args substituted):

error("`org-element-at-point' cannot be used in non-Org buffer %S (%s)" #<buffer  *temp*> fundamental-mode)
org-element-at-point(nil cached)
org-before-first-heading-p()
org-back-to-heading-or-point-min(t)
org-get-property-block()
org-at-property-p()
org-web-tools--remove-custom_id_properties()
org-web-tools--clean-pandoc-output()
org-web-tools--html-to-org-with-pandoc(my-urls-html)
org-web-tools--url-as-readable-org(my-url)

i wonder if when updating, org was updated, and the behaviour of org-element-at-point has been changed?

Feature: Use browser cookies file

Edit: This might be out of scope for this package and instead the place of some sync browser with url-cookies-file type package that runs on the idle timer. Maybe not though and I'm sure others have had this need that use org-web-tools before.

I link to private github repositories and would like to have the title of the page hidden unless logged in rather than link text of "Page not found - Github".

I plan on trying to implement this at some point, but want to leave the idea here in case I don't get to it, someone wants to pick it up immediately, or there are ideas about how to implement it.

After a little research we can do the following for firefox:

copy the cookies.db file to a temporary file (in firefox's case they obtain an exclusive lock on an sqlite database)
select cookies from db and save result in variable
delete temporary cookies db
export to format expected by url-retrieve-synchronously ( couldn't find this format)
append values to url-cookies-file

Then url-retrieve-syncronously and by extension org-web-tools-insert-link-for-url can access anything you've logged into with firefox!

fetch link at point

i thought to add this to org-web-tools, in order to say, run org-web-tools-read-url-as-org on a link in an elfeed entry or in eww, or on also an org link:

(defun org-web-tools--get-first-url ()
  "Return (shr) URL or org link URL at point, or URL in clipboard, or first URL in the `kill-ring', or nil if none."
  (or (shr-url-at-point nil) ; elfeed or eww links
      (plist-get (get-text-property (point) 'htmlize-link) :uri) ; org links
      (cl-loop for item in (append (list (gui-get-selection 'CLIPBOARD))
                                   kill-ring)
               when (and item (string-match (rx bol "http" (optional "s") "://") item))
               return item)))

could poss add some others, could poss find more rigorous way of fetching them...

defvar pandoc command

It's good practice to use a defvar variable to set external commands: this way, it's especially useful for systems where pandoc is installed but not necessarily in the PATH (e.g. with Nix or Guix) or for alternate installs of pandoc.

(defvar org-web-tools-pandoc-command "pandoc")

Symbol's function definition is void: temp-dir

Need to add a backtrace log.
Happened on a fresh Ubuntu 18.10 with emacs 25.2

"Pandoc failed" on certain websites

On some web pages I get an error "Pandoc failed". On others capture succeeds. For example on these two pages randomly chosen:

https://sachachua.com/blog/2007/12/planner-basic-configuration/
http://muto.ca/b/19-Rmail.html

The first fails, but the second succeeds.

Fix/modify relative URLs when inserting entries

After using org-web-tools-insert-web-page-as-entry' or org-web-tools-convert-links-to-page-entries' some relative urls within the web page are still relative in the org entry, which means they are now relative to the org file the page was inserted into, rather than to the original webpage.
I think it would be useful if these urls were to be made "absolute", so that they refer to the corresponding online content.
I think this applies to links and references to files such as images.Fix relative links / urls / references

Error when trying to use org-web-tools-archive-attach

Hi, I'm getting the error below when trying this function--I have the cursor in an org file, and running this with eval-expression:

(org-web-tools-archive-attach "https://karpathy.github.io/2021/03/27/forward-pass/")

Error is:

org-web-tools-archive--archive.is-archive-url: Wrong type argument: stringp, nil

Am pretty sure I'm calling the function correctly. I get the following trace:

Debugger entered--Lisp error: (wrong-type-argument stringp nil)
  org-web-tools-archive--archive.is-url-id("https://karpathy.github.io/2021/03/27/forward-pass...")
  org-web-tools-archive--archive.is-archive-url("https://karpathy.github.io/2021/03/27/forward-pass...")
  org-web-tools-archive--archive.is("https://karpathy.github.io/2021/03/27/forward-pass...")
  org-web-tools-attach-url-archive--1("https://karpathy.github.io/2021/03/27/forward-pass...")
  org-web-tools-archive-attach("https://karpathy.github.io/2021/03/27/forward-pass...")
  eval-expression((org-web-tools-archive-attach "https://karpathy.github.io/2021/03/27/forward-pass...") nil nil 127)
  funcall-interactively(eval-expression (org-web-tools-archive-attach "https://karpathy.github.io/2021/03/27/forward-pass...") nil nil 127)
  command-execute(eval-expression)

uses obsolete cl package

Nice package. But when I load it, I get the warning:

Package cl is deprecated

This can be fixed by using

(require 'cl-lib) ; the updated package for 'cl' functions

alphapapa / org-web-tools Goto Github PK

org-web-tools's People

Contributors

Stargazers

Watchers

Forkers

org-web-tools's Issues

Recommend Projects

Recommend Topics

Recommend Org