Giter Club home page Giter Club logo

org-web-tools's Introduction

org-web-tools

https://melpa.org/packages/org-web-tools-badge.svg https://stable.melpa.org/packages/org-web-tools-badge.svg

This file contains library functions and commands useful for retrieving web page content and processing it into Org-mode content.

For example, you can copy a URL to the clipboard or kill-ring, then run a command that downloads the page, isolates the “readable” content with eww-readable, converts it to Org-mode content with Pandoc, and displays it in an Org-mode buffer. Another command does all of that but inserts it as an Org entry instead of displaying it in a new buffer.

Installation

Requirements

  • Emacs 27.1 or later.
  • Commands that process HTML into Org require Pandoc. Note: The output of current Pandoc versions differs substantially from versions that may still be present in stable Linux distros. If you encounter any issues, please install a more recent version of Pandoc.

MELPA

After installing from MELPA, just run one of the commands below. If you want to use any of the functions in your own code, you should (require 'org-web-tools).

Usage

Commands

  • org-web-tools-insert-link-for-url: Insert an Org-mode link to the URL in the clipboard or kill-ring. Downloads the page to get the HTML title.
  • org-web-tools-insert-web-page-as-entry: Insert the web page for the URL in the clipboard or kill-ring as an Org-mode entry, as a sibling heading of the current entry.
  • org-web-tools-read-url-as-org: Display the web page for the URL in the clipboard or kill-ring as Org-mode text in a new buffer, processed with eww-readable.
  • org-web-tools-convert-links-to-page-entries: Convert all URLs and Org links in current Org entry to Org headings, each containing the web page content of that URL, converted to Org-mode text and processed with eww-readable. This should be called on an entry that solely contains a list of URLs or links.
  • org-web-tools-archive-attach: Download archive of page at URL and attach with org-attach. If CHOOSE-FN is non-nil (interactively, with universal prefix), prompt for the archive function to use. If VIEW is non-nil (interactively, with two universal prefixes), view the archive immediately after attaching. (See also org-board).
  • org-web-tools-archive-view: Open Zip file archive of web page. Extracts to a temp directory and opens with browse-url-default-browser. Note: the extracted files are left on-disk in the temp directory.

Functions

These are used in the commands above and may be useful in building your own commands.

  • org-web-tools--dom-to-html: Return parsed HTML DOM as an HTML string. Note: This is an approximation and is not necessarily correct HTML (e.g. IMG tags may be rendered with a closing “</img>” tag).
  • org-web-tools--eww-readable: Return “readable” part of HTML with title.
  • org-web-tools--get-url: Return content for URL as string.
  • org-web-tools--html-to-org-with-pandoc: Return string of HTML converted to Org with Pandoc. When SELECTOR is non-nil, the HTML is filtered using esxml-query SELECTOR and re-rendered to HTML with org-web-tools--dom-to-html, which see.
  • org-web-tools--url-as-readable-org: Return string containing Org entry of URL’s web page content. Content is processed with eww-readable and Pandoc. Entry will be a top-level heading, with article contents below a second-level “Article” heading, and a timestamp in the first-level entry for writing comments.
  • org-web-tools--demote-headings-below: Demote all headings in buffer so the highest level is below LEVEL.
  • org-web-tools--get-first-url: Return URL in clipboard, or first URL in the kill-ring, or nil if none.
  • org-web-tools--read-url: Return a URL by searching at point, then in clipboard, then in kill-ring, and finally prompting the user.
  • org-web-tools--read-org-bracket-link: Return (TARGET . DESCRIPTION) for Org bracket LINK or next link on current line.
  • org-web-tools--remove-dos-crlf: Remove all DOS CRLF (^M) in buffer.

Changelog

1.3

Changes

  • Errors from Pandoc are now displayed. (#47. Thanks to c1-g.)

Fixes

  • Default options to Wget (see #35).
  • Finding URL in clipboard on MacOS and Windows. (See #66. Thanks to @askdkc.)
  • Org timestamp format when inserting pages. (#54. Thanks to p4v4n for reporting.)

Internal

  • Use plz HTTP library and make various related optimizations.

Removed

  • Internal function org-web-tools--html-title. (If your program used this function, it’s trivially reimplemented; see source code.)

1.2

Improvements

  • Archiving tools:
    • Can use multiple functions to attempt archiving.
    • Associated options control retry attempts, delays, and fallbacks to other functions.
    • Functions to archive Web pages with wget and tar:
      • Function org-web-tools-archive--wget-tar archives a URL’s Web page, including page resources.
      • Function org-web-tools-archive--wget-tar-html-only archives a URL’s HTML only.
    • Command org-web-tools-archive-view handles both zip and tar archives.
    • The default settings use wget and tar to archive pages (because the archive.today service has not worked reliably with external tools for a long time).

Changes

  • Option org-web-tools-archive-fn defaults to using wget and tar to archive pages to XZ archives with HTML and page resources. (The archive.is service has not worked reliably with other tools for a long time.)

Fixes

  • org-web-tools--org-link-for-url now returns the URL if the HTML page has no title tag. This avoids an error, e.g. when used in an Org capture template.

Compatibility

  • Emacs 27.1 or later is now required.
  • Updated for Org 9.3’s changes to org-bracket-link-regexp. (Thanks to Aaron Zeng and Akira Komamura.)
  • Activate org-mode in temporary buffer for org-web-tools--html-to-org-with-pandoc. (#56. Thanks to mooseyboots.)
  • Use compat library.

1.1.2

Fixed

  • Only test non-nil items in org-web-tools--get-first-url. This makes it work properly in non-GUI Emacs sessions. (Thanks to Ben Sima for reporting.)

1.1.1

Fixed

  • Require org-attach.

1.1

Additions

  • Command org-web-tools-attach-url-archive.
  • Command org-web-tools-view-archive.
  • Function org-web-tools--read-url.

1.0.1

Changes

  • Remove all property drawers that contain the CUSTOM_ID property from Pandoc output.

1.0

  • First declared stable release.

Development

Contributions and suggestions are welcome.

License

GPLv3

org-web-tools's People

Contributors

alphapapa avatar askdkc avatar bcc32 avatar c1-g avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

org-web-tools's Issues

"Unable to test Pandoc"

I am trying to apply org-web-tools-insert-link-for-url, and I get:

 Unable to test Pandoc!  Please report this bug! (include the output of "pandoc --dump-args --no-wrap")

From the terminal, running pandoc --dump-args --no-wrap I get:

[pandoc warning] --no-wrap is deprecated. Use --wrap=none or --wrap=preserve instead.

I am running Eamcs 25.1 on FreeBSD 11.1, with Pandoc 1.19.2.1

uses obsolete cl package

Nice package. But when I load it, I get the warning:

Package cl is deprecated

This can be fixed by using

(require 'cl-lib) ; the updated package for 'cl' functions

handling same-page relative links, such as footnotes

hi ap, & as the others have said, thx for this great package.

i was impressed when i converted an academic essay with -read-url-as-org and it rendered all the footnote anchors as org-links, but it turns out they are relative links to nowhere, not to the notes at bottom of page/document. ditto the footnote links back up to the body of the text. i guess it is just a pandoc issue? is there any way they could be further processed somehow?

for me they appear as [[#en41][41]]. while the footnote return links appear as [[#fn1][↩], having been rendered from something like <a id="fn41" class="endnote-link" href="#en41" rel="footnote">41</a>.

my example text:
https://monthlyreview.org/2014/07/01/surveillance-capitalism/

i'm not sure if its something that should be supported, but thought i'd mention it in case there is a workaround or if others have the same issue.

thx again.

"X selection unavailable for this frame"

  • Installed org-web-tools.

  • In org-mode, with cursor over a link in Org format: [[URL][Description]]

  • Attempt to M-x any of the commands results in the error message: "X selection unavailable for this frame."

screenshot_1
screenshot_2
screenshot_3

The tool is not able to insert some pages

First, thanks for your time and effort here, second I'm trying to move from a cloud-based read-it-later service and found org-web-tools as a powerful tool to download articles on my machine.

But I faced a problem with some links where the tool was not able to download content and instead give off the following in the buffer:

 The plain HTTP request was sent to HTTPS port

Here are some of those links:

https://www.artofmanliness.com/articles/the-problem-with-minimalism/
https://www.stilldrinking.org/programming-sucks
https://hackernoon.com/function-composition-with-lodash-d30eb50153d1
https://alistapart.com/article/cult-of-the-complex
https://medium.com/@Imaginary_Cloud/javascript-ecosystem-overview-2018-fa9f776ddf74

Add option to disable use of `eww-readability` in `org-web-tools-read-url-as-org`

I am taking advantage of org-web-tools-read-url-as-org to retrieve word definitions and synonyms from web pages as org buffers. This is truly convenient as I get the content in org syntax and can use no matter what site I want without having to rely on a package.

Unfortunately content is missing for some urls. Examples:

Tracked this down to the line (eww-score-readability dom) in org-web-tools--eww-readable. If I remove that line from the function it inserts all content. eww-score-readability is rather cryptic so not sure how to solve this issue.

Cryptic errors?

Hi all:

I can't get a single org-web-tools-FOO command to work. I keep getting things like /443 Name or service not known

or Before first headline at position 1 in buffer FOO.org.

Any ideas as to what I'm doing wrong?

Open local url?

Is it possible to use org-web-tools-read-url-as-org with a local html file? I tried:

(org-web-tools-read-url-as-org "/path/to/index.htm")

But I got bad url error.

fetch link at point

i thought to add this to org-web-tools, in order to say, run org-web-tools-read-url-as-org on a link in an elfeed entry or in eww, or on also an org link:

(defun org-web-tools--get-first-url ()
  "Return (shr) URL or org link URL at point, or URL in clipboard, or first URL in the `kill-ring', or nil if none."
  (or (shr-url-at-point nil) ; elfeed or eww links
      (plist-get (get-text-property (point) 'htmlize-link) :uri) ; org links
      (cl-loop for item in (append (list (gui-get-selection 'CLIPBOARD))
                                   kill-ring)
               when (and item (string-match (rx bol "http" (optional "s") "://") item))
               return item)))

could poss add some others, could poss find more rigorous way of fetching them...

Feature: replace existing URLs in Org document

First of all, thanks for this package. It's making my notes, which often include a lot of (lists of) links to articles and so on, a lot more readable.

I have quite a few notes which already include a number of plain URLs or URLs with a short version of the page title. To make these notes more readable I would like to automatically change their description to the title of the linked page. Instead of yanking each URL and calling org-web-tools-insert-link-for-url, it would be handy to have another command which looks at the URL under the cursor, and replaces it with a link that has the title of the page as its description. If the link already has a description it could simply replace the existing description.

Curl error

After issuing any "M-x org-web-tools-*" commands, I am getting this error:

plz: plz: Curl error: "Curl error", #s(plz-error (6 . "Couldn't resolve host. The given remote host was not resolved.") nil nil)

Compilation errors

This package has a few compilation errors:

In org-web-tools--pandoc-no-wrap-option:
org-web-tools.el:140:7:Warning: reference to free variable
    ‘org-web-tools--pandoc-no-wrap-option’
org-web-tools.el:141:13:Warning: assignment to free variable
    ‘org-web-tools--pandoc-no-wrap-option’

In org-web-tools--check-pandoc-no-wrap-option:
org-web-tools.el:164:22:Warning: reference to free variable
    ‘org-web-tools-pandoc-sleep-time’

In org-web-tools--remove-bad-characters:
org-web-tools.el:181:40:Warning: reference to free variable
    ‘org-web-tools-pandoc-replacements’

In org-web-tools--get-url:
org-web-tools.el:342:53:Warning: reference to free variable
    ‘url-http-end-of-headers’

In end of data:
org-web-tools.el:505:1:Warning: the following functions are not known to be
    defined: libxml-parse-html-region, string-empty-p

Fix/modify relative URLs when inserting entries

After using org-web-tools-insert-web-page-as-entry' or org-web-tools-convert-links-to-page-entries' some relative urls within the web page are still relative in the org entry, which means they are now relative to the org file the page was inserted into, rather than to the original webpage.
I think it would be useful if these urls were to be made "absolute", so that they refer to the corresponding online content.
I think this applies to links and references to files such as images.Fix relative links / urls / references

Is it possible to support images?

Howdy @alphapapa, thanks for another amazing package!

I would love to download images as well. For instance this article works just fine with eww-readable, and a couple of images are critical to understanding the context.

Looking at the org-web-tools code, it appears that images are not fetched at all and therefore cannot be displayed. Pandoc support may be the other potential pitfall.

Am I on the right track, or are there other issues for supporting images that I'm not seeing?

Make org format customizable

Hello,

Thanks very much for the great package: I use it with org-feed and finally getting feeds content is much more reliable!

About the issue: currently downloading a link results in something like

  • [[link][title]] :website:
    timestamp
    ** Article
    contents

For my use case I do not need the ** Article heading.
This is enforced in org-web-tools--url-as-readable-org in this bit here:

...
    (with-temp-buffer
      (org-mode)
      ;; Insert article text
      (insert converted)
      ;; Demote in-article headings
      (org-web-tools--demote-headings-below 2)
      ;; Insert headings at top
      (goto-char (point-min))
      (insert "* " link " :website:" "\n\n"
              timestamp "\n\n"
              "** Article" "\n\n")
      (buffer-string))))

I have the feeling that this can be abstracted in a function format article-contents which defaults to your template, but that can be configured by the user. Something along the lines of:

...
    (format converted))))

(defun format (contents)
  "formats the article contents with title, timestamp, article heading"
  (with-temp-buffer
      (org-mode)
      ;; Insert article text
      (insert contents)
      ;; Demote in-article headings
      (org-web-tools--demote-headings-below 2)
      ;; Insert headings at top
      (goto-char (point-min))
      (insert "* " link " :website:" "\n\n"
              timestamp "\n\n"
              "** Article" "\n\n")
      (buffer-string)))

Would that make sense? For now I am using a modified version of org-web-tools--url-as-readable-org, but I really would like to not miss any future enhancement of this nice package :)
Thanks very much for the time spent in this!

org-element-at-point and w3m error

I am getting this error now after an Emacs update...

Error running timer ‘org-reveal’: (error "‘org-element-at-point’ cannot be used in non-Org buffer #<buffer w3m> (w3m-mode)") [2 times]

It happens when I am in a w3m buffer, trying to capture it...
with the command:
C-c c
then:
w

I hadn't had such an error in the past...

condition-case: Bad url: /

When I run the M-x org-web-tools-insert-web-page-as-entry for an url, the error is as follow:

condition-case: Bad url: /

I don't know why this happens.

My emacs version is 26.3 on macOS 10.15.4.

MELPA recipe force org install

Hello,

I noticed that org-web-tools' MELPA entry lists org 9.x as dependency: http://melpa.org/#/org-web-tools

That forces Emacs to install org 9.x, although it may be already installed via org-plus-contrib, like in my case.

I am afraid having two org installs may lead to some confusion.

I am not sure if this is something to let MELPA team know.

Sorry for pestering you with minor issues.

get rid of pandoc left overs

Hi again

Ok couldnt wait until i got home to play around with it :)

so when using pandoc ive noticed before it leaves alot of junk, like:


[pandoc warning] --no-wrap is deprecated. Use --wrap=none or --wrap=preserve instead.
#+BEGIN_HTML
  <div class="mw-parser-output">
#+END_HTML

#+BEGIN_HTML
  <div class="hatnote navigation-not-searchable" role="note">
#+END_HTML

i get around this by manually using this function

(defun z/clean-html-2-org-pandoc ()
(interactive)
(replace-string "#+BEGIN_QUOTE" "#+BEGIN_SRC R :session Rorg  :results none" nil (point-min) (point-max))
(replace-string "#+END_QUOTE" "#+END_SRC" nil (point-min) (point-max))
;clean extra \\
(replace-string "\\\\" "" nil (point-min) (point-max))
;;clean \_
(replace-string "\\\_" "_" nil (point-min) (point-max))
( z/bullets-to-spaces)
)

perhaps it could be of use and you can use this or something similar to auto clean the files right after the pandoc process?

best and thx again

Z

SSL Problems with archive.fo

This is most likely not something you can fix.
However I thought it might be good to report it here anyway in case someone else runs into this problem.

archive.today seems to have some SSL problems at the moment.
At least it doesn't really work for me when I tried to use it with Firefox.
I got this error:

Cannot communicate securely with peer: no common encryption algorithm(s). Error code: SSL_ERROR_NO_CYPHER_OVERLAP

In Emacs I get this:

error in process filter: gnutls-negotiate: GnuTLS error: #<process archive.today>, -12
error in process filter: GnuTLS error: #<process archive.today>, -12

Weird pandoc behavior (?)

Hi, thanks a lot for the very nice tool. I started playing with it for my web clipping activities. There is however some kind of weird behavior of pandoc or org-web-tools, I can not tell at this moment. It happened like this: I have pandoc installed on my Macmini running macOS High Sierra. When I tried to use org-web-tools-read-url-as-org, the minibuffer said something like "Can not test pandoc, please report a bug". Just by chance, I was opening the terminal and first tried which pandoc to check if it had been installed (it had been) and then I ran pandoc for testing in the terminal. However, somehow by letting pandoc run in the terminal, the function of org-web-tools-read-url-as-org worked. It stopped working when i tried to stop pandoc running in the terminal.
Is there anything I need to change in the environment or emacs configuration file for it to work properly without needing to have the terminal running with the pandoc command? I did have this (when (memq window-system '(mac ns x)) (exec-path-from-shell-initialize)) in the .emacs file after googling about the problems of emacs running in macOS not being able to use executable files in usr/local/bin.
Thanks again for the nice tool and also the helm-org-rifle tool, I love both of them a lot.

Unable to get submitid

Hello,

This is my first time using this tool, I ran it on this link and it failed. I don't know how to get a more descriptive backtrace.

Debugger entered--Lisp error: (error "Unable to get submitid")
  error("Unable to get submitid")
  org-web-tools-archive--archive.is-submitid()
  org-web-tools-archive--archive.is-url-id("http://www.hifi-forum.de/viewthread-185-6402.html#1")
  org-web-tools-archive--archive.is-archive-url("http://www.hifi-forum.de/viewthread-185-6402.html#1")
  org-web-tools-archive--archive.is("http://www.hifi-forum.de/viewthread-185-6402.html#1")
  org-web-tools-attach-url-archive--1("http://www.hifi-forum.de/viewthread-185-6402.html#1")
  org-web-tools-archive-attach("http://www.hifi-forum.de/viewthread-185-6402.html#1" nil nil)
  funcall-interactively(org-web-tools-archive-attach "http://www.hifi-forum.de/viewthread-185-6402.html#1" nil nil)
  command-execute(org-web-tools-archive-attach record)
  execute-extended-command(nil "org-web-tools-archive-attach" nil)
  funcall-interactively(execute-extended-command nil "org-web-tools-archive-attach" nil)
  command-execute(execute-extended-command)

Thank you for writing this package

Hey @alphapapa,

Thanks a lot for putting your time into this package. It's crazy useful. org-web-tools-insert-web-page-as-entry is a staple in my workflow.

Feel free to close this. 🙂

Setting up with org capture templates and doct

I'd appreciate help on using the doct package with org-web-tools, or rather the correct way to use org-web-tools commands with org capture since I'm new to elisp. This problem feels closer to a syntax error.

The initial goal is to be able to capture a URL (often from eww, or from a general clipboard copy) as a task, and extend this to construct other templates to capture entire webpages.

The function I have with the template is below. This uses the %x expansion fed into org-web-tools--get-url.

;; function holding the doct template
(defun sr/todo-file-ext-link-act-date ()
'("* %{todo-state} %(org-web-tools--get-url '%x)"
":PROPERTIES:"
":CREATED: %<%Y-%m-%d %a %H:%M>"
":PLANNED: %^t"
":END:"
"%?"))

Org capture template for URL's via doct:
(I've only pasted the relevant capture snippet below rather than all the templates I use)

(setq org-capture-templates
      (doct '(("Todo" :keys "t"
               :file "~/my_org/todo.org"
               :prepend t
	       :children (("External link"  
			   :keys "e"
			   :type entry
			   :headline "@reading"
			   :todo-state "TODO"
			   :template sr/todo-file-ext-link-act-date))))))

The error is "Capture abort: Invalid read syntax: ")". I am using org 9.3.6

handle images in links

@c1-g i tried out your branch, it works well for relative links to files and images.

i also noticed another issue, which perhaps you could address?
(it is in the main branch too, but maybe you have such knowhow?)

html like, this (an image that is also a link):

<a href="/vorratsdatenspeicherung" hreflang="de"><img src="/sites/default/files/styles/medium_crop/public/2017-09/fsa-unschuldsvermutung_%20John-Paul_Bader_cc-by-sa2.jpg?itok=CQntlFzw" width="410" height="208" alt="John-Paul Bader, CC BY SA 2.0" loading="lazy" typeof="foaf:Image" class="image-style-medium-crop" />

renders into a kind of hyperactivated org link

[[https://digitalcourage.de/digitale-selbstverteidigung][[[https://digitalcourage.de/sites/default/files/styles/medium_crop/public/2017-09/IMG_20160107_155735159.jpg?h=63c968e9&itok=rkduUPh5]]]]

i.e. it generates two links, one from href= and one from img src=, with mangled square brackets.

archive and attach all urls in an entry

Is it possible to run a single command to archive all urls in an org entry and attaching? Right now if I run org-web-tools-archive-attach on an entry, it archives and attaches whatever the first URL in the kill-ring is as far as I can tell. It only works when the point is on a link.

`org-element-at-point` cannot be used in non-Org buffer

i recently updated my packages, now i receive an error when i try to use org-web-tools-read-url-as-org.

the backtrace i see looks like this (w url and html args substituted):

error("`org-element-at-point' cannot be used in non-Org buffer %S (%s)" #<buffer  *temp*> fundamental-mode)
org-element-at-point(nil cached)
org-before-first-heading-p()
org-back-to-heading-or-point-min(t)
org-get-property-block()
org-at-property-p()
org-web-tools--remove-custom_id_properties()
org-web-tools--clean-pandoc-output()
org-web-tools--html-to-org-with-pandoc(my-urls-html)
org-web-tools--url-as-readable-org(my-url)

i wonder if when updating, org was updated, and the behaviour of org-element-at-point has been changed?

Symbol’s function definition is void: caddr

Running emacs 25.3.2. Installed org-web-tools from MELPA. I'm getting errors when trying to run org-web-tools-insert-web-page-as-entry or org-web-tools-insert-link-for-url

Contacting host: boingboing.net:443
org-web-tools--html-title: Symbol’s function definition is void: caddr
Contacting host: boingboing.net:443
org-web-tools--eww-readable: Symbol’s function definition is void: caddr

Error when the HTML contains an empty title

org-web-tools-insert-link-for-url raises the following error when the HTML of the url contains a title element with no content:

Debugger entered--Lisp error: (wrong-type-argument arrayp nil)
 replace-regexp-in-string("\n" " " nil t t)
 s-replace("\n" " " nil)
 org-web-tools--cleanup-title(nil)

I may fix this issue if I have time, but I'll just file it for now. Actually, the command doesn't make sense if the web page contains no title, so I have no idea what to do with this case.

add space around markup text from "abc*bold*def" to "abc *bold* def"

some html like this:

<p>abc<strong>bold</strong>def</p>
<p>abc<em>italic</em>def</p>
<p>abc<code>verbatim</code>def</p>

when converted to org, pandoc produced result as below, which emacs cannot recognize and highlight.

abc*bold*def

abc/italic/def

abc=verbatim=def

so i wish to add similar function to 'org-web-tools--clean-pandoc-output'.

org-web-tools--get-first-url fails in terminal

When running emacs -nw:

ELISP> (org-web-tools--get-first-url)
*** Eval error ***  Wrong type argument: stringp, nil

Relevant code.

The problem is that (gui-get-selection 'CLIPBOARD) returns nil, thus the first item is nil, which gets passed to string-match, which blows up when it sees nil.

SVG links broken

Hi again!

thx for extending the alphapapa universe ;-)

i will test the package later as im traveling yet one quick note

the svg links in the homepage seem broken :)

best

Z

ultra clean output via pandoc

This was already noted pandoc does not cleanup entirely the html file from unncessary infomation

for example when I simply download

https://www.theguardian.com/politics/2017/aug/30/may-to-press-japan-on-its-eu-trade-deal-in-hopes-of-a-model-for-uk
and convert it to org via pandoc there are a lot of

#+BEGIN_HTML

#+END_HTML

which is what happen when I use
org-web-tools-insert-web-page-as-entry

fortunately there is a workaround as it was pointed out to me on the pandoc mailing list, namely

pandoc -f html-raw_html-native_divs -t org May_to_press_Japan_on_its_EU_trade_deal_in_hopes_of_a_model_for_UK_Politics_The_Guardian.html -o neu.org

so could these options be included, in org-web-tools?

thanks

Uwe Brauer

Error when trying to use org-web-tools-archive-attach

Hi, I'm getting the error below when trying this function--I have the cursor in an org file, and running this with eval-expression:

(org-web-tools-archive-attach "https://karpathy.github.io/2021/03/27/forward-pass/")

Error is:

org-web-tools-archive--archive.is-archive-url: Wrong type argument: stringp, nil

Am pretty sure I'm calling the function correctly. I get the following trace:

Debugger entered--Lisp error: (wrong-type-argument stringp nil)
  org-web-tools-archive--archive.is-url-id("https://karpathy.github.io/2021/03/27/forward-pass...")
  org-web-tools-archive--archive.is-archive-url("https://karpathy.github.io/2021/03/27/forward-pass...")
  org-web-tools-archive--archive.is("https://karpathy.github.io/2021/03/27/forward-pass...")
  org-web-tools-attach-url-archive--1("https://karpathy.github.io/2021/03/27/forward-pass...")
  org-web-tools-archive-attach("https://karpathy.github.io/2021/03/27/forward-pass...")
  eval-expression((org-web-tools-archive-attach "https://karpathy.github.io/2021/03/27/forward-pass...") nil nil 127)
  funcall-interactively(eval-expression (org-web-tools-archive-attach "https://karpathy.github.io/2021/03/27/forward-pass...") nil nil 127)
  command-execute(eval-expression)

Problem with autoloads between files?

Thank you for fixing the last issue so quickly. I'm getting now a new error when I try to attach a website:
byte-code: Symbol’s function definition is void: org-web-tools--read-url

I'm still using GNU Emacs 25.2.2 on Ubuntu 18.04 :)

I don't know if its relevant but this would be my init config:

(use-package org-web-tools
    :ensure t
    :after org
    :bind (:map org-mode-map
        ("C-c w" . org-web-tools-attach-url-archive)
        ("C-c W" . org-web-tools-view-archive)))

defvar pandoc command

It's good practice to use a defvar variable to set external commands: this way, it's especially useful for systems where pandoc is installed but not necessarily in the PATH (e.g. with Nix or Guix) or for alternate installs of pandoc.

(defvar org-web-tools-pandoc-command "pandoc")

Feature: Use browser cookies file

Edit: This might be out of scope for this package and instead the place of some sync browser with url-cookies-file type package that runs on the idle timer. Maybe not though and I'm sure others have had this need that use org-web-tools before.

I link to private github repositories and would like to have the title of the page hidden unless logged in rather than link text of "Page not found - Github".

I plan on trying to implement this at some point, but want to leave the idea here in case I don't get to it, someone wants to pick it up immediately, or there are ideas about how to implement it.

After a little research we can do the following for firefox:

  • copy the cookies.db file to a temporary file (in firefox's case they obtain an exclusive lock on an sqlite database)
  • select cookies from db and save result in variable
  • delete temporary cookies db
  • export to format expected by url-retrieve-synchronously ( couldn't find this format)
  • append values to url-cookies-file

Then url-retrieve-syncronously and by extension org-web-tools-insert-link-for-url can access anything you've logged into with firefox!

Archive not yet available.

When i try to org-web-tools-archieve-attach i get "Archive not yet available", even after 6 retries. When i manually go to archieve.today and submit url it gets created in seconds. What can be the reason? I tried 2 urls - 1st ft.com article, 2nd random post on reddit.

Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-EgsajH/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Archive not yet available.  Retrying in 15 seconds (1/6 attempts)
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-EO31P2/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Archive not yet available.  Retrying in 15 seconds (2/6 attempts)
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-vPabrh/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Archive not yet available.  Retrying in 15 seconds (3/6 attempts)
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-acU6fX/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Archive not yet available.  Retrying in 15 seconds (4/6 attempts)
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-RbDen2/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Archive not yet available.  Retrying in 15 seconds (5/6 attempts)
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-1zI1YB/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Retrying with other functions...
Contacting host: archive.today:80
Wrote /data/data/com.termux/files/usr/tmp/org-web-tools-archive-ON8K5y/https%3A%2F%2Fwww.reddit.com%2Fr%2FEntrepreneur%2Fcomments%2Fyluj3x%2Flaunching_my_first_business_soon_criticize_it%2F--cfqeE.zip
Retrying with other functions...
"wget output:

/data/data/com.termux/files/usr/bin/wget: unrecognized option '--execute robots=off'
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.
"
wget output:

/data/data/com.termux/files/usr/bin/wget: unrecognized option '--execute robots=off'
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.

Error running timer: (error "No such directory found via CDPATH environment variable")

org-bracket-link-regexp is obsolete since org 9.3

org-bracket-link-regexp is now an alias for org-link-bracket-re.
You cannot obtain a link description from its third matching substring but the second one.
The following expression in org-web-tools--read-org-bracket-link sets desc to nil, which causes an error in org-web-tools-read-url-as-org:

(when (re-search-forward org-bracket-link-regexp (point-at-eol) t)
          (setq target (match-string-no-properties 1)
                desc (match-string-no-properties 3)))

Instead, you could rewrite it as follows:

(when (re-search-forward org-link-bracket-re (point-at-eol) t)
          (setq target (match-string-no-properties 1)
                desc (match-string-no-properties 2)))

You'll probably need a workaround for supporting Org < 9.3, though.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.