Giter Club home page Giter Club logo

org-web-tools's Introduction

org-web-tools

https://melpa.org/packages/org-web-tools-badge.svg https://stable.melpa.org/packages/org-web-tools-badge.svg

This file contains library functions and commands useful for retrieving web page content and processing it into Org-mode content.

For example, you can copy a URL to the clipboard or kill-ring, then run a command that downloads the page, isolates the “readable” content with eww-readable, converts it to Org-mode content with Pandoc, and displays it in an Org-mode buffer. Another command does all of that but inserts it as an Org entry instead of displaying it in a new buffer.

Installation

Requirements

  • Emacs 25.1 or later.
  • Commands that process HTML into Org require Pandoc. Note: The output of current Pandoc versions differs substantially from versions that may still be present in stable Linux distros. If you encounter any issues, please install a more recent version of Pandoc.

MELPA

If you installed from MELPA, just run one of the commands below. If you want to use any of the functions in your own code, you should (require 'org-web-tools).

Manual

Install dash.el, esxml, request, and s.el. Then require this package in your init file:

(require 'org-web-tools)

Usage

Commands

  • org-web-tools-insert-link-for-url: Insert an Org-mode link to the URL in the clipboard or kill-ring. Downloads the page to get the HTML title.
  • org-web-tools-insert-web-page-as-entry: Insert the web page for the URL in the clipboard or kill-ring as an Org-mode entry, as a sibling heading of the current entry.
  • org-web-tools-read-url-as-org: Display the web page for the URL in the clipboard or kill-ring as Org-mode text in a new buffer, processed with eww-readable.
  • org-web-tools-convert-links-to-page-entries: Convert all URLs and Org links in current Org entry to Org headings, each containing the web page content of that URL, converted to Org-mode text and processed with eww-readable. This should be called on an entry that solely contains a list of URLs or links.
  • org-web-tools-archive-attach: Download archive of page at URL and attach with org-attach. If CHOOSE-FN is non-nil (interactively, with universal prefix), prompt for the archive function to use. If VIEW is non-nil (interactively, with two universal prefixes), view the archive immediately after attaching. (See also org-board).
  • org-web-tools-archive-view: Open Zip file archive of web page. Extracts to a temp directory and opens with browse-url-default-browser. Note: the extracted files are left on-disk in the temp directory.

Functions

These are used in the commands above and may be useful in building your own commands.

  • org-web-tools--dom-to-html: Return parsed HTML DOM as an HTML string. Note: This is an approximation and is not necessarily correct HTML (e.g. IMG tags may be rendered with a closing “</img>” tag).
  • org-web-tools--eww-readable: Return “readable” part of HTML with title.
  • org-web-tools--get-url: Return content for URL as string.
  • org-web-tools--html-title: Return title of HTML page.
  • org-web-tools--html-to-org-with-pandoc: Return string of HTML converted to Org with Pandoc. When SELECTOR is non-nil, the HTML is filtered using esxml-query SELECTOR and re-rendered to HTML with org-web-tools--dom-to-html, which see.
  • org-web-tools--url-as-readable-org: Return string containing Org entry of URL’s web page content. Content is processed with eww-readable and Pandoc. Entry will be a top-level heading, with article contents below a second-level “Article” heading, and a timestamp in the first-level entry for writing comments.
  • org-web-tools--demote-headings-below: Demote all headings in buffer so the highest level is below LEVEL.
  • org-web-tools--get-first-url: Return URL in clipboard, or first URL in the kill-ring, or nil if none.
  • org-web-tools--read-url: Return a URL by searching at point, then in clipboard, then in kill-ring, and finally prompting the user.
  • org-web-tools--read-org-bracket-link: Return (TARGET . DESCRIPTION) for Org bracket LINK or next link on current line.
  • org-web-tools--remove-dos-crlf: Remove all DOS CRLF (^M) in buffer.

Changelog

1.2-pre

Compatibility

Fixed

  • org-web-tools--org-link-for-url now returns the URL if the HTML page has no title tag. This avoids an error, e.g. when used in an Org capture template.

Improvements

  • Archiving tools:
    • Can use multiple functions to attempt archiving.
    • Associated options control retry attempts, delays, and fallbacks to other functions.
    • Functions to archive Web pages with wget and tar:
      • Function org-web-tools-archive--wget-tar archives a URL’s Web page, including page resources.
      • Function org-web-tools-archive--wget-tar-html-only archives a URL’s HTML only.
    • Command org-web-tools-archive-view handles both zip and tar archives.
    • The default settings attempt to archive with archive.is, and if that fails after retrying for 75 seconds, falls back to using wget and tar.

1.1.2

Fixed

  • Only test non-nil items in org-web-tools--get-first-url. This makes it work properly in non-GUI Emacs sessions. (Thanks to Ben Sima for reporting.)

1.1.1

Fixed

  • Require org-attach.

1.1

Additions

  • Command org-web-tools-attach-url-archive.
  • Command org-web-tools-view-archive.
  • Function org-web-tools--read-url.

1.0.1

Changes

  • Remove all property drawers that contain the CUSTOM_ID property from Pandoc output.

1.0

  • First declared stable release.

Development

Contributions and suggestions are welcome.

License

GPLv3

org-web-tools's People

Contributors

alphapapa avatar bcc32 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.