Giter Club home page Giter Club logo

org-sql's Introduction

Org-SQL

This is a SQL backend for Emacs Org-Mode. It scans through text files formatted in org-mode, parses them, and adds key information such as todo keywords, timestamps, and links to a relational database. For now only SQLite is supported.

Motivation and Goals

Despite the fact that Emacs is the (second?) greatest text editor of all time, it is not a good data analysis platform, and neither is Lisp a good data analysis language. This is the strong-suite of other languages such as Python and R which have powerful data manipulation packages (dplyr and pandas) as well as specialized presentation platforms (shiny and dash). The common thread between these data analysis tools is a tabular data storage system, for which SQL is the most common language.

Therefore, the primary goal of org-sql is to provide a link between the text-based universe of Emacs / Org-mode and the table-based universe of data analysis platforms.

A common use case for org-mode is a daily planner. Within this use case, some questions that can easily be answered with a SQL-backed approach:

  • How much time do I spend doing pre-planned work? (track how much time is spent clocking)
  • How well do I estimate how long tasks take? (compare effort and clocked time)
  • Which types of tasks do I concentrate on the most? (tag entries and sort based on effort and clocking)
  • How indecisive am I? (track how many times schedule or deadline timestamps are changed)
  • How much do I overplan? (track number of canceled tasks)
  • How much do I delegate (track properties indicating which people are working on tasks)
  • How many outstanding tasks/projects do I have? (count keywords and tags on headlines)

There are other uses for an Org-mode SQL database. If one has many org files scattered throughout their filesystem, a database is an easy way to aggregate key information such as links or timestamps. Or if one primary uses org-mode for taking notes, it could be a way to aggregate and analyze meeting minutes.

Of course, these could all be done directly in Org-mode with Lisp code (indeed there are already built-in functions for reporting aggregated effort and clock-time). But why do that when one could analyze all org files by making a descriptive dashboard with a relatively few lines of R or Python code?

Installation

Download the package from MELPA

M-x package-install RET org-sql RET

Alternatively, clone this repository into your config directory

git clone [email protected]:ndwarshuis/org-sql.git ~/config/path/org-sql/

Once obtained, add the package to load-path and require it

(add-to-list 'load-path "~/config/path/org-sql/")
(require 'org-sql)

One can also use use-package to automate this entire process

(use-package org-sql
  :ensure t
  ;; add config options here...
  :config)

Dependencies

Emacs-Lisp packages

  • Org-mode (tested with 9.x)
  • dash (tested with 2.x)

Installed programs

  • sqlite (tested with 3.x)

Configuration

General behavior

  • org-sql-sqlite-path: the path to the sqlite database to be created
  • org-sql-files: list of org files to insert into database
  • org-sql-pragma: pragma to use for new connections (useful for performance tuning)
  • org-sql-buffer: the name of the buffer for the SQLite connection
  • org-sql-debug: turn on SQL transaction debug output in the message buffer

Database storage

These options control what data gets stored in the database, and are useful to minimize the size of the database as well as the time it takes to update:

  • org-sql-ignored-properties: list of properties to ignore
  • org-sql-ignored-tags: list of tags to ignore
  • org-sql-ignored-link-types: list link types to ignore (eg mu4e, file)
  • org-sql-included-healine-planning-types: planning types (eg :closed, :scheduled) to include
  • org-sql-included-contents-timestamp-types: type of timestamps (eg active, inactive) to include
  • org-sql-included-logbook-types: types of logbook entries to include (eg note, reschedule, etc)
  • org-sql-use-tag-inheritance: add inherited tags to the database
  • org-sql-store-clocks: whether to include clocks (nil implies no clock notes are desired either)
  • org-sql-store-clock-notes: whether to include clock notes

Logbooks

Much of the extracted data from org-sql pertains to logbook entries, and there are a number of settings that effect how this data is generated in org files and how it may be parsed reliably.

Firstly, one needs to set the relevant org-mode variables in order to capture logging information. Please refer to the documentation in org-mode itself for their meaning:

  • org-log-done
  • org-log-reschedule
  • org-log-redeadline
  • org-log-note-clock-out
  • org-log-refile
  • org-log-repeat
  • org-todo-keywords (in this one can set which todo keywords changes are logged)

Obtaining the above information for the database assumes that org-log-note-headings is left at its default value. This limitation may be surpassed in the future.

Additionally, for best results it is recommended that all logbook entries be contained in their own drawer. This means that org-log-into-drawer should be set to LOGBOOK and org-clock-into-drawer should be set to t (which means clocks go into a drawer with hardcoded name LOGBOOK). Without these settings, org-sql needs to guess where the logbook entries are based on location and pattern matching, which is not totally reliable.

Usage

Initializing

Run org-sql-user-reset. This will create a new database and initialize it with the default schema. It will also delete an existing database before creating the new one if it exists in org-sql-sqlite-path.

Updating

Run org-sql-user-update. This will synchronize the database with all files as indicated in org-sql-files by first checking if the file is in the database and inserting it if not. If the file is already present, it will check the md5 to assess if updates are needed. This function will insert the entire content of any org file that is either new or changed.

Note that the database will take several seconds to minutes if inserting many files depending on the speed of your device (particularly IO) and the size/number of files. This operation will also block Emacs until complete.

Clearing all data

Run org-sql-user-clear-all. This will clear all data but leave the schema.

Database Layout

The database is arranged by files at the top level and by org-mode features moving down to child tables. Primary keys are foreign keys are marked with P and F in parens respecively. All dates are converted into unix time integers before entering into the database.

hierarchy

The databases are arranged as follows according to their foreign key contraints:

  • files
    • headlines
      • tags
      • properties
      • clocking
      • logbook
        • state_changes
        • planning_changes

files

Stores one row per org file

columntypedescription
file_path (P)textpath to an org file
md5textmd5 sum of the org file
sizeintegersize of the file in bytes
time_modifiedintegerunused
time_createdintegerunused
time_accessedintegerunused

headlines

Stores one row for each headline in a given org file as well as any atomic metadata attached to that headline

columntypedescription
file_path (PF)textpath to file containing the headline
headline_offset (F)integeroffset of the headline’s first character in the org file
tree_path (P)textoutline tree path of the headline
headline_texttextraw text of the headline
keywordtextthe TODO state keyword
effortintegerthe value of the Effort property in minutes
prioritycharcharacter value of the priority
archivedbooltrue if the headline has an archive tag
commentedbooltrue if the headline has a comment keyword
contentstextunused

links

Stores one row for each link in a given org file and headline

columntypedescription
file_path (PF)textpath to file containing the link
headline_offset (F)integeroffset of the headline’s first character in the org file
link_offset (P)integeroffset of the link’s first character in the org file
link_pathtexttarget of the link (url or similar)
link_texttextdisplay text of the link in the org buffer
link_typetexttype of the link (eg http, mu4e, file, etc)

properties

Stores one row for each property identified in an org file and a given headline. This is also where one could identify if a headline is archived as it will have properties such as ARCHIVE_TIME

columntypedescription
file_path (PF)textpath to file containing the property
headline_offset (F)integeroffset of the headline’s first character in the org file
property_offset (P)integeroffset of the property’s first character in the org file
key_texttextthe property key
val_texttextthe property value
inheritedbooleanunused

tags

Stores tags similarly to the properties table

columntypedescription
file_path (PF)textpath to file containing the tag
headline_offset (PF)integeroffset of the headline’s first character in the org file
tag (P)textthe tag value
inherited (P)boolean1 if inherited, 0 if not

clocking

Stores one row for each clock entry identified in an org file and its given headline

columntypedescription
file_path (PF)textpath to file containing the clock
headline_offset (F)integeroffset of the headline’s first character in the org file
clock_offset (P)integeroffset of the clock’s first character in the org file
time_startintegertimestamp for the start of the clock
time_endintegertimestamp for the end of the clock
clock_notetextthe note text beneath the clock if available

logbook

Stores one row for each entry in the logbook underneath a headline (excluding clocks). Some entries may have additional information associated with them for planning and state changes as given in the child tables below.

columntypedescription
file_path (PF)textpath to file containing the entry
headline_offset (F)integeroffset of the headline’s first character in the org file
entry_offset (P)integeroffset of the entry’s first character in the org file
entry_typetexttype of the logbook entry (see org-log-note=headines)
time_loggedintegeretimestamp for when the entry was taken
headertextthe first line of the note, usually standardized
notenotethe text underneath the entry header

note: the header should match org-log-note-headings unless it is a clock note

state_changes

Stores one row per logbook entry with state change information (as triggered by any keywords configured to log in org-todo-headings)

columntypedescription
file_path (PF)textpath to file containing the entry
entry_offset (PF)integeroffset of the clock’s first character in the org file
state_oldtextformer todo state keyword
state_newtextupdated todo state keyword

planning_changes

Stores one row per logbook entry with planning changes as triggered by setting org-log-reschedule and org-log-redeadline.

columntypedescription
file_path (PF)textpath to file containing the entry
entry_offset (PF)integeroffset of the entry’s first character in the org file
timestamp_offset (F)integeroffset of the timestamp for the former planning entry

timestamp

In the future, this will have one row for each timestamp under a headline. For now it just refers to timestamps in the planning block (eg CLOSED, SCHEDULED, or DEADLINE) and timestamps in logbook headings for planning changes. It does not include the timestamp for the time a log entry was taken (this is stored in the logging table above) nor does it store timestamps for clocks (stored in the clocking table). Eventually this table will include timestamps in logbook notes as well as headline contents.

columntypedescription
file_path (PF)textpath to file containing the entry
headline_offset (F)integeroffset of the entry’s first character in the org file
timestamp_offet (P)integeroffset of the timestamp for the former planning entry
raw_valuetextthe string of the timestamp as it appears in org buffers
typetexteither inactive or active
planning_typetexteither closed, scheduled, deadline, or null
warning_typetextthe warning type (all or first)
warning_valueintegervalue of warning shift
warning_unittextunit for warning (hour, day, week, month, year)
repeat_typetextthe repeater type (catch-up, restart, cumulate)
repeat_valueintegervalue of repeater shift
repeat_unittextunit for repeater (hour, day, week, month, year)
timeintegerthe time of the timestamp in unix time
time_endintegerlike time but for the end if this is a range
resolutiontextminute if timestamp specifies hours/minutes, else day
resolution_endtextlike resolution but for the end if this is a range

NOTES:

  • Inactive-range and active-range will not appear in the type column. Use time_end instead to determine if the timestamp is a ranged timestamp
  • There is no distinction in this table alone between planning timestamps that are part of the planning section (eg CLOSED: [whatever]) and those from logbook entries that pertain to planning changes (eg Removed deadline, was [whatever]). This distinction requires joining the planning_changes table, which will only reference logbook planning change timestamps.

Contributions

Pull requests welcome, especially those for other SQL implementations.

Acknowledgments

The idea for this is based on John Kitchin’s implementation, which uses emacsql as the SQL backend.

org-sql's People

Contributors

ndwarshuis avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.