Giter Club home page Giter Club logo

ht's Introduction

ht - headless terminal

ht (short for headless terminal) is a command line program that wraps an arbitrary other binary (e.g. bash, vim, etc.) with a VT100 style terminal interface--i.e. a pseudoterminal client (PTY) plus terminal server--and allows easy programmatic access to the input and output of that terminal (via JSON over STDIN/STDOUT). ht is built in rust and works on MacOS and Linux.

screenshot of raw terminal output vs ht output

Use Cases & Motivation

ht is useful for programmatically interacting with terminals, which is important for programs that depend heavily on the Terminal as UI. It is useful for testing and for getting AI agents to interact with terminals the way humans do.

The original motiving use case was making terminals easy for LLMs to use. I was trying to use LLM agents for coding, and needed something like a headless browser but for terminals.

Terminals are one of the oldest and most prolific UI frameworks in all of computing. And they are stateful so, for example, when you use an editor in your terminal, the terminal has to manage state about the cursor location. Without ht, an agent struggles to manage this state directly; with ht, an agent can just observe the terminal like a human does.

Installing

Download and use the latest binary for your architecture.

Building

Building from source requires the Rust compiler (1.74 or later), and the Cargo package manager. If they are not available via your system package manager then use rustup.

To download the source code, build the binary, and install it in $HOME/.cargo/bin run:

cargo install --git https://github.com/andyk/ht

Then, ensure $HOME/.cargo/bin is in your shell's $PATH.

Alternatively, you can manually download the source code and build the binary with:

git clone https://github.com/andyk/ht
cd ht
cargo build --release

This produces the binary in release mode (--release) at target/release/ht. There are no other build artifacts so you can just copy the binary to a directory in your $PATH.

Usage

Run ht to start interactive bash shell running in a PTY (pseudo-terminal).

To launch a different program (a different shell, another program) run ht <command> <args...>. For example:

  • ht fish - starts fish shell
  • ht nano - starts nano editor
  • ht nano /etc/fstab - starts nano editor with /etc/fstab opened

Another way to run a specific program, e.g. nano, is to launch ht without a command, i.e. use bash by default, and start nano from bash by sending nano\r ("nano" followed by "return" control character) to the process input. See input command below.

Default size of the virtual terminal window is 120x40 (cols by rows), which can be changed with --size argument. For example: ht --size 80x24. The window size can also be dynamically changed - see resize command below.

Run ht -h or ht --help to see all available options.

Live terminal preview

ht comes with a built-in HTTP server which provides a handy live terminal preview page.

To enable it, start ht with -l / --listen option. This will print the URL of the live preview.

By default it listens on 127.0.0.1 and a system assigned, dynamic port. If you need it to bind to another interface, or a specific port, pass the address to the -l option, e.g. -l 0.0.0.0:9999.

API

ht provides 2 types of API: STDIO and WebSocket.

The STDIO API allows control and introspection of the terminal using STDIN, STDOUT and STDERR.

WebSocket API provides several endpoints for getting terminal updates in real-time. Websocket API is not enabled by default, and requires starting the built-in HTTP server with -l / --listen option.

STDIO API

ht uses simple JSON-based protocol for sending commands to its STDIN. Each command must be sent on a separate line and be a JSON object having "type" field set to one of the supported commands (below).

Some of the commands trigger events. ht may also internally trigger various events on its own. To subscribe to desired events use --subscribe [<event-name>,<event-name>,...] option when starting ht. This will print the events as they occur to ht's STDOUT, as JSON-encoded objects. For example, to subscribe to view snapshots (triggered by sending takeSnapshot command) use --subscribe snapshot option. See events below for a list of available event types and their payloads.

Diagnostic messages (notices, errors) are printed to STDERR.

sendKeys

sendKeys command allows sending keys to a process running in the virtual terminal as if the keys were pressed on a keyboard.

{ "type": "sendKeys", "keys": ["nano", "Enter"] }
{ "type": "sendKeys", "keys": ["hello", "Enter", "world"] }
{ "type": "sendKeys", "keys": ["^x", "n"] }

Each element of the keys array can be either a key name or an arbitrary text. If a key is not matched by any supported key name then the text is sent to the process as is, i.e. like when using the input command.

The key and modifier specifications were inspired by tmux.

The following key specifications are currently supported:

  • Enter
  • Space
  • Escape or ^[ or C-[
  • Tab
  • Left - left arrow key
  • Right - right arrow key
  • Up - up arrow key
  • Down - down arrow key
  • Home
  • End
  • PageUp
  • PageDown
  • F1 to F12

Modifier keys are supported by prepending a key with one of the prefixes:

  • ^ - control - e.g. ^c means Ctrl + C
  • C- - control - e.g. C-c means Ctrl + C
  • S- - shift - e.g. S-F6 means Shift + F6
  • A- - alt/option - e.g. A-Home means Alt + Home

Modifiers can be combined (for arrow keys only at the moment), so combinations such as S-A-Up or C-S-Left are possible.

C- control modifier notation can be used with ASCII letters (both lower and upper case are supported) and most special key names. The caret control notation (^) may only be used with ASCII letters, not with special keys.

Shift modifier can be used with special key names only, such as Left, PageUp etc. For text characters, instead of specifying e.g. S-a just use upper case A.

Alt modifier can be used with any Unicode character and most special key names.

This command doesn't trigger any event.

input

input command allows sending arbitrary raw input to a process running in the virtual terminal.

{ "type": "input", "payload": "ls\r" }

In most cases it's easier and recommended to use the sendKeys command instead.

Use the input command if you don't want any special input processing, i.e. no mapping of key names to their respective control sequences.

For example, to send Ctrl-C shortcut you must use "\u0003" (0x03) as the payload:

{ "type": "input", "payload": "\u0003" }

This command doesn't trigger any event.

takeSnapshot

takeSnapshot command allows taking a textual snapshot of the the terminal view.

{ "type": "takeSnapshot" }

This command triggers snapshot event.

resize

resize command allows resizing the virtual terminal window dynamically by specifying new width (cols) and height (rows).

{ "type": "resize", "cols": 80, "rows": 24 }

This command triggers resize event.

WebSocket API

The WebSocket API currently provides 2 endpoints:

/ws/events

This endpoint allows the client to subscribe to events that happen in ht.

Query param sub should be set to a comma-separated list of desired events. E.g. /ws/events?sub=init,snapshot.

Events are delivered as JSON encoded strings, using WebSocket text message type.

See events section below for the description of all available events.

/ws/alis

This endpoint implements JSON flavor of asciinema live stream protocol, therefore allows pointing asciinema player directly to ht to get a real-time terminal preview. This endpoint is used by the live terminal preview page mentioned above.

Events

The events emitted to STDOUT and via /ws/events WebSocket endpoint are identical, i.e. they are JSON-encoded objects with the same fields and payloads.

Every event contains 2 top-level fields:

  • type - type of event,
  • data - associated data, specific to each event type.

The following event types are currently available:

init

Same as snapshot event (see below) but sent only once, as the first event after ht's start (when sent to STDOUT) and upon establishing of WebSocket connection.

output

Terminal output. Sent when an application (e.g. shell) running under ht prints something to the terminal.

Event data is an object with the following fields:

  • seq - a raw sequence of characters written to a terminal, potentially including control sequences (colors, cursor positioning, etc.)

resize

Terminal resize. Send when the terminal is resized with the resize command.

Event data is an object with the following fields:

  • cols - current terminal width, number of columns
  • rows - current terminal height, number of rows

snapshot

Terminal window snapshot. Sent when the terminal snapshot is taken with the takeSnapshot command.

Event data is an object with the following fields:

  • cols - current terminal width, number of columns
  • rows - current terminal height, number of rows
  • text - plain text snapshot as multi-line string, where each line represents a terminal row
  • seq - a raw sequence of characters, which when printed to a blank terminal puts it in the same state as ht's virtual terminal

Testing on command line

ht is aimed at programmatic use given its JSON-based API, however one can play with it by just launching it in a normal desktop terminal emulator and typing in JSON-encoded commands from keyboard and observing the output on STDOUT.

rlwrap can be used to wrap STDIN in a readline based editable prompt, which also provides history (up/down arrows).

To use rlwrap with ht:

rlwrap ht [ht-args...]

Python and Typescript libs

Here are some experimental versions of a simple Python and Typescript libraries that wrap ht: htlib.py and a htlib.ts.

TODO: either pull those into this repo or fork them into their own htlib repo.

Possible future work

  • update the interface to return the view with additional color and style information (text color, background, bold/italic/etc) also in a simple JSON format (so no dealing with color-related escape sequence either), and the frontend could render this using HTML (e.g. with styled pre/span tags, similar to how asciinema-player does it) or with SVG.
  • support subscribing to view updates, to avoid needing to poll (see issue #9)
  • native integration with asciinema for recording terminal sessions (see issue #8)

Alternatives and related projects

expect is an old related tool that let's you spawn an arbitrary binary and then send input to it and specify what output you expect it to generate next.

Also, note that if there exists an explicit API to achieve your given task (e.g. a library that comes with the tool you're targeting), it will probably be less bug prone/finicky to use the API directly rather than working witht your tool through ht.

See also this hackernews discussion where a bunch of other tools were discussed!

Design doc

Here is the original design doc we used to drive the project development.

License

All code is licensed under the Apache License, Version 2.0. See LICENSE file for details.

ht's People

Contributors

ku1ik avatar andyk avatar mandx avatar

Stargazers

shadowboxingme avatar Greg Hochmuth avatar Lucas Della Bella avatar Zach Philipchook avatar Ege Kuzubasioglu avatar Keith McDuffee avatar Pavlos Stephanos Bekiaris avatar Mike Vanbuskirk avatar  avatar Roman Zabaluev avatar  avatar  avatar jiandong avatar Nathan Lloyd Ward avatar  avatar GROARK avatar Derry Redjeki avatar Dubakula Sai Venkata Chaitanya avatar Kara avatar  avatar Kaan Ertürk avatar Dan avatar Justin Farrell avatar Web Kimoni avatar  avatar Roman Gonzalez avatar Andi avatar Meysam avatar Gabriel Dumitrescu avatar casey avatar  avatar  avatar Michal avatar Kam7il avatar Jacky avatar 3zbumban avatar Szymon Łukaszczyk avatar Adam Gąsowski avatar Michał Kutzner avatar Miłosz Wiśniewski avatar  avatar Jakub avatar Dani Blanco avatar Don avatar Reza Khanipour avatar mowi12 avatar amoolove avatar Henil avatar Robertus Suhendro avatar Tyler Weir avatar Akif Feyzioğlu avatar Bruno Melo avatar Sreejith I V avatar Romain Paoli avatar Garrett Heath Koller avatar Krzysztof Witczak avatar  avatar Thierry Delamare avatar  avatar Misha Brukman avatar 离离 avatar Filippo Montani avatar Mokhtar Bacha avatar Guillaume Gelin avatar Piotr Icikowski avatar Colm O'Connor avatar F. Kiss avatar Gary Yendell avatar Erik avatar Eliza Zhang avatar Vineeth Voruganti avatar Zachary Loeber avatar Will Harris avatar Edric Chan avatar  avatar Kyosuke Fujimoto avatar Jan Küster avatar Phil Hadviger avatar SEOA7777 avatar Norman Nunley avatar Narek avatar Damian  avatar kawamou avatar ryo. avatar Yasuharu Sawada avatar  avatar raf avatar RanolP avatar Mutsuha Asada avatar Sean O'Neil avatar Ryu Gwang avatar tosh shimayama (satake) avatar Shina avatar Qyriad avatar Anthony Froissant avatar Arthur Burgin Jr avatar Adrien avatar Maximilian Jendrall avatar Takumasa Sakao avatar Kentaro Kakudo avatar

Watchers

Abhik Khanra avatar  avatar Jandy avatar Mark Galassi avatar Julian avatar Paweł Zmarzły avatar  avatar

ht's Issues

Escape sequences - current state of support

Does ht interpret ANSI escape codes? There are many protocols that I think ht could interpret and transform into JSON messages. Or if not, then explicitly forward them further over getView API (or a new getUnsanitizedView / getRawView).

Some protocols that are somewhat important for modern terminal experience:

  • colors,
  • clickable links,
  • mouse integration,
  • clipboard copy/paste,
  • setting title and cwd

Sending signals to foreground process

Does ht support sending a signal to current (foreground) process? How to send SIGINT or SIGKILL?

My Unix knowledge is rusty, but I think ht doesn't need to know what the foreground process is. If you send a signal to pty, AFAIK the kernel will somehow forward it to the foreground process in the process group.

Agent tool use case examples

Hi,

I am curious whether you have some tools_example.py to share, in order to be able to function call some sample commands or interact via new Mistral7b v0.3 (which is the first open weights model to support funccaling)?

Cheers

Varlink API

ht's API is an RPC-style newline-delimited JSON sent over stdin/stdout. This reminds me strongly of varlink, an IPC standard used by systemd and podman (the docker alternative). The biggest difference is that varlink uses null-byte-delimited JSON rather than newline-delimited JSON, and usually uses Unix sockets instead of stdin/stdout (so that many clients can connect at once).

What could be gained?

  • Instead of maintaining a list of supported methods in project README, with "this method returns XYZ" / "this method doesn't return", you could maintain a varlink IDL file
  • Streaming support, so that clients don't have to poll getView
  • Error messages by setting the error field in reply JSON
  • varlink CLI and systemd-varlinkctl CLI may make scripting and debugging easier
  • Autogenerated correct clients in many languages
  • Introspection - every varlink server has to reply to GetInfo and GetInterfaceDescription messages. This makes it easy for clients to see if a server version is new enough to support some new methods.

Note

This is an opinionated suggestion, feel free to disregard.

ht ssh remote_host. show prompt and show us a way to feed input

raw terminal gave a lot of garbed characters. I wish this ht may make ssh interactive session cleaner.

ht ssh username@remote-host
launching command "ssh username@remote-host" in terminal of size 120x40

it waited there forever, actually in normal terminal, it prompts to as ask password, how do I get prompt and feed my input?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.