sharkdp / numbat Goto Github PK

A statically typed programming language for scientific computations with first class support for physical dimensions and units

Home Page: https://numbat.dev

License: Apache License 2.0

Rust 92.19% HTML 2.31% JavaScript 1.77% Vim Script 0.23% Handlebars 2.22% Shell 0.30% CSS 0.83% PHP 0.13%

calculator physics programming-language statically-typed terminal-based units web-app

numbat's Introduction

Try online! • Documentation • Tutorial • Syntax reference • Installation • Development

Numbat is a statically typed programming language for scientific computations
with first class support for physical dimensions and units.

Key Features

Click to learn more.

Physical dimensions as types

Numbat has a static type system where physical dimensions like Length and Time act as types. Definitions of constants and functions can optionally contain type annotations that will be statically enforced. If the types are not specified, they will be inferred (Speed, Money and Frequency in the screenshot).

See this article to learn more about Numbat's type system.

First-class physical units

Numbat is focused on computations with units. Units are therefore treated as first-class citizens. They can be entered in various ways (km/h or kilometer/hour, GiB or gibibytes, ° or degree, m² or m^2). New units can be introduced on the spot (unit pixel). Compatible units can be converted easily using the -> operator (30 km/h -> mph, 1 mrad -> degree, 5 in + 2 ft -> cm, 27 weeks -> days). And unit expressions are simplified using various heuristics (15 km/h * 30 min = 7.5 km).

Comprehensive standard library

Numbat's standard library comes with a large number of physical dimensions and units (SI, US Customary, Imperial, Nautical, Astronomical, Atomic, Nuclear, …). See this reference page for a complete overview. It also contains a lot of mathematical and physical constants as well as a large range of pre-defined functions.

Strict syntax

Numbat's parser never tries to be "smart" on syntactically incorrect input. This means you will either get a (descriptive) error message, or you can trust the result of your calculation.

Excellent error messages

Numbat aims to provide descriptive and helpful error messages.

Interactive terminal

Numbat has been designed for an interactive use-case with small "one off" computations. Opening the numbat interpreter without any arguments starts a REPL with a familiar readline interface, including all the usual features like a command history, Ctrl-R search or tab completion.

Modular and customizable

The whole system of physical dimensions and units is specified Numbat's standard library, which is written in the Numbat language itself. It is therefore easily extensible by providing a init.nbt file. For example, a single line (unit bathtub = 150 L) is usually enough to add a new unit. Users can even choose to write their own prelude module, allowing for arbitrary modifications to the unit system.

Assertions

With its static type system, Numbat already enforces correctness of your calculations on a physical dimension level. But some checks can only be made at runtime. Numbat provides an assert_eq procedure that allows you to check for exact equality using assert_eq(12 ft, 1 in) or approximate equality using assert_eq(c, 300_000 km/s, 1% × c). This can be useful to make sure that intermediate results do not change during a restructuring of your calculation.

Non-features

Numbat is a scientific calculator. It's not a computer algebra system that solves differential equations or computes intergrals. Try WolframAlpha instead.
There is no graphical user interface with buttons like x², 1/x or DEG/RAD. Qalculate! is a fantastic tool that supports both text as well as graphical input.
Numbat supports a huge range of physical units. If you need something even more comprehensive, please consider contributing. Or try GNU units.

Installation

See this page for details on how to install the native command-line version of Numbat.

Development

Run Numbat CLI

cargo run -- <numbat args>

Install the CLI version

cargo install -f --path numbat-cli

Run all tests

cargo test

Working on the `prelude`

If you are working on Numbat's standard library, it is convenient to point the NUMBAT_MODULES_PATH environment variable to the numbat/modules/ folder. This way, you don't have to recompile Numbat to see your changes.

Alternatively, you can create a symlink from ~/.config/numbat/modules to the numbat/modules/ folder in the repository (see this page for the standard paths on other operating systems).

Contact us

To contact us, either open a GitHub issue or discussion, or pop into #numbat on Libera.Chat (link to webchat).

numbat's People

Contributors

Stargazers

Watchers

numbat's Issues

Proper command-line interface

Usage:

insect [--debug] [--expr <EXPR>] [<file>]

Implement exponentiation operator for expressions

also, rewrite the prelude

Modules / Imports

Allow for imports such that we can structure the Prelude into multiple documents (e.g. separate maths from physics to allow users to easily construct custom preludes).

Do we want to go for a simple "include this file" mechanism or implement something cleaner like an actual module system with namespaces, etc.?

Dimension annotations

Further expand the idea of dimension-annotations via : (or related syntax), allowing users to write:

let d: Length = 2 meter
let t: Time = 2 meter # error

# simple functions
kineticEnergy(m: Mass, v: Speed) -> Energy = 0.5 m v^2
kineticEnergy(5 gram, 100 km/h)
kineticEnergy(5 kg, 3 meter/second^2) # error

# generics?
let f<T>(x: T) -> T^2 = 2*x^2

Introduce optimization stage

After type checking, we could introduce an optimization stage in the compiler that could - in theory - perform a lot of optimizations, as most (so far: all) of the code is pure and could be evaluated at compile time.

There are also some simple optimizations that would generate cleaner byte code, like:

Negate(Constant(n)) => Constant(-n)

Examples for nontrivial exponents: adiabatic exponent

https://en.wikipedia.org/wiki/Heat_capacity_ratio

Functions: FFI

Allow calling of foreign functions (Rust code).

extern fn sqrt<T>(x: T) -> T^(1/2)
extern fn sqr<T>(x: T) -> T^2
extern fn sin(x: Angle) -> Scalar

WASM

As early as possible, make sure that we can build everything with WASM. We shouldn't rely on dependcies/technologies that don't work with WASM.

Allow only one fundamental unit per dimension?

[question] Why is there a new `fn` keyword? Is the existing function declaration syntax not sufficient?

Title.

Syntax for defining unit properties

We need a way to declare whether or not units allow prefixes. And we need a way to add aliases.

Ideas:

unit meter[si_prefixes,short="m"]: Length

#[si_prefixes,short="m"]
unit meter: Length

or we always use new declarations for aliases:

unit minute = …
unit minutes = minute
unit min = minute

Add quantities as runtime concept

Advanced conversion operations

users might also like to explicitly convert to:

a base SI representation, like kg m³ / s², e.g. expr -> base
a shortest-possible representation, e.g. J·m e.g. expr -> shortest
to the canonical representation (see also #3), e.g. expr -> canonical

Do we also need/want to allow for conversions to dimensions? is something like expr -> Energy useful? Users could always add type annotations, i.e. use sth like expr: Energy to make sure that something is an energy.

see #3 for implicit conversions

Constants and variables

It would be cool to define constants like pi or e in Numbat code itself (like we currently do in the prelude). That would require us to introduce the concept of constants / immutable variables.

Interesting challenges

List of interesting challenges when implementing a parser, type checker and bytecode interpreter for scientific calculations with physical dimensions and units:

Parser
- Precedence of unary minus compared with exponentiation depends on which side of the ^ we are on: -2^3 == -(2^3), but 2^-3 == 2^(-3).
- Parsing SI prefixes. Problems with non-normal units (US customary, Imperial), only allow metric prefixes for SI units, long-and-short prefixes, …
- Ambiguous edge cases:
  - ft: foot or femto tonne?
  - cd: candela or centi days?
- 2e3 is a number but 2e is 2 × e.
- Implicit multiplication
Type checker
- The problem with exponentiation (see README)
- Minor: asymmetry between add/sub and mul/div: the first two can produce type check errors, the latter can't.
- Type inference as a system of equations
- We need generics, otherwise we can not type something as simple as sqrt(x) = x^(1/2)
- Polymorphic zero? meter + 0 should be fine. But general case can probably not be solved, e.g. meter + (second - second)
- Rational exponents (eg definition of planck length)
Unit representation
- Angles (are angles dimensionless, does deg decay to a scalar?, sin(scalar)?, sin(angle)?)
- Units like cm/m or inch/meter which have the dimension of a scalar.
- Units with additive offsets like °C or °D
- Units with rational exponents
- Formatting of units, e.g. J/(m²·K)
Evaluation
- Prefixes need to be handled separately, otherwise you run into numerical problems when e.g. 1 / 2ns is being converted to GHz
- Distinguish between decimal, rational, integer?
- Simplification of units kg m²/s² back to J. How are unit system boundaries treated?
- Exact conversions between units?
Good error messages
- Incompatible dimensions/units
Features
- User-defined units

SI prefixes

This needs to be checked rather sooner than later. Does our "user-defined units" concepts fare well with SI prefixes that can come directly before the identifier? Or do we only support those prefixes on specified units (we probably should)? Or only on a small set of builtin units (that would be a bit sad)?

kilometer => prefix(kilo) · identifier(meter)
km => prefix(kilo) · identifier(meter)

Rational exponents in dimensions

We need this to allow for things like x^(1/2).

Zero should be polymorphic

3 meter + 0 should work. Similarly: something like hypot(2m, 0) should work.

Add assert_eq statements

let x = 3 + 4;

assert_eq(x, 7);

where assert_eq is of type

fn assert_eq<D>(x: D, y: D) -> !

Implicit multiplication

Interesting resources

https://www.cs.ox.ac.uk/people/samuel.staton/papers/tlca2015.pdf

Functions: recursion

In order for recursion to make any sense, we will probably need some boolean types, comparison operations, and maybe a ternary operator or an if <cond> then <expr1> else <expr2> expression.

Use high-precision numeric type

https://crates.io/crates/rug (LGPL! Possible to compile to WASM?)
Simply use f64 or something like f128? => we could put numeric values on the stack

Introduce comments

Functions: type inference

It would be very cool if we could infer function parameter types and return types. For example, given hypothetical code like

fn g(t: Time) -> Speed = …

fn f(d, t) = (d + 1 meter) / g(t)

we could easily infer from d + 1 meter that d: Length, and from g(t) that t: Time. Furthermore, we could compute the return type to be Length / Speed =Time, leading to an inferred type of

fn f(d: Length, t: Time) -> Time

This also relates to generics (#25), as some (most?) functions would probably have inferred generic types. For example, in the absence of any annotations,

kineticEnergy(m, v) = 1/2 * m * v^2

would have an inferred type of

kineticEnergy<T0, T1>(m: T0, v: T1) -> T0 * T1^2 = 1/2 * m * v^2

Possible constraints

Addition, Subtraction, Conversion: these binary operators act as constraints on their operands. For example, an addition like
```
expr1 + expr2
```
imposes a simple [expr1] == [expr2] constraint. Similarly, expr1 -> expr2 also forces expr1 to have the same dimension as expr2.
Function application: A function call like
```
f(expr1, expr1, …, exprN)
```
imposes N constraints, one for each argument: [exprK] == [parameter k of f]. A simple example would be something like
```
fn normalize_len(l: Length) -> Length = …
fn f(d, alpha) = normalize_length(d) * sin(alpha)
```
where we could use the function calls to easily get the types of d and alpha.
Exponentiation: Since the exponent in an expression like
```
expr1 ^ expr2
```
always needs to be a scalar, we can infer [expr2] == 1. In general, we can not assume that expr1 is also a scalar (meter^2 is perfectly fine, for example). So there is also no constraint for expr1.

Formalization

Given a (partially annotated) function definition like one of these (Cx is a concrete type):

f(x0, x1, x2, …) = …
f(x0, x1: C1, x2, …) = …
f(x0, x1, x2, …) -> C_ret = …
f(x0, x1: C1, x2, …) -> C_ret = …
…

We start by filling in blanks with free type variables T0, T1, … and T_ret:

f(x0: T0, x1: T1, x2: T2, …) -> T_ret = …
f(x0: T0, x1: C1, x2: T2, …) -> T_ret = …
f(x0: T0, x1: T1, x2: T2, …) -> C_ret = …
f(x0: T0, x1: C1, x2: T2, …) -> C_ret = …
…

Next, we walk the AST of the right hand side expression and record all constraints that we discover (see above). For example, given a declaration like f(x0, x1, x2) = sqrt((x0 · x2² + second) · second / x1²) -> meter, we would identify:

T0 · T2^2 == Time (from x0 · x2² + second)
((T0 · T2^2) · Time / T1²)^(1/2) == T0^(1/2) · T1 · T2 · Time^(1/2) == Length (from the -> conversion operator)

These constraints would then have to be converted to a system of linear equations on the exponents.

  1 * exponents0 + 0 * exponents1 + 2 * exponents2 == (1, 0)
1/2 * exponents0 + 1 * exponents1 + 1 * exponents2 == (1, -1/2)

where exponents_i and the right hand sides would be vectors with one rational number entry for each of the D physical dimensions. In total, this is a set of N_constraints · D equations, where the left-hand side will be the same regardless of the physical dimension.

Examples

Unique solution

Example 1

g(t: Length) -> Length = …
f(x, y, z) = g(sqrt(x)/y) * (y + 2 * second) * 2^z

From 2^z, we infer [z]: 1.
From y + 2 * second, we infer [y]: Time
From g(sqrt(x)/y), we infer sqrt(x)/y: Length, i.e. sqrt(x): Length · Time or x: Length² · Time².
Finally, there return type can be computed as Length · Time

f(x: Length² · Time², y: Time, z: 1) -> Length · Time = g(sqrt(x)/y) * (y + 2 * second) * 2^z

Example 2

f(x) = x + x^2

should be inferred as

f(x: Scalar) -> Scalar

Example 3

(from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-391.pdf)

f(x,y,z) = x*x + y*y*y*y*y + z*z*z*z*z*z

should ideally be inferred as

f<T>(x: T^15, y: T^6, z: T^5) -> T^30

Underdetermined

For a function definition like

f(x, y) = x + y^3

we could infer one of those (equivalent) types:

f(x: T, y: T^(1/3)) -> T = x + y^3
f(x: T^3, y: T) -> T^3 = x + y^3

Overdetermined

A function like

f(x) = (x + meter)/(y + second)

should lead to a type check error.

VM: pop function arguments off the stack

Short unit names

Functions: generics / parametric polymorphism

Add support for generic functions, to allow things like

fn sqr(x) = x^2

Syntax

We can go the Rust route and do something like

fn sqr<T>(x: T) -> T^2 = x^2

Or we could add a special syntax for type variables, e.g. $T:

fn sqr(x: $T) -> $T^2 = x^2

Or go the Haskell way and use lowercase expressions as variables:

fn sqr(x: d) -> d^2 = x^2

that could work well with inferred types (#29):

f(x: d0, y: d1, z: d2) -> dR = …

Experiment: generate LLVM IR

Instead of bytecode, simply(?) generate LLVM IR?
https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl03.html

Demonstrate performance improvement over PureScript version with a benchmark

Just to get some hard data on the performance improvement (and also because I'm personally curious, hehe).

Write integration test for prelude.ins

Automatically load prelude.ins

Use a different parsing technique

A Pratt parser would probably be quite useful for us because of the large number of (binary) operators.

https://tratt.net/laurie/blog/2020/which_parsing_approach.html
https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html
https://blog.reverberate.org/2013/09/ll-and-lr-in-context-why-parsing-tools.html

Shorthand notation for new unit including new dimension?

maybe we should add "syntactic sugar" and allow 'unit px' to mean: 'dimension px; unit px: px'

Canonical unit for dimension

With our new dimension concept, we can define a canonical unit for a certain dimension. For example: meter: Length, joule: Energy. Or if someone uses Imperial, ft: Length. Then, if the user doesn't ask for an explicit conversion, we can figure out the target dimension and use the canonical unit.

This will allow us to solve sharkdp/purescript-quantities#27

Function declarations
Local variables?
Arity errors

Things to think about:

Mathematica-style function calls, super convenient for repls: 4.0 // sqrt.
should foo(x) always be parsed as a function call? or could it be an implicit multiplication?

import currencies

30 € -> THB

Internally, that module would fetch currency data and define all those currencies.

Allow for non-standard units like inch
Implement conversion operator

Run all examples in unit tests

Run everything in examples and make sure it passes
Run everything in examples/errors and make sure it fails (with the right error)