Giter Club home page Giter Club logo

c3c's Introduction

C3 Language

C3 is a programming language that builds on the syntax and semantics of the C language, with the goal of evolving it while still retaining familiarity for C programmers.

It's an evolution, not a revolution: the C-like for programmers who like C.

Precompiled binaries for the following operating systems are available:

The manual for C3 can be found at www.c3-lang.org.

vkQuake

Thanks to full ABI compatibility with C, it's possible to mix C and C3 in the same project with no effort. As a demonstration, vkQuake was compiled with a small portion of the code converted to C3 and compiled with the c3c compiler. (The fork can be found at https://github.com/c3lang/vkQuake)

Design Principles

  • Procedural "get things done"-type of language.
  • Try to stay close to C - only change what's really necessary.
  • C ABI compatibility and excellent C integration.
  • Learning C3 should be easy for a C programmer.
  • Data is inert.
  • Avoid "big ideas" & the "more is better" fallacy.
  • Introduce some higher level conveniences where the value is great.

C3 owes its inspiration to the C2 language: to iterate on top of C without trying to be a whole new language.

Example code

The following code shows generic modules (more examples can be found at https://c3-lang.org/references/docs/examples/).

module stack (<Type>);
// Above: the parameterized type is applied to the entire module.

struct Stack
{
    usz capacity;
    usz size;
    Type* elems;
}

// The type methods offers dot syntax calls,
// so this function can either be called 
// Stack.push(&my_stack, ...) or
// my_stack.push(...)
fn void Stack.push(Stack* this, Type element)
{
    if (this.capacity == this.size)
    {
        this.capacity *= 2;
		if (this.capacity < 16) this.capacity = 16;
        this.elems = realloc(this.elems, Type.sizeof * this.capacity);
    }
    this.elems[this.size++] = element;
}

fn Type Stack.pop(Stack* this)
{
    assert(this.size > 0);
    return this.elems[--this.size];
}

fn bool Stack.empty(Stack* this)
{
    return !this.size;
}

Testing it out:

import stack;

// Define our new types, the first will implicitly create 
// a complete copy of the entire Stack module with "Type" set to "int"
def IntStack = Stack(<int>);
// The second creates another copy with "Type" set to "double"
def DoubleStack = Stack(<double>);

// If we had added "define IntStack2 = Stack(<int>)"
// no additional copy would have been made (since we already
// have an parameterization of Stack(<int>)) so it would
// be same as declaring IntStack2 an alias of IntStack

// Importing an external C function is straightforward
// here is an example of importing libc's printf:
extern fn int printf(char* format, ...);

fn void main()
{
    IntStack stack;
    // Note that C3 uses zero initialization by default
    // so the above is equivalent to IntStack stack = {};
    
    stack.push(1);
    // The above can also be written IntStack.push(&stack, 1); 
    
    stack.push(2);
    
    // Prints pop: 2
    printf("pop: %d\n", stack.pop());
    // Prints pop: 1
    printf("pop: %d\n", stack.pop());
    
    DoubleStack dstack;
    dstack.push(2.3);
    dstack.push(3.141);
    dstack.push(1.1235);
    // Prints pop: 1.123500
    printf("pop: %f\n", dstack.pop());
}

In what ways does C3 differ from C?

  • No mandatory header files
  • New semantic macro system
  • Module based name spacing
  • Subarrays (slices)
  • Compile time reflection
  • Enhanced compile time execution
  • Generics based on generic modules
  • "Result"-based zero overhead error handling
  • Defer
  • Value methods
  • Associated enum data
  • No preprocessor
  • Less undefined behaviour and added runtime checks in "safe" mode
  • Limited operator overloading to enable userland dynamic arrays
  • Optional pre and post conditions

Current status

The current stable version of the compiler is version 0.5.

The upcoming 0.6 release will focus on expanding the standard library. Follow the issues here.

If you have suggestions on how to improve the language, either file an issue or discuss C3 on its dedicated Discord: https://discord.gg/qN76R87.

The compiler is currently verified to compile on Linux, Windows and MacOS.

Support matrix

Platform Native C3 compiler available? Target supported Stack trace Threads Sockets Inline asm
Win32 x64 Yes Yes + cross compilation Yes Yes Yes Yes*
Win32 Aarch64 Untested Untested Untested Untested Untested Yes*
MacOS x64 Yes Yes + cross compilation Yes Yes Yes Yes*
MacOS Aarch64 Yes Yes + cross compilation Yes Yes Yes Yes*
iOS Aarch64 No Untested Untested Yes Yes Yes*
Linux x86 Yes Yes Yes Yes Yes Yes*
Linux x64 Yes Yes Yes Yes Yes Yes*
Linux Aarch64 Yes Yes Yes Yes Yes Yes*
Linux Riscv32 Yes Yes Yes Yes Yes Untested
Linux Riscv64 Yes Yes Yes Yes Yes Untested
ELF freestanding x86 No Untested No No No Yes*
ELF freestanding x64 No Untested No No No Yes*
ELF freestanding Aarch64 No Untested No No No Yes*
ELF freestanding Riscv64 No Untested No No No Untested
ELF freestanding Riscv32 No Untested No No No Untested
FreeBSD x86 Untested Untested No Yes Untested Yes*
FreeBSD x64 Untested Untested No Yes Untested Yes*
NetBSD x86 Untested Untested No Yes Untested Yes*
NetBSD x64 Untested Untested No Yes Untested Yes*
OpenBSD x86 Untested Untested No Yes Untested Yes*
OpenBSD x64 Untested Untested No Yes Untested Yes*
MCU x86 No Untested No No No Yes*
Wasm32 No Yes No No No No
Wasm64 No Untested No No No No

* Inline asm is still a work in progress

More platforms will be supported in the future.

What can you help with?

  • If you wish to contribute with ideas, please file issues or discuss on Discord.
  • Interested in contributing to the stdlib? Please get in touch on Discord.
  • Compilation instructions for other Linux and Unix variants are appreciated.
  • Would you like to contribute bindings to some library? It would be nice to have support for SDL, Raylib and more.
  • Build something with C3 and show it off and give feedback. The language is still open for significant tweaks.
  • Start work on the C -> C3 converter which takes C code and does a "best effort" to translate it to C3. The first version only needs to work on C headers.
  • Do you have some specific area you have deep knowledge of and could help make C3 even better at doing? File or comment on issues.

Installing

Installing on Windows with precompiled binaries

  1. Download the zip file: https://github.com/c3lang/c3c/releases/download/latest/c3-windows.zip (debug version here)
  2. Unzip exe and standard lib.
  3. If you don't have Visual Studio 17 installed you can either do so, or run the msvc_build_libraries.py Python script which will download the necessary files to compile on Windows.
  4. Run c3c.exe.

Installing on Debian with precompiled binaries

  1. Download tar file: https://github.com/c3lang/c3c/releases/download/latest/c3-linux.tar.gz (debug version here)
  2. Unpack executable and standard lib.
  3. Run ./c3c.

Installing on Mac with precompiled binaries

  1. Make sure you have XCode with command line tools installed.
  2. Download the zip file: https://github.com/c3lang/c3c/releases/download/latest/c3-macos.zip (debug version here)
  3. Unzip executable and standard lib.
  4. Run ./c3c.

Installing on Arch Linux

There is an AUR package for the c3c compiler : c3c-git.

Due to some issues with the LLVM packaged for Arch Linux, the AUR package will download and use LLVM 16 for Ubuntu-23.04 to compile the c3c compiler.

You can use your AUR package manager:

paru -S c3c-git
# or yay -S c3c-git
# or aura -A c3c-git

Or clone it manually:

git clone https://aur.archlinux.org/c3c-git.git
cd c3c-git
makepkg -si

Building via Docker

You can build c3c using either an Ubuntu 18.04 or 20.04 container:

./build-with-docker.sh 18

Replace 18 with 20 to build through Ubuntu 20.04.

For a release build specify:

./build-with-docker.sh 20 Release

A c3c executable will be found under bin/.

Installing on OS X using Homebrew

  1. Install CMake: brew install cmake
  2. Install LLVM 15: brew install llvm
  3. Clone the C3C github repository: git clone https://github.com/c3lang/c3c.git
  4. Enter the C3C directory cd c3c.
  5. Create a build directory mkdir build
  6. Change directory to the build directory cd build
  7. Set up CMake build for debug: cmake ..
  8. Build: cmake --build .

Getting started with a "hello world"

Create a main.c3 file with:

module hello_world;
import std::io;

fn void main()
{
   io::printn("Hello, world!");
}

Make sure you have the standard libraries at either ../lib/std/ or /lib/std/.

Then run

c3c compile main.c3

The generated binary will by default be named after the module that contains the main function. In our case that is hello_world, so the resulting binary will be called hello_world or hello_world.exedepending on platform.

Compiling

Compiling on Windows

  1. Make sure you have Visual Studio 17 2022 installed or alternatively install the "Buildtools for Visual Studio" (https://aka.ms/vs/17/release/vs_BuildTools.exe) and then select "Desktop development with C++" (there is also c3c/resources/install_win_reqs.bat to automate this)
  2. Install CMake
  3. Clone the C3C github repository: git clone https://github.com/c3lang/c3c.git
  4. Enter the C3C directory cd c3c.
  5. Set up the CMake build cmake -B build -G "Visual Studio 17 2022" -A x64 -DCMAKE_BUILD_TYPE=Release
  6. Build: cmake --build build --config Release
  7. You should now have the c3c.exe

You should now have a c3c executable.

You can try it out by running some sample code: c3c.exe compile ../resources/examples/hash.c3

Note that if you run into linking issues when building, make sure that you are using the latest version of VS17.

Compiling on Ubuntu 20.10

  1. Make sure you have a C compiler that handles C11 and a C++ compiler, such as GCC or Clang. Git also needs to be installed.
  2. Install CMake: sudo apt install cmake
  3. Install LLVM 15 (or greater: C3C supports LLVM 15-17): sudo apt-get install clang-15 zlib1g zlib1g-dev libllvm15 llvm-15 llvm-15-dev llvm-15-runtime liblld-15-dev liblld-15
  4. Clone the C3C github repository: git clone https://github.com/c3lang/c3c.git
  5. Enter the C3C directory cd c3c.
  6. Create a build directory mkdir build
  7. Change directory to the build directory cd build
  8. Set up CMake build: cmake ..
  9. Build: cmake --build .

You should now have a c3c executable.

You can try it out by running some sample code: ./c3c compile ../resources/examples/hash.c3

Compiling on Void Linux

  1. As root, ensure that all project dependencies are installed: xbps-install git cmake llvm15 lld-devel libcurl-devel ncurses-devel zlib-devel libzstd-devel libxml2-devel
  2. Clone the C3C repository: git clone https://github.com/c3lang/c3c.git
    • If you only need the latest commit, you may want to make a shallow clone instead: git clone https://github.com/c3lang/c3c.git --depth=1
  3. Enter the directory: cd c3c
  4. Create a build directory: mkdir build
  5. Enter the build directory: cd build
  6. Create the CMake build cache: cmake ..
  7. Build: cmake --build .

Your c3c executable should have compiled properly. You may want to test it: ./c3c compile ../resources/examples/hash.c3
For a sytem-wide installation, run the following as root: cmake --install .

Compiling on other Linux / Unix variants

  1. Install CMake.
  2. Install or compile LLVM and LLD libraries (version 15+ or higher)
  3. Clone the C3C github repository: git clone https://github.com/c3lang/c3c.git
  4. Enter the C3C directory cd c3c.
  5. Create a build directory mkdir build
  6. Change directory to the build directory cd build
  7. Set up CMake build for debug: cmake ... At this point you may need to manually provide the link path to the LLVM CMake directories, e.g. cmake -DLLVM_DIR=/usr/local/opt/llvm/lib/cmake/llvm/ ..
  8. Build: cmake --build .

A note on compiling for Linux/Unix/MacOS: to be able to fetch vendor libraries libcurl is needed. The CMake script should detect it if it is available. Note that this functionality is non-essential and it is perfectly fine to user the compiler without it.

Licensing

The C3 compiler is licensed under LGPL 3.0, the standard library itself is MIT licensed.

Editor plugins

Editor plugins can be found at https://github.com/c3lang/editor-plugins.

Contributing unit tests

  1. Write the test, either adding to existing test files in /test/unit/ or add a new file. (If testing the standard library, put it in the /test/unit/stdlib/ subdirectory).
  2. Make sure that the test functions have the @test attribute.
  3. Run tests and see that they pass. (Recommended settings: c3c compile-test -O0 test/unit.
    • in this example test/unit/ is the relative path to the test directory, so adjust as required)
  4. Make a pull request for the new tests.

c3c's People

Contributors

3than3 avatar c34a avatar cpiernikowski avatar data-man avatar davidgm94 avatar gdm85 avatar its-kenta avatar jasmcaus avatar jb-perrier avatar kathanshukla avatar kvk1920 avatar lerno avatar mathis2003 avatar matkuki avatar nikos1001 avatar odnetnini avatar pierrec avatar pinicarus avatar pitust avatar pixelrifts avatar poly2it avatar raynei86 avatar sarahisweird avatar seekingmeaning avatar shv187 avatar ssmid avatar thecalculus avatar tonis2 avatar wiguwbe avatar wraithglade avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

c3c's Issues

Easy use of 3rd party libraries

The problem: there's third party library source which contains desired functionality. I want to use this functionality.

However, there could be some negative factors:

  • The library contains a lot more than what I want. Extreme example is GUI library Qt: their source code package was half a gigabyte compressed.
  • The library is written by idiomatic style completely unsuited for the project. Different naming conventions, for example. Or different style of allocators. Or they use their own thread pool. Or ...
  • Public names from the library may clash with my own names. The usual solution, making names long enough to be unique (project_module_submodule_finally_the_name) is a cure worse of the disease. Java went over the top here.
  • Someone else, not knowledgeable of library trickeries, may later try to use another part of the library. Greek tragedy follows.

How these problems could be solved:

  • One rewrites the desired functionality from scratch.
  • One tries to fix the library.
  • One uses the library as is, hoping in the best, and that he won't be around when it blows up later.

Is there some other, easier way to use 3rd party libraries?

Yes, there's a solution. It even doesn't depend on implicit imports or linearized code, it could be used with or without these features.

How would it work?

  1. Somewhere inside the project tree create directory for this 3rd party library. E.g. this-library/ (if linearized 0085.this-library/).

  2. Unpack the library into this directory, with no or minimal changes.

  3. Create source file this-library.x (or 0085.this-library.x) in the place where the directory is. The name part must be the same, that's the key. It would at the end look like:

    ...
    this-library.x
    this-library/
          ... 3rd part source code here
    ...
    
  4. This file this-library.x is now the only interface to the library. It would contain exported functions which you may use, and these functions would invoke 3rd part code. This file is what you will use in the rest of the project.

  5. No one else, but the this-library.x would be able to use the 3rd party library. By no means, no exceptions. Including 3rd party library somewhere else would be forbidden, error. Public symbols from the library won't be public (the compiler would ensure this), they would be visible only inside the this-library.x and nowhere else. The library gets completely isolated.

  6. This invisibility of 3rd party library public symbols to the rest of the project would reduce potential for name clashes.

  7. This way 3rd party library could be used safely without changes. This feels like win-win solution.


Special considerations if project is linearized and uses implicit imports:

  • 3rd party library would see only those files "before" it. If 3rd party libraries are placed at the start of the project, it would ensure relative safety against name clashes.
  • 3rd party libraries isolated this way won't need to be written in linear style (but could)
  • nothing from isolated 3rd party library would be visible to the "following" source. Only the public symbols from interface file (0085.this-library.x). This again reduces potential for name clashes.

Are there any downsides to this?

  • One needs to be very careful and notice that source file and directory have the same names, to recognize this mechanism. This could be perhaps mitigated by making these names more visible (e.g. by adding # at the start of interface file name and the directory name), but I do not like this.
  • This solution is handy only if you need small part of the library. If you want to reuse more than what reasonably fits into the single interface file, then you are left on your own.
  • It doesn't help if the library does horrible things inside, like starting its own thread pool.
  • This solution is not helping with DLLs, static libraries or object files.

This solution offers one big unexpected ability. Ability to safely use different variants of the same library.

Let say there are two versions of the same library. Both export the same symbols, both offer the same functionality. If you name interface files differently, you may use them in a single project, without fear of name clashes.

It could look like this:

    ....
    this-library.v1.3.x   // interface file for version 1.3
    this-library.v1.3/
        ... source code for 1.3 version
    this-library.v1.4.x  // interface for version 1.4
    this-library.v1.4/
        ... source code for 1.4 version
    ...

Names exported from interface files would depend on whether overloads are allowed and how they are resolved. Some tricks could be used to make it simpler. I may write about it.

However, if the library does nasty things inside, like creating process unique name (e.g. named shared section for IPC), then having two library versions is risky to impossible.

const constraint for function arguments

This is followup to the discussion about the best way to implement various constraints.

C has const parameters. They are claimed to improve safety of the code.

It has following disadvantages:

  • the const annotation is placed in function definition. As if the function writer is naturally inclined to misuse every non-const parameter. Caller of the function knows nothing about constness. IDE may help a bit here, but when you just skim the source, you get no visual hint at all.
  • It is inflexible. What if you expect no change of parameter in one place, and expect the change at some other place? Should you create two almost identical functions?
  • const is easy to subvert by casts.
  • const is often incorrectly used. const char const * anyone?
  • const data are not const transitive. Consted data may contain non-const pointer, and modification through such inside pointer is not ignored. (D fixed this.)

In C++ the situation gets even worse, doubling the codebase with identical const overloads. mutable on the top. (I even heard mutable const annotation is valid.)


AFAIK Jai decided no to use to the whole const humbug. However, I see one way, which I think is easy to use, informative and flexible.

Constness annotation would be required at caller place. It would look like:

int* p = ...
int* q = ...

// Here pointer p is supposed to be const (nothing gets mutated through it).
// Data through pointer q may (or may not) be modified.
foo(!p, q); // the "!" means NOT modified inside

// Now I expect the opposite: p is mutable, q is not.
foo(p, !q); 

If the compiler sees clear violation of the constraint (there is unconditional update via p inside function foo), then it is error. This is similar to C's const.

However, the situation may be more complicated. There may be casts, the updates may be hidden behind complex logic, const transitivity may not be implemented.

The compiler knows when this situation happened, when it is impossible to verify immutability statically. It would then employ runtime check.


In the most simple case, it would insert comparison between the old and new values.

int x = 0;
foo(!&x);

would be transformed into:

int x = 0;
int __old_value_of_x = x;
foo(&x);
assert(x == __old_value_of_x | Detailed description of error);

What if the data passed into a function are way too big to be copied?

Then the compiler could use hash of these data, and compare both hashes.

int array[1000000] = ...
foo(!array);

would be transformed into:

int array[1000000] = ...
uint __array_old_crc = __calculate_crc(array);
foo(array);
uint __array_new_crc = __calculate_crc(array);
assert( __array_old_crc == __array_new_crc | Detailed description of error);

Wouldn't it be too slow?

Well, yeah, in debug mode. Nothing is for free. Useless feature like C's const has no runtime cost, but it fails to cover complicated situations.

If the situation is absolutely certain to be checked at compile time, there would be no runtime cost.

If someone gets paranoic and annotates everything with the !, then he should also get faster machine with more memory.


Hash is not 100% foolproof, it may give wrong positive result (not spotting a change).

Correct. Only 99.999% cases would be caught. (1 case out of 2^32 for 32 bit hash.) The computer isn't omnipotent.


What if the checked data are truly massive and even fast CRC check would be too slow?

The compiler could calculate the hash for only certain amount of data. It is better to offer at least partial check.

Alternatively, the compiler may refuse ! check if the data is known to be too big at compile time. If such situation is discovered only during the check itself, there could be runtime error, telling the developer to remove the check.


What if the requirements are more complicated? You expect some part of the data to stay const, while some other part of the data is changed. C++ has mutable for this situation.

This is asking for way too much. Compiler isn't omnipotent, compiler features do not come for free. It is all left up to you.

Test the code thoroughly. Tests should have access to everything, including internal data.


Isn't "!" already used as negation?

Yes, but this could be changed. The not from C (in #include<iso646.h>) could be mandatory as negation operator. The !! trick could (should!) be also replaced by something else.

Also ! is easy to miss inside complicated math expressions. As a parameter annotation it is somewhat more noticeable.

The ! may signal some desirable quality !argument could mean the argument is immutable. fn! could mean pure function (unlike ordinary fn). There could be some other handy uses found later.


Alternatively, some other style could be used:

foo(@p);
foo(#p);
foo(_p_);

But I like ! most, it feels intuitive. (Some languages use ! as semantic marker, e.g. Ruby).


Some notes:

  • Ensuring constness is supposed to be rare situation, not obsession as in C. This would keep the overhead down.

  • Constness should be checked mainly inside tests. Builds with tests are expected to be slower to compile and slower to run.

  • The compiler inserted assert should show proper line (the line where function is called, not place of the assert).

  • Un-needed const checks, e.g. for integer value (no pointer there) should be errors.

  • The feature could be implemented gradually. First the compiler may ignore all "!", then basic compile time checks could be added, then simple runtime checks and finally more complicated runtime checks.

Assert

Standard assert. Should tie into unreachable.

How explicit imports could be eliminated

I am the one who questioned the need for explicit imports on discord. It was for the first time I tried that web, and hopefully the last one.

What I had in my mind: implicit imports (in C #includes) are consequence of the past hardware limitations. They clutter the code bringing very little of useful information.

Similarly, separate compilation of modules (in C jargon translation units) is due to ancient memory limits. It adds complexity (intermediate object files and all the mess accompanying them) and increases overall compilation time.

A language from 2019 should not be limited by constraints from the 1970's. All source files for a project should compile as one indivisible unit. This cuts down on stale versions chaos and allows global optimizations. (C/C++ does allow this, with so called "unity build". Unfortunately it is frowned upon by people living in the glorious past when machines had 8 kB RAM.)

This arrangement would also make it easier to eliminate explicit imports. Once compiler parses all source files, it knows all the symbols and can resolve them inside modules. Only if there's ambiguity, the symbol would need module name.


How about the ability to rename imported module to a better name?

This feature (namespace renaming) is available in C++. I have yet to see a code which does it. I perceive this as misfeature. If module name is really so atrocious, why not rename the file a be done with it?

How about the ability to import only some parts of a module?

IMO not needed. I cannot imagine legitimate use for it. Another impractical me too misfeature.

What else would make implicit imports more handy?

Simplicity. One file = one module. One module = one file. No exceptions. File name (w/o path and extension) is module name. That name is not repeated inside. Compiler already knows it, you know it too. If you use invalid characters, it is your fault. Rename the file.

This would allow easy file renaming, easy move of source files within the project hierarchy, easy merging or splitting files. Try anything of that in a large C/C++ project.

assert

This topic is related to testing proposal, but could be implemented/used even without any tests.

assert is underrated. In C it is an ordinary function. This cripples its immense potential.

I propose to make assert special form, plus make the compiler knowledgeable what assert means. (If my tests proposal is implemented, assert2 and verify alternatives would need the same treatment.)


I should be able to see more detailed information whenever assert fires.

void foo()
{
  int x, y, z;
  ...
  assert(x < 0 | x = ?x, y = ?y, z = ?z);
}

If this assert fires, it would show following:
"Assertion failed in file/line."
"Test in file/line was running." // the optional part
"Assert statement: x < 0"
"x = 1, y = 2, z = 3"

The syntax is intentionally different from C's printf. It is special form, after all, so why to pretend something else. The compiler would ensure that any variable after the "?" has some sensible visual representation.


Historical note: similar functionality was implemented in C++, using evil macro trickery. It looked this way, and it did work:

SMART_ASSERT(x < 0)(x)(y)(z);

http://web.archive.org/web/20180730133346/http://www.drdobbs.com/cpp/enhancing-assertions/184403745

There was attempt to add this functionality into Boost collection of libraries, but it got too complex and the effort was abandoned. I then tried to cut it down to the bare bones, but still, using it resulted in compiler consuming huge amounts of memory and painfully long compilation times. It was unusable for anything serious.


assert can implement preconditions and postconditions. No need for special syntax as in D.

void foo()
{
  assert(x < 0); // precondition
  defer assert(x > 0); // postcondition
  ...
}

The compiler should survive, that in release mode the assert gets removed and defer would look empty.


assert could be used to implement some constraints, like non-nilness.

void foo(void* p)
{
  assert(p != 0); // this says to the compiler: p is non-nil
  ...
}

...
foo(0); // the most simple way to produce compile time error

Documentation generator could discover such non-nil property and show it in the docs. Smart IDE could do it too.

Non-nil constraint could bubble up through the call tree, depending how smart the compiler is.

Compiler could also handle slightly more complicated situations like this:

void foo(void* p)
{
  if (p == 0) {
    assert2(false); // this convinces the compiler that p is non-nil
    return;
  }
  ...
}

When a constraint gets too complicated to be enforced statically (this happens incredibly fast, probably only non-nilness and simple integer ranges could be reasonably checked), at least the runtime check would be still there.


assert could potentially implement invariants for structures

struct x
{
  int i;
  int j;
  int k;
  void* p;

  assert(p != 0);
  assert(i + j + k == 10);
}

These constraints would be placed at the end of functions modifying the structure.

I am not sure about this feature. It looks nice for simple cases, but would it handle more realistic complex situations?


assert can be used to catch Heisenbugs.

To catch Heisenbug, lot of effort is needed, much more than printing out some local variable. It could be implemented like this

void foo()
{

  assert(heisenbug == false | x = ?x, y = ?y) {
     // this is block with ordinary code, which is invoked when assert fires
     // It should create very detailed description of the problem
     // and return it as string. This string would be then appended 
    // to  previously shown values of x and y

    string s = "";
    ... // fill in the string with every calculated detail

    "42"  // found the solution, return it into the assert
  }
  ...
}

If this syntax is ambiguous, perhaps this could work:

void foo()
{
  assert(heisenbug == false | x = ?x, y = ?y, 
    { "42" }
  ); //  closing bracket of assert
  ...
}

padding

Here C screwed up thoroughly. Padding bytes are invisibly added by compiler, depending on architecture and compiler settings. Expensive tools (Purify, etc) are sold to check that these inserted bytes are not used by mistake. People waste lot of time ensuring the code didn't quietly broke down, they invent rules and commenting style to deal with this.

Here is solution, which is (a) explicit, (b) checked for correctness by the compiler and (c) doesn't require new language form like an @annotation.

Padding is problem only inside the structs.

struct X     // default alignment is 8
{
  byte b;
   // padding is keyword but only inside a struct. 
   // This says that compiler should consider next 3 bytes as padding.
  padding(3 B);
  int i1; // padded to 4
  int i2; // padded to 4
  padding(4 B);
  double d; // padded to 8
  byte b2;
  padding(7 B); // making the whole structure exactly 32 bytes long
}

This way, everybody sees where the padding is, the compiler could and should check that the padding makes members aligned naturally (error if mistake is made). Padding bytes are not accessible in any way.

Missing or wrong or superfluous (padding(0 B)) padding should be an error.

If a structure is well designed and has no inside padding, there would be no "padding" keywords, everything would be nice and clean.

If C3 is compiled via C, padding could be transformed into explicit bytes in the C structure.


What If I need nonstandard padding, contrary to the best practices?

struct X
{
  byte b;
  padding!(0 B); // "!" says: no padding here, I want it that way. 
  int i; // not properly padded
  padding(3 B); // naturally expected padding
  int x; // padded to 4 B as it should
}

Non-standard struct size:

struct Y
{
  int i;
  byte b;
  padding!(0 B); // no padding at the end
}

This thing only makes sense in combination with nonstandard alignment and being part of array. This trickery could (even should) be forbidden by the language.


Is platform dependent padding possible?

struct X
{
  byte b;
  padding(if (32 bit) {3} else {7}); // some kind of compile time if  
  void* p;
}

or

struct X
{
  byte b;
  if (32 bit) { // compile time if
    padding(3 B);
  } else {
    padding(7 B);
  }
  void* p;
}

Could this be used for bit fields? Invented syntax.

struct X
{
  bit b;
  padding(2 bit);
  bit c[2];
  padding(1 bit);
  bit d;
}

(I would be careful with bitfields in the language: too messy.)


How about alignment? By default structure should be aligned to its largest member.

struct Z
{
  align (2 B); // nonstandard alignment (default would be 8 B);
  double d1;
  double d2;
}

I believe padding calculations (as suggested above) should not be influenced by nonstandard alignment. During the runtime 1 byte aligned structure Z could be placed anywhere, with actual alignment from 1 to 8. So making assumption about optimal padding is not possible.

The interaction between non-standard alignment and padding is confusing. If it was up to me, I would not allow non-standard structure alignment at all.

There even could be a special case, nonstandard alignment used only when that struct is part of array, static or dynamic. But this is probably way too big complication.

Transpiling non-standard alignment into C would require platform specific code.

I think non-standard alignment is misfeature, optimization trick gone mad.

testing

There's one project taking testing seriously, SQLite:
https://www.sqlite.org/testing.html

My ideal is to make testing trivial.


How the tests should look like:

...
... // code
...

TEST()
{
  assert(1 + 1 == 2);
}

TEST()
{
  assert(2 + 2 == 5);
}

...
... // more code
...

That's all. Not a single character more. No unique names, no manual registration of tests, no #includes of frameworks. No set up/tear down vomit. No other infrastructure.

Just type these 8 characters and 3 newlines and have new empty test ready to run. I can do it in second or two. Anything taking more time is unacceptable.


It should be possible to run tests both in debug or in release mode. The latter is important, to make sure the optimizer or I didn't screw up.

I envision copying section for debug/release target under name debug-with-tests/release-with-tests, and adding TESTS define there. That should be all to enable the testing.


The old assert is good enough to do the tests. I'd seen test frameworks defining dozen assert like functions, e.g. specific assert to compare two floating point numbers. This is not needed.

However, the tests should be run also in release mode. asserts usually works only in debug mode, because there are so many of them. They would kill release mode performance. I am defensive programmer, I use assert more than any other construct. I check even completely trivial things, and found at least one bug in compiler this way.

So my proposal is to have 3 variants of asserts:

  1. the traditional assert. Not active in release modes. Could be placed anywhere.

  2. "assert for release mode". Active if and only if tests are enabled. Available only inside TEST(), nowhere else. I named it verify.

    TEST()
    {
      assert(1 + 1 == 2); // does the check in debug mode, in release it is empty test
    }
    
    TEST()
    {
      verify(1 + 1 == 2); // works both in debug and release modes. 
                          // MUST NOT be used outside tests
    } 
    
  3. "assert used for impossible situations". Many error situations are almost impossible. I place assert(false) into such paths, to be doubly sure.

    It would look like this:

    ...
    if (impossible-situation) {
       assert(false); // cannot happen
       return;
    }
    ...
    

However, within tests I try to provoke such impossible situations, and then check whether the code reacts properly. The old assert(false) would always fire, which I do not want, in this very specific situation.

I solved it by adding yet another form of assert unimaginatively named assert2. It fires only in debug mode and only when no test is currently running. In other words, it does not fire during the actual testing.

It would look like this:

```
bool foo() {
   ...
   if (impossible-situation) {
     assert2(false);
     return false;
   }
   return true;
}

TEST()
{
   ... // prepare impossible situation
  verify(foo() == false); // the assert2 won't fire, the test would pass
}

... // normal code
foo(); // if impossible situation happens, assert2 fires.
```

All kind of asserts should be able to find out, whether there's a test currently running, and if so, then in what is the file/line of the test. This information should then be shown.


Test runner. This is how I call the code actually running the tests.

The most basic test runner could be provided in stdlib. Just two functions is all needed.

It could look like:

TEST()
{
}

void main()
{
  #ifdef TESTS
    if (some hint-indicating-that-I-want-to-run-tests) 
    {
      run_all_tests(); // stdlib function which magically invokes all tests
    }
  #endif
 
   ... // normal code
}

Or I may like to run tests only from recently modified source files. That's surprisingly easy to do in C.

How it may look like:

void main()
{
  #ifdef TESTS
    // stdlib function which magically invokes tests
    // from files modified within last 2 hours.
    //
    // Run *every* time the application starts 
    // since it would cause only minimal delay.
    run_recent_tests(); 
  #endif
 
   ... // normal code
}

Developer may develop his own test runners. For this tests could have attributes. Attributes should have completely free format.

It may look like:

TEST() // no attributes test
{
}

TEST(name = "aaa") // named test
{
}

TEST( time < 1 ms) // test which should run under 1 millisecond
{
}

// test with many attributes
TEST(name = "aaa" | foo | bar(baz) |  10 ms < time < 30 ms | something^2 )
{
}

"|" is attribute separator.

Custom test runner would be able to access the attributes, and act accordingly.

For example, if there's explicit time limit, the runner would check that the test is faster. If it not, it could repeat the test up to 5x, and if it is always slower, show error. (Test could have some reasonable default timeout, to catch unexpectedly slow tests. I used 30 milliseconds. Whatever was slower had to be annotated.)


Tests could serve as leak detectors!

Test runners could also check other things. For example, make sure that the test does not leak. It was quite hard to implement, but it did miracles for me. No more silly leaks, tests caught everything. These checks were done automatically, no extra work for me, no silly noise within the test.


One test attribute needs compiler support. Sometimes I want to be sure that certain code does not compile. Uncompilable test could do it:

TEST(uncompilable) // no other attributtes allowed together with "uncompilable"
{
  ... // something what compiler refuses
}

The compiler would verify that the code fails. The test itself wont be passed to any test runner.

Content of uncompilable test should be syntactically correct. E.g. no misplaced parenthesis. If such tests are needed (I have doubts), then the uncompilable feature could be extended.


These test could be also used to collect profiling data.

TEST() // ordinary test
{
}

TEST(profiling)
{
  ... // do something useful for profiling
}

Special test runner would pick up only the tests intended for profiling and run them. This way relevant data could be gathered by the profiles.

Performance regression data could collected similarly.


Here's a biggie. It is often desirable to run tests checking code which deals with files, network, databases and other slow and clumsy things. However, I do not want these tests to take forever.

What I want is to replace slow APIs like fopen with something on mine, something what only mocks the real functionality, in no time. And I want to do it easily, with no complicated mocking etc.

How it may look like:

TEST()
{
   replace fopen = func () { ... } // replaced stldib fopen with my code
   replace flose = func () { ... } // my fclose alternative
    ...
    ...  // code indirectly invoking many fopen/fclose
}

The compiler would understand what I am trying to do. It would (if testing is allowed) replace all calls to fopen to invoking function pointer, which by default points to the old good fopen.

However, inside that test, fopen pointer would be replaced by my own function. At the end of the test, the compiler would return back the original fopen.

Similarly, constants could be replaced by value in memory, set and reset within relevant tests. E.g I may want to change some timeout constant to do the test quickly.

TEST()
{
  replace TCPIP_TIMEOUT = 0.001; // default is 2 hours
  ...
  ... code checking whether timeout works
}

Handy feature from Zig language. Sometimes tests share lot of initial and ending code. This may be solved like this:

TEST()
{
  common-set-up-code();

  TEST()
  {
    test1-code();
  }

  TEST()
  {
    test2-code();
  }

  common-tear-down-code()
}

This would transform into:

TEST()
{
  common-set-up-code();
  test1-code();
  common-tear-down-code()
}

TEST()
{
  common-set-up-code();
  test2-code();
  common-tear-down-code()
}

Handy idea from D language. Tests do not need to be only at top level.

struct x
{
  ...
  TEST()
  {
    .. test related to this structure
  }
}

void foo()
{
  if (impossible) {
    TEST()
    {
      ... // test invoking the impossible situation, makes sense to be here
    }
    return;
  }
}

These tests are nice to have feature for special circumstances.


Tests should have access to all parts of structure, whether private or public. Really thorough testing needs this.


Having many tests makes source file potentially very, very long. (I had no problem to have thousands of lines of code with tests in a single file.) I propose to have "test companion file" feature.

If something.x is source file, then something.tests.x would be the tests companion file. This companion file would be allowed to contain only tests and their helpers. Nothing could be exported out of this file. Tests in such file would have full access to the "parent" something.x, as if it was appended to the end of this parent file.

Potentially useful sub-feature is to give reader hints about existence of tests in companion file:

--- something.x = source file ---
...
TEST() // ordinary test
{
}
// forward of a test, it has to be in test companion under this name
TEST(name = "x"); 
TEST(name = "y") // another forward

TEST() // normal test again
{
}

---- something.tests.x  = test companion file ---

// all forward tests need to be here, in the same order, 
// for easier lookup
TEST(name = "x")
{
}

TEST(name = "y")
{
}

Both options to place tests, in source file and in test companion file should be supported at the same time. Sometimes the first way is better, sometimes the other one.


The testing framework expects that individual test either finishes (= everything works as expected), or the execution ends with an assert being fired. This stops the program (e.g. breaking into the debugger).

I do not support failing tests. If something goes wrong and assert fires, that bug should be fixed. Period. Collecting failures makes no sense.

However, if failing tests are desirable, they could be supported. It would increase complexity of the testing infrastructure, though.


I hope I didn't kill anyone with this long text. In D language forum someone once claimed that D's unittest is the most useful feature of the language.

It is possible to do even better, and it is not that hard, compared to many language features. Large parts of test support could be done via libraries.

project configuration file

The problem: project grows too much to be build by hand. Several compilation targets are needed, too many source files and dependencies between them.

Traditional solution: use makefile. If makefile gets too complicated, try makefile generator. If this doesn't help, why not try something like bjam, Turing complete tool for masochists. Or you may try this XML based tool. XML is the future, I heard! Big company could hire build engineer.


Could this unsolveable problem be fixed?

Yes, if the old way of building it object file after object file is dropped.

  1. Every project with more than one source file would require mandatory project configuration, a file with fixed name (e.g. c3-project.cfg) in project root.

  2. This file would contain complete information how to build a project.

  3. The compiler will have only one option, which target from the configuration to build. The other parameter would be the project root itself.
    c3 -debug-with-tests c:\my_project_root_directory

    debug-with-tests is the desired target, fully described in the configuration.
    There will be no other command line options available in the compiler.

  4. The compiler would then read project configuration, extract what to build and how, and will do it.


How would the project configuration look like?

Like an INI file. Not like makefile or other abominations. There would be no smartness inside, no trickery, no shows how clever one can get.

Example:

[global-settings]
enforce-parenthesis-around-binary-operators = *true | false
implicit-paddings-allowed-in-structs = true | *false
allow-utf8-characters-in-soure = none | *in-strings-and_comments | anywhere
...

// every single compilation target described here, 
// completely and independently to other target
[debug-with-tests]
defines = DEBUG, TESTS
optimization-option = *O0 | O1 | O2
libraries-path=...
...

[debug-no-tests]
defines = DEBUG
optimization-option = *O0 | O1 | O2
libraries-path=...
...

[release-with-tests]
....

[release-full-speed]
...

Wouldn't it mean lot of duplication?

Yes, and that's intentional.

If you use the same file path on 20 different places, so be it. Copy it 20x, and if you need to change it, do it in all 20 places.

There should be no possible doubt how project is being build. Any smartness, shared settings, settings computed on the flight, clever use of tool features is strictly forbidden.


Could such configuration be made foolproof?

Yes and this is the most important part of this post. The compiler would know exactly what the configuration has to contain and will complain if something goes wrong.

  1. The compiler will check that all items expected in configuration are present, in the correct order, without typos. If e.g. global settings implicit-paddings-allowed-in-structs is missing, it would be error. Error message should say what is expected.

  2. The compiler will refuse all items it does not understand (typos, previous version settings). No garbage would be allowed in the configuration.

  3. All available options for a settings need to be present in configuration file:

    implicit-paddings-allowed-in-structs = true | *false
    optimization-option = *O0 | O1 | O2
    

    This in order to make it absolutely clear what are the available options, to avoid guessing games. Exactly one of these options has to be selected, e.g. by *. No typo or wrong order among the options would be allowed.

  4. Options which are not finite (e.g. list of libraries) would be verified as much as possible. Paths would be checked for validity. No effort to make it foolproof should be spared.

  5. Problems during building should result in complete error information. No effort should be spared here either.


Wouldn't it be lot of work to implement?
Yes, it would.

Is there some way to run other build systems (e.g for the C part of the project)?
Yes, there could be some command to run external process. It would be at user's full risk.

What are some other advantages of this approach?

  1. It does not rely on cryptic and fragile compiler command line. There's exactly one place where all nicely named build options are stored. Unlike makefiles these options are easy to edit.
  2. Creating new build target is easy. Just copy the most close section and modify it. If you do not know what these options mean, bad luck. Deleting target is equally easy.
  3. You see it everything in one place, in expected order, and with all possible options. This is something what no other system offers.
  4. Typical mistake - "Oops, I forgot to specify this" - will be very limited.

Jai uses its language to create build the targets.
It is not needed.

Should the code have access to used build options during compile time?
Yes. An artificial example (not recommended to be used in practice): depending on optimization for size or speed, during compile time different code path could be selected by compile time if.

What if I make mistake in the configuration?
Your problem. At least, you have it all in one place, this makes it easier to spot bugs.


Hypothetical, very advanced IDE, could show what are effects of configuration changes within the project. E.g. which code is used because of this option, where that option makes a difference.

Enumset

Store of enums by ordinal.

defer

defer is very useful, but could potentially be even more. (I intentionally ignore defer for errors for now.)


It may be handy to cancel a defer sometimes.

func void test(int x)
{
    // defer may produce a symbol, which could be used inside the function
    some-symbol-name = defer printf("A");
    if (...) {
      un-defer[some-symbol-name]; // this cancels the defer action
     }
     if (...) {
      un-defer[some-symbol-name]; // multiple cancellation is OK
     }
}

un-defer would replace manual fiddling with boolean flags.


It may be also handy to change what is being deferred.

func void test(int x)
{
    some-symbol-name = defer printf("A");
    if (...) {
      re-defer[some-symbol-name] printf("B"); // use this action, not the previously set one
     }
    if (...) {
      re-defer[some-symbol-name] printf("C");
     }
}


Non-scoped defer. The ability to have function scoped defer at any place.

func void test(int x)
{
    if (...) {
      // will fire when function ends, not when the current scope does
      defer<function-scope>  printf("A"); 
     }
}

Defer action wouldn't be allowed to use local scope variables.

Symbol for such defer would also have function wide visibility:

func void test(int x)
{
    if (...) {
      my-symbol = defer<function-scope>  printf("A"); 
     }
    if (...) {
      un-defer[my-symbol];
    }
}

When defer symbol is generated, it has to be used. Unused symbol is error. Symbols should be unique per function. Deferred block should not contain ordinary return or error return.

Defer placed directly inside defer section should not be allowed, to keep the code sane.

mandatory space around binary operators

The language could mandate space on both sides of every binary operator. So instead of:

x*=(y+1)*z;

it would need to be written as:

x *= (y + 1) * z;

which is common practice anyway.

In addition to readability, it would make opportunity to use - inside the names. Words separated by minus (more precisely by typographic element dash) are somewhat easier to read than words separated by underscore (which is recent typewriter hack) or even than CamelStyle.

Ability to read project sources like a book

Programmer gets someone's else code to grok. Now he has two immediate problems:

  1. Where to start reading?
  2. Where to go next?

He would select some part randomly, and then jump back an forth, trying to understand at least something. This is pain for humans who are since early age trained to read texts linearly. IDE may help a little bit here, but the intensive pain remains.


Are there some languages which could be read linearly?

Yes. I am aware of two systems.

  1. Forth languages, at least those which use numbered code blocks instead of source files. They use nifty technique called hyper static scoping. Newer definitions simply overwrite old definitions, but the old code uses whatever was actual before.

  2. Experimental Lisp written in (simple) C++ called Wart ( sources, description in a blog ). The author decided to make it simple. His source files are named like 003tokenize.cc and 011types.cc and contain ordinary C++ code.

    When time to compile it comes up, a simple tool collects all these files, joins them orderly into one big source file and compiles the result.

    This file structure allows one to read the whole project like a book. Every bit of the code there depends only on previous code. No need to jump back and forth.

    Documentation and test files also have their number, this further improves readability.

    Reading such code was unusual joy.


Could a language similar to C "linearized"?

I believe so, and I saw it as the major feature of my dreamed up C-like language.

How would it work?

  1. All source files would have numeric prefix, like 0020.something.x or 0140.something-else.x. That numeric prefix would be unique and used only for ordering purpose. So for example module in file 0040.my-module.x would have module name my-module. This invisibility of numbers inside the code makes it simpler and easier to reorder.

  2. Project directories would follow the same rules. Directory with source would have a number, and the sources inside too. A project may look like:

    0010.first-source.x
    0020.second-source.x
    0030.directory1/
            01.sub-part1.x
            02.sub-part2.x
            03.sub-part3.x
    0040.third-source.x
    0050.directory2/
            1.another-subpart1.x
            2.another-subpart2.x
    0060.fourth-source.x
    
  3. Source file would "see" all files "before" (using implicit imports). So in previous example file 0040.third-source.x would be able to call anything from files 0010.first-source.x and 0020.second-source.x, and also anything exported by source files inside the 0030.directory1/. Things that are defined in "later" files (like 0060.fourth-source.x) are not accessible.

    This arrangement makes it easy to read the code step by step. At every moment one knows where to go next, and knowledge of the ""previous" code is enough to understand the "current" code.

  4. If really needed, one could forward declare and invoke functions implemented later, like C does. This was expected to be rare exception.

  5. Single source file project would not need to use this scheme. Any file name could be used. This would be special case.


What are other advantages of the above solution?

  • It reduces risk of overloading gone wrong. Once you write code and it works, it doesn't matter that at some time later you mistakenly reuse some old name. The old code won't be affected, only the new code would complain and this is easier to fix.
  • IDE (in my case Visual C++) shows the numbered files nicely sorted.
  • Ordered approach also allows developer to have single file with all project wide constants as the first file. (This file is different and unrelated to project build configuration!) Instead of having important constants and project wide macros spread randomly at unknown places, they would be in one, well known place. I planned to make it mandatory, with fixed name for that file.
  • This approach would also help with reading individual libraries, if they depend only on standard library or their dependencies are manageable.

What are disadvantages of the above solution?

I tried hard to iron out troubles and corner cases, and I believe I covered them all.

However:

  • This approach would require iron discipline, not just in file naming, but especially in avoiding forward declarations. I see discipline as good thing, but that's just an opinion. Quick and dirty style fans would be frustrated.
  • Third part libraries would need to follow the same rules (or be easy to change). While they could be - in theory - handled differently, this exceptionalism would make big mess and devaluate the original idea.
    This requirement would also apply to stdlib.
  • Reordering the project is more demanding than with the mud ball approach. One would need to ensure "linearability" all the time.
  • This approach needs implicit imports. Explicit imports would destroy its advantages.

Could a language offer choice to select between the traditional approach and the enforced linearization?

Possibly. I planned to use only the linear approach (with exception of one source file projects) to keep things simple. Allowing either this or the traditional approach (per project) would increase complexity of the compiler a lot.

Var arrays

  • Var array initializers
  • Var array append
  • Var array alloc
  • Var array free
  • Var array casts -> subarray, pointer
  • Var array slice copy -> subarray, pointer, array, vararray

very-very-long-name as short-name

This is proposal for feature, mainly intended to shorten long names, also could be used to rename modules.

Long descriptive names are good. Schools beat this obvious truth into every beginner. However, using long names is pain, people find rather soon.

Some languages noticed this ambivalence and try to remedy it.

  • For C++ I saw recommendation to use
    const auto x& = very_long_name;
    ... // use the short x as alias
    One must hope compiler will optimize the inner pointer away.
  • D has alias:
    alias very_long_name = short_name;
    Unfortunately. D also uses alias to do some type trickery (changing struct type to another struct type, I couldn't understand it when I read the docs).
  • C has nothing. Well, macros could be used here, but not even heavy user of the preprocessor would recommend this.

Could such a feature be done right?

Yes. I'll describe it right now. It would look like:

very-very-long-name as short-name;

It would work as compiler verified text replacement. No other fancy features or inside trickery.


Use as module renamer:

At the top of a module (file) there could be statements like:

my-long-module-name as short-name;
module.submodule1.submodule2 as foo;
  • The rename will be visible and valid until the end of the module (end of the file).

  • The compiler would check that module names exists and are valid.

  • It should also ensure that the new short name is used at least 2x (or perhaps 3x or 4x). Unused renames would not be allowed.

  • The old name would disappear, its use would mean error. This is to reduce confusion due to two names for one thing.

  • Only part of full module name could be renamed:
    level1.level2.level3.leaf
    could be renamed as
    level1.level2.level3 as foo;
    and used this way:
    foo.leaf.xyz

  • Module rename should be allowed only at the top of module (file), not in the middle or even within other constructs. If such a need ever arises, it would be easy to allow it later.

  • If there are multi-file modules, I'd recommend to allow renaming only at the top of the first file. Other ways would be just too confusing.

Generic (parametrized) modules could be used this way:
vector(double, float, int) as vec;

Potentially, module rename could be made public, to be seen by other modules. However, I do not like it. Small innocent change in one place could cause troubles at the other end of the project.
public very-long-name as short-name;
or
=====very-long-name as short-name;

Another potential feature would be the ability to hide certain module (with impact on symbol resolution), if the language has implicit imports. I do not like this either, though.
very-long-name as; // make the whole module very-long-name disapear


Use to shorten function argument names:

It would look like this:

void circle(
    int x-coordinate-of-circle-center-in-mm   as x,
    int y-coordinate-of-circle-center-in-mm   as y,
    int radius-of-circle-in-mm                as r
   )
{
   ... // here I can use x but NOT x-coordinate-of-circle-center-in-mm
}

Inside function body x, y and r would be allowed, not the long descriptive names.

Now you may wonder, why not do this:

void circle(
    int x,  // x-coordinate-of-circle-center-in-mm
    int y,  // y-coordinate-of-circle-center-in-mm
    int r   // radius-of-circle-in-mm
   )
{
   ...
}

Reasons:

  1. The long name would be presented in documentation, in IDE hints, etc. The short name makes sense only locally, inside the function body.

  2. Named function arguments should use the long name, not the short

    void foo(int long-descriptive-name as x) { .. }
    ...
    ...
    ...
    ...
    ...
    ... // in far distance from the function definition
    foo(long-descriptive-name = 1); // informative
    foo(x = 1); // virtually useless, NOT allowed
    

Replacing long names within function body:

The renaming could be used to shorten chains of names. It is similar to my crazy ditto feature. (I feel these two features would complement each other.)

void foo()
{
    something1.something2.something3.x = 1;
    something1.something2.something3.y = 2;
    something1.something2.something3.z = 3;
}

could be rewritten as:

void foo()
{
    something1.something2.something3 as bar;
    bar.x = 1;
    bar.y = 2;
    bar.z = 3;
}

It should have restrictions like the module rename:

  • Renamed names need to be valid. The names also need to be complete, i.e. no rename of just half of some name.
  • Renamed names had to be used at least 2x (3x, 4x) inside the function body.
  • Rename would be valid within current context (whole function, code block).
  • The could be artificial limit on max allowed number of rename definitions within single function (e.g. 1 or 2).

Renaming within function body should have some restrictions, no to confuse reader:

void foo()
{
  bar().baz().x = 1;
  bar().baz().y = 2;
  bar().baz().z = 3;
}

could be renamed as:

void foo()
{
  bar().baz() as a;
  a.x = 1;
  a.y = 2;
  a.z = 3;
}

but it hides function calls happening inside. I do not like this.

It is possible to go really wild, with parameters:

void foo()
{
  bar(10).baz().x = 1;
  bar(20).baz().y = 2;
  bar(30).baz().z = 3;
}

into:

void foo()
{
  bar(_).baz() as a; // the _ means free parameter
  a(10).x = 1;
  a(20).y = 2;
  a(30).z = 3;
}

but I dislike this even more.


as is visually unremarkable. Are there alternatives?

For example ===> could be used:

long-module-name ===> short-module-name;
void foo(int long-name ===> x) {}

Word alias could be used, as in D, if easy parsing requires it:

alias long-module-name = short-module-name;
void foo(int long-name alias x) {}

Enum type values

min, max, array:

enum MyEnum
{
  A,
  B = 100,
  C = 3
}

MyEnum.max => 100
MyEnum.min => 1
MyEnum.array = MyEnum { A, B, C }

Add $assert

$assert is running compile time asserts. This is static_assert in C++

String switch

string x = ...
switch (x)
{
  case "a": ...
  case "foo": ...
}

operator precedence

Righ now the documentation presents this simple table:

  1. (), [], ., postfix ++ and --
  2. prefix -, ~, prefix *, &, prefix ++ and --
  3. infix *, /, %
  4. <<, >>
  5. ^, |, infix &
  6. +, infix -
  7. ==, !=, >=, <=, >, <
  8. &&, ||
  9. ternary ?:
  10. =, *=, /=, %=, +=, -=, <<=, >>=, &=, ^=, |=
  11. ,

It feels like a part of some mnemonic quiz.

Is this thing really needed? After decades of programming I am unable to recall operator priorities in C.


What do the different languages do?

  • One implementation of Prolog has 1,000 priorities available for custom operators. I am not kidding. Nobody uses Prolog.

  • Smalltalk does not use priorities: 1 + 2 * 3 is 9, not 7. If you don't like it, add parenthesis. Nobody uses Smalltalk.

  • Lisp does not have operators and is missing the whole problem. Unfortunately, nobody uses Lisp.


What would be the simplest, most natural and most readable solution?

Use parenthesis for all binary operators, always, without exception. It clears all doubts, ambiguities and misunderstandings. Sane programmers do it already. Programmers coding in several similar languages must do it.

Examples:

// When parenthesis are not needed:
a + b + c // does not need parenthesis, evaluated left to right, as (a + b) + c
a - b - c  // ditto
a / b / c  // ditto
a * b * c  // ditto

// When parenthesis are required:
a + (b * c)
(a + b) * c
if ((a < b) && (c <= d)) ...
(a << 2) + 1
x *= (a * 2)

In addition to simplicity and readability, there is one more advantage: order of evaluation can be fixed (unlike C).
The compiler then won't be able to reorder expressions, bringing in unexpected integer overflow/underflow or loss of FP precision. This is important if the language doesn't care about overflow/underflow. (Important topic which should be documented in detail.)

The result C could could reflect this predictability. (This is yet another important topic. Do C promotion rules impact result of C3 expressions, or not?)


Are there some disadvantages?
Big one: no IOCC like contests.
If an expression has way too many parenthesis, it is signal to split it into simpler parts.


Btw, operator , (comma) in C is miscarriage and should not be allowed.

Create main wrapper for nicer main parameters.

Main should be either of the following:

func void main()
func void main(string[] args)
func void main(char*[] args)
func int main()
func int main(string[] args)
func int main(char*[] args)

This is done by creating a _main which calls the correct main. This will also fix #35

Static initializers

Static initialization is described in the idea section.

Globals and constants should be possible to initialize automatically at startup.

Expanded testing framework

We need the following tests:

  1. Semantic and syntax error checks: that certain code generates the correct syntax / semantic error
  2. Conversely, that some constructs pass semantic analysis.
  3. Check LLVM IR output.

Bitstruct

bitstruct : int
{
  int foo : 3;
  int bar : 2;
  bool x : 1;
}

Enum functions

module foo;
struct MyEnum
{
  A,
  B = 100,
  C = 3,
}

MyEnum.fromOrdinal(1) => MyEnum.B
C.ordinal() => 2
MyEnum.fromName("A") => MyEnum.A
B.name() => "B"
enum.fromFullName("MyEnum.A") => MyEnum.A
B.fullName() => "MyEnum.B"
enum.fromQualifiedName("foo::MyEnum.C") => MyEnum.C
A.qualifiedName() => "foo::MyEnum.A"

errors

Here is proposal for better syntax for doing errors. It is in the spirit of existing mechanism, but simpler (1 keyword instead of 5), cleaner (less visual noise) and safer (compiler keeps track of all errors).

It is safer, but not 100% safe against certain bugs, just like the previous mechanism.

Follows step by step explanation, every step with an example.


This is function which does not produce any error, neither it receives any error from within the function body. Compiler records this important property somewhere.

fn void foo()
{
}

This is function which may produce error. This error does not need any previous declaration or other ceremony. The function signature does not need to be modified.

The compiler knows that the function could produce error, and notes it somewhere.

fn void foo()
{
   if (...) {
      //
      // err is THE ONLY keyword of error mechanism
      //
      return err.MY-ERROR-OCCURED; 
   }
}

This function could produce several different errors. They do not need to predefined, to belong into some group, nothing like that. The compiler notices all possible errors and keeps tabs on them. No error information is lost or ignored.

fn void foo()
{
   if (...) {
      return err.WHATEVER1; 
   }

   if (...) {
      return err.WHATEVER2; 
   }
}

If some code calls a function which could produce one or more errors, this code MUST handle these errors.

Forgetting any error won't be allowed. This would be compiler error:

void foo() { return err.XYZ; }

void bar()
{
  foo();
}   // <== unhandled error (XYZ) at this point, compilation fails

Here function bar doesn't handle the error that could be received from foo. Compiler reports and stops.


The simplest way to handle an error is to move it up the call stack.

The compiler notices, that this new function now produces errors and which ones. It gets all recorded somewhere.

void foo() { return err.XYZ; }

void bar()
{
  foo();
  foo();
  foo();

  // equivalent of C++: catch(...) { throw; }
  if (err) { 
    return err;
  }
}

The compiler now knows that function bar may produce error XYZ (from foo).

Every possible error is rethrown in the bar function, by:

  if (err) { 
    return err; // safe, no error information lost
  }

Compiler is smart enough to recognize meaning of if(err).


The function may decide to handle received errors.

void foo()
{
  if (...) return err.X;
  if (...) return err.Y;
  if (...) return err.Z;
}

void bar()
{
  foo();
  foo();
  foo();

  switch (err) {
   case X: ;  // error consumed, is no more
   case Y: return err; // this error is rethrown up the call stack
   case Z: return err.SOMETHING-ELSE; // different error is produced
  }
}

The compiles sees that every single possible error is handled by the switch. This is good, otherwise the compiler would complain.

The compiler also notices that function bar may produce errors Y and SOMETHING-ELSE.


Catching and handling all errors in one big switch may not be the always desirable design.

void foo()
{
  if (...) return err.X;
  if (...) return err.Y;
  if (...) return err.Z;
}

void bar()
{
  foo();
  ...
  if (err == X) { } // error X handled (by swallowing it), Y and Z still active
  ...

  // all remaining possible errors handled now
  switch (err) {
   case Y: ;
   case Z: ;
  }
}

The compiler would keep tract of all errors throughout the function body, remove those which were handled, adding new ones. At the end of the function, no error is allowed to disappear without handling.


The compiler would guard against non-existent errors.

void foo()
{
  if (...) return err.X;
  if (...) return err.Y;
  if (...) return err.Z;
}

void bar()
{
  foo();
  ...
  if (err == TYPO) { }  // here compiler complains. No such error could be caught here.
}

This fails to compile.


Error handling would be intuitive, regarding scopes.

void foo()
{
  if (...) return err.X;
  if (...) return err.Y;
  if (...) return err.Z;
}

void bar()
{
  foo();
  ...
  if (...) {
    foo();
    if (err == X) {} // handles X from the second foo
  }
  ...
  if (err == X) {} // handles X from the first foo
  if (err == Y) {} // handles Y from both calls
  if (err == Z) {} // handles Y from both calls
  // every error handled by now
}

The traditional rules for using goto would need to be observed for errors. E.g. error cannot skip initialization of a value used later.


The main function has to catch all errors happening inside.

Every thread main function has to catch all errors happening inside.


Address of function which can produce errors cannot be taken. This fixes unsolvable problems with function pointers. I'd thought a lot about it, and there's no other way.

This way address of function which can returns error would not be passed into a C code. Another potential disaster avoided.

void foo() { return err.FOO; }

void bar()
{
  &foo; // NO WAY
}

If there's single thing from this proposal that should be accepted, it is this.


Errors should not be allowed to leave any TEST. (There may be a demand to allow tests to fail, by not handling certain error. I do not like this, but it is up for consideration.)

void foo()
{
  if (...) return err.X;
  if (...) return err.Y;
  if (...) return err.Z;
}

TEST()
{
  foo();
  if (err == X) { assert(false); };
  foo();
  switch (err) {
  case X:; // from the second call
  case Y:; // from both calls
  case Z:; // from both calls
  }
 // no error left here
}

Using errors does not mean 100% safety against mistakes. It is possible to leak due to use of errors.

void bar()
{
  consume-two-pointers(allocate-something(), allocate-something());

  switch (err) {
  ...
  } 
}

If the first pointer is successfully allocated, and the second call fails with error, then the first pointer results in a leak.

The solution is to test the code thoroughly, to find out such situations, and then rearrange the code.

It should eventually look like this:

void bar()
{
  void* p1 = 0;
  void* p2 = 0;

  p1 = allocate-something();
  p2 = allocate-something();

  consume-two-pointers(p1, p2);

  if (err) {
    if (p1) free(p1);
    if (p2) free(p2);
  }

  switch (err) {
  ... // handle individual errors
  }
}

The defer catch construct can look like:

defer if (err) return err; // any error rethrown

This would be exception to the general ban to use of return inside defer.

defer if (err == XYZ) return err.ABC;  // specific error consumed, new one returned

and

defer switch (err) {
  case ... // all errors that can occur need to be handled here
  case ...
  }

The compiler would need to ensure that defer processes all unhandled errors, and does not try to deal with already handled errors. This would be bug.

Since this could go tricky fast, really good compiler error messages are needed here.


Explanations, why I choose or avoided certain features.

  • Instead of throw XYZ I use return err.XYZ. One keyword less, readability remains.
  • I do not use error sets. Not needed. The compiler knows all the names and won't allow any mistake. Documentation generator/IDE can find out all possibly produced errors and show them. Another keyword gone.
  • The throws function annotation is either useless, or too detailed, and it seems not to be checked by the compiler. I made it shorter, less noisy, saved one keyword, and have it all verified by the compiler.
  • I replaced catch with switch (err). One keyword less, readability remains.
  • The try keyword is not very helpful. I got rid of it. The compiler makes sure that nothing goes unnoticed, this is better than some noise spread throughout the code.
  • Catching error subset is dangerous misfeature, prone to blow up unexpectedly.
  • Similarly, I do not allow default branch in switch (err). Too dangerous.
  • try ... else is superfluous and visually rather confusing. I use better ways.

Not allowing to take address of error returning function is the only way to deal with the problem. I'd thought about it for long time, and unless one spends lot of time inventing some very complicated tricks, it cannot be done correctly. Safer interfacing with C is then welcomed side-effect.


Minor considerations:

  • My solution does not require "one size fits all" internal implementation of errors with an union. The potentially better ABI would be still stable and predictable.

  • Converting an error into name string is still possible, though I see little use for this.

  • Converting error into an integer is still possible, though I see little use for this. The compiler would assign unique, nonzero number to every mentioned error in the project. When all source files are built together, this is trivial.

  • The compiler needs to be vigilant against errors getting lost. Examples:

    void foo() { return err.XYZ; }
    
    void bar()
    {
     foo();
    
     if (err) { 
       if (...) return; // <<== not allowed, some error may disappear 
                        // without explicit handling
       return err;
     }
    
     foo();
     if (err) {
       return err.MY-NEW_ERROR; // <<== not allowed to replace 
                               // many possible errors w/o explicit handling
     }
    }
    
    void baz()
    {
     foo();
    
     if (err == XYZ) {  // <<== explicit handling occurred here
       if (...) return; // OK
       return err;
     }
    }
    
    
    
    void foo2() { return err.XYZ2; }
    
    void baz2() {
     foo();
     if (err) {
       foo2(); // can have error, this has to be handled
       if (err) {
         return err; // produces XYZ2 from foo2
       } 
       return err; // produces XYZ from foo
     }
    }
    
  • "No memory" error may receive special handling. Some projects may decide not to handle low memory situations.

    There could be an option in project configuration file, in its "global settings" part. If the option "on-low-memory-do-exit" is selected, then allocation functions (there has to be some way to annotate them as such) won't produce the usual NO-MEMORY error, but will terminate the process. All existing code (e.g. foreign libraries) that handled NO-MEMORY situation then could be safely ignored by the compiler.

  • There should be also project configuration option not to allow errors. If I do not like this feature, I'd ban it and no code with error handling would compile.

Safe varargs

func void test(int... foo)
{
   printf("args = %d\n", foo.len);
}

Auto conversion to a subarray.

enums

Enums, as they are now, are better than in C, but could use few more improvements:

  • switch on enum should be exhaustive. Possibly even more, not allowed to use the default branch. This is to avoid errors when you add new enum value, and quietly, some place becomes buggy.
  • enums which are used as named numbers should require all enum items to have explicit numeric value. No funny autoincrement and its restarts, no hidden bugs. No iota.
  • enums with numeric value should be always sorted up when defined, just to minimize potential bugs.
  • enums with numeric values could be also (distinct) floats. The limitation on integers is artificial.
  • enums without numeric value should be symbols only, i.e. with no way to extract a number out of them. Example: enum cardinal-directions { NORTH, SOUTH, EAST, WEST} . They could be used as parameters, in switch, etc, but not as named numbers. Internally, they would be implemented as numeric enums.
  • repeated enum qualification in one switch branch may not be needed:
switch (h) 
    {
        case Height.LOW, MEDIUM: // here the second appearance of enum name is omitted
            io.printf("Not high");
            // Implicit break.
        case Height.HIGH:
            io.printf("High");
    }

Token stream / array

Instead of parsing tokens on demand, store tokens in an array. To do this, it's important that expanding the array is cheap so that tokens is cheaply retrieved by id.

Generic declaration

generic foo(x)
{
  case int:
    bar(x);
 case double:
    baz(x)
}

generic foo(float x)
{
  return blurb(x);
}

whitebox testing

This is description of a feature which would depend on the proposed testing infrastructure to work.

Checking function output is easy:

TEST()
{
  verify(add(2, 2) == 4);
}

However, you may want to be sure of few more things:

  • be sure that the function add allocates between 420 and 560 bytes of memory and that it also deallocates the same number of bytes.
  • be sure that the add function opens files foo.txt, then bar.txt and finally baz.txt, in this exact order.
  • be sure that this function doesn't use network.

Is it possible to check such a wide variety of things?

Yes. One would use certain kind of logging. I saw the detailed description fort first time here:
http://akkartik.name/post/tracing-tests

I'd implemented it as a C library, but I was disappointed with the results. Too much hassle for little gain. I came to conclusion, that compiler support is needed to make it usable in practice.

I'll describe the ideal system.


Any function could generate so called traces, strings with (hopefully) useful information.

void* my_allocate(uint n)
{
   ...
   TRACE("my_allocate allocated %u B", n);
   ...
}

TRACE could use printf style of formatting. If debug mode is on (it would be too slow for release) and tests are enabled (it is usable only inside the tests) and some test is actually asking to record this string (I'll explain this later), then the result string ("my_allocate allocated 32 B") is stored somewhere.

If the above conditions are not all met, the compiler will ignore the trace, as if it was never there.

Function could generate multiple traces, or none at all. Less is usually better, for performance and for keeping the code clean from noise.


To reduce the visual noise, compiler could automatically insert invisible traces that a function was invoked.

This innocent empty function:

void foo()
{
}

would secretly turn into:

void foo()
{
  TRACE("foo().my-module-name"); // NOT "my-module-name.foo()"
}

This happens if and only if some test was asking for this information (will explain later) plus debug mode on plus tests on.

If the above conditions are not all met, compiler will not insert anything.

The generated text "foo().my-module-name" will be stored somewhere. Since it is static string, only the pointer to it needs to be kept, not the string itself. This saves memory.

When I tried to write these function-is-called traces manually, it was (1) too much of work, and (2) too much noise. Compiler support is necessary.


By now I covered the supply side.

Generated traces are "consumed" by tests.


This test is not interested in traces. Therefore, when it runs, no traces are stored. (Some flag is set to false, and every TRACE checks that flag.)

TEST() // totally uninterested in traces
{
}

Not storing traces makes the test really fast. Most tests should be like this.


Here I have a test which expressed interest in 2 different traces:

#ifdef DEBUG // traces make sense only in debug mode
TEST()
{
  record-traces("my_allocate allocated", "foo()");
  ...
}
#endif

If some TRACE string starts with "my_allocate allocated" or with "foo()", it will be recorded. By a pure lack, both these strings are generated at some place. The compiler is happy, and will keep both those traces in the code (won't remove them for "inactivity").

What if nobody generates one of these requested strings? Compiler error.


What if I make a mistake?

#ifdef DEBUG 
TEST()
{
  record-traces("qwwertyui");
  ...
}
#endif

No such trace could be ever generated (there's no prefix "qwwertyui" present in any TRACE, neither there's any function whose name starts with "qwwertyui"). The compiler announces error.

This keep the impact of typos pretty down. Compiler support is necessary for this.


I'd previously expressed interest in 2 existing traces, now I run some tested code. Generated demanded traces (and only these, not all possibly generated traces, to keep memory consumption down) are stored somewhere. Let say that I expect string starting with "my_allocate allocated" to be stored once and "foo()" twice. (In other words, I believe there has to be one allocation and then the function foo is being invoked twice.)

Now I want to make sure my assumptions were right.

#ifdef DEBUG 
TEST()
{
  record-traces("my_allocate allocated", "foo()");

  ...
  ... // run code collecting the traces
  ...

  // This will try to find the first recorded trace which starts with this string.
  // If such trace is found, the complete string is returned.
  // If no such trace exists, NULL is returned.
  const char* s = expect_trace("my_allocate allocated");
  if (!s) {
    verify(false | my_allocate() not called at all, I expected allocating 32 B);
  }

  uint n = ... // extract the number of bytes from string "my_allocate allocated 32 B"
  if (n != 32) {
    verify(false | I expected 32 B allocated, got ?n B);
  }

  // Now I expect two "foo().my-module_name" traces to be found
  s = expect_trace("foo()");
  if (!s) {
    verify(false | First trace foo() not found);
  }

  s = expect_trace("foo()");
  if (!s) {
    verify(false | Second trace foo() not found);
  }

  // lets check there is no third foo() call
  s = expect_trace("foo()");
  if (s) {
    verify(false | Third, unexpected trace foo() found);
  }
}
#endif // DEBUG

The compiler will make sure, that all traces asked by the function expect_trace were previously requested by record-traces. This will keep typos a mistakes down.

The compiler could also check, that all requested traces are also later inspected, that one doesn't record something clearly not needed.


When the current test ends, all collected traces are deleted. Compiler ensures this.

The next test may start another trace collection round.


This kind of whitebox testing is not something to be done on massive scale. It is intended for complicated situations.

If one doesn't need it, he won't be paying the price (in performance of in complexity of his tests).

The traces inside functions would be either invisible (most would be added automatically) or limited to most important events (allocations, I/O, networking access). Very ordinary functions should not have explicit traces inside.

String

  • basically a subarray
  • allocate underlying array in various ways
  • conveniences

debugging

What are the current plans or ideas for debugging C3 code?

For what I know, languages transpiled to C have big problem with this. Nim didn't have usable debugger for many years. (I am not able to quickly find out their current situation.)

One relatively easy way would be to implement language interpreter (bytecode or tree walking) in addition to C route. Debugging support could be placed there. This solution would also allow tricks like debugging compile time code (which I am not sure other approaches would manage). On the other hand, calls to C code would be complicated.

Implement !! operator

According to the docs someCall()!! should implicitly return the error if "someCall" returns an error.

Simd types

Possible syntax:

int[<3, 4>] x;
int<[3, 4]> y;
int<3, 4> z;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.