hlorenzi / customasm Goto Github PK

💻 An assembler for custom, user-defined instruction sets! https://hlorenzi.github.io/customasm/web/

License: Apache License 2.0

Rust 82.11% PowerShell 0.36% Batchfile 0.02% Assembly 17.51%

assembly assembly-language assembler instruction-set instruction-set-architecture processor-architecture microprocessor bytecode bytecode-compiler compiler

customasm's Introduction

Hey, there! Check out some of my other projects outside of GitHub:

Lorenzi's Jisho: Japanese-English dictionary, with a completely custom Single-Page App front-end library. I've put in a lot of effort for this library, working on good page switching and responsive design -- I think it turned out beautiful!
Game Boards: Online registry for team matches and tournament lounges of competitive games, like Mario Kart. It still uses an old front-end architecture... It's probably going to be the next thing I'll revamp!
...Some other projects can be found on my homepage!

Feel free to contact me:

Twitter @hlorenzi_

customasm's People

Contributors

Stargazers

Watchers

customasm's Issues

Support for character (ASCII) constants

Since adding a character LCD to my project, I often find myself needing to define a character constant. Currently, I need to do something like this:

ASCII_SPACE = 32

Would be really nice to be able to do this:

ASCII_SPACE = ' '

Optionally, to use a character as an argument in an instruction and have it substiture an ascii value: eg.

mova ' '

"Unknown Variable" error on Instruction names which are not meant to be variables

I got a few instructions which use 3 Registers as a sort of index with offset. Registers C and D being combined into CD to form the base index, and Register B being added as the offset.

so my instruction to load from and address pointed by CD + B to some other address is:

LD (addr), (CDB)

and i also got some instructions that load from one address to another directly, which would be:

LD (addr), (addr)

and this must be why the assembler thinks that CDB is a variable, because it sees the second instruction as the correct one instead of checking if there is one with a more correct/relevant name.

and i cannot think of an elegant way to get rid of this.

i tried removing the brackets, i tried using different brackets, i tried adding characters inside CDB to make it not count as a variable, but nothing worked to tell the assembler "this is not a variable, it's text, search an instruction with exactly this text"

Errors do not go on stderr

Assembler errors do not go to stderr, which makes capturing output via -p needlessly problematic.

Weird Concatenation bug on release 0.9.1 and latest dev

Creating an instruction such as this one:
"do {value} -> 0x0 @ value[19:0]" results in a strange bug when value is 0x8000:
ff 80 00
This bug is very inconsistent. Formatting the instruction this way: "do {value} -> 0x1 @ value[19:0]" results in the bug disappearing with 0x8000, however doing this: "do 0x8000 | 0x1000" results in this output:
1f 90 00 when it should be 10 90 00
Experiments with different operations made the bug disappear as well. Placing the "or" operation in a variable and using that results in the same bug.

I really enjoy this program, and I would really love to use it. This bug however is giving me a headache trying to try to work around it. Everywhere I turn it decides to come up again. Here's a list of things I've tried:
adjusting the #bits value in cpudef
using different numbers with value
splitting value[20:0] into value[20:16] @ value[15:0]
using different operators in the instruction definition, like | and ^

all of these resulted in the last 8 bits of the number being ff or 1f

have the #cpudef seperate from the ASM code

i got an idea

basically, instead of having to define the CPU for every ASM program you make why not have the entire CPU Defintion in a seperate file that just gets "called" by the program file

something like a file with a specific name, let's say "NES_CPU" which only contains the "#cpudef" part
then in the actual Assembly program instead of having to define the CPU you just use some command to call the file that has the CPU Defintion in it. something like "#cpu " in this case (if the NES_CPU file is in the same directory) it would be "#cpu NES_CPU"

this way if you want to make a new program instead of having to copy the entire CPU Defintion you just use the #cpu command to call whatever CPU you want to use for the program.

of couse this means the compiler also needs to know where the CPU defintion is located but that is already defined in the #cpu command

Different length for instructions and data?

Hi, I'm working on an 8 bit computer with 16 bit (2 byte) instructions. Those 2 bytes are stored at the same address, so the issue I'm facing is that the pc from the assembler gets increased by 2, when it should only be increased by 1. This causes jumps to a label to not work.

The solution would be to use #bits 16 but that messes up the data memory. Is there any way to use #bits 16 on the program bank and #bits 8 on the data bank?

I currently have a workaround (accept an address as u9 instead of u8 and ignore the last bit), but I'd like to know if there is a native solution.

Thank you for this great program.

Runtime Error: unreachable executed

when assembling the below code:

#cpudef
{
    #bits 8
	byte {b} -> b[7:0]
    nop -> 0x00
	set {value}, {reg} -> 0x01 @ value[7:0] @ reg[7:0]
	jmp {address} -> 0x02 @ address[15:0]
	add {val} -> 0x03 @ val[7:0]
	dbg -> 0x04
	sub {val} -> 0x05 @ val[7:0]
	smem {pos},{val} -> 0x06 @ pos[15:0] @ val[7:0]
	rmem {pos},{reg} -> 0x07 @ pos[15:0] @ reg[7:0]
	memr {pos},{reg} -> 0x08 @ pos[15:0] @ reg[7:0]
}

nop

jmp .loop

.string:
	byte 'h'
	byte 'e'
	byte 'l'
	byte 'l'
	byte 'o'
	
.start:
	byte .string
	
.loop:
	rmem .start, 0
	dbg
	set .start, 0
	add 1
	memr .start, 0

it shows Runtime Error: unreachable executed

First declared instruction without parameter isn't recognized when using directive in `#ruledef`

Hi!

I have another weird bug to report: when I declare instructions without parameters (and sometimes even with), the said instruction isn't recognized. For instance:

#bits 8

#ruledef
{
    #labelalign 4

    ex   => 0x00
    cpy  => 0x01
}

main:
	ex

Doesn't compile. But if I just invert the declaration of ex and cpy:

#bits 8

#ruledef
{
#labelalign 4

cpy  => 0x01
ex   => 0x00

}

main:
ex

Now it compiles perfectly.

Everything works fine if I remove the #labelalign, so I guess it's the culprit, but I don't understand why. Also, I can put anything instead of #labelalign such as #blabla, this gives the same result - the #blabla is not recognized as invalid.

I think it may be closely related to #55, so feel free to close this issue if it is.

instruction asserts not selecting a previous instruction

The following code should work in all cases:

  jal {long} -> {
    assert((pc & 0xff00) != (long & 0xff00))
    0xef @ long[7:0] @ long[15:8]
  }
  jal {short} -> {
    assert((pc & 0xff00) == (short & 0xff00))
    0xdf @ short[7:0]
  }

It's checking to see if the current program counter address is in the same page (upper 8-bits) as the value to jump to. If it is, it can save a byte (and a cycle) and select the in-page jump. If not, it needs to select a long jump and emit an extra byte.

This code only seems to ever select the second instruction, it won't select the first instruction. If I reverse the order, again it only seems to select the second instruction, if the assertion fails, it produces this:

error: failed to resolve instruction
 100 |     cmp #DUMP_CMD 
 101 |     bne _endthen_10 
 102 |       jal dump 
     |       ^^^^^^^^  
 103 |  
 104 |       jmp _begin_for_08 
     cpudef.s:533:5 533:46:
     error: assertion failed
      531 |   } 
      532 |   jal {short} -> { 
      533 |     assert((pc & 0xff00) == (short & 0xff00)) 
          |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  
      534 |     0xdf @ short[7:0] 
      535 |   }

The assertion should succeed on the other instruction but it doesn't want to select that for some reason. I haven't dived into the code to find out why.

Alignment feature -- Feature request

Hi,

I am very pleased with your program. However, I need an advice for the following matter.

If I set the #align to 8 bits, and need to have all my labels assembled to addresses dividable by 2, or 4 (like some ARM or MIPS processors), how can I achieve this?
For example:
label1:
#d8 1
label2:
#d8 2

If the label1 starts on the 0x10 address, then the label2 would start at the 0x11, but I would like to have it on the address 0x12 (so it would be on the even address, not the odd address).

So far, I was only able to do it manually, by inserting #d8 0 tokens to align the label to the proper address. Is there a way to achieve this automatically using the latest release?

If I align everything to 16 bits, then all my addresses are actually half the real addresses, because then everything is counted in words, not bytes ( I cannot use #d8 actually). I need to align everything to 8 bits.

Best regards,
Milan

Instruction size vs #bits

My little FPGA processor that I'm working on has 16 bit wide instructions but it can address individual bytes. This mostly works with "#bits 8" but it causes some confusion eg the instructions stream should always consist of 16 bit word quantities. Maybe there needs to be two quantities, the addressable width and the instruction width? The instruction width would influence the width of the outputted data.

All in all, this project is terrific and I look forward to seeing it progress!

I'm hoping I can have a crack at another output format, a VHDL array, instead of being lazy and munging another output format with perl or similar. But I need to learn some Rust first.....

Disassembler option?

One thing i would really like is the ability to disassemble existing Binary files.
it could be done with the Command prompt like assembling but would require more parameters
for example:

customasm -d <bin file to disassemble> <CPU file to describe the Instruction set> <starting address of the Program> <starting address inside the file where it starts to disassemble from (optional)> <end address in the file where it stops disassembling (optional)>

example: customasm -d test.bin 6502.cpu 0xE000

and the format for the file generated could be similar to other Disassemblers:

<Address> <Data> <some space> <Labels> <Instructions>
example, this 6502 Disassembler:

E001   A2 FF      LDX #$FF
E003   9A         TXS
E004   A9 00      LDA #$00
E006   AA         TAX
E007   A8         TAY

I know this format already kinda exists for Assembling, but without showing labels (which is kinda a shame) and the formatting sometimes breaks if you have a lot in a single line... plus it doesn't use uppercase only HEX, which is just Heresy.

obviously it would be a lot of work and i'm asking for a lot.
but hey maybe it could be a long term goal.

and one last thing, the current version added some kinb of type specifications for values. it would be awesome if those were explained more in the documentation... i didn't see them there atleast.

Web version doesn't work on edge

Clicking the assemble button has no effect on MS Edge. The console reports an error "TextEncoder" is not defined.

Listing Output

Great tool! (Kept me from having to write my own assembler for my CPU. :) )

Would it be hard to add an "annotated hex listing output"? I looked a little at the output formats, and it didn't look like any of them have this sort of support.

I'm thinking something like this:

### Input:

#addr 0x80
lda #10
.loop:
sbc #1
out A
bnz .loop ; loop til zero

### Output:

; File: <input>:0
; #addr 0x80
0x80  01 0A    ; lda #10
; .loop:
0x82  0B 01    ; sbc #1
0x84  2C       ; out A
0x85  27 FA    ; bnz .loop ; loop til zero

Basically, the output would be something you might get if you ran the output of the tool back through a disassembler. Except that the tool can cheat, since it could know exactly what line corresponds to each group of bytes that was emitted.

If you're watching a simulated CPU, it would be very easy to follow this listing by following your CPU's PC. It would also be massively useful for just telling if your program is assembling to what you expect. (Emitting proper instructions and so on.)

I wouldn't mind contributing some code to this, but I'm not very familiar with Rust (yet), and I'm even less familiar with the architecture of this tool, so I don't know if the assembler even retains enough information for this feature to be feasible. If it doesn't keep some sort of mapping between the output bytes and the source line, I can imagine this might be a pretty invasive change.

Maybe this would work best as an additional output? That way the assembler can just write to the "listing log" as it goes along, and doesn't have to remember a bunch of extra information to produce an output.

Traditional macro assembler functions

At the end of the day, there's no escape from them.
It'd be nice if #define, #ifdef, #undef, and #ifundef were implemented, at least. Fullblown macros would be ideal, never-the-less. They'd be highly useful in larger projects.

"Expected Expression" Error

i'm back yet again with another confusing error! hope you missed me!

I'm currently working on building my own 65C02, and in order to test instructions i need code to test them with.

i wrote this simple piece to test the BBS/BBR instructions:

START:
	BBR 0,0,.TESTR
	HLT
	
	.TESTR:
	BBS 0,0,.TESTS
	HLT
	
	.TESTS:
	LDA #0x69
HLT

but i'm getting this error:

testing2.C02C:35:7 31:7:
error: expected expression
  29 |
  30 | START:
  31 |   BBR 0 ,0,.TESTR
     |        ^
  32 |   HLT
  33 |

testing2.C02C:35:7 35:7:
error: expected expression
  33 |
  34 |   .TESTR:
  35 |   BBS 0 ,0,.TESTS
     |        ^
  36 |   HLT
  37 |

and this is the part in the CPU file:

BBR {val},{src},{src1} -> {val}[3:0] @ 0xF[3:0] @ src[7:0] @ ({src1} - pc)[7:0]

BBS {val},{src},{src1} -> ({val} + 8)[3:0] @ 0xF[3:0] @ src[7:0] @ ({src1} - pc)[7:0]

also while we're at it, why are all of the "#" keywords or commands like "#d", "#str", etc. case sensitive?
and lastly is it possible to have a "#bankdef" fill all empty or unused bytes with a specific number instead of of just 0's?

Tokens as part of Instructions

I feel like i've done this already....
anyways, I'm currently writing the CPU file for the 68k (a painful project to be honest)
and it has an conditional Branch instruction where the condition used in the instruction is defined by the 2 letters following the instruction name (without space between them).

so the base instruction is: Bcc
where cc is condition. so BGE for example would be a Branch on Greater or Equal

so i thought i could just use tokens for that instead of having to write each instruction indivitually:

#tokendef CC
{
	CC = 0b0100
	CS = 0b0101
	EQ = 0b0111
	GE = 0b1100
	GT = 0b1110
	HI = 0b0010
	LE = 0b1111
	LS = 0b0011
	LT = 0b1101
	MI = 0b1011
	NE = 0b0110
	PL = 0b1010
	VC = 0b1000
	VS = 0b1001
}

	B{con:CC} {src}		-> {assert(src <= 0xFF), 0b0110[3:0] @ {con}[3:0] @ {src}[7:0]}
	B{con:CC} {src}		-> {assert(src >  0xFF), 0b0110[3:0] @ {con}[3:0] @ 0b00000000[7:0] @ {src}[15:0]}

but for this doesn't seem work when i try it, for example with: BVS 0xF3
but it does works when i use B VS 0xF3 instead... which is a problem

on a completely unrelated note,
Is it somehow possible to have a program start at a specific address without everything before being filled with 0's?
like when you write code for the 6502 your Program is likely going to be at the end of the address range, so you need to start it quite far down.
using "#addr" though is a problem as when you then generate the .BIN file it's filled with 0's until the coee starts, which is annoying as it has to be edited out manually in order to be used.

Basically i'm asking for something like the ORG statement from a lot of other assemblers that tells the assembler that the code starts at that address but doesn't fill the rest of the address space from 0 to there.

Assert selects wrong opcode depending on label position

#cpudef
{
	jmp {address} -> { assert(address <=   0xFF), 0x01 @ address[7:0]  }
	jmp {address} -> { assert(address <= 0xFFFF), 0x02 @ address[15:0] }
}

bla:
jmp bla

Output: 01 00

jmp bla
bla:

Output: 02 00 03

This does work fine if you pass in a number like 0xAB or 0xBEEF. Apparently this issue happens throughout the program, anything above the current instruction gets the short version, anything below it gets the long version.

I love this program by the way, thank you so much for making it! 😁

Add symbol output

Hi, Thanks for the great product! I have been using it for couple of years.
I think that you don't have the following feature: symbol list. This means that when you assemble the output file, you could have an additional option, for example: -sym symbols.txt
That option would generate the symbol list (symbols.txt file) containing addresses of all labels in the output assembled file. Something like this:
0x00000000 start:
0x000000fa main:
etc.
That feature would help me implement debugger that would show labels instead of addresses.
I hope I was clear enough for you.
Best regards,
Milan

Possible bug

Hi,
I am using this kind of instruction description:
add r0 -> 4'0x0 @ 4'0x1 @ 4'0x2 @ 4'0x3
add r1 -> 4'0x1 @ 4'0x1 @ 4'0x2 @ 4'0x3

Until now, it worked perfectly. However, when I added this description:
add r2 -> 4'0x0 @ 4'0x0 @ 4'0x8 @ 4'0x9
It produced: 0077, instead of 0089

When I described the instruction this way:
add r2 -> 4'0x0 @ 4'0x0 @ 8'0x89
It produced the correct code: 0089

Also, if I use this:
add r2 -> 4'0x2 @ 4'0x1 @ 4'0x8 @ 4'0x9
It produces the correct code: 2189

If I use this:
add r2 -> 4'0x0 @ 4'0x1 @ 4'0x8 @ 4'0x9
It produces the correct code: 0189

Only when I use:
add r2 -> 4'0x0 @ 4'0x0 @ 4'0x8 @ 4'0x9
it produces the wrong code: 0077

I have only three lines in the instruction description:
add r0 -> 4'0x0 @ 4'0x1 @ 4'0x2 @ 4'0x3
add r1 -> 4'0x1 @ 4'0x1 @ 4'0x2 @ 4'0x3
add r2 -> 4'0x0 @ 4'0x0 @ 4'0x8 @ 4'0x9

Am I missing something? I have created hundreds of lines the this way and I have accidentally discovered this problem just today.

Best regards,
Milan

thread main panic on bank overlap check

First of all: Thank you very much for that great assembler!

There is a thread main panic, during bank overlapping check, if the first bank has no outp set:

A simple program to recreate the error looks like this:
#include "6502.cpu"

#bankdef "data"
{ #addr 0x0000, #size 0x2000 }

#bankdef "pgm"
{ #addr 0x2000, #size 0xC000, #outp 0 }

#bank "pgm"
P: #d8 0

#bank "data"
Q: #res 1

The stack backtrace produced is this:
stackbacktrace.txt

The error disappears if you define the bank without outp after the bank with outp.

I believe the error is in file assembler.rs function check_bank_overlap. The check being done on line 244 for self.bankdefs[j] should be repeated at line 249 for self.bankdefs[i].

Missing "address out of bank range" check on instructions addresses

The error "address out of bank range check" is not triggered when an instruction goes out of bank range.
The program is being assembled and the instructions are placed out of the bank boundaries (causing several problems in my case, of course).
The check is being done for labels or data directives but not for instructions.

I already submitted a PR ( #37 ) and created two new unit tests, check them out:

test("test -> 0x12", "#bankdef \"hello\" { #addr 0, #size 3, #outp 0 } \n #res 1 \n test \n test",  Pass((4, "001212")));

test("test -> 0x12", "#bankdef \"hello\" { #addr 0, #size 2, #outp 0 } \n #res 1 \n test \n test",  Fail(("asm", 4, "out of bank range")));

Unclear License

There is no license file. It's unclear if it can be distributed as part of something else.

Custom register names -- Feature request

Hello,
I am developing a custom cpu with the following syntax:
mov r0, r1 -> 4'0x1 @ 4'0x0 @ 4'0x0 @ 4'0x1
mov r3, r2 -> 4'0x2 @ 4'0x3 @ 4'0x0 @ 4'0x1

Is it possible to make a rule like this:
mov r{d}, r{s} -> s[3:0] @ d[3:0] @ 4'0x0 @ 4'0x1

I have eight registers and this would make my syntax file significantly shorter.
Best regards,
Milan

Usage as a library

Hi,

I'm currently developping a virtual machine library that uses custom instructions, which means a custom assembly language.

I've used CustomASM for a while now using the CLI and I really enjoy the possibility of the projects (alignment, labels, arithmetic, etc.), but I'd like to integrate it to my library. Sadly, when I looked at your files, it doesn't seem you expose enough functions to enable concrete usage as a library :/

Do you have any plan to enable such usage? It'd be really great to be able to do something like:

let src: &'static str = "<<< asm code here >>>";
let hex: Vec<u8> = customasm::assemble_to_hex(src).unwrap();

println!("Hex: {}", hex);

Thanks in advance for your answer :)

!128 calculated incorrectly

For some reason, !128 is "00000000 01111111" when it should be "11111111 01111111".

Interestingly, !256 is "11111110 11111111" and !64 is "11111111 10111111".

I did manage to fix it by making bigint_not look like this:

fn bigint_not(x: BigInt) -> BigInt
{
	bigint_bitmanipulate(x, BigInt::zero(), |a, _b| !a)
}

But my rust skills are pretty much nil and I am sure there is a more elegant solution, otherwise I would cook up a PR for you.

"potentially ambiguous token after parameter" error [MC68000]

while working on my 68k CPU file, which is getting more and more stressful, i came across this:

error: potentially ambiguous token after parameter; try using a separating `,`
 144 |
 145 |   ;  Dn <- Dn + (D16/32)
 146 |   ADD.B {dest:D_REGS}, {src}.W                      -> 0b1101[3:0] @ {dest}[2:0] @ 0b000[2:0] @ 0b111[2:0] @ 0b000[2:0] @ {src}[15:0]
     |                             ^

this is a problem because that is just how the 68k works.
0x00000000.W is interpreted as a word (16 bit), and 0x00000000.L is interpreted as a long (32 bit)

here the snippet from the CPU file:

;	Dn <- Dn + (D16/32)
	ADD.B {dest:D_REGS}, {src}.W			-> 0b1101[3:0] @ {dest}[2:0] @ 0b000[2:0] @ 0b111[2:0] @ 0b000[2:0] @ {src}[15:0]
	ADD.W {dest:D_REGS}, {src}.W			-> 0b1101[3:0] @ {dest}[2:0] @ 0b001[2:0] @ 0b111[2:0] @ 0b000[2:0] @ {src}[15:0]
	ADD.L {dest:D_REGS}, {src}.W			-> 0b1101[3:0] @ {dest}[2:0] @ 0b010[2:0] @ 0b111[2:0] @ 0b000[2:0] @ {src}[15:0]
	ADD.B {dest:D_REGS}, {src}.L			-> 0b1101[3:0] @ {dest}[2:0] @ 0b000[2:0] @ 0b111[2:0] @ 0b001[2:0] @ {src}[31:0]
	ADD.W {dest:D_REGS}, {src}.L			-> 0b1101[3:0] @ {dest}[2:0] @ 0b001[2:0] @ 0b111[2:0] @ 0b001[2:0] @ {src}[31:0]
	ADD.L {dest:D_REGS}, {src}.L			-> 0b1101[3:0] @ {dest}[2:0] @ 0b010[2:0] @ 0b111[2:0] @ 0b001[2:0] @ {src}[31:0]
	
	ADD.B {dest:D_REGS}, {src:s16}			-> 0b1101[3:0] @ {dest}[2:0] @ 0b000[2:0] @ 0b111[2:0] @ 0b000[2:0] @ {src}[15:0]
	ADD.W {dest:D_REGS}, {src:s16}			-> 0b1101[3:0] @ {dest}[2:0] @ 0b001[2:0] @ 0b111[2:0] @ 0b000[2:0] @ {src}[15:0]
	ADD.L {dest:D_REGS}, {src:s16}			-> 0b1101[3:0] @ {dest}[2:0] @ 0b010[2:0] @ 0b111[2:0] @ 0b000[2:0] @ {src}[15:0]
	ADD.B {dest:D_REGS}, {src:i32}			-> 0b1101[3:0] @ {dest}[2:0] @ 0b000[2:0] @ 0b111[2:0] @ 0b001[2:0] @ {src}[31:0]
	ADD.W {dest:D_REGS}, {src:i32}			-> 0b1101[3:0] @ {dest}[2:0] @ 0b001[2:0] @ 0b111[2:0] @ 0b001[2:0] @ {src}[31:0]
	ADD.L {dest:D_REGS}, {src:i32}			-> 0b1101[3:0] @ {dest}[2:0] @ 0b010[2:0] @ 0b111[2:0] @ 0b001[2:0] @ {src}[31:0]

the top 6 instructions are the same as the bottom 6, except that the size of the address is forced by the programmer (either .W(ord) or .L(ong))
the bottom 6 automatically use the best address size, if the address fits inside a 16 bit signed integer it uses the .W(ord) addressing mode, if it doesn't fit it uses the .L(ong) addressing mode.
(atleast i hope instruction pritory works from top to bottom and that i'm using the s16 and i32 things correctly)

currently my only real way around this is to comment out the forced address sizes and only have the automatically deciding ones. technically the forced ones aren't even nessesary so i don't know if this is worth "fixing".

Endianness directive?

It's not currently possible to specify the endianness of the target architecture, which could lead to confusion for datatypes larger than it's #bits size.

Add C source file output format

Hi, this is the tool I was looking for. Thanks. Great for custom FPGA processors.

Could you please make a --format "c" which creates a c source file
which contains an array with the binary?

This would be make it more easy to integrate the binary into xilinx sdk or altera.

Regards
Thomas

Feature Request: Disassembly

Would it be possible to add the ability to disassemble compiled bytecode using a ruleset?

stack overflow error when trying to assemble files with recursive include directives

steps to reproduce:

create file a.asm with contents #include "b.asm"
create file b.asm with contents #include "a.asm"
execute customasm a.asm

output of customasm a.asm:

customasm v0.10.6
assembling `a.asm`...

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)

I think the assembler should terminate gracefully instead of crashing. (maybe when an include depth limit is reached?)
The error also occurs when a file includes itself, maybe this should't be allowed in the first place?

Specifying literals size

Hi there,

I'm using CustomASM for quite a bit of time now and I found it really enjoyable to use.
Still, I have one problem that is quite problematic to me: it's not possible to specify the maximum size of literals.

To understand the problem, let's consider the following ASM definition:

#cpudef
{
    #bits 8
    
    example {value} -> 0x10 @ 0x00 @ 0x00 @ value[7:0]
}

As oyu can see, the example instruction only uses 1 byte of its only operand.
Now let's consider we use this instruction like this:

main:
  example 0xAB
  example 0xABCD

This code will produce the following output:

 outp | addr | data

  0:0 |    0 |                ; main:
  0:0 |    0 | 10 00 00 ab    ; example 0xAB
  4:0 |    4 | 10 00 00 cd    ; example 0xABCD

The problem is we lose here the most significant byte of the operand in the second call. This is an error that can happen, but after compiling if we do not read the produced assembly we'll just be stuck with a bug that's hard to debug.

So I think it would be really useful to specify the maximum size of a literal, and throw a compilation error if the provided one exceeds this size. Something like this for instance:

#cpudef
{
    #bits 8
    
    example {value{1}} -> 0x10 @ 0x00 @ 0x00 @ value[7:0]
}

Or whatever syntax you may choose. In the case the literal is higher than 0xFF, an error would be thrown to indicate the literal is too big.

I don't know if this is complicated to implement though.

Error "Instruction size did not converge after iterations" [v0.11.1]

sadly the new update isn't perfect... (the code assembles perfectly with v0.10.6)
here the complete output:

error: instruction size did not converge after iterations
 --> Testing2.R816:12:3:
 10 |
 11 |   .LOOP:
 12 |     LD R0, (X)
    |     ^^^^^^^^^^
 13 |     LD (SCRN_START, X), R0
 14 |     INC XL

here the CPU file: https://pastebin.com/RUUadTQK
and the code i tried to assemble: https://pastebin.com/uakEHJjY

[v0.10.4] "thread 'main' panicked at 'at partial', src\syntax\parser.rs:89:6" Error

strangely been getting these Rust Errors instead of regular Assembler erros like "no matching instruction"

no idea if i just got worse at programming that it makes the Assembler crash, or the Assembler got some issue.

here the file i tried to assble and the CPU file:

CPU FILE

PROGRAM

No way to sanely represent highly-CISC instruction sets

Some instruction sets, like that of the DEC VAX, or even x86-64, are currently not representable in customasm without a insanely large cpudef. This could be solved in a few ways, the most doable of them being a way to describe a operand mode as a pattern, so, for example, the instructions of the VAX, which have 26 possible operand modes in any particular operand spot, could be represented as so:

#opdef op_byte
{
   ${lit} -> 0b11 @ lit[0:5]
   {reg: gpr} -> 0x5 @ {reg}
   ({reg: gpr}) -> 0x6 @ {reg}
   -({reg: gpr}) -> 0x7 @ {reg}
   ({reg: gpr})+ -> 0x8 @ {reg}
   #{imm: i8} -> 0x8F @ {imm}
   @({reg: gpr}+) -> 0x9 @ {reg}
   @#{imm: i8} -> 0x9F @ {imm}
   b^{disp: s8}({reg: gpr}) -> 0xA @ {gpr} @ {disp}
   ...
}
...
   addb2 {a: op_byte}, {o: op_byte} -> {
      0x80 @ a @ o
   }
   addb3 {a: op_byte}, {b: op_byte}, {o: op_byte} -> {
      0x81 @ a @ b @ o
   }
   addw2 {a: op_word}, {b: op_word}
   addw3 {a: op_word}, {b: op_word}, {o: op_word}
   addl2 {a: op_longword}, {b: op_longword}
...

Of course, this still wouldn't work (as the VAX has variable width operands, syntax that's probably incompatible with customasm, and floating point support.), but it'd be a step closer.

Add the ability to access symbols that are part of subsymbols

Example

;Access to another file, this would allow more complex organization
mov !STRINGS.__some_interesting_string, r48

Specifically allow the dot to be used to access sublabels

STRINGS:
.__some_interesting_string:
#str "I am interesting\n\0"

And for example allow scoping of labels so that higher labels are accessable from within sublabels

"pattern clashes with a previous instruction pattern" error

I'm stupid again and cannot see why this error occours.

https://pastebin.com/0zQz353f

it says that with line 131, if i remove that it will say the same thing for the instruction below. and so on. if i remove all "REG <- REG" instructions it throws out that error again for line 205

Donation link?

Hi there,

I've been using your library, CustomASM, for a while now. I'm currently making a complete virtual machine system which allows me to write more complex programs a lot faster.

I've requested a few features in the past and they have been implemented very quickly, and I see you're still very involved in the project as the recent release of the v0.11 shows, coming with a lot faster processing speed for what I've seen.

So I'd like to thank you for all your amazing work, but sadly I didn't see any donation link nor github sponsor system on your repository.

I opened this issue both because you don't have a public email adress so I couldn't contact you privately, and I think other developers who are using your library right now may be interested in donating a bit to support this library as well.

Anyway, tell me if you have a Patreon / Github Sponsor program / or anything else, and thanks again for your work on this great project ;)

Disabling `pc` variable

Hi,

I'm currently using CustomAsm for a tiny virtual machine and I recently ran into a problem about the usage of pc.

My assembly language has a register named pc where the address of the next instruction is stored. This register can be accessed directly, but they do not provide the expected result when used with CustomAsm.

Let's take for instance this code:

#cpudef
{
    #bits 8
    
	#tokendef reg {
		pc = 0xAB
	}
	
    ld {reg}, {short} -> 0x01 @ reg[7:0] @ short[15:8] @ short[7:0]
}

#addr 0x100

main:
  ld pc, 0xF0
  ld pc, 0xF0

The expected result is:

100:0 |  100 | 01 ab 00 f0    ; ld pc, 0xF0
104:0 |  104 | 01 ab 00 f0    ; ld pc, 0xF0

But the one CustomAsm provides is:

100:0 |  100 | 01 00 00 f0    ; ld pc, 0xF0
104:0 |  104 | 01 04 00 f0    ; ld pc, 0xF0

As pc is replaced by the address of the current instruction.

So I'd like to know if there is by change any way to either disable or rename this variable? I think it is really useful in programs so it would be great if we'd be able to still use it, but under another name (like for instance $pc or something).

Thanks in advance for your answer :)

Feature Request: Multiline comments

Can you add support for /* ... */ comments in asm blocks?

Altera MIF and Intel Hex format

This tool is helpful for people who write custom CPUs with custom ISAs on FPGAs. It would be helpful if the tool would also output files in Intel HEX or MIF format

Feature Request: Treat strings as byte arrays.

I would like to be able to do something like this and have it compile properly:

ascii_message:
        #d8 "Hello, World!", 0 ; 8-bit ASCII characters
unicode_message:
        #d16 "Hello, World!, 0 ; 16-bit unicode characters

Issue with `#labelalign`

Hi!

I've recently stepped against a weird bug with #labelalign: it seems that labels aren't aligned correctly anymore, but also that the directive isn't always recognized. Here is an example:

#ruledef
{
	test => 0x01 @ 0x02 @ 0x03 @ 0x04
}

a:
	test
b:
	test

This works fine and produce the expected result. But now if I add a #labelalign directive:

#ruledef
{
	#labelalign 16
	test => 0x01 @ 0x02 @ 0x03 @ 0x04
}

a:
	test
b:
	test

This code doesn't compile anymore. I've also tried moving the directive outside the #ruledef, but I get the same result. Strangely, I don't have this problem with the (huge) declaration file I use in my current project with this exact same syntax, so I don't know what's causing it.

The other problem is that this directive has 0 effect on the produced code. Even with the label working, when I write #labelalign 100000000000 it doesn't align anything :/:

I don't know where the bug comes from, but it's definitely a weird one. Do you have any idea or what's causing it?

Thread 'main' panicked when parent of a symbol is missing

I've been testing the new symbol layers in 0.11.0 release and I encountered this error:
thread 'main' panicked at 'index 2 out of range for slice of length 1', src/asm/symbol.rs:133:64

I looked at the code and it seems the error occurs when the parent of a label can't be found. This code causes the crash:

GLOBAL_LABEL:

    JMP ..local_label
    

..local_label:

But neither of those do:

GLOBAL_LABEL:
.aux_label:
    JMP ..local_label
    

..local_label:

.GLOBAL_LABEL:

    JMP ..local_label
    

..local_label:

Relative Addressing Labels

At the moment label are basically just a definition of the address they are at I believe. Have you considered adding relative addressing labels? Some instructions in the ISA I am working on such as branches use relative addressing so labels don't work atm. Maybe this is already a thing and I have just missed it?

Feature request: assembling to object files and linking

It would be interesting to compile to object files and linking. The SDCC code could be a good starting point.
I used customasm in my little project. I'm able to use 128-bit constants!
https://github.com/physnoct/softmicro

Feature request: macro assembler

First: awesome project! This is so helpful for writing terse assemblers.

@tchebb and I are writing a ruleset for the UM (IFCP 2006) and we would like to be able to define macros, for example:

#ruledef {
  loadi {reg}, {val} => 13`4 @ reg`3 @ val
  zero {reg} => loadi reg, 0
}

But this does not seem to be supported.

Incorrect Assembly of certain numbers

#align 24
JNZ {check_addr}, {branch_addr} -> 6'0x04 @ check_addr[4:0] @ 5'0 @ branch_addr[7:0]

will produce f***** whatever the input when it should be 1*****. By changing the above to

#align 24
JNZ {check_addr}, {branch_addr} -> 6'0x04 @ 18'0x00

it works correctly. Seems to be something to do with the alignment and how it concatenates bits together.

Strings as paramaters

I wasn't able to find an example of how one might go about doing something like this:

#ruledef
{
    pushn {n: u8}, {data: str}
}

pushn 4, "test"

I saw mention of the #d directive in the migration guide, but not of how it would be used in context.

Move multiple defintion problem

My processor has a "push multiple registers" opcode, a bit like MOVEM on the 68K. I'm struggling to represent it, since though I've defined a token:

I can't then do:

    popmulti r1+r2,(r7)

Is there a solution to this, without defining all 256 combinations. :-)

hlorenzi / customasm Goto Github PK

customasm's Introduction

Hey, there! Check out some of my other projects outside of GitHub:

Feel free to contact me:

customasm's People

Contributors

Stargazers

Watchers

Forkers

customasm's Issues

Recommend Projects

Recommend Topics

Recommend Org