Giter Club home page Giter Club logo

chasm's People

Contributors

cheatercodes avatar cplir-c avatar earthcomputer avatar foundationgames avatar iotabread avatar kroppeb avatar leo40git avatar oroarmor avatar williambl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

chasm's Issues

Create a proper release workflow

A proper release workflow needs to be developed.

Currently, every push to the main branch publishes a -SNAPSHOT version to maven with the current version number.
A manually triggered release action publishes the current version to maven.

This is not ideal for multiple reasons:

  • After a release, the next push to main doesn't necessarily update the version
  • We can't really automate the version increment

Please share your ideas below!

Make `ChasmClassWriter` and co more lenient

Right now, the class writer essentially requires a fully specced node to create the classes.

It would be better if certain fields that are optional can be omitted.
For example, fields currently have 7 nodes:

  • access modifier
  • name
  • descriptor
  • signature
  • value
  • annotations
  • attributes

Only access, name and descriptor are required, and ChasmClassWriter should simply provide defaults for the others if they are missing.

Language Test Platform

We need a test platform for testing the language as well as transformers written with it.

Basically the same as #25, but for the chasm-lang submodule.

Shallow Write Locks

Currently all locks in Chasm are "deep lock".
However, in most cases, shallow locks would be preferred.

Imagine targeting a single instruction in a method. If a transformation applies a write lock to said instruction, all other transformations are not allow to change anything about it.

However, in most cases the intended behavior will be that noone modifies the location of said instruction. Shallow locks provide that option, allowing other transformations to modify childrens of the target, but not the target itself.

Inspect Local Variable Representation

Let's have a look at the following java method:

private void localTest(int parameter1, String parameter2, List<String> parameter3) {
    int local1 = 5;
    System.out.println(local1);
    int local2 = 42;
    System.out.println(local2);
}

The following is the Chasm-Internal representation of said method:

{
	access: 2, 
	name: "localTest", 
	parameters: [{
			type: "I", 
			name: "P0", 
		},{
			type: "Ljava/lang/String;", 
			name: "P1", 
		},{
			type: "Ljava/util/List;", 
			name: "P2", 
		},], 
	returnType: "V", 
	signature: "(ILjava/lang/String;Ljava/util/List<Ljava/lang/String;>;)V", 
	exceptions: [], 
	annotations: [], 
	code: {
		instructions: [{
				label: "L0", 
			},{
				opcode: 8, 
			},{
				opcode: 54, 
				var: "V3", 
			},{
				label: "L1", 
			},{
				opcode: 178, 
				owner: "java/lang/System", 
				name: "out", 
				descriptor: "Ljava/io/PrintStream;", 
			},{
				opcode: 21, 
				var: "V3", 
			},{
				opcode: 182, 
				owner: "java/io/PrintStream", 
				name: "println", 
				descriptor: "(I)V", 
				isInterface: false, 
			},{
				label: "L2", 
			},{
				opcode: 16, 
				operand: 42, 
			},{
				opcode: 54, 
				var: "V12", 
			},{
				label: "L3", 
			},{
				opcode: 178, 
				owner: "java/lang/System", 
				name: "out", 
				descriptor: "Ljava/io/PrintStream;", 
			},{
				opcode: 21, 
				var: "V12", 
			},{
				opcode: 182, 
				owner: "java/io/PrintStream", 
				name: "println", 
				descriptor: "(I)V", 
				isInterface: false, 
			},{
				label: "L4", 
			},{
				opcode: 177, 
			},{
				label: "L5", 
			},], 
		sourceLocals: [{
				name: "this", 
				descriptor: "Lorg/quiltmc/chasm/Main$Inner;", 
				signature: null, 
				start: "L0", 
				end: "L5", 
				index: 0, 
			},{
				name: "parameter1", 
				descriptor: "I", 
				signature: null, 
				start: "L0", 
				end: "L5", 
				index: 1, 
			},{
				name: "parameter2", 
				descriptor: "Ljava/lang/String;", 
				signature: null, 
				start: "L0", 
				end: "L5", 
				index: 2, 
			},{
				name: "parameter3", 
				descriptor: "Ljava/util/List;", 
				signature: "Ljava/util/List<Ljava/lang/String;>;", 
				start: "L0", 
				end: "L5", 
				index: 3, 
			},{
				name: "local1", 
				descriptor: "I", 
				signature: null, 
				start: "L1", 
				end: "L5", 
				index: 4, 
			},{
				name: "local2", 
				descriptor: "I", 
				signature: null, 
				start: "L3", 
				end: "L5", 
				index: 5, 
			},], 
		tryCatchBlocks: [], 
		lineNumbers: [{
				line: 23, 
				label: "L0", 
			},{
				line: 24, 
				label: "L1", 
			},{
				line: 25, 
				label: "L2", 
			},{
				line: 26, 
				label: "L3", 
			},{
				line: 27, 
				label: "L4", 
			},], 
	}, 
}

First, let's observe:

  • The local variable analysis assigns V3 as the name for local1
  • The local variable analysis assigns V12 as the name for local2

However, the debug information provided in sourceLocals is not in any way associated with said locals.
I want to investigate the possibility and usefulness of merging these representations somehow.

One problem is currently that it is very hard to impossible to match a local by it's type, which is very common with mixin.
Another possible problem is duplicate information shared between signature and descriptor.

Please suggest different representations below.

Code Style

We are in desperate need to get some uniform code style, at least for this repository.

Requirements would be:

  • Automated checks possible (ideally via gradle)
  • Integration in Intellij
  • Integration in Eclipse

Checkstyle comes to mind, I think it has good support and should be sufficient.
We don't need anything super specific, but just having standards on general formatting and imports would be good.

Immutable Nodes for Transformers

While transformers should take care to not modify the nodes that are passed to them, we can probably avoid some future bugs by passing immutable nodes to Transformers and Transformations.

There is already an attempt at immutability made, but it is fairly minimal and there is also no easy way to create a mutable node again (if that is something that is needed).

Alternatively, it might make more sense to instead provide some sort of ImmutableNodeView, so that we can provide immutability without copying.

Allow Transformers to Depend on Undefined Transformers

Transformers are allowed to declare ordering dependencies on each other.
However, Chasm currently doens't handle transformers depending on undefined transformers.
The intended behavior is described here:

Transitive Dependencies

Imagine the transformer ids A and C are defined, but B isn't. Given the dependencies

  • A mustRunAfter B
  • B mustRunAfter C

The following should behave equivalently

  • A mustRunAfter C

Missing dependencies

Imagine the transformer ids A and C are defined, but B isn't. Given the dependencies

  • A mustRunAfter B
  • A mustRunAfter C

The following should behave equivalently

  • A mustRunAfter C

Chasm-Lang code generation

Currently, the chasm-lang module allows parsing a file into an expression.
However, there are multiple use-cases for instead generating code from a programatically generating an expression.
Most notably:

  • A remapper parsing an expression, remapping certaing values, and then writing the expression back to file
  • A Chasm frontend generating a transformer

In order to support this, chasm-lang needs a way to create arbitrary expressions programatically and write them to a file.

Change chasm internals to use the `Ast` builder class

Currently, you will see the following type of code everywhere in chasm's internals:

new MapNode(new LinkedHashMap<>())

and other even more verbose code. All this could be cleaned up using the Ast class, for example the above could be rewritten as Ast.emptyMap().

`ChasmClassWriter` cleanup

ChasmClassWriter is a big and messy implementation for turning a node back into a class file.

The class should at least have some functionality moved into methods, and maybe also into other classes.

New Transformer Format

As mentioned in #89, the current format for transformers/transformations has some problems regarding long lasting locks and the effects of transformer ordering. After thinking about these issues for a while, I am now ready to propose a radically different approach to before.

How did we get here?

When Chasm was originally developed (and still called ASMR), a "transformer" was approximately defined as a class that

  • declares dependencies
  • declares "reads"
  • declares "writes"

Based on their dependencies, transformers would then be sorted into rounds. Within each round, the reads and writes of transformers would be sorted to avoid as many conflicts as possible.

While lots of Chasm changed since then, this base structure was kept, although made more declarative in the process. A transformer would now

  • declare dependencies
  • generate transformations (each consisting of reads and a single write)

The behavior is essentially the same as before: Based on their dependencies, transformers are sorted into rounds, and within a round transformations are sorted to avoid conflicts.

One thing that always bothered me was that dependencies are treated before worrying about conflicts. When sorting transformers into rounds, we could simply create one round per transformer and this way avoid all explicit conflicts. Of course this would result in a very high number of silent undetected conflicts and crashes. Therefore, our goal was to sort transformers into as few rounds as possible, to get as much out of our conflict resolution as possible.

Another problem was discovered recently: If a transformer want to see the changes made by a mod, it declares a round dependency on said mod. This however, prevents it from seeing the initial state of the classes. As a workaround, it was suggested to provide access to the initial classes for verification purposes. However, this is a hardcoded special case, and it feels like the wrong approach for Chasm.

A new proposal

Transformations no longer exist

To avoid the problems of having multiple levels of ordering, all ordering is reduced to a single level. Technically transformations still exist, but they are now executed as a group as part of the transformer and not relevant to collision handling.

Transformers specify targets

Since transformations are gone but targets are still needed, targets are now defined on transformer level. "Target" refers to any specific node or slice in the class tree.

Locking is now done on target level

If a target is defined as locking, it will behave as a lock from the moment it was created until the transformer was applied.

Transformer application is atomic

Since there is no sub-ordering of transformers, transformers are now atomic. That's a good thing! This simplifies many things like renaming classes or adding locals. If you need something to not be atomic, then just put it in separate transformers.

Dependencies are specified at target level

This one might feel a bit odd, but the reasoning is simple: In order to declare a target, the target must exist. Therefore, the targets care the most about being applied before or after other transformers.

Some implementation details

In general, the process is now as follows:

  • Transformers are sorted according to the ordering implied by their targets (if possible)
  • Targets are read as soon as all their dependencies were applied

Transformer Format

A transformer has two members:

  • targets
  • apply

target is a map of target_id to target.
Each target contains the following members:

  • after: a list of transformer ids that need to apply before this target is read
  • before: a list of transformer ids that need to apply after this target is read
  • type: Either "read" or "write"
  • lock: The locking behavior of the target
  • target: A function taking in the classes at target read time and returning a slice or node target

lock defines the locking behavior of the target. I'm not sure about the details yet, but it will be something along these lines:

  • none: The target may be modified, as long as it can still be found
  • shallow: The target may not be modified, but its child nodes may
  • full: The target may not be modified at all

apply in the transformer will be run as soon as possible after all locks were read.
The argument to this function will be a map of target_id to the results of reading the respective target.
The return value is a map of target_id to a replacement for the specified target, for all targets with type "write".

Transformer context

When creating a transformer, the following values are provided as "globals":

  • id: contains the id of this transformer
  • transformers: contains the ids of all registered transformers

Example transformer

A very simple transformer is shown here. It makes the class com.example.ExampleClass public.

{
  targets: {
    a: {
      after: [
        "transformer_a"
      ],
      before: [
        "transformer_b"
      ],
      type: "write",
      lock: "none",
      target: classes -> {
        node: classes[c -> c.name = "com/example/ExampleClass"][0].access
      }
    }
  },
  apply: sources -> {
    a: sources.a | 0x0001
  }
}

Label Representation

ASM represent labels as pseudo-instructions. Right now, CHASM instead attaches label information to instructions.

While I personally prefer it this way, it comes with it's own issue: There might be trailing labels in the ASM representation that can't be expressed in CHASM.

Right now this is solved by inserting a trailing NOP instruction at the end of each method. This seems to work fine for now, adn we might also be able to strip this trailing NOP when writing the class file.

However, I feel like this might require additional discussion.

Chasm-Lang Public API

Chasm-Lang currently doesn't have a split between api and internals.
I propose the following structure:

  • ast becomes internal
  • interop gets moved out of chasm-lang as per #49
  • op becomes api, with static functions to instantiate the relevant interfaces
  • visitor and the generated parser/lexer become internal
  • The execution environment (in the base module) becomes internal with a single clean public API class for evaluating expressions
  • Directly Chasm related classes get moved out of the module according to #49

Remapping

Chasm currently has no solution for remapping.
Some proposed solutions are:

A remap object, which gets processed during build

{
	remap: {
		my_class: {
			class: "ex/MyClass"
		},
		my_field: {
			field: "myField",
			descriptor: "F",
			owner: my_class.class
		}
	},
	target_class: classes[c -> c.name = remap.my_class.class],
}

A remap function, which gets evaluated to a remapped version of the object passed in, evaluated during build

{
	my_class: remap({
		class: "ex/MyClass"
	}),
	my_field: remap({
		field: "myField",
		descriptor: "F",
		owner: my_class.class
	}),
	target_class: classes[c -> c.name = my_class.class],
}

For this to work, there needs to be a context in which only the remap object or remap functions will be evaluated, without evaluating the rest of the transformer

String Constant Extraction

The current code makes use of lots of string constants which is error prone.

The constants should be extracted into global constants that can be reused in different classes.
This could also be very usefuls to expose as an api to frontends or the language implementation.

NodePaths don't get adjusted when slices are injected

Currently, SliceManager is responsible for keeping track of slice indices.
When a slice gets applied, all other slice indices targeting the same node get moved in order to still point to the correct target.
However, I forgot to apply this same operation to all NodePaths that contain this node's index.

Transformation and Transformer Locking

Background and Terminology

At the start of Chasm, all Transformers are created, usually read from Chassembly files. They can declare ordering relative to other transformers. According to the specified ordering, Transformers are sorted into (as few as possible!) so-called "Rounds". The rounds are then applied one after the other. Therefore, a Transformer experiences the following relevant steps in its lifetime:

  1. Creation (Start of Chasm)
  2. Application (Start of Round)
  3. Completion (End of Round)
  4. Destruction (End of Chasm)

At the start of a round, during transformer application, a Transformer can create any number of Transformations. A Transformation declares during creation which segments of code it requires read access to, called Sources, as well as one primary Target it will replace on application. The Transformations of all Transformers that are running in the current round are then sorted to ordering implied by the Sources and the Target.

Analog to the Transformer, a Transformation at most lives through the following steps during its lifetime:

  1. Creation (Transformer application)
  2. Application (Replacing Target)
  3. Completion (End of Round)
  4. Destruction (End of Chasm)

Internally, Chasm only differentiates between Target and Sources by their writability. For the remainder of this discussion, Target therefore either refers to the single writeable Target or any one of the read-only Sources.

Locking

The idea of locking is, to allow a Transformation to declare restrictions on which Targets and Sources may be modified by other Transformations. In theory, locking can be performed on different levels:

  • Transformer-level locking
  • Transformation-level locking
    More granular targets are preferable, but a lock may only use lifetime events from its current level.

Affects on Ordering

The fundamental goal of Chasm is to apply as many Transformations as possible without violating any contracts. Some contracts are implied by the specified Targets (write inner before outer), but many of them are defined via locking. Therefore, locking on Transformation-level is a fundamental requirement as it is the primary way to affect ordering.

Locking on Transformer-level would allow the possibility to also affect ordering of Transformers, but it is unclear if this is something that is wanted.

Current Locking

Currently, we only use Transformation-level locking. Additionally, Locks can't outlive a Transformation, so we're limited to the following lock options:

  • Lock from start of round until Transformation application (Lock.Before)
  • Lock from Transformation application until end of round (Lock.After)

Questions

With this background, the following questions require discussion

Is current locking sufficient?

Imagine a mod that has only a single transformer for it's entire functionality. A different mod might modify the same code, silently overwriting the original mod. This is a conflict that would be currently detected, but only if they both run in the same round.

The current solution would be to have the first mod generate an additional transformer running in the very last round, which verifies correct application or crashes otherwise.

Do we want longer lived locking?

This would allow us to create locks that last past the end of the current round. The above example could instead create a lock that last from after application to the very end of Chasm.

Do we want Transformer-level locking?

Declaring locks on Transformer-level would allow us to create earlier locks, e.g. from that start of Chasm until Transformer application. I'm not sure this is sueful in any way, but if someone has a usecase, let me know.

Syntactic sugar for multi-parameter functions

Currently, multi-parameter functions have no explicit support in chassembly, but they can be emulated using functions that return functions. For example, here's a function that adds its two inputs:

{
  add: x -> y -> x + y
}

it can be called like this:

add(2)(3)

Although this works, it makes for a lot of parentheses at call sites. One way to improve the situation would be to add syntactic sugar for applying the function:

add(2, 3)

which would be semantically exactly the same as above.

Others have proposed a similar syntax to declarations (I'm personally not sold on this):

{
  add: (x, y) -> x + y
}

this has the disadvantage that it requires increasing the lookahead in the parser to 2.

If we go ahead with the declaration syntax, one possibility is that this will have the same semantics as the arrow declaration. That is, the following functions would be exactly equivalent:

{
  add1: (x, y) -> z -> x + y + z
  add2: x -> (y, z) -> x + y + z
}

Another option is to have the special declaration syntax mean something semantically different: making partial application impossible. Thus, with the above function declarations:

{
  a: add1(x, y, z), // ok
  b: add1(x, y)(z), // ok
  c: add1(x)(y, z), // error
  d: add1(x)(y)(z), // error
}

Boolean Operators and Short Circuiting

As mentioned in #42, we could merge the operators && and || with & and | respectively.
The reason they are currently split up is because the boolean operator supports short-circuiting.
As @Earthcomputer correctly said, all operators can be short-circuiting, so there's no need for this differentiation.
However, I'm not entirely sure how to implement this properly and safely.

If anyone feels like tackling this problem, please discuss it here before opening a PR.

Refactor and Internalize NodePath

The NodePath is in need of a refactor:
It is esentially just a wrapper of a list, maybe it should instead extend AbstractList?

I also don't want it to be API (for the language implementations), so I'd want to internalize it.
That means that NodeTarget and SliceTarget need to accept a Node and a ListNode instead of the path like they do right now.

This is also a good opportuninty to adress #23:

  • Rename to startsWith
  • Flip the implementation logic (swap this and other in the function)
  • Flip the use logic (swap thisand other at every call site)

Consider redesigning `ChasmProcessor.process()` API

Currently, ChasmProcessor.process() returns all classes after transformation.
ChasmProcessor.process(true) limits this to only return ClassData for classes that have been modified.
However, there's a technicality that the latter can't return classes that have been deleted entirely.
This behavior is unintuitive and not very useful.
Additionally, the latter also comes with a performance benefit in cases where only few classes have been modified (which will likely be the commonc ase), since it doesn't need to recompute unchaged classes.
I want to propose an API change that makes this behavior more obvious.

Instead of returning a List<ClassData>, the method should return something like a List<ClassResult>, where ClassResult is an interface that allows retrieving the name of the class, its Metadata (if present), and its byte representation (if present).
There would then be different implementations:

  • UnmodifiedClass can forward calls to its internal ClassReader
  • ModifiedClass can extract the relevant information from its internal MapNode representing the class
  • DeletedClass should be able to return the class name, but will return null for metadata and class bytes
  • Possibly AddedClass, which would be the same as ModifiedClass but with a different semantic

This can be performant because the methods are evaluated on demand rather than when calling the method.
It also makes it easy to use if you simply want all classes, since you can jsut calll the method on the interface.
And finally, it allows you to skip unmodified classes if that is relevant for your implementation.

Some additional notes:

  • Chasm currently doesn't keep track of the input classes, this needs to change if we want to detect removed classes.
  • ClassData has a similar interface to ClassResult, so maybe it should just be converted to an interface instead.

Use CongoCC

JavaCC21 has been rebranded as CongoCC and, really, existing users of JavaCC21 are strongly encouraged to switch to Congo. That is where all new development will be happening.

One thing you may find appealing is that CongoCC is on Maven Central, since this was some sort of issue regarding JavaCC21.

A more important point, however, is that the cluttered legacy JavaCC syntax is no longer supported in CongoCC. Later versions of JavaCC21 did (or do) have a syntax converter, which you can run via:

 java -jar javacc-full.jar convert MyGrammar

But I thought to give you a converted file for your convenience. The attached file is functionally the same as your older Parser.jj but is entirely in the newer streamlined syntax:

Parser.ccc.txt

It seems that the project builds/tests as before using that file.

Implement intrinsics in chassembly where possible

I propose that we should implement intrinsic functions in chassembly whenever it is possible to do so. Currently, some intrinsic functions (like from_entries) are possible to reimplement in chassembly, while others (like file_bytes) are not. Please note that just because the function is implemented in chassembly, that does not mean that the chassembly implementation is actually run, and that an equivalent Java implementation isn't run instead. The meaning of "intrinsic" is simply that the interpreter is allowed to treat the function as special.

Doing this would have the following advantages:

  • Makes adding new intrinsic functions easier.
    • No new Java code needed.
    • Doesn't need to be added to the formal chassembly specification when it is written.
  • Provides an easy to access reference implementation for intrinsics that are implemented in both chassembly and Java.
  • Expands the scope of intrinsics we can add. Currently we are very hesitant to add new intrinsics, I'm hoping that doing this will allow for more of a standard-library-esque feel to them, without some of the downsides this would have previously meant.
  • 3rd party analysis tools don't need to implement every intrinsic themselves.

Round dependencies

We might require transformers to be run in entirely separate "rounds" to other transformers.

This would mean sorting Transformers rather than Transformations before applying Transformers.
The transformers would then be applied in thei round, with each round receiveng the transformed classes of the previous round.

Transformation sorting would still happen in the round itself.

Degenerify `ValueNode`

Right now ValueNode is unnecessarily generic. There is no benefit to the generics here, in fact they just get in the way, forcing us to do unchecked casts.

Things to do:

  • Change ValueNode<T> to ValueNode, replacing all T with Object
  • Fix all related code occurences
  • Check the return type of getValue wherever it's used

Better Error Messages

In my opinion, the biggest pain point in developing for chasm-lang right now are the bad and not very helpful error messages produced. Since #42, most expressions hold a reference to a ParseTree (though it might be null occasionally). This can be used to print the location in the source file as well as the actual expression text. However, this is currently not used in any exceptions.

A simple improvement would be to print additional information in the thrown exceptions. Alternatively, the system could even support printing multiple errors instead of simply failing at the first.

If you're interested in helping with this issue, please discuss here before opening a PR.

CONTRIBUTING.md

We need a CONTRIBUTING.md file that points out the checkstyle task and other things.

This is probably something I need to do myself, but I'm putting this here so I don't forget.

Split ChasmProcessor

While not the biggest class, ChasmProcessor could probably be split up.
I think that ChasmProcessor should only operate on sets of classes, probably in the form of ClassReaders.
That would mean removing the jar processing from the class, maybe moving into a JarProcessor?
This new class could then also take care of (optionally) removing jar signing.

Node Metadata

In the future, I'd like to be able to provide meaningful eror messages on conflict.

While most conflicts should be recognized during transformation sorting, some might only be detected during application.

In order to provide additional information in error messages, I'd like to attach metadata to nodes. Primarily, this metadata would be the origin of a node (e.g. transformer id). Additionally, this might be the cleaner way to attach path information to a node.

Language Specification

This is a tracking issue for the chasm-lang specification.

The current specification can be found at chasm-lang/doc/Specification.md.

This issue should be referenced by all issues relating to the language specification and will stay open until the language has stabilized.

Write Lock Support

In addition to specifying a target to read or replace, a transformation might also want to specify that a certain section of code should not be modified, before or after application of the transformation.

This issue exists to discuss the requirement for such a system, as well as the implementation.

Additionally, similar read locks might be useful in some cases.

Remove javadoc requirement for internal classes

Checkstyle currently complains about missing javadocs for classes in the internal package.
While adding javadocs is encouraged for internal classes, it is not important and shouldn't cause warnings or errors, which clutter the output.

Proper Transformation Sorting

The current TransformationSorter is not complete and may fail in cases that could actually be resolved.

This sorting needs to be rewritten to allow successfully applying transformers wherever possible.
This will probably require some sort of topological sorting.

Parameter Representation

Parameter information is provided via ASM in different ways. However, CHASM might want to represent them differently.

Ideally I'd like to provide a parameters list for each method, where each entry represents a parameter and their type.

However, there are two issues with this:

  • Assigning parameter annotations to parameters seems to not be clearly defined
  • Type information would be duplicated between the method descriptor and the parameter type information

I'm not sure how to solve this issue, any feedback would be welcome.

Empty slices in maps

So far we have two types of targets that a transformer can replace: node targets and list slice targets. I propose adding a third type of target, a map empty slice target.

Motivation

You can insert into a list by replacing an empty slice in the list. However, there is currently no way to insert into a map without replacing the whole map. Therefore I am proposing this to be able to insert into a map.

This would solve the issue raised by @CheaterCodes here, and make it easier to improve the chasm tree further before release if necessary.

Implementation

An empty map slice in chasm will look something like the following, with a boolean specifying whether we're targetting the entire map or an empty slice within the map.

{
   node: foo.some_map,
   inside: true, // true if this is an empty map slice ("inside" the map), false if it's the whole map (the default).
}

The locking behaviour of an empty map slice would be similar to that of an empty list slice; that is, never conflicting with another empty map slice, and only ever conflicting with locks that target the map itself or any of its parents.

The Chasm implementation will have to be updated to support a third type of slice.

Develop a Test Platform

We need some sort of tests to ensure correctness of the code.
Unfortunately, I'm no good at designing tests, so if anyone has some good ideas, put them here!

Thigs that should be tested:

  • Reading and writing classes
  • Readingand writing jars
  • Using chasm without transformers shouldn't chage any classes
  • Valid transformers that can be applied without conflicts must be able to be applied
  • Valid transformers that conflict must be detected
  • Probably a lot more later

Consider switching to JavaCC21

The chasm-lang module currently uses ANTLR to generate its lexer and parser.
While this is fine in principle, it introduces both an additoinal compile time dependency and a runtime dependency.
Since I expect the chasm-lang module to be used in gradle plugins, having many dependencies is undesirable.
Besides ANTLR, there is currently also a dependency on ASM and Chasm as well, which would be removed as described in #49.
This means that by dropping ANTLR, we could remove all runtime dependencies from the module.

The suggested alternative is JavaCC21.
It works very similar to ANTLR in principle, so the conversion shouldn't be particularly hard.
The advantage is that it generates all reuired classes that are required for runtime.

Note that shadowing ANTLR would be possible, however I am still unsure about licensing behaviour when shadowing libraries.
JavaCC21 on the contrary requires no particular licensing, since the files geneated by it belong to us.

Chassembly: Builder pattern for code generation

Rather than dealing with the verbosity and repetitive-ness of manually constructing AST nodes, something like this could be made available:

MapNode chassemble() {
  return Nodes.mapBuilder()
    .put("foo", Nodes.binary(5, PLUS, 3))
    .put("bar", Nodes.list(Nodes.string("hello"), Nodes.ref("foo"))
  .build();
}

Missing Doc Comments

None of the files in the org.quiltmc.chasm.asm package have method documentation, but the ClassWriter class has some line comments in its methods.
In the org.quiltmc.chasm.transformer package, only the Target interface has method documentation.
In the org.quiltmc.chasm.tree package, only the Node interface has a method doc comment, and it's only one.
In the org.quiltmc.chasm package, the ChasmProcessor class, the SliceManager class, and the TransformationSorter class have line comments in one method, and none of the classes have method doc comments.

I'm making this because I don't know what the path is in this for example, and I thought doc comments would help anyone reading the code.
I could ask more specific questions, but I'm not sure if that would be more helpful.

Local Variable Representation

ASM provides local variable information in multiple different ways:

  • Local debug information
  • Local annotations
  • Stack frames

In order to simplify writing transformers, it would be ideal to represent locals in a less fragmented way.
One possible representation:

{
  name: <string>,
  descriptor: <string>,
  index: <integer>,
  annotations: [ ... ]
  scope: [
    {
      from: <label>
      to: <label>
    },
    [...]
  ]
}

However, I'm unsure if this representation always works. ASM's MethodVisitor::visitLocalVariableAnnotation suggests that a variable could have different indices in different scopes. We could maybe circumvent this by representing local variable indices with strings rather than integers, and converting during parsing and writing.

Restructure modules

The current structure of the Chasm project consists of two modules:

  • chasm contains the main and other closely related logic
  • chasm-lang contains the Chasm-Lang implementation as well as interoperability with Chasm
    Because of this structure, currently chasm-lang holds a dependency on chasm

Since most projects including Chasm in the future will likely also require Chasm-Lang, it would make sense to make chasm depend on chasm-lang instead. In addition, chasm-lang will likely often be used without chasm, for example in chasm-frontends looking to generate Chasm-Lang code, as well as in remappers, needing to parse it. Therefore I would like to propose that Chasm-Lang will be stripped of all Chasm related functions in order to remove the dependency.

Note that this, in combination with #48 would mean that Chasm-Lang is dependency free.

This leaves us with two options:

  • Add Chasm-Lang as a dependency to Chasm, moving all the interoperability into the main module
  • Add a new module Chasm-Util repsonsible for interop and other utility functions like scanning the classpath or handling jars.
    Which one would be preferred?

Instrinsic Functions

This issue keeps track of missing intrinsic functions.

If you want to implement one of these functions, feel free to open a PR. For an example on how to implement these, check the Intrinsics class, specifically the len() intrinsic.

  • map({list: <list>, function: <lambda>}) -> <list>: Maps the elements of list using function and return them in a new list.
  • flatten(<list>) -> <list>: Flattens a list of lists into a new list. Don't flatten recursive.

Expect this list to change as functions are implemented or new ones are required.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.