cadixdev / lorenz Goto Github PK

View Code? Open in Web Editor NEW

41.0 41.0 7.0 984 KB

A representation of JVM deobfuscation mappings for Java.

License: MIT License

Java 94.87% Groovy 3.94% Kotlin 1.19%

lorenz's People

Contributors

Stargazers

Watchers

Forkers

phase lunarclient zml2008 magusmc lexmanos isabella232 bleachdev

lorenz's Issues

Kotlin DSL

Add a DSL for Kotlin, similar to the one introduced for Groovy in #23.

Why does parameter indexing work the way it does?

Perhaps this is a legacy from what was decided with #2, but I thought I'd ask anyway.

The indexing system Lorenz uses matches that one the JDT uses, indexing parameters by encountering order from 0. This contrasts with MCP, Enigma and Tiny V2 which all index parameters from 1 for non-static methods (to leave this as 0) and respect doubles and longs taking up two LVT slots (skipping the next index for them). Whilst the different might be quite subtle, it can make a bit of a mess:

void methodA(int a, int b, int c);

static methodB(int a, long b, int c) {
}

Lorenz is an index out for methodA which is a little annoying but resolvable for MCP at least as it has a static method list for which any other method can have 1 taken off the parameter name index. With methodB Lorenz starts out consistently with the others, but then c has index 2 in Lorenz and 3 with the mapping formats. This is resolvable in Enigma and Tiny V2 as they both have the method descriptors so can take off the double/long gains appropriately.

If you combine both however you find a situation where neither mapping solution can work with Lorenz without additional outside information. MCP knows about static methods but not method descriptors, whilst Enigma and Tiny V2 know the opposite. The JDT however can know about both, as IMethodBindings provide whether they are static and the types of any parameters.

Of course this is presuming following mapping formats was/is important, as opposed to just representing parameters as they're declared, given that it does put some onus on whatever is using Lorenz to also have to account for a method being static or parameters being doubles/longs. Guess the crux of the matter is whether this was an active consideration/decision for indexing to work the way it does or just a side effect of what is (to be fair) the most obvious way to do indexing from the surface.

Support blank comments in IO readers

https://github.com/canyonmodded/River/blob/60eb9843d3435fd3bae47f7de666f78f6a2ace3a/server.tsrg#L1-L3

https://github.com/PigeonMC/BetaMappings/blob/cf4a75f586183465a634b2ba3e9f72c94817611c/serverToClientObf.tsrg#L1-L4

I can't find many examples of comments, but it wouldn't hurt to ignore lines starting with # in the CSRG/TSRG/SRG readers.

AbstractClassMappingImpl.complete(...) infers wrong method name

Context

I have three classes:

abstract class A {
  private void foo() {
    [...]
  }
}

interface B {
  void foo();
}

class C extends A implements B {
  public void foo() {
    [...]
  }
}

I want to remap these classes using the following mapping set:

A.foo()V -> A.methodA()V
B.foo()V -> B.methodB()V

Problem

Expected result:
C.foo()V is overriding B.foo() and should therefore be remapped to C.methodB()V

Actual result:
C.foo()V gets remapped to C.methodA()V even though A.foo()V is private and can't be overriden.

Note that the erronous result is also ordering dependent, so it might not occur in all situations.

Origin

The reason for this happening are the changes in commit 2a7c0b6.
The changes were made in response to issue CadixDev/Mercury#14.

The argument made in the code comments doesn't make sense to me:

Check if there are any methods here that override the return type of a parent method.

While the issue in Mercury is correct in saying that

Java allows overriding methods with different return types, if the type is a subclass of the OG return type.

, the same is not applicable to the JVM, as the spec says:

An instance method mC can override another instance method mA iff all of the following are true:

mC has the same name and descriptor as mA.
[...]

Preamble

Lorenz has been in development now for over 2 years - in which time it has gone through a number of iterations and been used in countless projects. It has been refined to such an art that the readers in Lorenz feel galaxies apart from their counterparts in Nocturne. Despite the many leaps that Lorenz has made - no 1.x.x release has been made. I am making it my mission this year to address the lack of a Lorenz 1.x, and am working on a number of pre-1.0.0 interim releases.

Timeline

Please note that this is only a rough estimate of what the timeline will be like - I have a very busy year and may well not stick to this timeline particularly well. Having said that I have been generous on the later releases to attempt to compensate for that

Required Features

There are a number of features that are required for a 1.0.0 release, though the eventually feature list may well be larger than this.

Support for both reading and writing in the SRG format (including the two variants: TSRG and CSRG)
Support for both reading and writing in the JAM format
Support for both reading and writing in the Enigma format

Format Specifics

There are a number of things that need to exist in order to support the formats, and further complexities to support multiple, differing, formats side-by-side.

SRG

The SRG and variant formats support the following mapping types, with the given data:

Class Mappings -> obfuscated name, de-obfuscated name
Field Mappings -> obfuscated name, de-obfuscated name
Method Mappings -> obfuscated name, obfuscated signature, de-obfuscated name

JAM + Enigma

Whereas both the JAM and Enigma formats support the following mapping types, with the given data (note the extra inclusions in bold):

Class Mappings -> obfuscated name, de-obfuscated name
Field Mappings -> obfuscated name, obfuscated signature, de-obfuscated name
Method Mappings -> obfuscated name, obfuscated signature, de-obfuscated name
Method Parameter Mappings -> obfuscated name, de-obfuscated name, index

These differences make it difficult to support the two mapping formats - especially in cases where it is desired to convert an SRG mappings file to an Enigma mappings file, for example. There are a few solutions that could be used:

Introduce FieldTypeProviders that would find the type of fields without an associated type
Introduce bytecode analysis to Lorenz

I won’t get into the arguments for and against the above solutions here, other than mention that I am swaying towards the former.

Incorporating into Nocturne

One key element of the Lorenz 1.0.0 release is making sure it is in such a state that it could be incorporated into Nocturne to replace the ageing mappings model that exists. I won’t reiterate any specific formats that need be supported here - however it is worth mentioning that Nocturne has a number of hooks within the mappings model. There are a few solutions that could be used:

Convert the mappings model to be composed of interfaces, providing a default implementation
Introduce an event bus throughout the model to allow for tools such as Nocturne to listen on specific events

Again, I will not comment on arguments for or against here - but I am swaying towards the former option.

Closing Note

There is no set-in-stone feature list for each pre-1.0.0 release currently, as I will select a number of features I wish to address in that development-cycle at the start of the cycle alongside a GitHub issue to track the progress made.

This specification exists to inform myself and others of what needs to be done for Lorenz 1.0.0, however the pre-1.0.0 releases will better inform me of specifics that I have not yet had to consider. Keep an eye out for my further, more-technical specifications for those releases.

Gradle task `uploadArchives` is run implicitly

The default Gradle tasks run the uploadArchives task implicitly, without it being specifically defined. This makes the project difficult to build on systems without the required configuration.

Edit: It also doesn't seem to define a jar task. Unless I misunderstand its intention, your Gradle buildfile is very broken.

Flush isn't called on standard MappingFormats

The only way to force the entire mapping to print is to use TextMappingFormat#createWriter(Writer) and flushing the writer yourself.
Effected formats: (All of them)

CSrgMappingFormat
TSrgMappingFormat
XSrgMappingFormat
SrgMappingFormat

Reproducible test cases

MappingSet mappings = MappingSet.create();
mappings.createTopLevelClassMapping("A", "B");
MappingFormats.SRG.createWriter(System.out).write(mappings);
MappingFormats.CSRG.createWriter(System.out).write(mappings);
MappingFormats.TSRG.createWriter(System.out).write(mappings);
MappingFormats.XSRG.createWriter(System.out).write(mappings);

MappingSet#deobfuscate is called twice for return types

The following method is only used when remapping the return type of method signatures. It looks to me like the intention of the method was to remap if the given Type is a FieldType, and simply return the given type if it's not a FieldType (i.e. Void). But for some reason it calls deobfuscate a second time on the result of the first deobfuscate call. Again, this method is only used for return types, for parameter types the FieldType-accepting deobfuscate method is called directly.

Lorenz/lorenz/src/main/java/org/cadixdev/lorenz/MappingSet.java

Lines 311 to 316 in 6851a5e

 public Type deobfuscate(final Type type) { 

 if (type instanceof FieldType) { 

 return this.deobfuscate(this.deobfuscate((FieldType) type)); 

 } 

 return type; 

 }

With certain mapping sets, this causes deobfuscate to give invalid results. This was observed when trying to call reverse on a MappingSet containing class mappings like the following:

BinaryHeap -> Path
Path -> PathEntity

any mappings which had PathEntity as the return type were changed to have BinaryHeap as the return type in the reversed mappings.

Using my own reverse and deobfuscate methods which call deobfuscate(FieldType) only once fixed the issue for me, but given I'm not sure why it's remapping twice, I'm hesitant to call it the "proper" fix, so I haven't opened a PR.

Binary Formats

Currently Lorenz only supports text-based mappings formats, this is a problem as it prevents formats such as kin from being supported. These formats are well worth supporting as they are far smaller in size.

I have begun some preliminary work on supporting binary formats, though I will both need to investigate further for things such as compression and complete the job.

Classes with a `$` in sources have names and references unintentionally changed

So I discovered this while using mercury to update CraftFabric (https://github.com/tr7zw/CraftFabric) to the latest yarn mappings.

So one example class was class CraftScoreboard$2 implements Function<net.minecraft.scoreboard.Team, Team>

Now i agree this is definitely some very odd naming, but I'll continue. So I pushed the sources through mercury and the outputted result ends up as class 2 implements...

Furthermore some constructors were outputted incorrectly, such as the following:

CraftScoreboard$2(CraftScoreboard var1) {
    this.this$0 = var1;
}

where it becomes

(CraftScoreboard var1) {
    this.this$0 = var1;
}

Also some references were unintentionally changed, such as new CraftScoreboard$2(this) -> new 2(this)

Faulty canInherit calls on anonymous classes

Caused by https://github.com/CadixDev/Lorenz/blob/develop/lorenz/src/main/java/org/cadixdev/lorenz/impl/model/AbstractClassMappingImpl.java#L217-L221.

Identified through anomalous output.

k k
	b LOGGER
k$a k$a
	a LINUX
	b SOLARIS
	c WINDOWS
	d OSX
	e UNKNOWN
	f $VALUES
k$a$1 k$a$1
k$a$2 k$a$2

Create MappingSets from MappingSetModelFactory

Lorenz basically forces all MappingSets to be an instance of MappingSetImpl as there is no way to construct a MappingSet directly from a MappingSetModelFactory
Also, there are many occurrences where MappingSet.create() is called even though a MappingSetModelFactory is available

Proposed solution:

Add a createMappingSet method to MappingSetModelFactory
Make MappingSet.create(MappingSetModelFactory) use the model factory
Call MappingSet.create(MappingSetModelFactory) or MappingSetModelFactory.createMappingSet() where possible (the three calls of create() in the MappingSet class don't use the existing MappingSetModelFactory)

In-memory extension data

Effectively the same as kashike implemented in heart.

There should be extension data (provided by lorenz) for manually toggling a mapping as de-obfuscated - specifically useful in Survey's EnumNameMapper (in case they are weren't obfuscated, prevents another mapper giving them another name thinking they weren't deobf'd).

Composition of incomplete mappings leads to weird behavior in `complete`

Not sure whether this is arguably more a Mercury issue, but all the code in question is in Lorenz, so posting it here.

If you have a mapping set ab which exhaustively lists all mappings including overridden methods, and a mapping set bc which is intended to chain from state b but doesn't list overridden methods instead relying on complete to infer them, the composition of the mappings, ab.merge(bc) will never complete those methods, instead leaving them in their original state as in ab.

For example, if we have class AImpl extends A where both have method a() and the mapping sets are as follows

ab:

A.a() -> B.b()
AImpl.a() -> BImpl.b()

bc:

B.b() -> C.c()

the result is

A.a() -> C.c()
AImpl.a() -> BImpl.b()

and completion will never properly complete the mapping.

Not sure what the solution here is. In my project using the library, I used a workaround where I modified complete to use put instead of putIfAbsent and basically treat the highest mapping in the class hierarchy as canonical, which worked for my purposes, but ideally you'd want the most complete mapping to dominate, regardless of where it is in the hierarchy.

While making more complete mappings "infectious" in a way which climbs up the hierarchy would require significant restructuring of the code, and may not really be necessary as it's a bit of an edge case, for incorporation into the upstream I think a solution which at very least will not override a more complete mapping with a less complete one is necessary. (Though, that's a tradeoff in and of itself, in that arguably it's not ideal for application of mappings to change the semantics of the code, and it may be preferable for a less complete mapping to override a more complete one rather than for application of the mappings to make a method no longer override a parent's method.)

The parenthesized concern aside, such a solution would more or less necessitate keeping track of the history of the mapping mergers for each mapping, and inheriting the parent mapping only if it's "longer" (i.e. more composition steps). It also raises questions regarding complicated topologies that can emerge when matching multiple mapping sets.

E.g. if we have classes class AImpl extends A where both have method a(), and the user merges ab.merge(bc).merge(ad) resulting in a mapping set with both

A.a() -> B.b() -> C.c()
AImpl.a() -> DImpl.d()

Should A.a() -> C.c() override AImpl.a() -> DImpl.d()? That chain is longer, but on the other hand it doesn't actually continue the latter mapping.

XSRG Support

Support the XSRG format, which is the same as SRG but with field types: FD: oCls/oFld oDesc mCls/mFld mDesc.

Duplicate field mappings do not repair signatures

This is hard to explain, so have a chart:
This is Lorenz 0.5.4's current output, when merging int->obf (has signatures) and obf->srg (no signatures)

Left	Right	Output
name: A->B signature: O	name: B->C signature: null	name: A->C signature: O
name: A->A signature: O	name: A->A signature: null	name: A->A signature: null

I would expect the 2nd row, which is when the mappings are "loosely duplicated", to still copy over the signature name from the left mappings.

Merge Strategies

There are a number of scenarios where multiple mapping sets may be present, and users may want to merge them. I can think of a variety of actual situations where this could be handy.

Obf -> Deobf (Chained mappings), obfuscated values = first.obfName, de-obfuscated values = second.deobfName, matched by first.deobfName == second.obfName
Deobf -> Deobf, obfuscated values = first.deobfName, de-obfuscated values = second.deobfName, matched by first.obfName == second.obfName

	public Type deobfuscate(final Type type) {
	if (type instanceof FieldType) {
	return this.deobfuscate(this.deobfuscate((FieldType) type));
	}
	return type;
	}