cadixdev / lorenz Goto Github PK
View Code? Open in Web Editor NEWA representation of JVM deobfuscation mappings for Java.
License: MIT License
A representation of JVM deobfuscation mappings for Java.
License: MIT License
Add a DSL for Kotlin, similar to the one introduced for Groovy in #23.
Perhaps this is a legacy from what was decided with #2, but I thought I'd ask anyway.
The indexing system Lorenz uses matches that one the JDT uses, indexing parameters by encountering order from 0. This contrasts with MCP, Enigma and Tiny V2 which all index parameters from 1 for non-static methods (to leave this
as 0) and respect doubles and longs taking up two LVT slots (skipping the next index for them). Whilst the different might be quite subtle, it can make a bit of a mess:
void methodA(int a, int b, int c);
static methodB(int a, long b, int c) {
}
Lorenz is an index out for methodA
which is a little annoying but resolvable for MCP at least as it has a static method list for which any other method can have 1 taken off the parameter name index. With methodB
Lorenz starts out consistently with the others, but then c
has index 2 in Lorenz and 3 with the mapping formats. This is resolvable in Enigma and Tiny V2 as they both have the method descriptors so can take off the double/long gains appropriately.
If you combine both however you find a situation where neither mapping solution can work with Lorenz without additional outside information. MCP knows about static methods but not method descriptors, whilst Enigma and Tiny V2 know the opposite. The JDT however can know about both, as IMethodBinding
s provide whether they are static and the types of any parameters.
Of course this is presuming following mapping formats was/is important, as opposed to just representing parameters as they're declared, given that it does put some onus on whatever is using Lorenz to also have to account for a method being static or parameters being doubles/longs. Guess the crux of the matter is whether this was an active consideration/decision for indexing to work the way it does or just a side effect of what is (to be fair) the most obvious way to do indexing from the surface.
I can't find many examples of comments, but it wouldn't hurt to ignore lines starting with #
in the CSRG/TSRG/SRG readers.
I have three classes:
abstract class A {
private void foo() {
[...]
}
}
interface B {
void foo();
}
class C extends A implements B {
public void foo() {
[...]
}
}
I want to remap these classes using the following mapping set:
A.foo()V -> A.methodA()V
B.foo()V -> B.methodB()V
Expected result:
C.foo()V
is overriding B.foo()
and should therefore be remapped to C.methodB()V
Actual result:
C.foo()V
gets remapped to C.methodA()V
even though A.foo()V
is private and can't be overriden.
Note that the erronous result is also ordering dependent, so it might not occur in all situations.
The reason for this happening are the changes in commit 2a7c0b6.
The changes were made in response to issue CadixDev/Mercury#14.
The argument made in the code comments doesn't make sense to me:
Check if there are any methods here that override the return type of a parent method.
While the issue in Mercury is correct in saying that
Java allows overriding methods with different return types, if the type is a subclass of the OG return type.
, the same is not applicable to the JVM, as the spec says:
An instance method mC can override another instance method mA iff all of the following are true:
- mC has the same name and descriptor as mA.
[...]
The minimum fix would be to add accessibility checks to the erronous code, preventing it from inheriting a name from a private method. However, keep in mind that the JVM requires strictly identical method descriptors, so this solution could still cause issues (A method String foo()
can not override the method Object foo()
in the JVM, and inferring the names that way is still erronous).
My preferred fix would be to revert the referenced commit in Lorenz and instead fix the issue in a tool specifically made for source remapping if possible (e.g. Mercury).
If separating asm remapping from source remapping is too difficult, maybe there could be a flag of some sort to change that behaviour.
When remapping a field that shares its name with a field in the parent class, the parent's mapping is wrongly used instead. (Depends on the order that the classes are processed also). I believe I have narrowed it down to AbstractClassMappingImpl.complete
Example code:
public class LorenzTest {
static class Parent { int f; }
static class Child extends Parent { int f; }
private LorenzRemapper remapper;
@Before
public void setup() {
MappingSet mappings = MappingSet.create();
mappings.getOrCreateClassMapping("LorenzTest$Parent").createFieldMapping("f", "parentField");
mappings.getOrCreateClassMapping("LorenzTest$Child").createFieldMapping("f", "childField");
remapper = new LorenzRemapper(mappings, new ReflectionInheritanceProvider(getClass().getClassLoader()));
}
@Test
public void testHiddenField() {
assertEquals(remapper.mapFieldName("LorenzTest$Parent", "f", "I"), "parentField");
assertEquals(remapper.mapFieldName("LorenzTest$Child", "f", "I"), "childField"); // fails
}
@Test
public void testHiddenField2() {
assertEquals(remapper.mapFieldName("LorenzTest$Child", "f", "I"), "childField");
assertEquals(remapper.mapFieldName("LorenzTest$Parent", "f", "I"), "parentField");
}
}
Just curious, is this project still relevant now that Fabric's mapping-io
library exists?
Lorenz has been in development now for over 2 years - in which time it has gone through a number of iterations and been used in countless projects. It has been refined to such an art that the readers in Lorenz feel galaxies apart from their counterparts in Nocturne. Despite the many leaps that Lorenz has made - no 1.x.x release has been made. I am making it my mission this year to address the lack of a Lorenz 1.x, and am working on a number of pre-1.0.0 interim releases.
Please note that this is only a rough estimate of what the timeline will be like - I have a very busy year and may well not stick to this timeline particularly well. Having said that I have been generous on the later releases to attempt to compensate for that
There are a number of features that are required for a 1.0.0 release, though the eventually feature list may well be larger than this.
There are a number of things that need to exist in order to support the formats, and further complexities to support multiple, differing, formats side-by-side.
The SRG and variant formats support the following mapping types, with the given data:
Whereas both the JAM and Enigma formats support the following mapping types, with the given data (note the extra inclusions in bold):
These differences make it difficult to support the two mapping formats - especially in cases where it is desired to convert an SRG mappings file to an Enigma mappings file, for example. There are a few solutions that could be used:
I won’t get into the arguments for and against the above solutions here, other than mention that I am swaying towards the former.
One key element of the Lorenz 1.0.0 release is making sure it is in such a state that it could be incorporated into Nocturne to replace the ageing mappings model that exists. I won’t reiterate any specific formats that need be supported here - however it is worth mentioning that Nocturne has a number of hooks within the mappings model. There are a few solutions that could be used:
Again, I will not comment on arguments for or against here - but I am swaying towards the former option.
There is no set-in-stone feature list for each pre-1.0.0 release currently, as I will select a number of features I wish to address in that development-cycle at the start of the cycle alongside a GitHub issue to track the progress made.
This specification exists to inform myself and others of what needs to be done for Lorenz 1.0.0, however the pre-1.0.0 releases will better inform me of specifics that I have not yet had to consider. Keep an eye out for my further, more-technical specifications for those releases.
The default Gradle tasks run the uploadArchives
task implicitly, without it being specifically defined. This makes the project difficult to build on systems without the required configuration.
Edit: It also doesn't seem to define a jar
task. Unless I misunderstand its intention, your Gradle buildfile is very broken.
The only way to force the entire mapping to print is to use TextMappingFormat#createWriter(Writer)
and flushing the writer yourself.
Effected formats: (All of them)
CSrgMappingFormat
TSrgMappingFormat
XSrgMappingFormat
SrgMappingFormat
MappingSet mappings = MappingSet.create();
mappings.createTopLevelClassMapping("A", "B");
MappingFormats.SRG.createWriter(System.out).write(mappings);
MappingFormats.CSRG.createWriter(System.out).write(mappings);
MappingFormats.TSRG.createWriter(System.out).write(mappings);
MappingFormats.XSRG.createWriter(System.out).write(mappings);
The following method is only used when remapping the return type of method signatures. It looks to me like the intention of the method was to remap if the given Type
is a FieldType
, and simply return the given type if it's not a FieldType
(i.e. Void). But for some reason it calls deobfuscate
a second time on the result of the first deobfuscate
call. Again, this method is only used for return types, for parameter types the FieldType
-accepting deobfuscate
method is called directly.
Lorenz/lorenz/src/main/java/org/cadixdev/lorenz/MappingSet.java
Lines 311 to 316 in 6851a5e
With certain mapping sets, this causes deobfuscate
to give invalid results. This was observed when trying to call reverse
on a MappingSet
containing class mappings like the following:
BinaryHeap -> Path
Path -> PathEntity
any mappings which had PathEntity
as the return type were changed to have BinaryHeap
as the return type in the reversed mappings.
Using my own reverse and deobfuscate methods which call deobfuscate(FieldType)
only once fixed the issue for me, but given I'm not sure why it's remapping twice, I'm hesitant to call it the "proper" fix, so I haven't opened a PR.
Currently Lorenz only supports text-based mappings formats, this is a problem as it prevents formats such as kin from being supported. These formats are well worth supporting as they are far smaller in size.
I have begun some preliminary work on supporting binary formats, though I will both need to investigate further for things such as compression and complete the job.
So I discovered this while using mercury to update CraftFabric (https://github.com/tr7zw/CraftFabric) to the latest yarn mappings.
So one example class was class CraftScoreboard$2 implements Function<net.minecraft.scoreboard.Team, Team>
Now i agree this is definitely some very odd naming, but I'll continue. So I pushed the sources through mercury and the outputted result ends up as class 2 implements...
Furthermore some constructors were outputted incorrectly, such as the following:
CraftScoreboard$2(CraftScoreboard var1) {
this.this$0 = var1;
}
where it becomes
(CraftScoreboard var1) {
this.this$0 = var1;
}
Also some references were unintentionally changed, such as new CraftScoreboard$2(this)
-> new 2(this)
Identified through anomalous output.
k k
b LOGGER
k$a k$a
a LINUX
b SOLARIS
c WINDOWS
d OSX
e UNKNOWN
f $VALUES
k$a$1 k$a$1
k$a$2 k$a$2
Lorenz basically forces all MappingSet
s to be an instance of MappingSetImpl
as there is no way to construct a MappingSet
directly from a MappingSetModelFactory
Also, there are many occurrences where MappingSet.create()
is called even though a MappingSetModelFactory
is available
Proposed solution:
createMappingSet
method to MappingSetModelFactoryMappingSet.create(MappingSetModelFactory)
use the model factoryMappingSet.create(MappingSetModelFactory)
or MappingSetModelFactory.createMappingSet()
where possible (the three calls of create()
in the MappingSet
class don't use the existing MappingSetModelFactory
)Effectively the same as kashike implemented in heart.
There should be extension data (provided by lorenz) for manually toggling a mapping as de-obfuscated - specifically useful in Survey's EnumNameMapper (in case they are weren't obfuscated, prevents another mapper giving them another name thinking they weren't deobf'd).
Not sure whether this is arguably more a Mercury issue, but all the code in question is in Lorenz, so posting it here.
If you have a mapping set ab
which exhaustively lists all mappings including overridden methods, and a mapping set bc
which is intended to chain from state b
but doesn't list overridden methods instead relying on complete
to infer them, the composition of the mappings, ab.merge(bc)
will never complete those methods, instead leaving them in their original state as in ab
.
For example, if we have class AImpl extends A
where both have method a()
and the mapping sets are as follows
ab:
A.a() -> B.b()
AImpl.a() -> BImpl.b()
bc:
B.b() -> C.c()
the result is
A.a() -> C.c()
AImpl.a() -> BImpl.b()
and completion will never properly complete the mapping.
Not sure what the solution here is. In my project using the library, I used a workaround where I modified complete to use put
instead of putIfAbsent
and basically treat the highest mapping in the class hierarchy as canonical, which worked for my purposes, but ideally you'd want the most complete mapping to dominate, regardless of where it is in the hierarchy.
While making more complete mappings "infectious" in a way which climbs up the hierarchy would require significant restructuring of the code, and may not really be necessary as it's a bit of an edge case, for incorporation into the upstream I think a solution which at very least will not override a more complete mapping with a less complete one is necessary. (Though, that's a tradeoff in and of itself, in that arguably it's not ideal for application of mappings to change the semantics of the code, and it may be preferable for a less complete mapping to override a more complete one rather than for application of the mappings to make a method no longer override a parent's method.)
The parenthesized concern aside, such a solution would more or less necessitate keeping track of the history of the mapping mergers for each mapping, and inheriting the parent mapping only if it's "longer" (i.e. more composition steps). It also raises questions regarding complicated topologies that can emerge when matching multiple mapping sets.
E.g. if we have classes class AImpl extends A
where both have method a()
, and the user merges ab.merge(bc).merge(ad)
resulting in a mapping set with both
A.a() -> B.b() -> C.c()
AImpl.a() -> DImpl.d()
Should A.a() -> C.c()
override AImpl.a() -> DImpl.d()
? That chain is longer, but on the other hand it doesn't actually continue the latter mapping.
Support the XSRG format, which is the same as SRG but with field types: FD: oCls/oFld oDesc mCls/mFld mDesc
.
This is hard to explain, so have a chart:
This is Lorenz 0.5.4's current output, when merging int->obf (has signatures) and obf->srg (no signatures)
Left | Right | Output |
---|---|---|
name: A->B signature: O |
name: B->C signature: null |
name: A->C signature: O |
name: A->A signature: O |
name: A->A signature: null |
name: A->A signature: null |
I would expect the 2nd row, which is when the mappings are "loosely duplicated", to still copy over the signature name from the left mappings.
There are a number of scenarios where multiple mapping sets may be present, and users may want to merge them. I can think of a variety of actual situations where this could be handy.
first.obfName
, de-obfuscated values = second.deobfName
, matched by first.deobfName == second.obfName
first.deobfName
, de-obfuscated values = second.deobfName
, matched by first.obfName == second.obfName
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.