rgrig / barista Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 827 KB

fork of Barista, an OCaml library for handling Java bytecode

Home Page: http://barista.x9c.fr/

License: GNU Lesser General Public License v3.0

OCaml 96.26% Haskell 3.22% Makefile 0.52%

barista's People

Contributors

Stargazers

Watchers

barista's Issues

lightweight parsing

It would speed up topl a lot if we could parse only signatures.

treatment of new in SymbExe

The type inferer wrongly identifies which objects are initialized by INVOKESPECIAL. It is possible, for example, to execute the same NEW twice before executing a corresponding INVOKESPECIAL.

optimize LDC into ICONST_<n>, ... when possible

Topl was doing this. But, it is the job of Barista.

symbolic execution is too lenient

Any code that passes the type inference in Barista should pass JVM's type checking. See issue 12 of topl for a counterexample.

Type inference should use the types from Section 4.10.1.1 of the JVM7 spec. For example, the following should be illegal.

Put an uninitialized reference in an array of Object.
Disobey the class hierarchy.

support java 8 flag combinations

Barista fails on interfaces with default methods. The function AccessFlag.check_method_flags should implement what S4.6 of JVM8 spec says (or whatever section it is in JVM9).

size-agnostic symbolic execution

Make the symbolic execution size-agnostic. This should be much easier now that locals are size-agnostic in the high-level representation.

It is somewhat risky to preserve unknown attributes. They could contain, for example, references inside the constant pool that become wrong. It might be safer (1) to warn when an unknown attribute is decoded, and (2) drop it.

long and double in constant_pool

bincheck should count Longs in the constant pool as 2 entries
see 14f7cfe

Decode all attributes

There are several Coder.decode_attr_ functions that are unimplemented. In particular, processing Tomcat7 fails because Coder.decode_attr_bootstrap_methods is unimplemented.

polymorphic variants

Reduce the use of polymorphic variants.

Pros of PV: you don't need to decide to which module they belong, you can do cool stuff with subtyping.
Cons of PV: you get less help from the type system, they are one of the more complicated features of OCaml

What prompted this: the data type t for constants in (HighInstruction.LDC t).

mutual recursion

At the moment, encode and encode_code in HighAttribute are mutually recursive. For decoding we broke up the recursion, and we might want to do the same for encoding.

rename modules to High.Class, High.Instruction, and so on

I would like to put all code for the high-level representation of the bytecode in a module High that has submodules Class, Instruction, Field, Attribute, Method, and so on.

At the moment, we have both the old high-level representation (Class, Instruction, Attribute, ....) and the new high-level representation (HighClass and its submodules HighInstruction, HighAttribute, ...) In a few instances I could not understand OCaml's name resolution rules when one has both module A and module B.A, so I think it's a bad idea to have both Instruction and High.Instruction. In other words, the renaming that I propose here should be done only after we get rid of the old high-level representation altogether.

argument--local correspondence

There should be a high-level way to deal with the argument--local correspondence in clients of Barista.

Example of the problem: If the arguments are [Class Foo, long, int], then the locals are [0,1,3], because long occupies 2 locals.

the type of ConstantValue_attribute

The ConstantValue attribute must not exist in the high-level representation of the bytecode. Instead, the constant should be tucked on the field. This way, we don't have a situation where the type of the field and the type of the constant used to initialize it do not match.

Invalid bytecode on Tomcat 7.

The bytecode generated by TOPL on Tomcat 7 crashes Infer, so it is likely invalid.

print StackMapTable only at entry points of basic blocks

The flowgraph should be built during symbolic execution (especially because of JSR). Then symbolic execution should also return this graph. Finally, [compute_max_stack_locals] should filter the entry points of basic blocks (using the flow graph).

speed of encoding

The following are times of running parts of topl on dacapo:

7m30s for zip+decode+encode+topl instrumentation
6m33s for zip+decode+encode
1m41s for zip+decode+inherit
1m35s for zip+decode
13s for zip

Encoding is more than twice slower than decoding. There's no good reason for this difference; should be fixed.

use non-wide instructions in encoding

Right now we always use the wide variant of instructions. To pick the smallest size we need another fixed-point computation: Changing the size of instructions may change the size of jumps, which in turn may change the size of instructions.

I'm not sure if this really makes a difference for the speed of the bytecode. Just for the size of the bytecode I wouldn't bother.

high-level INVOKE

It is undesirable to distinguish between INVOKEINTERFACE and INVOKEVIRTUAL in the high-level representation. The choice should be made based on whether the type name is that of an interface or that of a class. Which brings up another point: It is impossible to distinguish between interfaces and classes given their name. In particular, Barista won't complain if someone uses both INVOKEs on the same type name. Is there a setup in which such bytecode will run without error?

I'm inclined to say that the high-level representation should distinguish between names of classes and names of interfaces. One option is to introduce a type Name.for_interface, another is to have a map Name.for_class -> bool. However, this whole issue needs more thought.

Name.for_class

Name.for_class is a pair that tries to distinguish the name of an inner class. However, there seems to be nothing in the spec that constraints the name of nested classes in any way.

big variants should not be in MLI

Recently, Rasmus solved a cyclic dependency between submodules of HighClass by separating types and operations into different modules. It might be a good idea to have one module with the types and one module with the operations: High and, respectively, HighOperations. (Each will have several submodules, such as High.Attribute, High.Instruction, and HighOperations.Attribute.)

The module High would have only an ml file, and no mli file. So we won't have to keep those huge variants in sync.

Related:
#3

write Long in local variable 0
write Int in local variable 1
read Long from local variable 0

We must have a type Right_half, or with a better name.

packing of two s4 into one s8

Put s8_of_2s4 into Utils and make sure it's used everywhere (grep shift_left).

remove old high-level representation

It's quite clear now that we have a full fork. We don't want to maintain two high-level representations. All that depends on the old high-level representation should be removed.

rgrig / barista Goto Github PK

barista's People

Contributors

Stargazers

Watchers

barista's Issues

Recommend Projects

Recommend Topics

Recommend Org