Giter Club home page Giter Club logo

barista's People

Contributors

rgrig avatar

Stargazers

 avatar

Watchers

 avatar  avatar

barista's Issues

treatment of new in SymbExe

The type inferer wrongly identifies which objects are initialized by INVOKESPECIAL. It is possible, for example, to execute the same NEW twice before executing a corresponding INVOKESPECIAL.

symbolic execution is too lenient

Any code that passes the type inference in Barista should pass JVM's type checking. See issue 12 of topl for a counterexample.

Type inference should use the types from Section 4.10.1.1 of the JVM7 spec. For example, the following should be illegal.

  • Put an uninitialized reference in an array of Object.
  • Disobey the class hierarchy.

support java 8 flag combinations

Barista fails on interfaces with default methods. The function AccessFlag.check_method_flags should implement what S4.6 of JVM8 spec says (or whatever section it is in JVM9).

size-agnostic symbolic execution

Make the symbolic execution size-agnostic. This should be much easier now that locals are size-agnostic in the high-level representation.

unknown attributes

It is somewhat risky to preserve unknown attributes. They could contain, for example, references inside the constant pool that become wrong. It might be safer (1) to warn when an unknown attribute is decoded, and (2) drop it.

Decode all attributes

There are several Coder.decode_attr_ functions that are unimplemented. In particular, processing Tomcat7 fails because Coder.decode_attr_bootstrap_methods is unimplemented.

polymorphic variants

Reduce the use of polymorphic variants.

Pros of PV: you don't need to decide to which module they belong, you can do cool stuff with subtyping.
Cons of PV: you get less help from the type system, they are one of the more complicated features of OCaml

What prompted this: the data type t for constants in (HighInstruction.LDC t).

mutual recursion

At the moment, encode and encode_code in HighAttribute are mutually recursive. For decoding we broke up the recursion, and we might want to do the same for encoding.

rename modules to High.Class, High.Instruction, and so on

I would like to put all code for the high-level representation of the bytecode in a module High that has submodules Class, Instruction, Field, Attribute, Method, and so on.

At the moment, we have both the old high-level representation (Class, Instruction, Attribute, ....) and the new high-level representation (HighClass and its submodules HighInstruction, HighAttribute, ...) In a few instances I could not understand OCaml's name resolution rules when one has both module A and module B.A, so I think it's a bad idea to have both Instruction and High.Instruction. In other words, the renaming that I propose here should be done only after we get rid of the old high-level representation altogether.

argument--local correspondence

There should be a high-level way to deal with the argument--local correspondence in clients of Barista.

Example of the problem: If the arguments are [Class Foo, long, int], then the locals are [0,1,3], because long occupies 2 locals.

the type of ConstantValue_attribute

The ConstantValue attribute must not exist in the high-level representation of the bytecode. Instead, the constant should be tucked on the field. This way, we don't have a situation where the type of the field and the type of the constant used to initialize it do not match.

print StackMapTable only at entry points of basic blocks

The flowgraph should be built during symbolic execution (especially because of JSR). Then symbolic execution should also return this graph. Finally, [compute_max_stack_locals] should filter the entry points of basic blocks (using the flow graph).

speed of encoding

The following are times of running parts of topl on dacapo:

  • 7m30s for zip+decode+encode+topl instrumentation
  • 6m33s for zip+decode+encode
  • 1m41s for zip+decode+inherit
  • 1m35s for zip+decode
  • 13s for zip

Encoding is more than twice slower than decoding. There's no good reason for this difference; should be fixed.

use non-wide instructions in encoding

Right now we always use the wide variant of instructions. To pick the smallest size we need another fixed-point computation: Changing the size of instructions may change the size of jumps, which in turn may change the size of instructions.

I'm not sure if this really makes a difference for the speed of the bytecode. Just for the size of the bytecode I wouldn't bother.

high-level INVOKE

It is undesirable to distinguish between INVOKEINTERFACE and INVOKEVIRTUAL in the high-level representation. The choice should be made based on whether the type name is that of an interface or that of a class. Which brings up another point: It is impossible to distinguish between interfaces and classes given their name. In particular, Barista won't complain if someone uses both INVOKEs on the same type name. Is there a setup in which such bytecode will run without error?

I'm inclined to say that the high-level representation should distinguish between names of classes and names of interfaces. One option is to introduce a type Name.for_interface, another is to have a map Name.for_class -> bool. However, this whole issue needs more thought.

Name.for_class

Name.for_class is a pair that tries to distinguish the name of an inner class. However, there seems to be nothing in the spec that constraints the name of nested classes in any way.

big variants should not be in MLI

Recently, Rasmus solved a cyclic dependency between submodules of HighClass by separating types and operations into different modules. It might be a good idea to have one module with the types and one module with the operations: High and, respectively, HighOperations. (Each will have several submodules, such as High.Attribute, High.Instruction, and HighOperations.Attribute.)

The module High would have only an ml file, and no mli file. So we won't have to keep those huge variants in sync.

Related:
#3

handle LocalVariable(Type)Table

These are two attributes that need some extra work during encoding and decoding. They hold debugging information; so, they aren't high priority.

long and double in symexec

The symbolic execution in coder.ml doesn't handle long and double properly. (For example, TOPL's test longarg fails.)

save line numbers

The function Coder.encode_attr_line_number_table has a dummy implementation and a todo note on it.

We need to implement something sensible. (Why? I think this happens: Infer detects errors in the bytecode instrumented by TOPL but does not print anything because it has no line numbers.)

long values are not typechecked correctly

The following scenario is allowed by our SE, but not by the spec:

  • write Long in local variable 0
  • write Int in local variable 1
  • read Long from local variable 0

We must have a type Right_half, or with a better name.

remove old high-level representation

It's quite clear now that we have a full fork. We don't want to maintain two high-level representations. All that depends on the old high-level representation should be removed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.