Giter Club home page Giter Club logo

pysoot's Introduction

PySoot

pysoot is a lifter from JAR/APK files to a Soot-like Python IR.

The master branch supports Python 3, the py2k branch supports Python2.

Installation

pip install -e .

How to use

from pysoot.lifter import Lifter
input_file = "tests/test_samples/simple1.jar" # the jar/apk you want to analyze
lifter = Lifter(input_file) # the default IR is Shimple, the default input_format is jar
classes = lifter.classes # get the IR of all the classes (as a dict of classes)
print(classes[list(classes.keys())[0]]) # print the IR of one of the translated classes

Many other examples are in tests/test_pysoot.py

lifter.soot_wrapper gives direct access to some Soot functionality. As of now, I added functions from Hierarchy.java, but it is easy (and "almost" automatic) to add others.

Requirements

  • Java. Currently tested using OpenJDK 8 (sudo apt-get install openjdk-8-jdk).

Other components used by pysoot are:

  • Jython. Already included in this repo, it is not neccesary to install it. The embedded version "simulates" a virtualenv with pysoot installed.
  • soot-trunk.jar. This is a slightly modified version of the pre-compiled Soot JAR. At some point, I will upload its source code and the compilation script somewhere. pysoot should also work with a normal version of soot-trunk.jar.

Internals

Components

pysoot works by running Soot (compiled in the embedded soot-trunk.jar) using Jython (embedded) and the code in soot_manager.py

jython_wrapper.py and jython_runner.py establish an IPC bi-directional channel which allows a Python process to call methods of an instance of a class in Jython (data is serialized/deserialized using pickle). jython_wrapper.py runs in Python, while jython_runner.py runs in Jython. In the future we could release this IPC-layer as a separate component.

lifter.py uses this IPC channel to ask Jython to create and serialize the IR.

Classes in pysoot.sootir are used both by the Jython code and the Python one.

Data-Flow Overview

Python --> lifter.py --> jython_wrapper.py --> Jython --> jython_runner.py --> soot_manager.py --> Soot --> Soot IR

Jython --> Soot IR --> classes in pysoot.sootir --> jython_runner.py, pickle --> Python --> jython_wrapper.py, unpickle --> classes in pysoot.sootir --> lifter.py


Pysoot Architecture

pysoot's People

Contributors

antoniobianchi333 avatar conand avatar dipanjan avatar ltfish avatar mohitrpatil avatar rhelmot avatar thrsten avatar twizmwazin avatar zhangysh1995 avatar zwimer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pysoot's Issues

Recursive symlink prevents building sdist

Currently pysoot uses a symlink to "install" itself into the Jython environment. This prevents building an sdist with python -m build --sdist. An alternative solution needs to be found.

Can't find java.lang.System.out in basic example

I'm trying to run one of the basic samples (java_crackme1) and I'm receiving the following complaints:

WARNING | 2022-07-23 18:52:39,336 | angr.engines.soot.field_dispatcher | Couldn't find field in in classes [java.lang.System].
WARNING | 2022-07-23 18:52:39,337 | angr.engines.soot.field_dispatcher | Couldn't find field out in classes [java.lang.System].
WARNING | 2022-07-23 18:52:39,352 | angr.engines.soot.expressions.newarray | Array size <BV32 argc_0_32> can exceed maximum size. It gets bounded with the maximum <BV32 0x3e8>.

Similar errors about not being able to find java.lang.System.out are repeated later and no useful output is produced.

I've tried using OpenJDK 8 and OpenJDK 11 and am running this all on an Arm64 system. (Fedora under macOS)

Consider adding Android Sdk to the CI

This is definitely low priority.
We can continue doing what implemented in 30b3cda, but at some point we may want to consider adding the Android Sdk to the CI.
It is more than 2GB.

@rhelmot ping me if you want to do this at some point

Encountering weird problems when analyzing some APK files

Question

Encountering weird problems when analyzing some APK files

I am a beginner, so please forgive me if these issues are actually easy to solve. I have been stuck with these things for a long time.

Description

When I load different APK files using the following Python code, three weird problems occur that prevent the program from returning the correct results.

Python code:

from pysoot.lifter import Lifter

android_sdk_path = "/home/yyy/Android/Sdk/platforms"
apk_file = "xxx.apk"
lifter = Lifter(apk_file, input_format="apk", ir_format="jimple", android_sdk=android_sdk_path)
print("{} has {} classes".format(apk_file, len(lifter.classes)))

Problem 1: Chosen the wrong API version.

The output is:

pysoot.errors.JythonClientException: JYTHON SOCKET CLOSED
STDOUT:
b'Soot CUSTOM: Scene created! >>> 3\n'b'Chosen APIVersion: [-1, -1, -1] --> 3\n'
STDERR:
......b'java.lang.RuntimeException: java.lang.RuntimeException: error: target android.jar (/home/yyy/Android/Sdk/platforms/android-3/android.jar) does not exist.\n'

or

pysoot.errors.JythonClientException: JYTHON SOCKET CLOSED
STDOUT:
b'Soot CUSTOM: Scene created! >>> 3\n'b'Chosen APIVersion: [63055, 28, 19] --> 63055\n'
STDERR:
......b'java.lang.RuntimeException: java.lang.RuntimeException: Required APIVersion (63055) is not available. maxAPI is: 34\n'

Problem 2: Encountered small uint that is out of range.

The output is:

pysoot.errors.JythonClientException: JYTHON SOCKET CLOSED
STDOUT:
b'Soot CUSTOM: Scene created! >>> 3\n'b'Chosen APIVersion: [29, 28, 19] --> 29\n'b"Using '/home/yyy/Android/Sdk/platforms/android-29/android.jar' as android.jar\n"b"Warning: exception while processing dex file '/home/yyy/apkfiles/testapk/Mijia/v6.0.214.apk'\n"b'Exception: org.jf.util.ExceptionWithContext: Encountered small uint that is out of range at offset 0x38\n'
STDERR:
......b'org.jf.util.ExceptionWithContext: org.jf.util.ExceptionWithContext: Encountered small uint that is out of range at offset 0x38\n'

Problem 3: The program took more than 4 hours to execute and did not return any results.

Environment

OS: Ubuntu 20.04
CPU:
Architecture: x86_64
Model name: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
CPU(s): 16
MEM: 128G
Python: 3.8.10
pysoot: 7.7.12.1

APK File

Jython Error

Exception in thread "main" java.lang.IllegalArgumentException: Cannot create PyString with non-byte value
at org.python.core.PyString.(PyString.java:64)
at org.python.core.PyString.(PyString.java:70)
at org.python.core.Py.newString(Py.java:641)
at org.python.core.PySystemState.initRegistry(PySystemState.java:800)
at org.python.core.PySystemState.doInitialize(PySystemState.java:1045)
at org.python.core.PySystemState.initialize(PySystemState.java:974)
at org.python.core.PySystemState.initialize(PySystemState.java:930)
at org.python.core.PySystemState.initialize(PySystemState.java:925)
at org.python.util.jython.run(jython.java:263)
at org.python.util.jython.main(jython.java:142)

occur when i' m trying to test my angr-dev
test target is angr doc/example/java_androidnative1
just using solve.py (i changed SDK position)


My environment:
Linux version 4.19.0-10-amd64(Debian 8.3.0-6) (info from cat proc/version)
Python 3.7.3 (using virtual python environment.)

I'm trying to fix this problem, seem it caused by Jython.
i'm a new guy of Jython and angr ,if this is a noob question please forgive me.

Example error

An update to the documentation example would need to have the following print statement added instead of the way it is explained currently.
It should be print(classes[list(classes.keys())[0]]) for python3

Refactor bindings to use an external bindings library

Description

Currently pysoot's architecture involves the Python (3) library communicating with a Python (2) program running in Jython, in a JVM along with soot. This is less than ideal, especially considering python 2 has been sunset for a while now. To address this, pysoot needs to be refactored away from this architecture.

I've taken a very brief look at what the state of java to python bindings are, and unfortunately it doesn't seem great. However, the library Py4J does seem promising! An implementation based around Py4J would still depend on an external JVM, but it would allow us to ditch our own communication code and replace it with a small java program. I might be too optimistic, but the python side should also be mostly just be adapting the exposed Java types into something ergonomic from Python.

Alternatives

Someone with more experience using java libraries in Python might know how to do this better. If that's you, please get in touch!

Another option is to consider replacing Soot with SootUp, a new library intended to succeed Soot.

Lastly, pypcode also nominally supports lifting JVM bytecode! I'm not aware if anyone has tested this support, but it might also be a suitable replacement for pysoot for some use cases. If anyone gives this a test, feedback would be super cool.

Additional context

Right now improving and modernizing pysoot is not a high priority for the angr team. We're looking for community members who would be interested in undertaking this effort.

Pysoot Looping through shimple code?

I was going to ask if there is a way to loop through and extract each statement?
For example suppose the following shimple code:
`//<pysoot.sootir.soot_class.SootClass object at 0x7fb8ffb91200>
class HelloWorld extends java.lang.Object{

//<pysoot.sootir.soot_method.SootMethod object at 0x7fb8ffbfd848>
void <init>(){
	//<Block 0 [0], 3 statements>
	r0 <- @this[HelloWorld]
	r0.<init>() [specialinvoke java.lang.Object.<init>()]
	return null
}

//<pysoot.sootir.soot_method.SootMethod object at 0x7fb8ffbfdc48>
public static void main(java.lang.String[]){
	//<Block 0 [0], 4 statements>
	r0 <- @parameter0[java.lang.String[]]
	$r1 = StaticFieldRef ('out', 'java.lang.System')
	$r1.println('"Hello, World"') [virtualinvoke java.io.PrintStream.println(java.lang.String)]
	return null
}

}
`

I would let say want to modify $r1 to say instead of "Hello, World" "Hello, World2". How could I do this with the current architecture of pysoot?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.