Giter Club home page Giter Club logo

libscout's Introduction

LibScout

LibScout is a light-weight and effective static analysis tool to detect third-party libraries in Android apps. The detection is resilient against common bytecode obfuscation techniques such as identifier renaming or code-based obfuscations such as reflection-based API hiding or control-flow randomization.
LibScout requires the original library SDKs (compiled .jar/.aar files) to extract library profiles that can be used for detection on Android apps.

Unique features:

  • Library detection resilient against many kinds of bytecode obfuscation
  • Capability of pinpointing the exact library version (in some cases to a set of 2-3 candidate versions)
  • Capability of handling dead-code elimination, by computing a similarity score against baseline SDKs

For technical details and large-scale evaluation results, please refer to our publications:

Reliable Third-Party Library Detection in Android and its Security Applications
https://www.infsec.cs.uni-saarland.de/~derr/publications/pdfs/derr_ccs16.pdf

Keep me Updated: An Empirical Study of Third-Party Library Updatability on Android
https://www.infsec.cs.uni-saarland.de/~derr/publications/pdfs/derr_ccs17.pdf

If you use LibScout in a scientific publication, we would appreciate citations using these Bibtex entries: [bib-ccs16] [bib-ccs17]

Library Profiles and Scripts

To facilitate usage of LibScout we are happy to release our datasets to the community.
You can find the following resources in the data/scripts directory:

Library Profiles (last updated: 06/27/2017)

You can find all library profiles (ready-to-use) for lib detection in apps in the data directory as compressed .zip file.
It currently includes 205 unique libraries and 3,071 library versions.
For convenience, data/library-data.csv contains a complete list of library/-versions including meta data such as release dates.

Scripts (scripts/mvn-central)

The scripts directory further contains a python script to automatically download original library SDKs including complete version histories from maven-central.
The set of libraries we currently retrieve is stored in a json file.

Due to copyright reasons we cannot publicy provide the original library SDKs. If you are interested in this data, send us an email. We also welcome contributions to LibScout or our library database (either original SDKs or scripts for automatic retrieval from sources other than mvn central).

Contact us for comments, feedback, how to contribute: Erik Derr [[email protected]]

Detecting vulnerable library versions

LibScout has builtin functionality to report library versions with the following security vulnerabilities.
Detected vulnerable versions are tagged with [SECURITY], patches with [SECURITY-FIX].
This information is encoded in the library.xml files that have been used to generate the profiles. We try to update the list/profiles whenever we encounter new security issues. If you can share information, please let us know.

Library Version(s) Fix Version Vulnerability Link
Airpush < 8.1 > 8.1 Unsanitized default WebView settings Link
Apache CC 3.2.1 / 4.0 3.2.2 / 4.1 Deserialization vulnerability Link
Dropbox 1.5.4 - 1.6.1 1.6.2 DroppedIn vulnerability Link
Facebook 3.15 3.16 Account hijacking vulnerability Link
MoPub < 4.4.0 4.4.0 Unsanitized default WebView settings Link
OkHttp 2.1-2.7.4 / 3.0.0-3.1.2 2.7.5 / 3.2.0 Certificate pinning bypass Link
SuperSonic < 6.3.5 6.3.5 Unsafe functionality exposure via JS Link
Vungle < 3.3.0 3.3.0 MitM attack vulnerability Link

On our last scan of free apps on Google Play (05/25/2017), LibScout detected >20k apps containing one of these vulnerable lib versions. These results have been reported to Google's ASI program (still under investigation).

LibScout Repo Structure


|_ build.xml (ant build file to generate runnable .jar)
|_ data (library profiles and supplemental data sets)
|    |_ library-data.csv (library meta data)
|    |_ library-profiles-21.06.zip (all library profiles)
|    |_ app-version-codes.csv (app packages with valid version codes)
|_ lib
|    pre-compiled WALA libs, Apache commons*, log4j, Android SDK 
|_ logging
|    |_ logback.xml (log4j configuration file)
|_ scripts
|    |_ mvn-central
|         |_ mvn-central-crawler.py (script to retrieve complete library histories from mvn-central)
|_ src
    source directory of LibScout (de/infsec/tpl). Includes some open-source,
    third-party code to parse AXML resources / app manifests etc.

Getting Started

  1. LibScout requires Java 1.8 or higher. A runnable jar can be generated with the ant script build.xml
  2. Modes of operation (provided via -o switch):
    Profile and Match mode require an Android SDK, provided via the -a switch, to distinguish app code from framework code.
    For your convenience, you can use the one provided in the lib directory.
    1. Library Profiling (-o profile)
      Generate library profiles from original library SDKs (.jar and .aar files supported). Besides the library file, this mode requires a library.xml that contains some meta-data about the library (e.g. name, version, etc.). A library.xml template can be found in the assets directory. Use the -v switch to generate trace profiles, i.e. profiles with class and method signatures, where methods are limited to public methods (Trace profiles are required as input for the library api analysis):
      java -jar LibScout.jar -o profile -a lib/android-X.jar -x ${lib-dir/library.xml} ${lib-dir/lib.[jar|aar]} 
    2. Library Matching (-o match)
      Detect libraries in apps using pre-generated profiles (this example logs to directory + serializes results):
      java -jar LibScout.jar -o match -a lib/android-X.jar -p <path-to-lib-profiles> -s -d <log-dir> $someapp.apk  
    3. Database creation (-o db)
      Generate a SQLite database from library profiles and serialized app stats:
      java -jar LibScout.jar -o db -p <path-to-lib-profiles> -s <path-to-app-stats> 
    4. Library API robustness analysis (-o lib_api_analysis)
      Analyzes changes in the set of library APIs across versions (additions/removals/modifcations). Checks for SemVer compliance, i.e. whether the change in the version string matches the changes in the public API set. SemVer compliance statistics are logged, while API robustness data is written out in JSON format (use -j switch to configure). If you use this mode you have to provide trace profiles (generated via -o profile -v).
      java -jar LibScout.jar -o lib_api_analysis -p <path-to-lib-profiles> -j <json-output-path> 
  3. Output formats: There are three different output formats available (individually configurable).
    1. Textual logging. Per default, LibScout logs to stdout. Use the -d switch to redirect output to files. The -m switch disables any text output.
    2. JSON output can be enabled via -j switch.
    3. The analysis results per app can also be serialized to disk using the -s switch. This is particularly useful for large-scale evaluations. After all apps have been processed, you can use operation mode c) to generate one convenient SQLite file from the serialized results (the DB structure can be found in class de.infsec.tpl.stats.SQLStats).
  4. If you are interested in digging into the source, here are some classes to start with:
    • de.infsec.tpl.TplCLI:    Starting class including CLI parsing and logging init
    • de.infsec.tpl.LibraryProfiler:   Starting class to extract library profiles
    • de.infsec.tpl.LibraryIdentifier:   Code to match lib profiles and application bytecode
    • de.infsec.tpl.hash.HashTree:   main data structures used for profiles

libscout's People

Contributors

reddr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.