Giter Club home page Giter Club logo

metonymic-smokey / javagc Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 1.0 14.98 MB

A pipeline to extract and analyse object lifetimes from a Java program.

License: GNU General Public License v2.0

CMake 0.01% C++ 61.16% Shell 0.37% Java 27.13% Python 0.04% Dockerfile 0.01% Jupyter Notebook 0.51% Kotlin 2.69% HTML 0.16% CSS 0.04% JavaScript 4.23% Vue 0.03% TypeScript 0.36% Makefile 1.45% Batchfile 0.12% C 1.00% Objective-C 0.05% DTrace 0.16% Assembly 0.20% XSLT 0.30%

javagc's Introduction

Object Analyzer

A pipeline to extract and analyze object lifetimes from a Java program.

The Object Analyzer runs a given Java program in a modified JVM (AntTracks JVM) that collects profiling information for every single object that was allocated and details of every garbage collection event. The JVM writes this information to highly compressed trace files which are read by the Analyzer and converted to Parquet files. The processed file is used to generate a few visualizations using the analysis scripts.

More details can be found in the paper titled "Analysis of Garbage Collection Patterns to Extend Microbenchmarks for Big Data Workloads".

Usage

Requirements

  • Docker (tested on v20.10)
  • bash (tested on v5.0)
  • A Java 8 program, or a compiled JAR targeted for Java 8 (Java class file version 52.0 and below).

Running on a JAR file

If you have a JAR file that you want to analyze, use the run.sh script as:

./run.sh <absolute path to JAR file> [<args to JAR file>...]

The outputs will be saved to a subdirectory in ./outputs/ - look for the last line in the script's output to get the full path.

The output directory contains:

  • output/data___lifetimes_._parquet subdirectory contains graphs generated by the analysis.
  • data subdirectory contains raw object level data as a CSV (this can be quite large - please delete it if not required) and as a Parquet file.
  • trace_files subdirectory contains trace, symbols and class definitions file generated by the modified AntTracks JVM.

Running on generated trace files

If you have generated a trace file (with symbols and class definitions files too) using the AntTracks JVM separately, you can use the on_traces.sh script as:

./on_traces.sh <absolute path to trace file>

Note: the directory containing the trace file must also contain the symbols and class definitions files with the same suffix.

The outputs will be saved to a subdirectory in ./outputs/ - look for the last line in the script's output to get the full path.

The output directory structure is similar to Running on a JAR file except that trace_files subdirectory is not generated.

Directory structure

  • ant-tracks-jvm: the modified AntTracks JVM. Note: this is modified slightly from the original AntTracks JVM to support applications that run multiple JVMs concurrently. The Object Analyzer pipeline assumes that this modified AntTracks JVM is used.
  • ant-tracks-analyzer: a Java CLI application that re-uses the source code of the original AntTracks Analyzer to extract object data and lifetime in a processable format.
  • analysis: python scripts used to generate Parquet files and visualizations from the processed CSVs generated by the Analyzer.
  • custom-benchmarks: a set of JMH-based Java micro-benchmarks made to replicate some patterns observed in Big Data benchmarks.
  • IonutBench: a JMH implementation of some of Ionut Balosin's Garbage collectors benchmarks.
  • sample-program: a sample Java 11 benchmark program that is compiled with Java 8 compatibility. The generated JAR file (./gradlew jar) can be analyzed using ./run.sh $PWD/sample-program/build/libs/sample-program.jar 1000 10000 (1000 10000 are arguments to the program).
  • vmtrace: a JVMTI agent that tracks the allocations of all objects. Currently not used in the pipeline since it cannot find the death/collection of objects.

Limitations

  • Only Java 8 applications (or compiled JARs targeted for Java 8 compatibility) are supported. See the sample program for an example on how to configure Gradle to target Java 8 even if a higher Java version is used for compilation.
    • This is due to a limitation in AntTracks JVM since it's a modified Java 8 JVM.
    • The analysis scripts are agnostic of the data source and only expect the data in a particular format. If it's possible to get the same data through another source that supports newer JVMs (perhaps something like vmtrace), the same lifetime analysis can be performed.

Citing

If you find this work useful, please cite our work - Analysis of Garbage Collection Patterns to Extend Microbenchmarks for Big Data Workloads. A BibTeX is given below:

@inproceedings{10.1145/3491204.3527473,
author = {Sarnayak, Samyak S. and Ahuja, Aditi and Kesavarapu, Pranav and Naik, Aayush and Kumar V., Santhosh and Kalambur, Subramaniam},
title = {Analysis of Garbage Collection Patterns to Extend Microbenchmarks for Big Data Workloads},
year = {2022},
isbn = {9781450391597},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3491204.3527473},
doi = {10.1145/3491204.3527473},
abstract = {Java uses automatic memory allocation where the user does not have to explicitly free used memory. This is done by the garbage collector. Garbage Collection (GC) can take up a significant amount of time, especially in Big Data applications running large workloads where garbage collection can take up to 50 percent of the application's run time. Although benchmarks have been designed to trace garbage collection events, these are not specifically suited for Big Data workloads, due to their unique memory usage patterns. We have developed a free and open source pipeline to extract and analyze object-level details from any Java program including benchmarks and Big Data applications such as Hadoop. The data contains information such as lifetime, class and allocation site of every object allocated by the program. Through the analysis of this data, we propose a small set of benchmarks designed to emulate some of the patterns observed in Big Data applications. These benchmarks also allow us to experiment and compare some Java programming patterns.},
booktitle = {Companion of the 2022 ACM/SPEC International Conference on Performance Engineering},
pages = {121โ€“128},
numpages = {8},
keywords = {big data, java, java virtual machine, garbage collection, hadoop},
location = {Bejing, China},
series = {ICPE '22}
}

License

GPLv2

javagc's People

Contributors

metonymic-smokey avatar naikaayush avatar psiayn avatar samyak2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

jackkolokasis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.