Giter Club home page Giter Club logo

appthreat / atom Goto Github PK

View Code? Open in Web Editor NEW
25.0 2.0 1.0 18.79 MB

Atom is a novel intermediate representation for applications and a standalone tool that is powered by chen.

Home Page: https://appthreat.com

License: Apache License 2.0

Scala 14.45% Batchfile 0.01% Shell 0.35% JavaScript 1.56% PowerShell 0.08% TypeScript 22.87% Python 0.14% HTML 12.01% Rust 48.10% Dockerfile 0.43%
code-analysis supply-chain intermediate-representation application-analytics exploit-prediction reachability-analysis variant-analysis vulnerability-analysis

atom's Introduction

Atom (⚛)

Atom is a novel intermediate representation for applications and a standalone tool powered by the chen library. The intermediate representation (a network with nodes and links) is optimized for operations typically used for application analytics and machine learning, including slicing and vectoring.

Our vision is to make atom useful for many use cases such as:

  • Supply-chain analysis: Generate evidence of external library usage including the flow of data from sources to sinks. Atom is used by OWASP cdxgen to improve the precision and comprehensiveness of the generated CycloneDX document.
  • Vulnerability analysis: Describe vulnerabilities with evidence of affected symbols, call paths, and data-flows. Enable variant and reachability analysis at scale.
  • Exploit prediction: Predict exploits using precise representations of vulnerabilities, libraries, and applications.
  • Threat-model and attack vectors generation: Generate precise threat models and attack vectors for applications at scale.
  • Application context detection: Generate context useful for summarization and risk-profile generation (e.g. services, endpoints, and data attributes).
  • Mind-maps for applications: Automate summarization of large and complex applications as a developer tool.

and more.

release npm Discord

Atom logo

Installation

Atom comprises a core (standalone chen application developed in scala) with a nodejs wrapper module. It is currently distributed as an npm package.

npm install @appthreat/atom
# sudo npm install -g @appthreat/atom

Install cdxgen to generate a Software Bill-of-Materials which is required for reachables slicing.

npm install -g @cyclonedx/cdxgen --omit=optional

atom native-image

atom is available as a native image built using graalvm community edition.

curl -LO https://github.com/AppThreat/atom/releases/latest/download/atom-amd64
chmod +x atom-amd64
./atom-amd64 --help

On Windows

curl -LO https://github.com/AppThreat/atom/releases/latest/download/atom.exe
.\atom.exe --help

NOTE: cdxgen is not bundled into the native image so needs to be installed separately.

CLI Usage

Usage: atom [parsedeps|data-flow|usages|reachables] [options] [input]

  input                    source file or directory
  -o, --output <value>     output filename. Default app.⚛ or app.atom in windows
  -s, --slice-outfile <value>
                           export intra-procedural slices as json
  -l, --language <value>   source language
  --with-data-deps         generate the atom with data-dependencies - defaults to `false`
  --remove-atom            do not persist the atom file - defaults to `false`
  -x, --export-atom        export the atom file with data-dependencies to graphml - defaults to `false`
  --export-dir <value>     export directory. Default: atom-exports
  --file-filter <value>    the name of the source file to generate slices from. Uses regex.
  --method-name-filter <value>
                           filters in slices that go through specific methods by names. Uses regex.
  --method-parameter-filter <value>
                           filters in slices that go through methods with specific types on the method parameters. Uses regex.
  --method-annotation-filter <value>
                           filters in slices that go through methods with specific annotations on the methods. Uses regex.
  --max-num-def <value>    maximum number of definitions in per-method data flow calculation - defaults to 2000
Command: parsedeps
Extract dependencies from the build file and imports
Command: data-flow [options]
Extract backward data-flow slices
  --slice-depth <value>    the max depth to traverse the DDG for the data-flow slice - defaults to 7.
  --sink-filter <value>    filters on the sink's `code` property. Uses regex.
Command: usages [options]
Extract local variable and parameter usages
  --min-num-calls <value>  the minimum number of calls required for a usage slice - defaults to 1.
  --include-source         includes method source code in the slices - defaults to false.
  --extract-endpoints      extract http endpoints and convert to openapi format using atom-tools - defaults to false.
Command: reachables [options]
Extract reachable data-flow slices based on automated framework tags
  --source-tag <value>     source tag - defaults to framework-input.
  --sink-tag <value>       sink tag - defaults to framework-output.
  --include-crypto         includes crypto library flows - defaults to false.
  --help                   display this help message

Sample Invocations

Generate an atom

# Compile java project
atom -o app.atom -l java .
atom -o app.atom -l jar <jar file>
export ANDROID_HOME=<path to android sdk>
atom -o app.atom -l apk <apk file>

Create reachables slice for a java project.

cd <path to repo>
cdxgen -t java --deep -o bom.json .
atom reachables -o app.atom -s reachables.json -l java .

Create data-flow slice for a java project.

atom data-flow -o app.atom --slice-outfile df.json -l java .

Create usages slice for a java project.

atom usages -o app.atom --slice-outfile usages.json -l java .

Learn more about slices or view some samples

Extract HTTP endpoints in openapi format using atom-tools

Atom can automatically invoke atom-tools convert command to extract http endpoints from the usages slices. Pass the argument --extract-endpoints to enable this feature.

pip install atom-tools
atom usages --extract-endpoints -o app.atom --slice-outfile usages.json -l java .

A file called openapi.generated.json would be created with the endpoints information.

Export atom to graphml or dot format

It is possible to export each method along with data dependencies in an atom to graphml or dot format. Simply pass --export to enable this feature.

atom -o app.atom -l java --export-atom --export-dir <export dir> <path to application>

The resulting graphml files could be imported into Neo4j or NetworkX for further analysis. Use the argument --export-format for dot format.

atom -o app.atom -l java --export-atom --export-format dot --export-dir <export dir> <path to application>

In dot format, individual representations such as ast, cdg, and cfg would also get exported.

To also compute and include data-dependency graph (DDG) information in the exported files, pass --with-data-deps

atom -o app.atom -l java --export-atom --export-dir <export dir> --with-data-deps <path to application>

container usage

docker run --rm -v /tmp:/tmp -v $HOME:$HOME -v $(pwd):/app:rw -it ghcr.io/appthreat/atom atom --help
# podman run --rm -v /tmp:/tmp -v $HOME:$HOME -v $(pwd):/app:rw -it ghcr.io/appthreat/atom atom --help

Example for java project.

docker run --rm -v /tmp:/tmp -v $HOME:$HOME -v $(pwd):/app:rw -it ghcr.io/appthreat/atom atom -l java -o /app/app.atom /app
# podman run --rm -v /tmp:/tmp -v $HOME:$HOME -v $(pwd):/app:rw -it ghcr.io/appthreat/atom atom -l java -o /app/app.atom /app

Languages supported

  • C/C++
  • H (C/C++ Header files alone)
  • Java (Requires compilation)
  • Jar
  • Android APK (Requires Android SDK. Set the environment variable ANDROID_HOME)
  • JavaScript
  • TypeScript
  • Python
  • PHP (Requires PHP >= 7.0. Supports PHP 5.2 to 8.3)

Atom Specification

The intermediate representation used by atom is available under the same open-source license (Apache-2.0). The specification is available in protobuf, markdown, and html formats.

The current specification version is 1.0.0

Generating atom files

Atom files (app.⚛ or app.atom) are zip files with serialized protobuf data. Atom cli is the preferred approach to generate these files. It is possible to author a generator tool from scratch using the proto specification. We offer samples in Python and Deno for interested users. We also offer proto bindings in additional languages which can be found here.

Example code snippet for generating an atom in python.

# Create a method fullname property
methodFullName = atom.CpgStructNodeProperty(
    name=atom.NodePropertyName.FULL_NAME, value=atom.PropertyValue("main")
)

# Create a method node with the fullname property
method = atom.CpgStructNode(
    key=1, type=atom.NodeType.METHOD, property=[methodFullName]
)

# Create an atom with a single node
atom_struct = atom.CpgStruct(node=[method])

# Create an atom (app.atom) by serializing this data into a zip file
with ZipFile("app.atom", "w") as zip_file:
    zip_file.writestr("cpg.proto", bytes(atom_struct))

License

Apache-2.0

Developing / Contributing

Install Java 21 Node.js > 21

sbt clean stage scalafmt test createDistribution
cd wrapper/nodejs
bash build.sh && sudo npm install -g .

Using atom with chennai

chennai is the recommended query interface for working with atom.

chennai> importAtom("/home/almalinux/work/sandbox/apollo/app.atom")

Atom tools

Checkout atom-tools for some project ideas involving atom slices.

Enterprise support

Enterprise support including custom language development and integration services is available via AppThreat Ltd. Free community support is also available via discord.

Sponsors

YourKit supports open source projects with innovative and intelligent tools for monitoring and profiling Java and .NET applications. YourKit is the creator of YourKit Java Profiler, YourKit .NET Profiler, and YourKit YouMonitor.

YourKit logo

atom's People

Contributors

cerrussell avatar davidbakereffendi avatar prabhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

cerrussell

atom's Issues

Warnings from repotests

With shiftleft-java-example

[INFO ] Pass io.joern.x2cpg.passes.base.AstLinkerPass completed in 18 ms (2% on mutations). 333 + 0 changes committed from 1 parts.
[INFO ] Start of enhancement: io.joern.x2cpg.passes.base.ContainsEdgePass
[INFO ] Enhancement io.joern.x2cpg.passes.base.ContainsEdgePass completed in 45 ms. 3349  + 0 changes committed from 547 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.TypeUsagePass
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=CALL, srcNodeId=4119, dstNodeType=TYPE, dstFullName=java.util.Iterator
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=CALL, srcNodeId=4121, dstNodeType=TYPE, dstFullName=java.util.Iterator
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=CALL, srcNodeId=4847, dstNodeType=TYPE, dstFullName=java.util.Iterator
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=CALL, srcNodeId=4849, dstNodeType=TYPE, dstFullName=java.util.Iterator
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=CALL, srcNodeId=4869, dstNodeType=TYPE, dstFullName=java.util.Iterator
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=CALL, srcNodeId=4871, dstNodeType=TYPE, dstFullName=java.util.Iterator
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=LOCAL, srcNodeId=4118, dstNodeType=TYPE, dstFullName=java.util.Iterator
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=LOCAL, srcNodeId=4846, dstNodeType=TYPE, dstFullName=java.util.Iterator
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=LOCAL, srcNodeId=4868, dstNodeType=TYPE, dstFullName=java.util.Iterator
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=IDENTIFIER, srcNodeId=4120, dstNodeType=TYPE, dstFullName=java.util.Iterator
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=IDENTIFIER, srcNodeId=4126, dstNodeType=TYPE, dstFullName=java.util.Iterator

juice-shop

[INFO ] Enhancement io.joern.x2cpg.passes.base.ContainsEdgePass completed in 442 ms. 278961  + 0 changes committed from 8978 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.TypeUsagePass
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=929, dstNodeType=TYPE, dstFullName=Challenge
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=1257, dstNodeType=TYPE, dstFullName=User
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=1562, dstNodeType=TYPE, dstFullName=User
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=1640, dstNodeType=TYPE, dstFullName=Delivery
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=1767, dstNodeType=TYPE, dstFullName=Address
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=1889, dstNodeType=TYPE, dstFullName=Card
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=2291, dstNodeType=TYPE, dstFullName=Product
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=2439, dstNodeType=TYPE, dstFullName=Memory
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=2674, dstNodeType=TYPE, dstFullName=Product
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=2706, dstNodeType=TYPE, dstFullName=Product
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=2727, dstNodeType=TYPE, dstFullName=Product
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=4525, dstNodeType=TYPE, dstFullName=SecurityQuestion
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=13893, dstNodeType=TYPE, dstFullName=ErrorWithParent
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=14098, dstNodeType=TYPE, dstFullName=ErrorWithParent
[INFO ] Could not create edge. Destination lookup failed. edgeType=EVAL_TYPE, srcNodeType=METHOD_PARAMETER_IN, srcNodeId=14304, dstNodeType=TYPE, dstFullName=ErrorWithParent

Reachables slicing should accept the path to the BOM file

Currently, the BOM file is expected to be called bom.json and placed within the same application directory for reachables slicing to work.

Instead, we can explicitly pass the path to the BOM files to support more use cases. Requires a new global ConfigFile Creation Pass in chen that can look into other directories.

[Slice] call tree slicing

A simple call tree slicing on top of CFG. This could offer some improvements over usage slicing with better performance than reachables. We can try to squeeze in automatic tags, too.

Extract Python Dependencies & Imports

Use Joern to:

  • Find a call to setup in /setup.py
  • Determine if the argument install_requires is set
  • If it is set, tag the associated List as a list of dependencies (Question: Is this always a list? Could this point to a file?)
  • Parse all the imports in the project and note which modules are used (excl. test and local imports)
  • Use the output format:
{
  "modules": [],
}

Tasks

  • Build extraction logic
  • Wire logic to CLI and give JSON output
  • Refactor unit test to use folders for test data
  • Have a separate list for builtin_modules by using a hardcoded list of DEFINES for python similar to jssrc.

[container] astgen commands are missing in the container image

[root@5fdef671090c app]# which astgen
/usr/bin/which: no astgen in (/root/.local/bin:/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/opt/vendor/bin:/opt/java/21.0.1-graalce/bin:/opt/maven/3.9.6/bin:/usr/local/bin/:/root/.local/bin:/opt/android-sdk-linux/cmdline-tools/latest/bin:/opt/android-sdk-linux/tools:/opt/android-sdk-linux/tools/bin:/opt/android-sdk-linux/platform-tools:)

slice generated for requests include requests

Could we filter the internal modules from the current folder in the generated modules slice?

Look for argument name - name and packages and add them to the localModuleNames. Sometimes, the name could point to a variable and packages could be a list.

cpg.call("setup").argument.argumentName("packages").where(_.file.name(".*setup.py")).code.toSet

Add annotations to usage slice

Annotations could also added to usage slices. When adding annotations, all of the annotations of a given method or class must be added with references to the className and methodName along with lineNumber and columnNumber. In addition, all of the attributes of the annotation must be retained.

Example:

@GetMapping(value = { "/", "/home" })
public String home(Model model, HttpSession session) {

https://github.com/HooliCorp/vuln-spring/blob/master/src/main/java/com/example/vulnspring/WebController.java#L46

In the above example, we need to retain information such as:

  • typeFullName of the annotation is org.springframework.web.bind.annotation.GetMapping
  • args list is [{name: "value", type: "java.util.List<String>", values: ["/", "/home"]}]
  • methodFullName
  • lineNumber
  • columnNumber
  • fileName

If a method or a class has multiple annotations, then every annotation must be captured. The hierarchy could also be captured if an annotation is a sub-class of another annotation.

[Slice] Occurrences slices

Sometimes, computing and categorizing all four styles of usages is computationally intensive. We need a simpler and quicker slicing technique that uses the chen SBOM-based auto-tagger to list only the occurrences, which is file location with line number.

This might be challenging for c/c++ since the usages slices are mandatory for creating an SBOM. So we might need to port parsedeps slicing to c/c++ to remove this dependency.

NOTICE: Atom version 2 would require Java 21 as minimum version

We're seeing tremendous performance differences when switching to Java 21 virtual threads. Therefore, we have decided to make Java 21 the minimum version for v2.

Atom version 1 would remain available for Java 17 to 20 users, with limited support available for sponsors and enterprise users.

Unsolved symbol error on ppc64 arch

Failure: Unsolved symbol : java.lang.String

2023-09-01 08:25:23.526 ERROR AstCreator: Unsolved symbol exception caught in src/main/java/com/example/vulnspring/WebController.java
2023-09-01 08:25:23.568 ERROR AstCreator: Unsolved symbol exception caught in src/main/java/com/example/vulnspring/SessionFilter.java

Support for exporting data flow slice to age and neo4j format

For apache age, there is a csv format for loading graphs

https://age.apache.org/age-manual/master/intro/agload.html

Support graphml format to export the DataFlows slice. Must be usable with filtering #4.

https://neo4j.com/docs/apoc/current/import/graphml/

My preference would be graphml so that it could be imported using python networkx directly.

https://networkx.org/documentation/stable/reference/readwrite/graphml.html

Finally, add a section on README explaining the steps involved to export, import and query the information.

Bug: repotests are hanging while parsing juice-shop

[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous117 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous117: 109
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous118 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous118: 187
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous119 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous119: 70
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous120 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous120: 11
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous121 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous121: 24
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous122 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous122: 20
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous123 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous123: 20
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous124 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous124: 39
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous125:anonymous in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous125:anonymous: 36
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous125 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous125: 1
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous126 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous126: 19
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous127 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous127: 19
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous128 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous128: 19
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous129 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous129: 46
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous130 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous130: 5
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous131 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous131: 17
[INFO ] Calculating reaching definitions for: frontend/src/assets/private/three.js::program:anonymous132 in frontend/src/assets/private/three.js
[INFO ] Number of definitions for frontend/src/assets/private/three.js::program:anonymous132: 16

Sporadic "key not found errors" in reaching def pass

We need a solution or a workaround for this exception. So far, tried adding JAVA_OPTS to increase heap memory which didn't work. Merely running parsedeps command against a repo like scipy or requests is enough to replicate this.

atom parsedeps -l python -o /tmp/atom-deps-sIUOi7/app.atom --slice-outfile /tmp/atom-deps-sIUOi7/slices.json /home/almalinux/work/sandbox/scipy -Dlog4j.configurationFile=/tmp/atom-deps-sIUOi7/log4j2.xml
Data-flow overlay is not detected, applying now
Failure: java.util.NoSuchElementException: key not found: io.shiftleft.codepropertygraph.generated.nodes.Identifier[label=IDENTIFIER; id=2346266]
 2023-06-11 00:02:29.712 ERROR CpgPassBase: Pass io.joern.dataflowengineoss.passes.reachingdef.ReachingDefPass failed
java.util.NoSuchElementException: java.util.NoSuchElementException: key not found: io.shiftleft.codepropertygraph.generated.nodes.Identifier[label=IDENTIFIER; id=2346266]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
	at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
	at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
	at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:562) ~[?:?]
	at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:591) ~[?:?]
	at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:689) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:927) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:?]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:693) ~[?:?]
	at io.shiftleft.passes.NewStyleCpgPassBase.runWithBuilder(CpgPass.scala:152) ~[io.shiftleft.codepropertygraph_3-1.3.600.jar:1.3.600]
	at io.shiftleft.passes.ForkJoinParallelCpgPass.createApplySerializeAndStore(CpgPass.scala:74) ~[io.shiftleft.codepropertygraph_3-1.3.600.jar:1.3.600]
	at io.shiftleft.semanticcpg.layers.LayerCreator.runPass(LayerCreator.scala:53) ~[io.joern.semanticcpg_3-1.1.1742.jar:1.1.1742]
	at io.joern.dataflowengineoss.layers.dataflows.OssDataFlow.create$$anonfun$1(OssDataFlow.scala:31) ~[io.joern.dataflowengineoss_3-1.1.1742.jar:1.1.1742]
	at scala.runtime.function.JProcedure1.apply(JProcedure1.java:15) ~[org.scala-lang.scala3-library_3-3.3.0.jar:3.3.0]
	at scala.runtime.function.JProcedure1.apply(JProcedure1.java:10) ~[org.scala-lang.scala3-library_3-3.3.0.jar:3.3.0]
	at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:575) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:573) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1300) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at io.joern.dataflowengineoss.layers.dataflows.OssDataFlow.create(OssDataFlow.scala:32) ~[io.joern.dataflowengineoss_3-1.1.1742.jar:1.1.1742]
	at io.shiftleft.semanticcpg.layers.LayerCreator.run(LayerCreator.scala:32) ~[io.joern.semanticcpg_3-1.1.1742.jar:1.1.1742]
	at io.appthreat.atom.parsedeps.package$.parseDependencies(package.scala:32) ~[io.appthreat.atom-1.0.0.jar:1.0.0]
	at io.appthreat.atom.Atom$.generateSlice$$anonfun$2(Atom.scala:250) ~[io.appthreat.atom-1.0.0.jar:1.0.0]
	at scala.util.Using$.resource(Using.scala:261) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at io.appthreat.atom.Atom$.generateSlice(Atom.scala:250) ~[io.appthreat.atom-1.0.0.jar:1.0.0]
	at io.appthreat.atom.Atom$.run$$anonfun$1$$anonfun$1$$anonfun$1(Atom.scala:114) ~[io.appthreat.atom-1.0.0.jar:1.0.0]
	at scala.util.Either.flatMap(Either.scala:352) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at io.appthreat.atom.Atom$.run$$anonfun$1$$anonfun$1(Atom.scala:115) ~[io.appthreat.atom-1.0.0.jar:1.0.0]
	at scala.util.Either.flatMap(Either.scala:352) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at io.appthreat.atom.Atom$.run$$anonfun$1(Atom.scala:115) ~[io.appthreat.atom-1.0.0.jar:1.0.0]
	at scala.util.Either.flatMap(Either.scala:352) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at io.appthreat.atom.Atom$.run(Atom.scala:115) ~[io.appthreat.atom-1.0.0.jar:1.0.0]
	at io.appthreat.atom.Atom$.run(Atom.scala:99) ~[io.appthreat.atom-1.0.0.jar:1.0.0]
	at io.appthreat.atom.Atom$.main(Atom.scala:55) ~[io.appthreat.atom-1.0.0.jar:1.0.0]
	at io.appthreat.atom.Atom.main(Atom.scala) ~[io.appthreat.atom-1.0.0.jar:1.0.0]
Caused by: java.util.NoSuchElementException: key not found: io.shiftleft.codepropertygraph.generated.nodes.Identifier[label=IDENTIFIER; id=2346266]
	at scala.collection.immutable.BitmapIndexedMapNode.apply(HashMap.scala:635) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at scala.collection.immutable.BitmapIndexedMapNode.apply(HashMap.scala:633) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at scala.collection.immutable.HashMap.apply(HashMap.scala:132) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at io.joern.dataflowengineoss.passes.reachingdef.ReachingDefFlowGraph.pred(ReachingDefProblem.scala:72) ~[io.joern.dataflowengineoss_3-1.1.1742.jar:1.1.1742]
	at io.joern.dataflowengineoss.passes.reachingdef.ReachingDefFlowGraph.pred(ReachingDefProblem.scala:71) ~[io.joern.dataflowengineoss_3-1.1.1742.jar:1.1.1742]
	at io.joern.dataflowengineoss.passes.reachingdef.DataFlowSolver.$anonfun$1(DataFlowSolver.scala:20) ~[io.joern.dataflowengineoss_3-1.1.1742.jar:1.1.1742]
	at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:118) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:105) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at scala.collection.mutable.ListBuffer.flatMap(ListBuffer.scala:39) ~[org.scala-lang.scala-library-2.13.10.jar:?]
	at io.joern.dataflowengineoss.passes.reachingdef.DataFlowSolver.calculateMopSolutionForwards(DataFlowSolver.scala:33) ~[io.joern.dataflowengineoss_3-1.1.1742.jar:1.1.1742]
	at io.joern.dataflowengineoss.passes.reachingdef.ReachingDefPass.runOnPart(ReachingDefPass.scala:31) ~[io.joern.dataflowengineoss_3-1.1.1742.jar:1.1.1742]
	at io.joern.dataflowengineoss.passes.reachingdef.ReachingDefPass.runOnPart(ReachingDefPass.scala:23) ~[io.joern.dataflowengineoss_3-1.1.1742.jar:1.1.1742]
	at io.shiftleft.passes.NewStyleCpgPassBase$$anon$2.accept(CpgPass.scala:147) ~[io.shiftleft.codepropertygraph_3-1.3.600.jar:1.3.600]
	at io.shiftleft.passes.NewStyleCpgPassBase$$anon$2.accept(CpgPass.scala:146) ~[io.shiftleft.codepropertygraph_3-1.3.600.jar:1.3.600]
	at java.util.stream.ReduceOps$4ReducingSink.accept(ReduceOps.java:220) ~[?:?]
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
	at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:960) ~[?:?]
	at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:934) ~[?:?]
	at java.util.stream.AbstractTask.compute(AbstractTask.java:327) ~[?:?]
	at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754) ~[?:?]
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) ~[?:?]
	at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) ~[?:?]
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) ~[?:?]
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) ~[?:?]
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) ~[?:?]

cc: @mpollmeier

Support for filtering the slices

We need a way of customizing the slicing operation to limit the data that gets written to the slices json.

Usages slices

  • Accept method filters to filter methods such as constructors, getters or setters method etc based on pattern
  • Accept parameter filters to filter based on parameter types
  • Accept annotation filters to filter methods based on annotations

DataFlow slices

  • Limit dataflows to only those that begin from an internal method
  • Limit dataflows to only those that end with an external method
  • Options to accept patterns for sources
  • Options to accept patterns for sinks

Common

  • Accept a yaml configuration to pass the parameters for the filtering operations
  • Add safeguards to prevent arbitrary remote code execution and exfiltration via queries

Bug: Check if cpg generation is working for c, javascript and typescript

CPG file is just 16Kb indicating it may not be working.

-rw-r--r-- 1 runner docker  16K Jun  2 13:35 /tmp/c.cpg.bin
-rw-r--r-- 1 runner docker   35 Jun  2 13:35 /tmp/c.slices.json
-rw-r--r-- 1 runner docker   63 Jun  2 13:35 /tmp/c.usages.json
-rw-r--r-- 1 runner docker 544K Jun  2 13:35 /tmp/java.cpg.bin
-rw-r--r-- 1 runner docker 1.1M Jun  2 13:35 /tmp/java.slices.json
-rw-r--r-- 1 runner docker 147K Jun  2 13:35 /tmp/java.usages.json
-rw-r--r-- 1 runner docker  16K Jun  2 13:35 /tmp/juice.cpg.bin
-rw-r--r-- 1 runner docker   35 Jun  2 13:35 /tmp/juice.slices.json
-rw-r--r-- 1 runner docker   63 Jun  2 13:35 /tmp/juice.usages.json
-rw-r--r-- 1 runner docker 4.1M Jun  2 13:35 /tmp/py.cpg.bin
-rw-r--r-- 1 runner docker  13M Jun  2 13:35 /tmp/py.slices.json
-rw-r--r-- 1 runner docker 969K Jun  2 13:35 /tmp/py.usages.json
-rw-r--r-- 1 runner docker  16K Jun  2 13:35 /tmp/ts.cpg.bin
-rw-r--r-- 1 runner docker   35 Jun  2 13:35 /tmp/ts.slices.json
-rw-r--r-- 1 runner docker   63 Jun  2 13:35 /tmp/ts.usages.json

Develop mini-scripts to work with slices

  • From the usages json, extract only the external classes and methods and the usage location
  • From the dataflow slice json, extract only the CALL, IDENTIFIER, and PARAMETER NODES from the paths and display stack trace-like output

Create atom-samples repo

We need a new repo with a collection of atom, data-flow and usage slices organized by language. We can edit the readme of this repo to link to the samples repo.

Diff slice feature

Support for generating slices for Usages and DataFlows based on the diff between two CPGs.

Formalize the data slices json format

We must formalize and confirm the v1 format for data slices for Usages and DataFlows.

  • Create jsonschema spec to validate and show examples
  • Generate html from the spec similar to the cpg spec

cc: @fabsx00

Support for custom slicer

Atom could accept a scala script or a yaml config file with a CPGQL query to generate custom slices in json format.

usages slices exception for vencord

https://github.com/Vendicated/Vencord

atom usages -o app.atom -l javascript --slice-outfile usages.json .                              ok 
2023-08-23 10:58:34.624 ERROR CpgPassBase: Pass io.appthreat.atom.passes.SafeJSTypeRecovery failed
java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
at java.util.concurrent.ForkJoinTask.reportExecutionException(ForkJoinTask.java:581) ~[?:?]
at java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:993) ~[?:?]
at io.joern.x2cpg.passes.frontend.XTypeRecovery.$anonfun$2(XTypeRecovery.scala:135) ~[io.joern.x2cpg_3-2.0.56.jar:2.0.56]
at scala.collection.Iterator$$anon$9.next(Iterator.scala:584) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.IterableOnceOps.reduceLeft(IterableOnce.scala:764) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.IterableOnceOps.reduceLeft$(IterableOnce.scala:753) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1300) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.IterableOnceOps.reduceLeftOption(IterableOnce.scala:805) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.IterableOnceOps.reduceLeftOption$(IterableOnce.scala:805) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1300) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.IterableOnceOps.reduceOption(IterableOnce.scala:739) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.IterableOnceOps.reduceOption$(IterableOnce.scala:739) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1300) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at io.joern.x2cpg.passes.frontend.XTypeRecovery.run(XTypeRecovery.scala:136) ~[io.joern.x2cpg_3-2.0.56.jar:2.0.56]
at io.shiftleft.passes.CpgPass.runOnPart(CpgPass.scala:27) ~[io.shiftleft.codepropertygraph_3-1.4.20.jar:1.4.20]
at io.shiftleft.passes.NewStyleCpgPassBase.runWithBuilder(CpgPass.scala:134) ~[io.shiftleft.codepropertygraph_3-1.4.20.jar:1.4.20]
at io.shiftleft.passes.ForkJoinParallelCpgPass.createApplySerializeAndStore(CpgPass.scala:74) ~[io.shiftleft.codepropertygraph_3-1.4.20.jar:1.4.20]
at io.shiftleft.passes.NewStyleCpgPassBase.createAndApply(CpgPass.scala:124) ~[io.shiftleft.codepropertygraph_3-1.4.20.jar:1.4.20]
at io.joern.x2cpg.passes.frontend.XTypeRecoveryPass.run$$anonfun$2(XTypeRecovery.scala:77) ~[io.joern.x2cpg_3-2.0.56.jar:2.0.56]
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:575) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:573) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at scala.collection.AbstractIterator.foreach(Iterator.scala:1300) ~[org.scala-lang.scala-library-2.13.10.jar:?]
at io.joern.x2cpg.passes.frontend.XTypeRecoveryPass.run(XTypeRecovery.scala:77) ~[io.joern.x2cpg_3-2.0.56.jar:2.0.56]
at io.shiftleft.passes.CpgPass.runOnPart(CpgPass.scala:27) ~[io.shiftleft.codepropertygraph_3-1.4.20.jar:1.4.20]
at io.shiftleft.passes.NewStyleCpgPassBase.runWithBuilder(CpgPass.scala:134) ~[io.shiftleft.codepropertygraph_3-1.4.20.jar:1.4.20]
at io.shiftleft.passes.ForkJoinParallelCpgPass.createApplySerializeAndStore(CpgPass.scala:74) ~[io.shiftleft.codepropertygraph_3-1.4.20.jar:1.4.20]
at io.shiftleft.passes.NewStyleCpgPassBase.createAndApply(CpgPass.scala:124) ~[io.shiftleft.codepropertygraph_3-1.4.20.jar:1.4.20]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.