Giter Club home page Giter Club logo

tree-of-ast's Introduction

Tree-of-AST

Tree-of-AST is a static-pseudocode analysing approach inspired by Tree of Thoughts: Deliberate Problem Solving with Large Language Models that can be used in both security-analysis and jobs debugging tracing backs, However, in Tree-of-AST we are creating a Tree-of-AST embed with both code generated by AST tools and LMs's thought. With a basic input point, we can use evaluation on both tree node probability and creating a sub-tree of LM's response

Chain-Of-thought && backgrounds

Inspired by Tree of Thoughts: Deliberate Problem Solving with Large Language Models, A solution combining the tree of Aspartate transferase and the tree of Chain-Of-thought. However, we are not using chain-of-thought but creating a Chain-of-attack in favor of construction of ToTs, including heuristically evaluating states;

AUTOGDB

ToT

Construction of ToT can be summarized by four questions mentioned in the paper:

  1. How to decompose the intermediate process into thought steps;
  2. How to generate potential thoughts from each state;
  3. How to heuristically evaluate states;
  4. What search algorithm to use.

Focused on the job of analyzing via AST, in this case specifically analyzing possible vulnerabilities, we can conclude that:

  1. Information will given ( Potential Vulnerability: Tools like bandit, Tree construction: AST )
  2. To generate possible K, choosing from Sampling, and Proposing, we will use Proposing since Proposing works better when the thought space is more constrained (e.g. each thought is just a word or a line), so proposing different thoughts in the same context avoids duplication.
  3. Choosing from Vote and Value or using together, we choose COMBINING TWO BOTH. Since vulnerability analyzing jobs require both parallel judgment and future possible prediction, the final call will be considered with weight of 8:2 in 10.
  4. Depth-first search (DFS) since we are actually conducting analysis on an existing possible vulnerability.

The whole workflow should be like:

1. Static vulnerability existence detection (Bandit for instance)
2. AST Tree building from and analyzing (Finding xreferences and input)
3. Eval Tree nodes reversely using `Vote` and `Value` of external LMs
    * If node score > threshold, mark the current node and restart on previous doubtful node using BHF.
    * If no node score > threshold Until the final node is reached (t > T), mark as none.

4. Finally, export the marked branch, in further discussions, we will call it "Tree-of-AST"

Exploitations

For exploitations, we will use another approach based on ToT. To begin with, we will start on from tail-to-Head travaling the Tree-of-AST in order to proceed inverse operation of the Vuln-input. For each node we met in the original Tree-of-AST, we will call it a layer; For each layer we met, we will use LMs ( no-shot or few shot ) to generate a part of exploit, these exploits will be embed in this layer of Tree-of-AST.

Same as ToT goes, we will generate multiple nodes of exploits in a layer, however, we will only evaluate them in vote, value with lookahead simulations in order to avoid issues such as unsure type, after this round of travel. Thus the weight should be 2:8 now

In the exploitation process, we might face the possibility that we can insert arbitrary since in this layer this part have not encountered any limitation, in this case, we will leave this part with <arbitrary> tag to avoid future asserts since we are traveling from Tail-To-Head.

Additionally, if <arbitrary> encounted unspecificed assert, such as len(), we can modify it by adding a limitation property <arbitrary limit='length < 7'>.

Or we can re-travel the tree from head-to-tail, using ToT to reconstruct arbitrary

import os

class UserInput:
   def __init__(self,input) -> None:
       self.input = input
       
   def __setattr__(self, name, value):
       print(f'[!] {name} now is {value}')
       super().__setattr__(name, value)

def vuln(input=input('Your address: ')):
   # ANALYSIS-TOP: [<arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>]

   user_input = UserInput(input=input).input

   # ANALYSIS: tytytyty<arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>

   if 'tytytyty' not in user_input[:7]:
       exit(0)
   else:
       user_input = user_input[7:]

   # ANALYSIS: <arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>
   user_input = user_input.split(',')

   # ANALYSIS UNSURE TYPE, lookahead simulations
   # ANALYSIS: [<arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>]
   user_input = user_input[3]
   
   # ANALYSIS: <arbitrary>retr0reglove<COMMAND>,
   #           retr0reg<arbitrary>love<COMMAND>
   if not 'retr0reg' in user_input:
       exit()

   # ANALYSIS-TAIL: <arbitrary>love<COMMAND>
   user_input = user_input.split('love')[1] 
   
   return os.system(user_input)

vuln()

Thought To Reality

Setting up AST nods

Think might be easier than doing, but in this case both is hard. To turn our ideal model in to reality acutally took lots of efforts. To begin with, we started on buiding a source-to-sink alike framework using Python AST with reverse-tracking. However, the AST do not support direct reverse-traveling, thus we will have to assign each node with a fwd, bck and id similar to glibc bidirectional linked list in dynamic memory allocation; while adding parents and children to remain the tree property of our AST.

class ToAfy(ast.NodeVisitor):
    
    """ Adding father and children attribution to every node so we can reverse travel the AST """
    
    def __init__(self, target_function=None, target_variable=None):
        self.parent = None
        self.last_id = 0
        self.last_node = None
        self.target_function = target_function
        self.target_variable = target_variable
        self.vulnerable_note = None

        
    def generic_visit(self, node):
        node.node_id = self.last_id
        self.last_id += 1

        # print(f'setting {node.id}')
        
        if self.last_node is not None:
            self.last_node.fwd = node
            node.bck = self.last_node
        else:
            node.bck = None
            
        node.fwd = None
        self.last_node = node
        # if node.bck and self.last_node:
        #     print(f'{node.id}: last_node\'s fwd {self.last_node.id}, note\'s bck {node.bck.id}')
        
        if not hasattr(node, 'parent'):
            node.parent = self.parent
        if not hasattr(node, 'children'):
            node.children = []

Additionally, we will have to locate the node that's representing our vulnerable code segment. this usually takes other round of positive directional traveling, nevertheless, if we sets the fwd, bck, id while try locating that vulnerability sink, we can manage to use only once of positive directional traveling of the AST.

    if isinstance(node, ast.Call):
            full_function_name = get_full_function_name(node.func)
            if full_function_name == self.target_function:
                for arg in node.args:
                    if isinstance(arg, ast.Name) and arg.id == self.target_variable:
                        # VULN
                        self.vulnerable_note = node

Finally wrap up setting parent nodes.

        self.parent = node
        current_last_node = self.last_node
        for child in ast.iter_child_nodes(node):
            node.children.append(child)
            self.visit(child)
        self.last_node = current_last_node 
        ## RESTORING self.last_node

tree-of-ast's People

Contributors

jettchent avatar nicolasperez19 avatar retr0reg avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.