Giter Club home page Giter Club logo

scalpel's People

Contributors

4383 avatar ashwinprasadme avatar b-gunawan avatar bevanlewis avatar billquan avatar cherryblossom000 avatar cici0702 avatar gspeiliu avatar jarvx avatar lilicoding avatar lstrn98 avatar marc-goritschnig avatar mden29 avatar michaelmior avatar samuelchassot avatar sapkotaruz11 avatar simisimon avatar tbolg avatar tristanlatr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

scalpel's Issues

Running type checker on the code

I've run pytype checker on the code and it seems that there are a few error that should be looked into IMO:

pytype run-test: commands[0] | pytype --keep-going scalpel
Computing dependencies
Analyzing 29 sources with 0 local dependencies
ninja: Entering directory `.pytype'
[1/29] check scalpel.core.func_call_visitor
[2/29] check scalpel.cfg.model
FAILED: Scalpel/.pytype/pyi/scalpel/cfg/model.pyi 
Scalpel/.tox/pytype/bin/python -m pytype.single --imports_info Scalpel/.pytype/imports/scalpel.cfg.model.imports --module-name scalpel.cfg.model --platform darwin -V 3.8 -o Scalpel/.pytype/pyi/scalpel/cfg/model.pyi --analyze-annotated --nofail --quick Scalpel/scalpel/cfg/model.py
File "Scalpel/scalpel/cfg/model.py", line 74, in strip_comment: Name 'token' is not defined [name-error]
File "Scalpel/scalpel/cfg/model.py", line 79, in strip_comment: Name 'tokenize' is not defined [name-error]
File "Scalpel/scalpel/cfg/model.py", line 90, in strip_comment: Name 'mod' is not defined [name-error]
File "Scalpel/scalpel/cfg/model.py", line 91, in strip_comment: Name 'token' is not defined [name-error]
File "Scalpel/scalpel/cfg/model.py", line 93, in strip_comment: Name 'mod' is not defined [name-error]
File "Scalpel/scalpel/cfg/model.py", line 94, in strip_comment: Name 'tokenize' is not defined [name-error]
File "Scalpel/scalpel/cfg/model.py", line 96, in strip_comment: Name 'mod' is not defined [name-error]
File "Scalpel/scalpel/cfg/model.py", line 98, in strip_comment: Name 'mod' is not defined [name-error]

For more details, see https://google.github.io/pytype/errors.html#name-error
[3/29] check scalpel.core.fun_def_visitor
[4/29] check scalpel.core.util
[5/29] check scalpel.core.vars_visitor
[6/29] check scalpel.core.class_visitor
[7/29] check scalpel.core.source_visitor
[8/29] check scalpel.typeinfer.utilities
FAILED: Scalpel/.pytype/pyi/scalpel/typeinfer/utilities.pyi 
Scalpel/.tox/pytype/bin/python -m pytype.single --imports_info Scalpel/.pytype/imports/scalpel.typeinfer.utilities.imports --module-name scalpel.typeinfer.utilities --platform darwin -V 3.8 -o Scalpel/.pytype/pyi/scalpel/typeinfer/utilities.pyi --analyze-annotated --nofail --quick Scalpel/scalpel/typeinfer/utilities.py
File "Scalpel/scalpel/typeinfer/utilities.py", line 270, in check_consistent_list_types: bad option 'None' in return type [bad-return-type]
           Expected: str
  Actually returned: Optional[Any]
File "Scalpel/scalpel/typeinfer/utilities.py", line 430, in <module>: Invalid type annotation '<instance of Callable>' for node [invalid-annotation]
  Not a type

For more details, see https://google.github.io/pytype/errors.html
[9/29] check scalpel.typeinfer.visitors
[10/29] check scalpel.typeinfer.__init__
[11/29] check scalpel.typeinfer.classes
FAILED: Scalpel/.pytype/pyi/scalpel/typeinfer/classes.pyi 
Scalpel/.tox/pytype/bin/python -m pytype.single --imports_info Scalpel/.pytype/imports/scalpel.typeinfer.classes.imports --module-name scalpel.typeinfer.classes --platform darwin -V 3.8 -o Scalpel/.pytype/pyi/scalpel/typeinfer/classes.pyi --analyze-annotated --nofail --quick Scalpel/scalpel/typeinfer/classes.py
File "Scalpel/scalpel/typeinfer/classes.py", line 31, in ScalpelVariable: Type annotation for type does not match type of assignment [annotation-type-mismatch]
  Annotation: str (Did you mean 'typing.Optional[str]'?)
  Assignment: None
File "Scalpel/scalpel/typeinfer/classes.py", line 35, in ScalpelVariable: Type annotation for called_methods does not match type of assignment [annotation-type-mismatch]
  Annotation: List[str] (Did you mean 'typing.Optional[List[str]]'?)
  Assignment: None
File "Scalpel/scalpel/typeinfer/classes.py", line 36, in ScalpelVariable: Type annotation for binary_operation does not match type of assignment [annotation-type-mismatch]
  Annotation: _ast.BinOp (Did you mean 'typing.Optional[_ast.BinOp]'?)
  Assignment: None
File "Scalpel/scalpel/typeinfer/classes.py", line 46, in ScalpelFunction: Type annotation for return_type does not match type of assignment [annotation-type-mismatch]
  Annotation: str (Did you mean 'typing.Optional[str]'?)
  Assignment: None
File "Scalpel/scalpel/typeinfer/classes.py", line 56, in ScalpelClass: Type annotation for inherits does not match type of assignment [annotation-type-mismatch]
  Annotation: List[str] (Did you mean 'typing.Optional[List[str]]'?)
  Assignment: None
File "Scalpel/scalpel/typeinfer/classes.py", line 65, in BinaryOperation: Invalid type annotation '<instance of Callable>' for left_ast_type [invalid-annotation]
  Not a type
File "Scalpel/scalpel/typeinfer/classes.py", line 67, in BinaryOperation: Invalid type annotation '<instance of Callable>' for right_ast_type [invalid-annotation]
  Not a type
File "Scalpel/scalpel/typeinfer/classes.py", line 68, in BinaryOperation: Invalid type annotation '<instance of Callable>' for operator [invalid-annotation]
  Not a type
File "Scalpel/scalpel/typeinfer/classes.py", line 69, in BinaryOperation: Type annotation for shared_type does not match type of assignment [annotation-type-mismatch]
  Annotation: str (Did you mean 'typing.Optional[str]'?)
  Assignment: None

For more details, see https://google.github.io/pytype/errors.html#import-error
[14/29] check scalpel.rewriter
FAILED: Scalpel/.pytype/pyi/scalpel/rewriter.pyi 
Scalpel/.tox/pytype/bin/python -m pytype.single --imports_info Scalpel/.pytype/imports/scalpel.rewriter.imports --module-name scalpel.rewriter --platform darwin -V 3.8 -o Scalpel/.pytype/pyi/scalpel/rewriter.pyi --analyze-annotated --nofail --quick Scalpel/scalpel/rewriter.py
File "Scalpel/scalpel/rewriter.py", line 53, in random_var_renaming: Invalid keyword argument skip_call_name to function scalpel.core.vars_visitor.get_vars [wrong-keyword-args]
         Expected: (node)
  Actually passed: (node, skip_call_name)
File "Scalpel/scalpel/rewriter.py", line 210, in insert_before: No attribute 'pattern' on ASTRewriter [attribute-error]
File "Scalpel/scalpel/rewriter.py", line 225, in insert_after: No attribute 'pattern' on ASTRewriter [attribute-error]
File "Scalpel/scalpel/rewriter.py", line 234, in remove: No attribute 'pattern' on ASTRewriter [attribute-error]
File "Scalpel/scalpel/rewriter.py", line 244, in replace: No attribute 'pattern' on ASTRewriter [attribute-error]

For more details, see https://google.github.io/pytype/errors.html
[15/29] check scalpel.SSA.alg
[16/29] check scalpel.import_graph.__init__
[17/29] check scalpel.__init__
[18/29] check scalpel.call_graph.__init__
[19/29] check scalpel.SSA.__init__
[20/29] check scalpel.core.kw_visitor
[21/29] check scalpel.core.__init__
[22/29] check scalpel.util
ninja: build stopped: cannot make progress due to previous errors.
Leaving directory '.pytype'

Tell me what you think,

Creating CFG fails

With the following source code from the repository Conditional_Density_Estimation, the creation of the control flow graph fails.

def initialize_models(model_dict, verbose=False, model_name_prefix=''):
    ''' make kartesian product of listed parameters per model '''
    model_configs = {}
    for model_key, conf_dict in model_dict.items():
        model_configs[model_key] = [dict(zip(conf_dict.keys(), value_tuple)) for value_tuple in
                                    list(itertools.product(*list(conf_dict.values())))]

    """ initialize models """
    configs_initialized = {}
    for model_key, model_conf_list in model_configs.items():
        configs_initialized[model_key] = []
        for i, conf in enumerate(model_conf_list):
            conf['name'] = model_name_prefix + model_key.replace(' ', '_') + '_%i' % i
            if verbose: print("instantiating ", conf['name'])
            """ remove estimator entry from dict to instantiate it"""
            estimator = conf.pop('estimator')
            configs_initialized[model_key].append(globals()[estimator](**conf))
    return configs_initialized

Here is the corresponding stack trace. I think the line configs_initialized[model_key].append(globals()[estimator](**conf)) causes the error.

  File "D:\GitHub\test\test_scalpel.py", line 91, in <module>
    main()
  File "D:\GitHub\test\test_scalpel.py", line 66, in main
    cfg = CFGBuilder().build_from_src(name="", src=code_str)
  File "C:\Users\ssimon\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\cfg\builder.py", line 128, in build_from_src
    return self.build(name, tree)
  ...
  File "C:\Users\ssimon\AppData\Local\Programs\Python\Python39\lib\ast.py", line 407, in visit
    return visitor(node)
  File "C:\Users\ssimon\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\core\func_call_visitor.py", line 91, in visit_Call
    call_info["params"] += [self.param2str(arg)]
  File "C:\Users\ssimon\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\core\func_call_visitor.py", line 47, in param2str
    return get_func(param)
  File "C:\Users\ssimon\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\core\func_call_visitor.py", line 39, in get_func
    raise Exception(str(type(node.func)))
Exception: <class 'ast.Subscript'>

Bug in mnode.parse_func_body

I'm not sure if this is a valid bug. As I understand, parse_function_body() is used to return all vars and all calls in each function.

Test case:
image

Code to get the result:
image

Result (only regarding function AA):
{'assign_pairs': [{'var': {'name': 'x', 'lineno': 15, 'col_offset': 4, 'usage': 'store'}, 'calls': []}], 'other_calls': [[{'name': 'BB', 'lineno': 13, 'col_offset': 4, 'params': []}, {'name': 'CC', 'lineno': 14, 'col_offset': 4, 'params': []}], [], [{'name': 'BB', 'lineno': 13, 'col_offset': 4, 'params': []}], [{'name': 'CC', 'lineno': 14, 'col_offset': 4, 'params': []}]]}

Problem:
There will be a lot of repeated func call info in 'other_calls' (O(n^2)), and there's always one empty list in 'other_calls'.
Also, it seems that nested classes will be missed in the results.

The last PR #70 removed my changes

Hi, I just wanted to work on #42 and create a PR for this issue. However, I noticed that my changes from PR #68 and #69 are not in the main branch anymore. I found out that the last PR #70 removed my changes. Is this intended?

Constant propagation class: instance variables are not covered

Instance variables, as you can see in the code snippet below, are currently not covered by the constant propagation class. That is, when applying Scalpel, I cannot identify the values of self.x. However, I would expect Scalpel to be able to do this.

class Test:
    def __init__(self):
        self.x = 2

    def change_x(self):
        self.x = 5

There are two parts in the source code that need to be modified. First, Scalpel needs to cover the ast.Attribute object.

elif isinstance(targets[0], ast.Attribute):
#TODO: resolve attributes
pass

Here, we just need to add two lines:

left_name = ast.unparse(stmt.targets[0])
const_dict[left_name] = stmt.value  

Second, we need to modify the last code block of the method get_stmt_idents_ctx. As the name and attr attribute of ast.Attribute objects are usually connected by a dot, we need to remove "." in r["name"] in the first if condition (Line 316).

ident_info = get_vars(visit_node)
for r in ident_info:
if r['name'] is None or "." in r['name'] or "_hidden_" in r['name']:
continue
if r['usage'] == 'store':
stored_idents.append(r['name'])
else:
loaded_idents.append(r['name'])
if r['usage'] == 'del':
del_set.append(r['name'])
return stored_idents, loaded_idents, []

With these two changes, Scalpel is able to identify all the values for self.x.

Is type inferencing supposed to be interprocedural?

Consider the following modified type_infer_example.py (`:

from os import getcwd

def my_function():
    x = "Current working directory: "
    return x + getcwd()

def main():
    y = my_function()

Running type_infer_tutorial.py on this file, I get the following output (slightly modified for clarify):

python type_infer_tutorial.py 
{'file': 'type_infer_example.py', 'line_number': 4, 'function': 'my_function', 'type': {'str'}}
{'file': 'type_infer_example.py', 'line_number': 5, 'variable': 'x', 'function': 'my_function', 'type': {'str'}}
{'file': 'type_infer_example.py', 'line_number': 9, 'variable': 'y', 'function': 'main', 'type': {'any'}}

Given that my_function has type str, should not y have type str as well instead of any? I am confused about two things:

  1. Is the type inference supposed to be interprocedural or intraprocedural?
  2. Shouldn't the entry point be a function rather than a file? Is the issue that the entry point is specified as the entire file, i.e., type_infer_example.py and that multiple functions are defined in that file?

Duplicate entry (parameter and variable) for function parameter during type inferencing

Consider the following modified type_infer_example.py (`:

from os import getcwd

def my_function(x):
    x = "Current working directory: "
    return x + getcwd()

Running type_infer_tutorial.py on this file, I get the following output (slightly modified for clarify):

python type_infer_tutorial.py
{'file': 'type_infer_example.py', 'line_number': 4, 'function': 'my_function', 'type': {'str'}}
{'file': 'type_infer_example.py', 'line_number': 4, 'parameter': 'x', 'function': 'my_function', 'type': {'any'}}
{'file': 'type_infer_example.py', 'line_number': 5, 'variable': 'x', 'function': 'my_function', 'type': {'str'}}

Above, there are two entries for x but, actually, there is only one x, i.e., the parameter x. In other words, there is no variable x.

AttributeErrors caused by vars visitor

I found again several problems caused by the vars visitor module when analyzing ML projects. To reproduce the errors, run the following example:

# AttributeError: 'int' object has no attribute '_fields'
code_str = """x = np.zeros(([Y.shape[0],X.shape[0]]))"""   

# AttributeError: 'Name' object has no attribute 'value'
# code_str = """z = np.array([x.test[i] for i in test])"""   

# AttributeError: 'Call' object has no attribute 'value'
# code_str = """best_tune = param_grid_list[tune_accs.index(max(tune_accs))]""" 

cfg = CFGBuilder().build_from_src(name="", src=code_str)

_, const_dict = SSA().compute_SSA(cfg)

Just change the code_str variable, to see the different errors. There are probably more errors. I try to add all the cases that I find.

If you need more information, just let me know.

Bug in scalpel.cfg.model.Block.get_calls()

Test case:
The test case "cfg_issue_case.py" is in test-cases (too large to fit in here), but I think the function fails in general cases (as long as it returns a result, i.e. not an empty string).
Error:
image

PyPI distribution

Hello!

First of all, thank you for the work you have put into this project. I was starting to build my own, but I am glad that I found Scalpel.

I want to integrate Scalpel within my project, but with only a source installation, it is hard to manage dependencies, and I would rather not ship Scalpel's code within my project.

Are you planning on releasing this project via PyPI?
If so, considering that there are already a few (unrelated) projects using the same name, I would suggest naming this distribution code-scalpel, python-scalpel, static-scalpel, or something along those lines.

SSA class: Handling of tupels

Currently the SSA class is not able to detect assignment to variables in a tuple.
For example, if we want to detect the value of the variable "a" in the following code example, the SSA-class does not provide any result.

a, b = 1, 2

Bug in vars vistor: Tuple object has no attribute value

I found another problem when I used scalpel for analyzing ML projects. To reproduce the problem, just run the example below.

code_str = """X = dataset.iloc[:, [3, 4]].values"""   

cfg = CFGBuilder().build_from_src(name="", src=code_str)

_, const_dict = SSA().compute_SSA(cfg)

Running this example leads to the following stack trace:

Traceback (most recent call last):
  File "D:\GitHub\test\test_scalpel.py", line 118, in <module>
    main()
  File "D:\GitHub\test\test_scalpel.py", line 92, in main
    _, const_dict = SSA().compute_SSA(cfg)
  File "..\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\SSA\const.py", line 107, in compute_SSA
    stored_idents, loaded_idents, func_names = self.get_stmt_idents_ctx(stmt, const_dict=tmp_const_dict)
  File "C:\Users\ssimon\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\SSA\const.py", line 309, in get_stmt_idents_ctx
    ident_info = get_vars(visit_node)
  File "...\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\core\vars_visitor.py", line 188, in get_vars
    visitor.visit(node)
  File "...\AppData\Local\Programs\Python\Python39\lib\ast.py", line 407, in visit
    return visitor(node)
  File "...\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\core\vars_visitor.py", line 176, in visit_Assign
    self.visit(node.value)
  File "...\AppData\Local\Programs\Python\Python39\lib\ast.py", line 407, in visit
    return visitor(node)
  File "...\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\core\vars_visitor.py", line 113, in visit_Attribute
    self.visit(node.value)
  File "...\AppData\Local\Programs\Python\Python39\lib\ast.py", line 407, in visit
    return visitor(node)
  File "...\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\core\vars_visitor.py", line 151, in visit_Subscript
    self.slicev(node.slice)
  File "...\AppData\Local\Programs\Python\Python39\lib\site-packages\scalpel\core\vars_visitor.py", line 145, in slicev
    self.visit(node.value)
AttributeError: 'Tuple' object has no attribute 'value'

I think it is probably a special case, which is currently not covered by the vars visitor module?

PS: Here is the original file from which I extracted the the code line.

Bug in mnode.parse_import_stmts()

Again I'm not sure if this is expected.

Test case:
image

Result:
{'A': 'X.A', 'B': 'X.B', 'C2': 'X.C.C2', 'd': 'Y.D'}

Problem:
'from . import Test' is missing in the result.

Type inferencing and library code

Consider the following example:

import tensorflow as tf

class SequentialModel(tf.keras.Model):
  def __init__(self, **kwargs):
    super(SequentialModel, self).__init__(**kwargs)

    self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28))

    # Add a lot of small layers
    num_layers = 100
    self.my_layers = [tf.keras.layers.Dense(64, activation="relu") for n in range(num_layers)]
    self.dropout = tf.keras.layers.Dropout(0.2)
    self.dense_2 = tf.keras.layers.Dense(10)

  @tf.function
  def call(self, x):
    x = self.flatten(x)

    for layer in self.my_layers:
      x = layer(x)

    x = self.dropout(x)
    x = self.dense_2(x)

    return x

def main():
    input_data = tf.random.uniform([20, 28, 28])

    eager_model = SequentialModel()
    eager_model(input_data)

if __name__ == "__main__":
    main()

Running the type inferencing on this example, I get something similar to the following on be39267:

$ python type_infer_tutorial.py
{'file': 'graph_execution_time_comparison.py', 'line_number': 28, 'variable': 'input_data', 'function': 'main', 'type': {'any'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 30, 'variable': 'eager_model', 'function': 'main', 'type': {'callable'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 10, 'variable': 'num_layers', 'function': '__init__', 'type': {'int'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 16, 'parameter': 'x', 'function': 'call', 'type': {'any'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 17, 'variable': 'x', 'function': 'call', 'type': {'any'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 22, 'variable': 'x', 'function': 'call', 'type': {'any'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 23, 'variable': 'x', 'function': 'call', 'type': {'any'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 20, 'variable': 'x', 'function': 'call', 'type': {'any'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 10, 'variable': 'num_layers', 'function': '__init__', 'type': {'int'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 16, 'parameter': 'x', 'function': 'call', 'type': {'any'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 17, 'variable': 'x', 'function': 'call', 'type': {'any'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 22, 'variable': 'x', 'function': 'call', 'type': {'any'}}
{'file': 'graph_execution_time_comparison.py', 'line_number': 23, 'variable': 'x', 'function': 'call', 'type': {'any'}}

Question: is it possible to know that input_data is of type Tensor?

how run the tool

I cannot install the tool for type inference I use pip install I got error
Using cached scalpel-0.8.2.tar.gz (96 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

Ɨ python setup.py egg_info did not run successfully.
ā”‚ exit code: 1
ā•°ā”€> [8 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\00\AppData\Local\Temp\pip-install-ussx3gzj\scalpel_0c4ee3a7afd74153a91022a0f8e95f23\setup.py", line 4, in
import scalpel
File "C:\Users\00\AppData\Local\Temp\pip-install-ussx3gzj\scalpel_0c4ee3a7afd74153a91022a0f8e95f23\scalpel_init_.py", line 36, in
from constants import version
ImportError: cannot import name 'version' from 'constants' (C:\Users\00\AppData\Local\Programs\Python\Python39\lib\site-packages\constants.py)
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Ɨ Encountered error while generating package metadata.
ā•°ā”€> See above for output.

After that I download the source code and I also got error
Traceback (most recent call last):
File "C:\Users\00\Desktop\Hityper\Scalpel-master\Scalpel-master\ex.py", line 1, in
from scalpel.typeinfer.typeinfer import TypeInference
File "C:\Users\00\Desktop\Hityper\Scalpel-master\Scalpel-master\scalpel\typeinfer\typeinfer.py", line 14, in
from scalpel.typeinfer.analysers import (
File "C:\Users\00\Desktop\Hityper\Scalpel-master\Scalpel-master\scalpel\typeinfer\analysers.py", line 9, in
from typed_ast import ast3
ModuleNotFoundError: No module named 'typed_ast'

Fix TypeErrors

Raising a string is not a thing in Python, these lines will trigger a TypeError.

by default 2022-06-07 at 9 37 52 PM

Notice the difference:

>>> raise ValueError()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError
>>> raise 'ValueError'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: exceptions must derive from BaseException

Running pytype on this project to fix the obvious TypeErrors would be a good thing until you develop your own type checker ;)

AttributeError: Attribute/Subscript/Name object has no attribute func

Hi, the code line below causes an AttributeError in many ML projects that I analyze. Such projects are for example Stellargraph, seq2seq-EVC or, mmdetection

if type(node.func) is ast.Name:

Here are two a shortened stack trace for two file in the projects Stellargraph.

stellargraph/stellargraph/layer/hinsage.py:

Traceback (most recent call last):
  File "/home/simisimon/github/digital-bauhaus/CfgNet/src/cfgnet/plugins/source_code/ml_plugin.py", line 107, in _parse_config_file
    self.cfg = Cfg(code_str=code_str)
  File "/home/simisimon/github/digital-bauhaus/CfgNet/src/cfgnet/utility/cfg.py", line 26, in __init__
    self.cfg: CFG = CFGBuilder().build_from_src(name="", src=code_str)
  File "/home/simisimon/.cache/pypoetry/virtualenvs/cfgnet--7Rsn6Rj-py3.9/lib/python3.9/site-packages/scalpel/cfg/builder.py", line 128, in build_from_src
...
  File "/home/simisimon/.cache/pypoetry/virtualenvs/cfgnet--7Rsn6Rj-py3.9/lib/python3.9/site-packages/scalpel/core/func_call_visitor.py", line 92, in visit_Call
    call_info["params"] += [self.param2str(arg)]
  File "/home/simisimon/.cache/pypoetry/virtualenvs/cfgnet--7Rsn6Rj-py3.9/lib/python3.9/site-packages/scalpel/core/func_call_visitor.py", line 48, in param2str
    return get_func(param)
  File "/home/simisimon/.cache/pypoetry/virtualenvs/cfgnet--7Rsn6Rj-py3.9/lib/python3.9/site-packages/scalpel/core/func_call_visitor.py", line 34, in get_func
    mid = get_func(node.func.value)
  File "/home/simisimon/.cache/pypoetry/virtualenvs/cfgnet--7Rsn6Rj-py3.9/lib/python3.9/site-packages/scalpel/core/func_call_visitor.py", line 27, in get_func
    if type(node.func) is ast.Name:
AttributeError: 'Subscript' object has no attribute 'func'

stellargraph/stellargraph/layer/graphsage.py.

Traceback (most recent call last):
  File "/home/simisimon/github/digital-bauhaus/CfgNet/src/cfgnet/plugins/source_code/ml_plugin.py", line 107, in _parse_config_file
    self.cfg = Cfg(code_str=code_str)
  File "/home/simisimon/github/digital-bauhaus/CfgNet/src/cfgnet/utility/cfg.py", line 26, in __init__
    self.cfg: CFG = CFGBuilder().build_from_src(name="", src=code_str)
  File "/home/simisimon/.cache/pypoetry/virtualenvs/cfgnet--7Rsn6Rj-py3.9/lib/python3.9/site-packages/scalpel/cfg/builder.py", line 128, in build_from_src
   ...
  File "/home/simisimon/.cache/pypoetry/virtualenvs/cfgnet--7Rsn6Rj-py3.9/lib/python3.9/site-packages/scalpel/core/func_call_visitor.py", line 48, in param2str
    return get_func(param)
  File "/home/simisimon/.cache/pypoetry/virtualenvs/cfgnet--7Rsn6Rj-py3.9/lib/python3.9/site-packages/scalpel/core/func_call_visitor.py", line 34, in get_func
    mid = get_func(node.func.value)
  File "/home/simisimon/.cache/pypoetry/virtualenvs/cfgnet--7Rsn6Rj-py3.9/lib/python3.9/site-packages/scalpel/core/func_call_visitor.py", line 27, in get_func
    if type(node.func) is ast.Name:
AttributeError: 'Attribute' object has no attribute 'func'

Constant propagation computes wrong values

Given the following test file, computing the constant values for x using scalpel results into a dictionary with three key-value pairs.

x = "a"
x = "b"
x = 1
with open("test_file.py", "r", encoding="utf-8") as source:
    code_str = source.read()

cfg = CFGBuilder().build_from_src(name="", src=code_str)

_, const_dict = SSA().compute_SSA(cfg)

print(const_dict)
{('x', 0): <ast.Constant object at 0x0000021BC9EFA430>, ('x', 1): <ast.Constant object at 0x0000021BC9EFA430>, ('x', 2): <ast.Constant object at 0x0000021BC9EFA430>}

That is what I expect. However, parsing the ast.Constant objects to get the actual values does not result into the correct values. Instead, I get the same value three times.

 for key, value in const_dict.items():
        print(key, ast.unparse(value))
('x', 0) 1
('x', 1) 1
('x', 2) 1

This is not what I expect. Do you agree or am I missing something here?

Fix for: Constant propagation for loop variables

As I already commented in #27, I think that the current solution unnecessarily complicates constant propagation for loop variables, since two artifical ast.Call objects (Line 239 and 241) are created, while the actual ast.Call object (Line 237) of the loop is not used at all.

if hasattr(stmt.target, "id"):
left_name = stmt.target.id
iter_value = stmt.iter
# make a iter call
iter_node = ast.Call(ast.Name("iter", ast.Load()), [stmt.iter], [])
# make a next call
next_call_node = ast.Call(ast.Name("next", ast.Load()), [iter_node], [])
const_dict[left_name] = next_call_node

I would like to create a PR, but I do not have the permission. Therefore, I provide my suggested fix here:

 if hasattr(stmt.target, "id"):    
     left_name = stmt.target.id 
     iter_value = stmt.iter 
     const_dict[left_name] = iter_value

or even better:

 if hasattr(stmt.target, "id"):    
     left_name = stmt.target.id 
     const_dict[left_name] = stmt.iter

Corresponding test cases can look like this:

def test_in_range_loop():
    code_str = """for i in range(3):    x = i"""

    cfg = CFGBuilder().build_from_src(name="", src=code_str)

    _, const_dict = SSA().compute_SSA(cfg)
    
    assert const_dict
    assert len(const_dict) == 2
    assert isinstance(const_dict.get(('i', 0)), ast.Call) 
    assert isinstance(const_dict.get(('x', 0)), ast.Name)


def test_for_each_loop():
    code_str = """for x in y:    z = x"""

    cfg = CFGBuilder().build_from_src(name="", src=code_str)

    _, const_dict = SSA().compute_SSA(cfg)
    
    assert const_dict
    assert len(const_dict) == 2
    assert isinstance(const_dict.get(('x', 0)), ast.Call) 
    assert isinstance(const_dict.get(('z', 0)), ast.Name)

Problem with typed-ast in analysers.py

Hello,
similar to a recent issue, I get an AttributeError (module 'typed_ast' has no attribute '_ast3') when calling infer_types() of the TypeInference module.
My environment is Ubuntu 20.04 and Python version 3.8. I did dome digging and quickly found out that the typed-ast module does not support Python versions 3.8 and newer. (Mentioned in the introduction on GitHub)
They recommend using the native ast library as the parser also supports type comments for the newer versions.
I just wanted to raise the issue again in case someone runs into similar problems. I will test the type inference with a lower version.

Edit: I tested versions 3.6.5 and 3.7.0, the error also occurs there

Error in analyzers.py

When I try to run type_infer_tutorial.py in the examples, I got the error "AttributeError: module 'typed_ast' has no attribute '_ast3'" which is caused by typeinfer/analysers.py at line 150. I wonder whether this is a bug in the development? Thanks!

Documentation issue

In https://github.com/SMAT-Lab/Scalpel/wiki/Type-Inference#user-content-how-to-use-type-inference:

The first parameter of TypeInference is the desired name for the inference analyzer, and the second one is the path to a python file or the root folder of a python package.

which is referring to:

inferer = TypeInference(name='type_infer_example.py', entry_point='type_infer_example.py')

Isn't the first parameter the path to a Python file or the root folder of a python package and the second parameter an "entry point?" Are there multiple inference analyzers?

rewriter.random_var_renaming()

src="""
def func(x,y):
    z = 0.
    for i in range(x+y):
        z+=i
    return z
"""
output = """
def func(x, y):
    z = 0.0
    for i in _renamed_one(x + y):
        z += i
    return z
"""

range is changed to _renamed_one

Get all cfgs of a file?

When calling cfg = CFGBuilder().build_from_file(name="test", filepath=path/to/file), the cfg for the corresponding file is created.
If the file contains a classes and methods, the cfg contains functioncfgs and class_cfgs. Does it makes sense to implement a method to get all cfgs of a file?

In my specific use case, I want to identify values of variables in the entire file. Hence, I always have to iterate through all functioncfgs and class_cfgs.

Constant Propagation: Variables that represent callables are not covered

In the following example, a callable is passed as a variable to a parameter:

def preprocessing(s):
    return s

v = TfidfVectorizer(analyzer=preprocessing)

Running Scalpel results into the following const_dict:

{('preprocessing', 0): None, ('v', 0): <ast.Call object at 0x0000016BCDE2AAC0>}

In this case, the function def should be assigned to the variable preprocessing instead if None.

Failed to run call graph analysis because of pycg.

scalpel only supports pycg 0.0.3 version, but the latest version of pycg is 0.0.5,

as we can see in scalpel/call_graph/pycg.py, it imports the pycg from PyPi instead of the scalpel/pycg.

from pycg.pycg import CallGraphGenerator
from pycg import formats

I run the CallGraphGenerator as following:

$ ipy
Python 3.9.8 (main, Nov 10 2021, 03:48:35)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from scalpel.call_graph.pycg import CallGraphGenerator

In [2]: cg_generator = CallGraphGenerator(["__init__.py"], "bs4")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-b5da871ecbc4> in <module>
----> 1 cg_generator = CallGraphGenerator(["__init__.py"], "bs4")

TypeError: __init__() missing 2 required positional arguments: 'max_iter' and 'operation'

I set the max_iter to the default value 1 and set the operation to call-graph. Then it will report a new error.

Traceback (most recent call last):
  File "/Users/vancir/Documents/GitHub/Scalpel/examples/cg_tutorial.py", line 12, in <module>
    f.write(json.dumps(formatter.generate()))
  File "/opt/homebrew/lib/python3.9/site-packages/pycg-0.0.3-py3.9.egg/pycg/formats/simple.py", line 28, in generate
    output = self.cg_generator.output()
AttributeError: 'dict' object has no attribute 'output'

Constant propagation for loop variables

Considering the following code.

from sklearn.cluster import KMeans

for i in range(5):
    kmeans = KMeans(n_clusters=i)

I want to compute all possible value for the loop variable i. Using scalpel results into the following const_dict:

{('i', 0): None, ('kmeans', 0): <ast.Call object at 0x0000015FB8F6EC10>}

Is this behavior intentional? I would not expect i to be None. Instead, I would expect the const_dict to contain all the values i can have in the loop.

Turn the user-guide into a sphinx documentation site

I believe it could make it more useful for end users to have both the API documentation ans the user guides at the same place and be able to link from narrative docs to the API documents.

If this is something that you'd like, I can work on turning the user guides into a sphinx project, using markdown. Then we'll be able to upload it to readthedocs very easily ;-)

Tell me what you think,

rewriter bugs

I find some bugs in rewriter function:

- rewriter.random_var_renaming()

I use this API, to rename the variable, the following is a case with error:

src = """
def func(x,y):
    z = 0.
    for i in range(x+y):
        z+=i
    return z
"""
output = """
def func(x, y):
    z = 0.0
    for i in range(x + _renamed_one):
        z += i
    return z
"""
output2 = """
def func(x, y):
    z = 0.0
    for i in x(x + y):
        z += i
    return z
"""

The range in the var_name_set: ['z', 'i', 'range', 'x', 'y', 'z', 'i', 'z']

- rewriter.unused_stmt_insertion()

src = """
def func(x,y):
    z = 0.
    for i in range(x+y):
        z+=i
    return z
"""
output = """
def func(x, y):
    z = 0.0
    print(this is an unused statement)
    for i in range(x + y):
        z += i
    return z
"""

The string in the print() lacks double quotes.

- rewriter.loop_exchange()

src = """
def func(x,y):
    z = 0.
    for i in range(x+y):
        z+=i
    return z
"""
output = """
def func(x, y):
    z = 0.0
    _iter_obj_4 = range(x + y)
    _counter_4 = 0
    _len_of_iter_4 = len(_iter_obj_4)
    while _counter_4 < _len_of_iter_4:
        z += _iter_obj_4
        _counter_4 += 1
    return z

The line z += _iter_obj_4 is not correct, maybe _iter_obj_4-->_counter_4

Error in SSA.const

inferred = infferer.get_types()

SSA module gives an error when testing typeinfer, the error can be reproduced by running the python file linked above.

The traceback is as follows:
Traceback (most recent call last):
File "xxxx\Scalpel/tests/test_typeinfer_with_real_libs.py", line 155, in
test_pytest_case4()
File "xxxx\Scalpel/tests/test_typeinfer_with_real_libs.py", line 135, in test_pytest_case4
infferer.infer_types()
File "xxxx\Scalpel\scalpel\typeinfer\typeinfer.py", line 128, in infer_types
processed_file = self.process_file(node.source)
File "xxxx\Scalpel\scalpel\typeinfer\typeinfer.py", line 382, in process_file
return_visitor.visit(function_node)
File "xxxx\AppData\Local\Programs\Python\Python38-32\Lib\ast.py", line 360, in visit
return visitor(node)
File "xxxx\Scalpel\scalpel\typeinfer\analysers.py", line 593, in visit_FunctionDef
self.type_infer_CFG(node)
File "xxxx\Scalpel\scalpel\typeinfer\analysers.py", line 721, in type_infer_CFG
ssa_results, ident_const_dict = ssa_analyzer.compute_SSA(cfg)
File "xxxx\Scalpel\scalpel\SSA\const.py", line 123, in compute_SSA
stored_idents, loaded_idents, func_names = self.get_stmt_idents_ctx(stmt, const_dict=tmp_const_dict)
File "xxxx\Scalpel\scalpel\SSA\const.py", line 208, in get_stmt_idents_ctx
left_name = stmt.targets[0].id
AttributeError: 'AnnAssign' object has no attribute 'targets'

Bug in SSA usage in typeinfer.analyser

inferer.infer_types()

The SSA part can give a KeyError when running typeinfer, the error can be reproduced by running the script linked above.

The traceback is as follows:
Traceback (most recent call last):
File "C:/Scalpel/tests/tmp_test_script_for_issue.py", line 5, in
inferer.infer_types()
File "C:\Scalpel\scalpel\typeinfer\typeinfer.py", line 113, in infer_types
processed_file = self.process_file(node.source)
File "C:\Scalpel\scalpel\typeinfer\typeinfer.py", line 403, in process_file
return_visitor.visit(function_node)
File "C:\Anaconda3\Lib\ast.py", line 253, in visit
return visitor(node)
File "C:\Scalpel\scalpel\typeinfer\analysers.py", line 670, in visit_FunctionDef
self.type_infer_CFG(node)
File "C:\Scalpel\scalpel\typeinfer\analysers.py", line 832, in type_infer_CFG
return_values = get_return_value(block)
File "C:\Scalpel\scalpel\typeinfer\analysers.py", line 825, in get_return_value
const_values.append(ident_const_dict[stmt.value.id, ident_no])
KeyError: ('count', 1)

How powerful is scalpel in terms of handling ambiguity?

Hi,

First thanks for building this tool, looks very helpful.

Iā€™m one of the maintainers of pydoctor, and Iā€™m looking into this library in order to make our ast builder more powerful, especially in understanding complex __all__ assignments and modifications. I have few questions.

Can scalpel infer list modifications? If so,
what is the inference of the __all__ variable in such situations?

__all__ = ['i',]
from .mod import __all__ as a
import sys
__all__ = ['f', 'k']
if sys.version_info > (3,8):
    __all__.extend(a)

Does scalpel offer a way to consider version_info as if it takes a specified value and successful infer this without ambiguity?

Thanks

Error while pip installing

Hi,

When I try and run 'pip install Scalpel', I repeatedly get the error

WARNING: Discarding https://files.pythonhosted.org/packages/66/46/bd7bbdf2a4a376c4c34288ddf45ca7779a0bd64c0e1aaf0952484634850f/scalpel-0.7.0.tar.gz#sha256=845c6b03de9b03a496718098deeea886b0e58b22608b172125257cfe8b40cdf6 (from https://pypi.org/simple/scalpel/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. Using cached scalpel-0.6.1.tar.gz (32 kB) Preparing metadata (setup.py) ... error ERROR: Command errored out with exit status 1: command: /home/ec2-user/anaconda3/envs/vector/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-eoc0_wm0/scalpel_65cec16a055e471d90f4510e99e4b66c/setup.py'"'"'; __file__='"'"'/tmp/pip-install-eoc0_wm0/scalpel_65cec16a055e471d90f4510e99e4b66c/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-lhu2p5tv cwd: /tmp/pip-install-eoc0_wm0/scalpel_65cec16a055e471d90f4510e99e4b66c/ Complete output (7 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-eoc0_wm0/scalpel_65cec16a055e471d90f4510e99e4b66c/setup.py", line 4, in <module> import scalpel File "/tmp/pip-install-eoc0_wm0/scalpel_65cec16a055e471d90f4510e99e4b66c/scalpel/__init__.py", line 7, in <module> from constants import __version__ ModuleNotFoundError: No module named 'constants' ----------------------------------------

Installing from source doesn't work either

Constant propagation for loops with counter does not work

I think this problem is related to #37. However, I believe this here is more complex.

Consider the following code snippet:

num_points = 300
C_range = np.geomspace(start=1e-7, stop=1e7, num=num_points)

for i, C_ in enumerate(C_range):
    lr_l2_C = LogisticRegression(penalty = 'l2', solver='liblinear', C=C_)

With Scalpel, I'm currently not able to get the values for the variables C_. Do you plan to handle such cases as well?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.