Giter Club home page Giter Club logo

Comments (10)

yuleisui avatar yuleisui commented on May 29, 2024

Hi Oliver,

Thanks for your question. You case looks a bit complex.

If I understand correctly, do you want to track the value-flows of a particular pointer parameter at a callsite, say "* sp" in your example? Then, You may wish to refer to "getActualParmSVFGNode(const PAGNode* aparm,llvm::CallSite cs)".

If you want to track the value-flows of an address-taken object, say the target pointed to by "*sp" at callsite "log2(*sp,foo)". Then, you may wish to refer to "getActualINSVFGNodes(llvm::CallSite cs)". Any object that may be modified inside callee "log2" is put as a ActualIN SVFGNode before the callsite.

During the past few months, we've made SVF to support C++ programs, e.g., virtual calls. We tried to model as many C++ library calls as possible. However, when lowering a C++ program into a bitcode file, they are still things may not be modelled, especially some of the C++ internal classes, such as class string in your case. String is internal in C++, whose function bodies are not included in the a bc file. Any string operations e.g., =, +, <<, * are no longer simple assignment or dereference, rather, they are translated into a series of invocations to its (string) overloading functions.

You can see some C++ function names like:

line 624 in Logger.bc
"_ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKNSt7__cxx1112basic_st"
corresponds to "<<" at line 39 in Logger.cpp

line 629 in Logger.bc
"_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEPKc"
corresponds to "<=" at line 40 in Logger.cpp

The above function calls are not modeled by SVF, so the value-flows of the address-taken objects pointed to by "*sp" may be missed. We will need to add the side-effects of those functions in file Util/ExtAPI.cpp after demangling their names (Util/CPPUtil.cpp).

For now, another way to achieve you goal is to change "string&" to be "char*&". Then the value-flows will be soundly generated.

from svf.

obraunsdorf avatar obraunsdorf commented on May 29, 2024

Thanks for the detailed help!!

Seems like getActualParmSVFGNode() provides the functionality I was looking for. However I can't traverse the value flow graph from there because the returned SVFGNode doesn't have any incoming or outgoing edges. Is this intended?

Fortunately, I managed to get the SVFGNode corresponding to a function argument using

for(const PAGNode* pagNode : pag->getCallSiteArgsList(callSite)) {
	const SVFGNode* svfgNode = svfg->getDefSVFGNode(pagNode);
}

From this SVFGNode I can traverse the Edges through svfgNode->getInEdges() / svfg->getOutEdges().

With the information from your answer about C++ internal classes, I was able to track back the value flow of *sp to the expected locations, simply by inserting
{"_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4_", ExtAPI::EFT_A1R_A0R},
into Util/CPPUtil.cpp. If I understood your code correctly, this causes generation of a load and store edge from the right hand side to the left hand side of the std::string::operator=() (string assingment function). Am I right?
As changing std::string to char* is no option for my project, I guess I have to find all external functions (e.g stdlib functions) which are not linked into the LLVM bit code file and model their behaviour within your ExtAPI class.

Thinking a little bit further the following question comes in my mind:
Do you think the process of building the SVFG could be sped up by also summarizing the behaviour of non-library functions known by the user (and unknown at compile time of SVF)? Maybe the user could specify a list of known functions and their internal dependencies in the ExtAPI-fashion, so that llvm::StringMap<extf_type> info is extended dynamically within ExtAPI::init()?
I guess this would require a more generic fashion, since the ExtAPI is bounded to extf_t-Enum.

from svf.

yuleisui avatar yuleisui commented on May 29, 2024

A very good try!

  1. On getActualParmSVFGNode
    Yes, you can get ActualParmSVFGNode via "pag->getCallSiteArgsList(callSite)"
    For example, your callsite "logger.log2(*sp, foo);" has three arguments: "logger" (hidden this pointer), "*sp" and "foo".

  2. On SVFGNodes for top-level pointers and address-taken objects
    From my understanding, you want to track the value-flows of the object that "*p" points to. Is this your intention?
    If so, you may wish to use one of the following three ways to get an SVFGNode

  • use"getActualINSVFGNodes(llvm::CallSite cs)" at the callsite.
  • use getFormalINSVFGNodes(llvm::Function) at the entry of callee function "log2(std::string& msg, std::string& msg2)".
  • find a StoreSVFGNode or LoadSVFGNode which is accessing the object (this is a commonly used way).
    After you obtain the SVFGNode, you can iterative its incoming/outgoing edges on SVFG to track the value-flows. Remember that every IndirectSVFGEdge has a "PointsTo" field (representing a group of objects) to specify the value-flows of those objects (https://github.com/unsw-corg/SVF/wiki/Technical-documentation#322-svfgedge).
  1. On C++ internal classes
    We have just committed a new patch which summaries the side-effects of the C++ string APIs. Please see 53dafe6.
    We have also added an option ("-modelConsts") to enable analysis of constant strings for your case. I believe this will solve your problem.
    Please use the following command line to dump memory SSA and SVFG
wpa -fspta -modelConsts=true -stat=false  -dump-mssa -dump-svfg LoggerOO.ll

You will see two ActualINSVFGNodes and one ActualOUTSVFGNode. These are the ones you want to obtain.

CALMU(MR_6V_4)  pts{222 }
CALMU(MR_8V_2)  pts{226 }
  invoke void @_ZN6Logger4log2ERNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES6_(%class.Logger* %logger, %"class.std::__cxx11::basic_string"* dereferenceable(32) %23, %"class.std::__cxx11::basic_string"* dereferenceable(32) %foo)
          to label %invoke.cont27 unwind label %lpad15, !dbg !722 
6V_5 = CALCHI(MR_6V_4)  pts{222 }
  1. On function summarization
    It is a good idea. In pointer analysis, function summarization (either external or internal) is an alternative approach to traditional "whole-program" analysis. Summarization is particularly useful for modular analysis of programs which heavily use frameworks and libraries. You can have a try, however, making summarization of every function may be error prone, because you have to understand the behavior of every app function.

from svf.

obraunsdorf avatar obraunsdorf commented on May 29, 2024

Hi Yulei,

sorry for this delayed reply! Unfortunately my time to work with SVF is limited at the moment.

The patch (53dafe6) you provided works great, thanks! I was able to get a big part of my work done after applying it.
Unimportant information: I had to add the following lines to ExtAPI to get the particular overload of std::string::operator=() that my code uses.

/// string operator=: operator= (string&& str)
+    {"_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4_", ExtAPI::CPP_EFT_A0R_A1R}, // c++11

So it seems that there is more work to be done with the standard c++ lib. But I will try to cover this topic in another issue on a higher level.

Regarding points 1 and 2 of your last post:
My intention is to track the value flow of a particular parameter of a particular function X, no matter if the parameter is given by value, by pointer or by reference (which equals a pointer in LLVM IR if I'm correct). When tracking the value flow of this parameter I want to be able to tell that the value comes from
(a) the return value of another function that has been called before function X
(b) the output parameter i of another function that takes 0 <= i <= n arguments and that has been called before function X.
(c) the input parameter of the function that calls function X

So far I accomplished (a) and (b) but there are a few bugs due to my lack of understanding of SVFG and how it is built in all the corner cases.
Currently I replaced SVFGOPT (it confused me that it deleted nodes that I thought I need for my analysis) with unoptimized SVFG. Then for the particular function X, I iterate over FormalInSVFGNodes (as you said) to get a parameter node.

Question 1: The FormalINSVFGNodeSet seems to not guarantee that its order corresponds to parameter ordering. How can I know that the FormalINSVFGNode represents the i-th parameter of function X?
Related Question 1.1: FormalINSVFGNodeSet seems to contain more FormalINSVFGNodes. Only some of them have incoming Edges (which I use for traversal). Where do they come from and how can I know that I it does not correspond to a parameter of function X?

For (a) and (b): Sometimes while traversing SVFG I visit a FormalRetSVFGNode. Then it's obvious that value comes from the return value of another function. But sometimes I only get FormalOutSVFGNodes. It's obvious from which function it comes but it's hard to tell from which parameter (No. 0, 1, 2, ..., n) of the function it comes or if it actually comes from the return value of the function.
I used a little workaround(*) which searches for a StoreSVFGNode within the incoming edges of the FormalOutSVFGNode. Then I can use storeNode->getPAGDstNode()->getValue(), look for a llvm::LoadInstruction which uses this value, call loadInstuction->getPointerOperand() to get the llvm::Value which actually represents the formal parameter and finally cast it to llvm::Argument which provides me with getArgNo(). If the argument number is i=0 and the function has a return value then I know the FormalOutSVFGNode represents this return value.

Question 2: Is this workaround the way to go? In all corner cases? Or is there a simpler and reliable way to get the parameter No. i?
So far my workaround seems to work in all my test cases.

For (c): Very rarely while traversing SVFG I visit a FormalParmSVFGNode. Then it's easy to know from which input parameter of which function the value comes from. But most of the time I get a FormalINSVFGNode. As with FormalOUTSVFGNodes I cannot tell to which parameter number the node corresponds. And I cannot use the workaround(*) as I can't detect any StmtSVFGNode around.

Question 3: How can I know to which parameter number i the FormalINSVFGNode corresponds?
[This question seems to be solved, if question 1 is solved and vice versa]

Many thanks in advance!

from svf.

yuleisui avatar yuleisui commented on May 29, 2024

Hi Oliver,

We didn’t label an index (e.g., i) for an address-taken object o via i-th formal parameter p passing into the entry of a function. This is because the object o may not only refer to *p, but also **p or ***p, their indexes are the same, which will be confusing for users. In addition, another parameter e.g., q may also point to this object, so object o may not have a fixed index. Besides, if o is a global object passing indirectly as a FormalInSVFGNode via global pointers, which does not have an argument index.

My advice is to start the backtracking from a statement SVFGNode (as your tainted source) rather than a formal parameter on SVFG. It the can always reach a formal parameter if the object/pointer is passed directly or indirect via a parameter.

I have listed several examples for you to understand.

  • Example 1
f(p){
*p = q
} 

A DirectSVFGEdge (value-flow of p) from parameter p (FormParmSVFGNode) to *p=q (StoreSVFGNode)

  • Example 2
f(r){
q = *r
p = *q
} 

A DirectSVFGEdge (value-flow of r) from parameter r (FormParmSVFGNode) to q=*r (LoadSVFGNode), and another DirectSVFGEdge (value-flow of q) from q=*r (LoadSVFGNode) to p=*q (LoadSVFGNode)

  • Example 3
f(x){
*r = x  CHI(o)   // r points to o
p = *q   CHI(o)	  // q points to o
} 

A DirectSVFGEdge (value-flow of “x”) from parameter “x” (FormParmSVFGNode) to “*r=x” (StoreSVFGNode), and another IndirectSVFGEdge (value-flow of “o”) from “*r=x” (StoreSVFGNode) to “p=*q” (LoadSVFGNode)

  • Example 4
f(x){
r = *x    // r points to o
p = *r	  
} 

A DirectSVFGEdge (value-flow of x) from parameter x (FormParmSVFGNode) to *r=x (LoadSVFGNode), and another DirectSVFGEdge (value-flow of r) from *r=x (LoadSVFGNode) to p=*r (LoadSVFGNode).

  • Example 5
    The following case can't reach a formal parameter when backtracking from p=*r, because o is a global object.
caller:
CHI(o)    // AuctualInSVFGNode of o and o is global
f(y)

f(q){
CHI(o)    // FormalInSVFGNode of o
r = *x    // x is a global pointer that points to o
p = *r	  
} 

An IndirectSVFGEdge (value-flow of o) from the AuctualInSVFGNode to FormalInSVFGNode, and then to r=*x (LoadSVFGNode) via IndirectSVFGEdge (value-flow of o), and we have another DirectSVFGEdge (value-flow of r) from r=*x (LoadSVFGNode) to p=*r (LoadSVFGNode).

Please let me know if you have anything more want to discuss.

from svf.

yuleisui avatar yuleisui commented on May 29, 2024

Hi Oliver,

If possible, could you help us collect the summaries of some C++ library calls (as what you did for string operator=)?

It will benefit other users too.

Thanks

from svf.

obraunsdorf avatar obraunsdorf commented on May 29, 2024

Hi Yulei,

yes I will help you collecting the libc++ summaries. After I get my little example to work, I plan to analyze a real world c++ project with lots of c++ library calls. I will provide a git patch or pull request or whatever you prefer to commit the ExtAPI summaries.

But I am really confused at the moment. Your examples seem straight forward to me, except example 5. Where does x come from?(Just a typo and should be q? => I assume x is q for the following).
My Question is: what is the reason to add an Entry-CHI(o) in f(q)? Because there is a Ref of o (which is pointed to by q) in r = *q - a possible side effect, right? So why can't we get a connection between Entry-CHI and the FormalParm q? The are obviously related to each other.
=> Is there any way to get to the llvm::Argument from an Entry-CHI node or Call-CHI node in all corner cases?
If no, should I consider building a custom SVFG?

Another case: if my particular parameter is not a pointer, but just an i8 for example. If I understood you correctly there is no Actual/FormalParmSVFGNode for this parameter and no ActualIN/FormalINSVFGNode, right? Do I have to use PAG then to track its value?

from svf.

yuleisui avatar yuleisui commented on May 29, 2024

Oliver,

x in example 5 is a global pointer. q is intended to sit there. Let us assume it won't be used in the body of function f. Sorry about the confusion.

Please see the revised one below:

main(){
L1: *x = z;
STORE-CHI(o)   // x is a global pointer that points to o

CALL-CHI(o)    // AuctualInSVFGNode of o and o is global
L2: f(y);
}


f(q){
L3: ENTRY-CHI(o)    // FormalInSVFGNode of o

LOAD-MU(o)
L4: p = *x;    // x is a global pointer that points to o	  
} 

In this example, when you perform backtracking from "p=*r" (tainted source) to "*x=z", the traversed value-flows have nothing to do with parameter q.

L4 --o--> L3 --o--> L2 --o--> L1

We don't model value-flows (def-use chains) for no-pointer values, as they are explicit on LLVM SSA.

Please let me know if the above still doesn't answer your question.

from svf.

obraunsdorf avatar obraunsdorf commented on May 29, 2024

Ah okay, that makes sense. So there are definitly CHI and MU nodes, that do not correspond to function parameters. What about this example, where x and y are local allocations in the main function. Is the placement of the CHIs and MUs correct and can we relate them to their actual and formal parameters?

main() {
	x = alloc()
	y = alloc()
	a = *x
	b = *y

	CSCHI(a=>x)
	CSCHI(b=>y)
	functionA(a, b);
	CSMU(a=>y)
	CSMU(b=>x)

	CSCHI(a=>y)
	functionX(a)
	CSMU(a=>?)
}


functionA(*p, *q) {
	ENCHI(p=>x)
	ENCHI(q=>y)
	t = *p
	*p = *q
	*q = t;
	RETMU(p=>y)
	RETMU(q=>x)
}

from svf.

yuleisui avatar yuleisui commented on May 29, 2024

I believe you can use CollectPtsChain (lines 428-450) in MemRegion.cpp for your case.

You can apply CollectPtsChain for each parameter p to decide whether an object (at an ENTRY-CHI) is passed via p into a callee function.

from svf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.