Giter Club home page Giter Club logo

crete-dev's Introduction

CRETE: Versatile Binary-Level Concolic Testing

Build Status

CRETE is a versatile binary-level concolic testing framework developed by the System Validation Lab at Portland State University. It can be applied to various kinds of software systems for test case generation and bug detection, including proprietary user-level programs, closed-source libraries, kernel modules, etc.

Highlights

  • Open and extensible architecture: totally decoupled concrete and symbolic execution from virtual machine
  • Standardized execution trace: llvm-based, self-contained, and composable
  • Binary-level analysis: no source code or debug information required
  • In-vivo analysis: use real full software stack and require no environment modeling
  • Compact execution trace: selective binary-level tracing based on Dynamic Taint Analysis
  • Trace/test selection and scheduling algorithms to improve effectiveness
  • Simple usage model: no expertise required and accessible for general users

Publications

Support

If you need help with CRETE, or want to discuss the project, you can open new GitHub issues, or contact the maintainer.

We also have a very brief user manual, which contains building instruction, running example, etc.

crete-dev's People

Contributors

likebreath avatar moralismercatus avatar zhenkun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crete-dev's Issues

NodeDriver threads' contention over member node_ causes starvation

The dispatch-communication thread and the run-node thread compete for node_, as a shared resource. Either one of these can starve the other, if one repeatedly acquires node_ consecutively.

This is especially apparent when e.g., vm-node instances > 1. Because the run-node thread processes all VM instances before relinquishing node_, the issue is exacerbated given run-node's tendency to win the race for node_.

I implemented a fix on my fork: moralismercatus@e2774ac#diff-b5094f7d4fdca6fbd261c78e8b3d8c50

The idea is that when a transmission from dispatch is received, a flag is set. The run-node thread always checks if there's a pending transmission before acquiring node_. If true, then run-node thread waits until the transmission is complete before acquiring node_.

I don't consider it a perfect solution because a transmission could theoretically take considerable time to complete, and run-node is waiting idle during that period, but it works for my purposes, and is certainly an improvement over the current.

CRETE hangs indefinitely if QEMU terminates before first trace is dumped

Consider the following scenario:

  1. vm-node starts QEMU.
  2. vm-node transmits the seed to QEMU.
  3. QEMU terminates/crashes before a trace is dumped.
  4. vm-node, via fault tolerance, logs the crash, restarts QEMU and proceeds as normal.

At this point, there are no more test cases and no traces from which to generate new test cases. This is the logical point at which dispatch should recognize that there is nothing more to do and terminate gracefully; however, dispatch instead hangs at this point indefinitely.

The reason for this is that the guard DispatchFSM_::is_target_expired returns false if the first trace has not yet been received.

A simple fix may be to use the first test case, instead of the first trace, as the condition in which to start checking if the target is expired. If the first test case is not a consistent source of indicating that a VM instance has started - because the first test may originate from a seed - then a consistent indicator should be the reception of guest data by dispatch. We can be sure of this because a VM instance must have been started in order for vm-node to get the data from the guest OS.

A more thorough fix would be to re-evaluate how CRETE determines when testing has completed.

[Test-pool] Assertion failure when using crete_make_concolic() out of crete config file

When there is direct invoation to crete_make_concolic() within the binary under test, an assertion failure would fire to complain the parent test case can not be found for the new coming test cases.

The reason is that the initial test case extracted from the configuration file is incorrect when using crete_make_concolic() directly (not by specifying conoclic variables in the crete config file).

Assertion `!crete_tci_is_current_block_symbolic()' failed

  • Found by Chris and Raghu
  • Description:
    Along the tracing in qemu, a set of sanity checks are performed between the execution of each TB. A sanity check failed in the test with signmsg.
  • Cause:
    A PAGE fault exception caused by tracing on the translation context of QEMU-IR, which breaks the tracing flow and a sanity check property.
  • Fix:
    Avoid PAGE fault exception caused by tracing.

Incomplete reset to QemuFSM from vm-node

vm-node will reset the instance of QemuFSM if its flag::error is active. Relevant code:

if(vm->is_flag_active<flag::error>())
{
using node::vm::fsm::QemuFSM;
push(vm->error());
std::cerr << "pushing error!\n";
auto pwd = vm->pwd();
vm.reset(new QemuFSM{}); // TODO: may leak. Can I do vm = std::make_shared<QemuFSM>()?
)

This reset is incomplete, because it created a new instance of QemuFSM from scratch without keeping any useful information from the erred instance. For example, all tests and traces from the erred instance will be throwed away.

A complete reset should keep all meaningful information, such as test cases and traces, from the erred instance.

[vm-node] QEMU will fail if extra space in dispatch XML crete.vm.args

Consider dispatch.xml:

<crete>
  <vm>
    <args>-m 512  -nographic</args>
  </vm>
</crete>

It may be difficult to see, but there are two spaces between 512 and -nographic.

The parser in vm-node will interpret this as an additional argument because it uses a simple string split. QEMU will fail, as it is unable to recognize an empty string as a valid argument.

Solution:

Use the parsing algorithm used in svm-node. It specifically handles this problem. It is more robust generally. Regex would be the most complete solution.

Use binary serialization for communication to improve efficiency

Text serialization is being used in several places of crete-run, vm-node, etc. For the sake of efficiency, they should be replaced by binary serialization.

A driven example is transmitting the guest config file from crete-run to vm-node can take a long time, when the guest config file is non-trivial (such as when we give a concrete file of size 200K as the seed of concolic file, the serialized config file can be over 800K).

By design, "crete/asio" provides a set of read/write communication functions using different serialization format, and they should have the same usage convention. However, it seems "write_serialized_binary() / read_serialized_binary()" can not be directly used as an alternative of "write_serialized_text_xml() / read_serialized_text_xml()".

One concrete example is: when replacing the "_text_xml" pair with "_binary" pair in "RunnerFSM_::transmit_guest_data()" and QemuFSM_::receive_guest_info(), there will be an exception thrown while reading/de-serializing.

Exception info:

std::exception::what: std::exception
[crete::err::tag_msg*] = Dynamic exception type: std::length_error
std::exception::what: basic_string::resize

Support symbolic files with symbolic size

CRETE now supports only symbolic files with fixed size. This symbolic file support is limited, because file size commonly makes important effects on program's behavior. Support of Symbolic size is needed to enable CRETE explore program's branches that require different file size.

Interrupt breaks the crete_pre/crete_post order assertion

Background
While running crete-qemu either with GDB remote debugging, or in a pre-boot environment (e.g., BIOS), the sanity check assertion is raised:

https://github.com/moralismercatus/crete-dev/blob/master/front-end/qemu-2.3/runtime-dump/runtime-dump.cpp#L1856-L1860

Based on our analysis, the cause is essentially that CRETE expects TB execution to follow the the flow:

for(;;)
  sigsetjmp();
  ...
  crete_pre_cpu__tb_exec();
  cpu_tb_exec();
  crete_post_cpu_tb_exec();

Interrupts can occur within cpu_tb_exec() which call cpu_loop_exit(). This function performs a siglongjmp() to the sigsetjmp() at the top of the loop, thus breaking the pre/post flow.

Reproducing
To reproduce, simply run crete-qemu under gdb, and single step e.g.,:

# Start CRETE in a suspended state (-S) and with remote debugging enabled (-s):
crete-qemu-2.3-system-x86_64 -m 128 -s -S &
# Start gdb:
gdb
# Within gdb, attach to QEMU and begin single stepping:
target remote localhost:1234
si
si
# Assertion will appear here!

Specs
Commit used: 71ab5f8
Also confirmed on latest from SVL-PSU/master.

PS
The nature of single stepping is interrupt-driven, which is why this remote debugging will reproduce the problem.

Remove recommendation for avoiding -j when building CRETE

Previously, there was a dependency issue (or some other issue) with the CRETE build that disallowed the use of -jN. Since the recent changes to the CMake files, I have been able to use -jN without issue (greatly reducing build times).

I'd like further confirmation on this. Once decided that the issue is no longer present, we can remove the recommendation against -j in the documents.

[vm-node] vm-node crash caused by race condition

vm-node has two long running kernel threads: one for running FSM and one for communication with dispatch. The shared resource under the issue is "trace" folder, which is not protected by lock. Most of the time, their access to the shared resource are sequentialized, as a result of predefined workflow.

However, when the FSM-thread needs to reset the FSM (e.g. as a result of VM failure), it removes the shared resource ("trace" folder) without checking with the communication-thread. If the communication-thread is accessing the shared resource at the same time, such as transmitting the "trace" to dispatch, there will be an exception occur which will cause vm-node to crash.

What is the purpose of taint engine vreg blacklist?

Hi @likebreath ,

I'm curious what is the purpose of the Analyzer::guest_vcpu_regs_black_list_? Defined here https://github.com/SVL-PSU/crete-dev/blob/master/front-end/qemu-2.3/runtime-dump/tci_analyzer.cpp#L854

The inquiry stems from an observation that, with single-step enabled (where one TB represents a single guest instruction), conditional branch TBs (jb, ja, je, jne, etc.) were never marked as tainted.

Here's a concrete example of what I mean:

cmp edx, ebx ; Marked as tainted.
jae 0xdeadbeef ; Not marked as tainted.

As the jae uses flags based on the tainted cmp, logically jae should be tainted as well.

In root causing, the situation seems related to the fact that the various virtual CPU registers responsible for tracking flag status (e.g., CPUX86State::cc_src) are blacklisted, or removed from the taint equation.

PS.
Unsurprisingly, disabling the blacklist lead to an assertion: https://github.com/SVL-PSU/crete-dev/blob/master/front-end/qemu-2.3/runtime-dump/runtime-dump.cpp#L118

Thanks,

Potential Boost Version Confict

CRETE now uses boost 1.59.0.

Compilation of crete may fail on the system which has installed a different version of boost.

Guest data is unconditionally sent from crete-run to crete-vm-node

Background

As only the first time a VM is created is the guest data needed (it was assumed that it would be identical on all parallel VMs and reboots of VMs), a conditional flag was added to VMNode's FSM to only accept guest data for the first time.

Problem

crete-run unconditionally sends the guest data every time it starts, regardless of whether it's the first VM or not. This means that there's a break in the synchronization between crete-run and crete-vm-node when more than 1 VM is running (in theory, actual implementation may differ), or when a VM is restarted (caused by crash or other termination).

Solution

Since the feature was implemented as an optimization (to avoid redundantly transmitting guest data), maybe the simplest solution is to remove the feature. I don't believe it serves a necessary purpose.

Appending _p<#> to concolic name may clobber valid data

Here is the scenario:

I have a particular executable binary (may differ from typical data layout).

Two calls to crete_make_concolic are made (paraphrasing, not actual code):

crete_make_concolic( &buf1[0], sizeof( buf1 ), "name1" );
crete_make_concolic( &buf2[0], sizeof( buf2 ), "name2" );

Now, the way "name1" and "name2" are stored in memory is that they are adjacent with no padding between. E.g.,: "name1\0name2\0"

In the linked to code, https://github.com/SVL-PSU/crete-dev/blob/master/front-end/qemu-2.3/runtime-dump/custom-instructions.cpp#L124, crete_custom_instr_send_concolic_name appends a "_p<#>" to the end of each name. Assuming only one process is in play, it appends "_p1". After the first call to make_concolic, the data representing "name2" has been clobbered from "name2\0" to "p1\0e2\0".

Thus, the second call to make_concolic will claim the name given was "p1", rather than "name2", and the following assertion will be raised: https://github.com/SVL-PSU/crete-dev/blob/master/front-end/qemu-2.3/runtime-dump/custom-instructions.cpp#L119

Redirect guest output to host

Redirect output of crete-run and the program under test to the host.

Presumably, the best place for this information to be logged is under the dispatch directory.

This feature should also be optionally disabled via dispatch configuration, as it constitutes overhead.

Node is unnecessarily locked during transmission of trace to Dispatch

Look at https://github.com/SVL-PSU/crete-dev/blob/71ab5f8f3c6116e1024219d6b93190102c956584/lib/include/crete/cluster/node_driver.h#L302:L323

Notice that lock's mutex is released upon function scope exit. It is held during transmission of the trace. I don't believe this is necessary. Rather, I believe:

 auto lock = node.acquire();
 auto trace = lock->pop_trace();

Should be:

 auto trace = node.acquire()->pop_trace();

Thus node is immediately released upon completion of the statement.

Dispatch FSM Segfaults when no items listed for testing in distributed mode

When no element is listed, this transition row in DispatchFSM_ is faulty.

Note the Or_<is_first, ...> meaning that even if !have_next_target, the transition succeeds with subsequent code presuming that have_next_target.

One fix is to add have_next_target thusly: Or_<And_<is_first, have_next_target>, ...>

Not a complete solution because there is no transition representing the case Or_<And_<is_first, Not_<have_next_target>>, ...> which would likely need to transition to an error state.

Or, of course, another fix is to ensure that the presumption that the current transition makes always holds (that if is_first, then have_next_target is always true).

[vm-node] Deadlock happened while resetting

A deadlock of vm-node will be triggered when the reset from itself (as a result of QEMU crash) and the reset from dispatch (as a result of finishing the current target test) happen at the same time.

This deadlock will also hold dispatch, as dispatch is waiting for the communication with vm-node.

Travis-CI timeout for building CRETE (llvm-3.2) on Ubuntu 14.04

Building CRETE on ubuntu 14.04 requires sudo privilege to reverse the version of bison to make stp compile [1]. When sudo is required, Travis-CI will use an isolated VM image to perform the build[2]. In this setup, Travis-CI is very slow and compiling llvm-3.2 will exceed the timeout of 50 mins [3].

Potential solutions are:

  1. Use external llvm-3.2 #4;
  2. Remove sudo requirement for building crete on ubuntu 14.04;

Use full Boost libs for guest for simplicity

Problem Statement

Presently, only a subset of Boost is provided to the guest for building. When any guest lib/util uses a feature of Boost not in the subset, or the version of Boost is upgraded, one must recreate the subset using correct tooling that sometimes has issues. This process is inconvenient.

Suggested Resolution

The original purpose for doing this was to keep the size down to a minimum, but, once all the "doc" folders have been recursively deleted, and only the required libraries to build are listed in the CMake file, the size is quite manageable.

As an additional benefit, this process reuses the host copy of Boost.

See https://github.com/moralismercatus/crete-dev/tree/2d6a025229a61a6bb1c07da401d06c30fd67dcb4/front-end/guest/lib/boost for inspiration.

Dispatch timer runs while VM image is being copied

Dispatch timer started right after the first node being connected. While running at distrubuted mode, the time of copying images will also be counted as a part of the timer, while it should not.

#10 covered some relevent discussion.

Ensure vm-node to crete-run port file is not reused

Scenario:

  1. vm-node starts the VM image
  2. The VM silently fails
  3. vm-node writes the port to file that the VM is expected to use
  4. vm-node waits indefinitely for crete-run to connect (also a separate issue)
  5. vm-node is restarted manually
  6. crete-run reads the old port file when the VM starts before vm-node has written a new one

Solution:

A fix should be simple. Just ensure the port file is removed before the VM image is started. This could be done in VMNodeFSM's ctor (or equivalent). A more robust solution is to follow the core guidelines R.1.

The present implementation is lacking sufficient resource (i.e., the port file) management:

// ...write port_file...
server->open_connection_wait();
fs::remove(port_file_path);

Clearly, this is inadequate. If open_connection_wait() throws, or vm-node is terminated between these two invocations, fs::remove() is not invoked, thus causing the same problem.

waiting for port connection

The host OS displays "Awaiting connection on 'ubuntu' on port 10012" after the 3 files dispatch,vm-node and svm-node are run. The guest OS is also running crete-run command which displays "Waiting for port". How do I complete the port connection for the tests and traces to run?

Bottleneck of transmitting trace from vm-node to dispatch

From recent experiments with crete, I found the transmission of trace between vm-node to dispatch can be a bottleneck for the whole workflow of dispatch (crete-manager). Here are an example of dispatch's output that implies this bottleneck.

  time (s)|  tests left| traces left|  1-[vm] tc/tr| 2-[svm] tc/tr|
      1932|      0/1844|      1/1401|         0/442|           0/0|

From the output above, there are 442 traces from vm-node waiting to be transmitted to dispatch, while the backend is idle because of no available traces to replay.

From the current implementation of dispatch, there is at most one trace can be transmitted from vm-node to dispatch for one iteration of dispatch's FSM cycle.

Improve utilities command line interfaces

Utilities such as crete-dispatch, crete-vm-node, and crete-svm-node could all use some improvement in their command line interfaces.

All:

  1. Don't report exception is thrown when, for example, --help is used, or an incorrect option. Simply fixed by throwing a special purpose exception that is caught in a special purpose catch block e.g., something like crete::CmdLineArgException or crete::EarlyExitException

crete-dispatch:

  1. Include the ability to give a timeout as a cmdline argument e.g., --time-out/-t. A commonly requested feature.

crete-vm-node:

  1. -n erroneously states "number of svm instances". It should be "... vm instances"
  2. Provide complementary --vm option to crete-svm-node's --svm option that allows the user to specify the path of VM to use. I believe the reason this option was omitted originally is because the VM is architecture dependent; however, if that behavior is still desired at the cmdline, I don't see why it can't be done as --vm-x86, --vm-x64 options. Note that crete-svm-node uses platform agnostic --translator.
  3. Change option --port-master to simply --port. Originally done to distinguish from --guest-port before ports for guest were automatically selected.

Add option to use external llvm/clang and boost

llvm/clang and boost now are all integrated as a part of the compilation of crete. This is good in terms of automation for building process.

A related inconvenience imposed by this is deleting a build tree of crete needs to rebuild llvm/clang and boost, which is very time consuming. A potential enhancement for this is to add an option to use external llvm/clang and boost.

A useful reference can be found here.

Allow symbolic files of size 0

@likebreath @UnseeingEye

Currently, when a file of size 0 is listed as a symbolic input file, CRETE raises an assertion. Presumably, the reasoning was that the user likely made a mistake.

However, there is nothing in the file system against files of size 0, and it would make CRETE consistent with the FS.

In our particular case, this restriction caused problems with our infrastructure which was automatically generating files.

As a mitigation, I changed the assertion to treat symbolic files of size 0 to concrete:

https://github.com/moralismercatus/crete-dev/blob/exciting/lib/include/crete/harness_config.h#L617

VMNode deadloops when QEMU fails to start

Problem Statement
When QEMU fails to start, an exception is thrown (

BOOST_THROW_EXCEPTION(VMException{} << err::process_exited{"pid_"});
). At this point, VMNode's recovery mechanism will attempt a recovery and try again to the same effect, and on indefinitely.

It should be noted that this deadloop is not encountered when QEMU terminates after VMNodeFSM has consumed a test case, because eventually test cases will be exhausted. In this case, however, no test case is consumed yet.

We have encountered this in two scenarios.

  1. The GUI for QEMU is having technical difficulties (such as can occur with Xming+Putty).
  2. The QEMU image has somehow been corrupted.

Solution
One solution is to throw a special exception designating that it originated in starting the VM, and therefore recovery should not be attempted.

See moralismercatus@13c966a In essence, I added a new exception VMNoRecoveryException that is thrown from start_vm which transitions to the Terminate state instead of the Error state. In this way, VMNode will not attempt to reboot the VM. It's not a complete solution because, while the deadloop no longer occurs, for some reason, CRETE does not terminate. Another issue here is that errors don't get propagated back to Dispatch with the Terminate state. A more thorough fix is needed.

Make status report from crete-dispatch optional

The titular "status report" refers to the display of time, test cases, and traces, and vm-node/svm-node information.

When combining CRETE utilities with scripts or other utilities, continuous output can be problematic. In particular, the use of system("clear") seems to cause problems.

[vm-node] Dead lock caused by VMNode::reset() while executing QemuFSM_::connect_vm()

While QemuFSM_::connect_vm() is waiting for "server->open_connection_wait()", a reset on vm-node ( signal 'packet_type::cluster_reset' from dispatch) will cause deadlock.

The deadlock happened while executing "vms_.clear();" in VMNode::rest(). It tries to destroy all QemuFSM_ within the vm-node, which finally tries to destory the async_task that is executing QemuFSM_::connect_vm().

Miss handling for crete-run crash within guest

When crete-run is crashed within guest, vm/node and dispatch are not aware of it. vm-node and dispatch will do nothing but continue to wait for crete-run's finish signal (the "trace_ready" file .

[qemu/vm-node] QEMU does not terminate when parent vm-node terminates

Note: distributed mode specific.

When vm-node terminates for whatever reason (testing is done, ctrl+c, crash) its QEMU children go on living as daemons. In general, this behavior is undesirable, as future tests on the images corresponding to these daemons can cause image corruption.

For development purposes, such behavior may be desirable e.g., to observe output from crete-run; however, development mode should be of use in that case.

I don't believe an adequate solution is to rely on vm-node to kill its QEMU children. Only when vm-node exits gracefully do the proper destructors get called which can do the infanticide.

Potential solution:

A Linux specific solution is to call prctl(PR_SET_PDEATHSIG, SIGHUP); from within QEMU. The kernel will then notify QEMU when its parent has died.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.