shuumatsu / codeql-uboot Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://lab.github.com/githubtraining/codeql-u-boot-challenge-(cc++)
License: MIT License
Home Page: https://lab.github.com/githubtraining/codeql-u-boot-challenge-(cc++)
License: MIT License
We created this course to help you quickly learn CodeQL, our query language and engine for code analysis. The goal is to find several remote code execution (RCE) vulnerabilities in the open-source software known as U-Boot, using CodeQL and its libraries for analyzing C/C++ code. To find the real vulnerabilities, you'll need to write a sequence of queries, making them more precise at each step of the course.
The goal is to find a set of 9 remote-code-execution vulnerabilities in the U-Boot boot loader. These vulnerabilities were originally discovered by GitHub Security Lab researchers and have since been fixed. An attacker with positioning on the local network, or control of a malicious NFS server, could potentially achieve remote code execution on the U-Boot powered device. This was possible because the code read data from the network (that could be attacker-controlled) and passed it to the length parameter of a call to the memcpy
function. When such a length parameter is not properly validated before use, it may lead to exploitable memory corruption vulnerabilities.
U-Boot contains hundreds of calls to both memcpy
and libc functions that read data from the network. You can often recognize network data being acted upon through use of the ntohs
(network to host short) and ntohl
(network to host long) functions or macros. These swap the byte ordering for integer values that are received in network ordering to the host's native byte ordering (which is architecture dependent).
In this course, you will use CodeQL to find such calls. Many of those calls may actually be safe, so throughout this course you will refine your query to reduce the number of false positives, and finally track down the unsafe calls to memcpy
that are influenced by remote input.
Upon completion of the course, you will have created a CodeQL query that is able to find variants of this common vulnerability pattern.
Bookmark these useful documentation links:
If you get stuck during this course and need some help, the best place to ask for help is on the GitHub Security Lab Slack. Request an invitation from the Security Lab Get Involved page and ask in the channel #codeql-writing
. There are also sample solutions in the course repository, but please try to solve the tasks on your own first!
Hope this is exciting! Please close this issue now, then wait for the next set of instructions to appear in a comment below.
We want to identify integer values that are supplied from network data. A good way to spot those is to look for use of network ordering conversion macros such as ntohl
, ntohll
, and ntohs
.
In the from
section of the query, you declare some variables, and state the types of those variables. The type tells us what the possible values are for the variable.
In the previous query you were querying for values in the class Function
to find functions in the source code. We have to query a different type to find macros in the source code instead. Can you guess its name?
NOTE: These Network ordering conversion utilities can be macros or functions depending on the platform. In this course, we are looking at a Linux database, where they are macros.
In step 4, you wrote a query that finds the definitions of functions named memcpy
in the codebase. Now, we want to find all the calls to memcpy
in the codebase.
One way to do this is to declare two variables: one to represent functions, and one to represent function calls. Then you will have to create a relationship between these variables in the where
section, so that they are restricted to only functions that are named memcpy
, and calls to exactly those functions.
In step 5, you wrote a query that finds the definitions of macros named ntohs
, ntohl
and ntohll
in the codebase. Now, we want to find all the invocations of these macros in the codebase.
This will be similar to what you did in step 6, where you created variables for functions and function calls, and restricted them to look for a particular function and its calls.
Note: A macro invocation is a place in the source code that calls a particular macro. This is comparable to how a function call is a place in the source code that calls a particular function.
You will now run a simple CodeQL query, to understand its basic concepts and get familiar with your IDE.
Edit the file 3_function_definitions.ql
with the following contents:
import cpp
from Function f
where f.getName() = "strlen"
select f, "a function named strlen"
Don't copy / paste this code, but instead type it slowly. You will see the CodeQL auto-complete suggestions in your IDE as you type.
from
and the first letters of Function
, the IDE will propose a list of available classes from the CodeQL library for C/C++. This is a good way to discover what classes are available to represent standard patterns in the source code.where f.
the IDE will propose a list of available predicates that you can call on the variable f
.getName()
to narrow down the list.Run this query: Right-click on the query editor, then click CodeQL: Run Query.
Inspect the results appearing in the results panel. Click on the result hyperlinks to navigate to the corresponding locations in the U-Boot code. Do you understand what this query does? You probably guessed it! This query finds all functions with the name strlen
.
Now it's time to submit your query. You will have 2 choices to do that, and we'll explain both of them in the comments below. Once you have chosen your method, submit your answer!
Read carefully: you will need to follow the same steps to submit your answers to later steps. You can always come back to this issue later to check the submission instructions.
Great! You made it to the final step!
In step 9 we found expressions in the source code that are likely to have integers supplied from remote input, because they are being processed with invocations of ntoh
, ntohll
, or ntohs
. These can be considered sources of remote input.
In step 6 we found calls to memcpy
. These calls can be unsafe when their length arguments are controlled by a remote user. Their length arguments can be considered sinks: they should not receive user-controlled values without further validation.
Combining these pieces of information,
we know that code is vulnerable if tainted data flows from a network integer source to a sink in the length argument of a memcpy
call.
However, how do we know whether data from a particular source might reach a particular sink? This is known as data flow or taint tracking analysis. Given the number of results (hundreds of memcpy
calls and a large number of macro invocations), it would be quite a lot of work to triage all these cases manually.
To make our triaging job easier, we will have CodeQL do this analysis for us.
You will now write a query to track the flow of tainted data from network-controlled integers to the memcpy
length argument. As a result you will find 9 real vulnerabilities!
To achieve this, we’ll use the CodeQL taint tracking library.This library allows you to describe sources and sinks, and its predicate hasFlowPath
holds true when tainted data from a given source flows to a sink.
In the previous step, you found invocations of the macros we are interested in. Modify your query to find the top-level expressions these macro invocations expand to.
Note: An expression is a source code element that can have a value at runtime. Invoking a macro can bring various source code elements into scope, including expressions.
Now let's analyze what you have written. A CodeQL query has the following basic structure:
import /* ... path to some CodeQL libraries ... */
from /* ... variable declarations ... */
where /* ... logical formulas that say something about the variables ... */
select /* ... expressions to output ... */
The from
/where
/select
part is the query clause: it describes what we are trying to find in the source code.
Let's look closer at the query we wrote in the previous step.
import cpp
from Function f
where f.getName() = "strlen"
select f, "a function named strlen"
At the top of the query is import cpp
. This is an import statement . It brings into scope the standard CodeQL library that models C/C++ code, allowing us to use its features in our query. We'll use this library in every query, and in later steps we'll also use some more specialized libraries.
In the from
section, there is a declaration Function f
. Here we declare a variable named f
which has the type Function
. Function
is a class declared in the standard library (you can jump to the definition using F12
). A class represents a collection of values, in this case the collection of all C/C++ functions in the source code.
Now look at the expression f.getName()
in the where
section. Here we call the predicate getName
on the variable f
of type Function
. Predicates are the building blocks of a query: they express logical properties that we want to hold. Some predicates return results (like getName
) , and some predicates do not (they just assert that a property must be true).
So far your query finds all functions with the name strlen
. It does this by asserting that the result of f.getName()
is equal to the string "strlen"
.
In this step we will learn how to write our own CodeQL classes. This will help us make the logic of our query more readable, easier to reuse, and easier to refine.
We'd like to find the same results as in the previous step, i.e. the top level expressions that correspond to the ntohl
, ntohs
and ntohll
macro invocations. It would be useful if we could refer to all such expressions directly, just like we can use MacroInvocation
from the standard library to refer to all macro invocations.
We will define a class to describe exactly this set of expressions, and use it in the last step of this course.
The Expr
class is the set of all expressions, and we are interested in a more specific set of expressions, so the class we write will be a subclass of Expr
.
We will use the CodeQL extension for Visual Studio Code. You will take advantage of IDE features like auto-complete, contextual help and jump-to-definition.
Don't worry, you'll do this setup only once, and you'll be able to use it for future CodeQL development.
Follow the instructions below.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.