Giter Club home page Giter Club logo

linux-kernel-enriched-corpus's Introduction

Table of Contents

1. Linux Kernel Enriched Corpus for Fuzzers

Documentation for using and generating the Enriched corpus provided here.

For more questions, feel free to email Palash Oswal or Rohan Padhye.

1.1. Using Enriched corpus with Syzkaller

The latest copy of the Corpus file corpus.db is available in the releases for this repository. The file is updated daily.

Download it to syzkaller workdir and start syzkaller.

mkdir workdir
cd workdir
wget https://github.com/cmu-pasta/linux-kernel-enriched-corpus/releases/download/latest/corpus.db

1.2. Using Enriched corpus with HEALER

The corpus programs are stored in files directory and can directly be imported to HEALER.

Clone the repository and copy over files/* to output/corpus/ directory in HEALER. From within HEALER working directory, run the following commands.

mkdir -p output/corpus
cp -vr <path/to/files/> output/corpus/

1.3. Citing

Please use the following BibTeX to cite enriched corpus.

@phdthesis{
author={Oswal,Palash B.},
year={2023},
title={Improving Linux Kernel Fuzzing},
journal={ProQuest Dissertations and Theses},
pages={43},
isbn={9798379515645},
language={English},
url={https://www.proquest.com/dissertations-theses/improving-linux-kernel-fuzzing/docview/2812311865/se-2},
} 

1.4. DIY

1.4.1. Fetching Corpus Manually

collect.py : currently fetches syz reproducers from all fixed Linux Kernel upstream crashes in syzbot.

This script can be modified to fetch corpus programs from other kernel versions and to fetch "C" Programs instead of syz reproducers.

1.4.2. Generating corpus.db File

If you have a collection of syz programs that need to be converted to a syzkaller comptaible corpus.db file, you can use syz-db.go pack from syzkaller.

An implementation of this is available in the GitHub actions workflow here.

1.5. Corpus Files Available

Up to date Reproducers

  1. corpus.db : Enriched Corpus (version 0 for syz-db)
  2. ci-qemu-upstream-corpus.db : Corpus Obtained from Syz-CI (Google's syzbot) (version latest per syz-db)
  3. enriched-ci-qemu-upstream-corpus.db : Enriched Version of the Corpus Obtained from Syzbot (version 0 for syz-db) A detailed comparison of the three is provided in the research document. More documentation to follow.

1.6. Results

Experiments performed by fuzzing 1 instance using 2VCPUs and 4GB RAM for 24 hours. Corpus comparison experiments performed with 8 such VMs.

System Used : ThinkMate, Intel® Xeon® Gold 6226R.

Kernel Versions Tested: Linux v6.0.8 and v6.1.20

1.6.1. Coverage over time

1 VM (2vCPU and 4G RAM) for 24 hours.
image
8 VM (2vCPU and 4G RAM) for 24 hours.
image

1.6.2. Unique Crashes over time

1 VM (2vCPU and 4G RAM) for 24 hours.
image
8 VM (2vCPU and 4G RAM) for 24 hours.
image

1.6.3. Total Crashes over time

1 VM (2vCPU and 4G RAM) for 24 hours.
image
8 VM (2vCPU and 4G RAM) for 24 hours.
image

1.6.4. CVEs:

1.6.5. New Bugs Reported:

1.6.6. More bugs discovered (includes bugs that were found sooner than syzbot & bugs undiscovered by syzbot)

Title Found in #Instance Date of Discovery Branch (if found by syzbot) New/Earlier
UBSAN: shift-out-of-bounds in ntfs_fill_super 10 2/28/23 6.2.0 Yes
UBSAN: shift-out-of-bounds in nilfs_load_super_block 10 10/25/22 net-6.1-rc3-1 Yes
UBSAN: shift-out-of-bounds in dbAllocAG 10 9/28/22 6.0.0-rc7 Yes
KASAN: use-after-free Read in si470x_int_in_callback 10 N/A regression Yes
KASAN: use-after-free Read in run_unpack 10 N/A new Yes
KASAN: use-after-free Read in ntfs_trim_fs 10 N/A new Yes
KASAN: slab-out-of-bounds Read in hdr_find_e 10 N/A new Yes
KASAN: out-of-bounds Read in leaf_paste_entries 10 N/A regression Yes
KASAN: null-ptr-deref Write in f2fs_stop_discard_thread 10 N/A new Yes
KASAN: slab-out-of-bounds Read in ntfs_attr_find 8 N/A new/regression Yes
KASAN: use-after-free Read in em28xx_init_extension 6 3/30/22 5.17.0-syzkaller- Yes
KASAN: use-after-free Read in do_garbage_collect 6 11/13/22 6.1.0-rc4-syzkaller Yes
KASAN: slab-out-of-bounds Read in do_garbage_collect 6 11/13/22 6.1.0-rc4-syzkaller Yes
KASAN: use-after-free Read in cfusbl_device_notify 5 11/12/22 5.18.0-rc1-syzkaller Yes
KASAN: use-after-free Read in notifier_call_chain 4 11/18/22 5.18.0-rc3- Yes
KASAN: use-after-free Write in nr_release 3 N/A regression Yes
KASAN: use-after-free Read in task_work_run 2 N/A new Yes
KASAN: use-after-free Read in inode_cgwb_move_to_attached 2 N/A new Yes
KASAN: use-after-free Read in __fib6_clean_all 2 N/A new Yes
KASAN: use-after-free Read in tcp_retransmit_timer 1 N/A regression Yes
KASAN: use-after-free Read in nexthop_flush_dev 1 N/A new Yes
KASAN: use-after-free Read in lock_sock_nested 1 9/1/20 5.9.0-rc3 Yes
KASAN: slab-out-of-bounds Read in mi_enum_attr 1 N/A new Yes

linux-kernel-enriched-corpus's People

Contributors

github-actions[bot] avatar oswalpalash avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

linux-kernel-enriched-corpus's Issues

Large repo size due to binary files change history

Thank you for your great work with this project, it is a good tool that is useful to improve syzcaller-based bug detection!

However, there is a problem with downloading the repo originally: as of now, the history of auto-pushed binary corpora is quite large. The corpora themselves are not huge, but being binary and being auto-pushed to repo frequently by robot, it leads to naively-cloned repo being quite large (13 GB, about 2 times larger then linux kernel repo). This leads not only to large hdd used space (which could be limited on a syzcaller-based-fuzzer machine) but to long cloning time. As the robot keeps pushing binary corpora, the problem might get even worse in the future.

My solution was to use git clone --filter=blob:none option while clonning, which leads to much more manageable 581 MB repo size.

I would suggest adding blobless-clone recommendation to the DIY section in the README file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.