sysprog21 / lkmpg Goto Github PK

View Code? Open in Web Editor NEW

7.5K 149.0 497.0 5.39 MB

The Linux Kernel Module Programming Guide (updated for 5.0+ kernels)

Home Page: https://sysprog21.github.io/lkmpg/

License: Open Software License 3.0

Makefile 0.74% TeX 94.37% Shell 4.89%

linux linux-kernel linux-kernel-module linux-kernel-driver kernel lkm device-driver lkmpg books documentation

lkmpg's Introduction

The Linux Kernel Module Programming Guide

This project keeps the Linux Kernel Module Programming Guide up to date, with working examples for recent 5.x and 6.x kernel versions. The guide has been around since 2001 and most copies of it on the web only describe old 2.6.x kernels.

The book can be freely accessed via https://sysprog21.github.io/lkmpg/ or latest PDF file. The original guide may be found at Linux Documentation Project. You may check other freely available programming books listed by The Free Ebook Foundation or Linux online books collected by The Online Books Page.

Getting Started

Summary

Get the latest source code from the GitHub page.
Install the prerequisites.
Generate PDF and/or HTML documents.

Step 1: Get the latest source code

Make sure you can run git with an Internet connection.

$ git clone https://github.com/sysprog21/lkmpg.git && cd lkmpg

Step 2: Install the prerequisites

To generate the book from source, TeXLive (MacTeX) is required.

For Ubuntu Linux, macOS, and other Unix-like systems, run the following command(s):

# Debian / Ubuntu
$ sudo apt install make texlive-full

# Arch / Manjaro
$ sudo pacman -S make texlive-binextra texlive-bin

# macOS
$ brew install mactex
$ sudo tlmgr update --self

Note that latexmk is required to generated PDF, and it probably has been installed on your OS already. If not, please follow the installation guide.

In macOS systems, package Pygments may not be pre-installed. If not, please refer to the installation guide before generate documents.

Alternatively, using Docker is recommended, as it guarantees the same dependencies with our GitHub Actions workflow. After install docker engine on your machine, pull the docker image twtug/lkmpg and run in isolated containers.

# pull docker image and run it as container
$ docker pull twtug/lkmpg
$ docker run --rm -it -v $(pwd):/workdir twtug/lkmpg

nerdctl is a Docker-compatible command line tool for containerd, and you can replace the above docker commands with nerdctl counterparts.

Step 3: Generate PDF and/or HTML documents

Now we could build document with following commands:

$ make all              # Generate PDF document
$ make html             # Convert TeX to HTML
$ make clean            # Delete generated files

License

The Linux Kernel Module Programming Guide is a free book; you may reproduce and/or modify it under the terms of the Open Software License. Use of this work is governed by a copyleft license that can be found in the LICENSE file.

The complementary sample code is licensed under GNU GPL version 2, as same as Linux kernel.

lkmpg's People

Contributors

Stargazers

Watchers

Forkers

chihyu1206 liaozihzrong djcp1942 reggiehsu111 louis208908 bochengc brchiu alan23273850 zha0 25077667 ekangmonyet asas1asas200 ryan-sie hankchang805 yingmuo kancheng chses9440611 ladonyude xuunnis123 jeffrey-minwei sagar19raorane 2011eric aquastripe et84121 gpwork4u lind026 fennecj waynelin1992 sisker1111 demonsome focaaby cklinuxproject carlogilmar gisairo dgotrik wsandin eticzon norman-at-kbd bwenstar crackercat mintonmu quinndiggity mbohun idkunal danselectrichand ccc-sp blue76815 shanliwa1 simpleplan2910 doytsujin syskey0 ardiangithub gfvre123 0xashiii rinhizakura marisaling ramonbrugman rendaardy dancek erblin-ibrahimi tkforks tangopapa abdulkadir-muhendis productinfo pseudobobsmith ssahgal lyrl rocker9527 nietzschex ds-lkua shaunstanislauslau kinale extrovert7986 zysyyz cesar237 henrybear327 vxteemo chinyikming snkr01g asahsieh cloud-lee cyrilbrulebois liurenjin sun363587351 gagachang jagatheesh31 fanfansfan chunhualiu zouxycn symfund binihao5bei brdbkr arvindk459895 wangpengcheng tuckerpo jereviikari plosys pardeshisourabh edits282 mu-l

lkmpg's Issues

Add yocto chapter or buildroot

Introducing how to build custom Linux through Yocto and Buildroot.

Messages on console of hello-5.ko examples

In the example of insmod or rmmod hello-5.ko, it will not display any messages on console like below. The pr_info messages will only display on console with commands like dmesg or journalctl.

Simply remove the messages might not give audience a good picture of this example. May need to add one more command like dmesg to show the parameters for clarity.

$ sudo insmod hello-5.ko mystring="bebop" myintArray=-1
myshort is a short integer: 1
myint is an integer: 420
mylong is a long integer: 9999
mystring is a string: bebop
myintArray[0] = -1
myintArray[1] = 420
got 1 arguments for myintArray.

$ sudo rmmod hello-5
Goodbye, world 5

$ sudo insmod hello-5.ko mystring="supercalifragilisticexpialidocious" myintArray=-1,-1
myshort is a short integer: 1
myint is an integer: 420
mylong is a long integer: 9999
mystring is a string: supercalifragilisticexpialidocious
myintArray[0] = -1
myintArray[1] = -1
got 2 arguments for myintArray.

$ sudo rmmod hello-5
Goodbye, world 5

$ sudo insmod hello-5.ko mylong=hello
hello-5.o: invalid argument syntax for mylong: 'h'

[14.1 Tasklets] init function may be interrupted by tasklet

In 14.1 Tasklets:

Here is an example tasklet module. The tasklet_fn function runs for a few seconds and in the mean time execution of the example_tasklet_init function continues to the exit point.

static void tasklet_fn(unsigned long data) 
{ 
    pr_info("Example tasklet starts\n"); 
    mdelay(5000); 
    pr_info("Example tasklet ends\n"); 
} 
 
static DECLARE_TASKLET_OLD(mytask, tasklet_fn); 
 
static int example_tasklet_init(void) 
{ 
    pr_info("tasklet example init\n"); 
    tasklet_schedule(&mytask); 
    mdelay(200); 
    pr_info("Example tasklet init continues...\n"); 
    return 0; 
}

So with this example loaded dmesg should show:

tasklet example init
Example tasklet starts
Example tasklet init continues...
Example tasklet ends

However, when example_tasklet_init is in mdelay(200), the thread may be interrupted by the tasklet_fn since it's a softirq, thus the output becomes:

qiuhao@pc:~/tmp/lkmpg/examples$ sudo insmod example_tasklet.ko
qiuhao@pc:~/tmp/lkmpg/examples$ sudo dmesg | tail -n 4
[1293636.040724] tasklet example init
[1293636.043172] Example tasklet starts
[1293641.003428] Example tasklet ends
[1293641.198868] Example tasklet init continues...

Maybe we should fix the description "in the mean time..."?

Design new book covers

It has been 16 years since last book cover for "The Linux Kernel Module Programming Guide". We would change the cover sooner or later.

Candidate cover:

[Question] Can forked file descriptions lead to race condition in chardev2.c?

In chardev2.c, Compare-And-Swap operation is used in device_open() to avoid concurrent access to the shared resources:

/* This is called whenever a process attempts to open the device file */ 
static int device_open(struct inode *inode, struct file *file) 
{ 
    pr_info("device_open(%p)\n", file); 
 
    /* We don't want to talk to two processes at the same time. */ 
    if (atomic_cmpxchg(&already_open, CDEV_NOT_USED, CDEV_EXCLUSIVE_OPEN)) 
        return -EBUSY; 
 
    try_module_get(THIS_MODULE); 
    return SUCCESS; 
}

But what if a process opens the device, then triggers a fork(2). Can these two processes have concurrent access to the device? (we called open(2) only once)

Thanks.

Correct the descriptions about `spin_lock_irqsave` and `spin_unlock_irqrestore`

__next__ at Reddit gave the feedback:

Interrupts can't happen during the lock as the main idea of using spin_lock_irq is to disable (on the logical core, let's use "local CPU" term for that) interrupts.

The whole idea of using spin_lock_irqsave instead of just spin_lock_irq is what is happening during the unlock. At the time spin_lock_irq() is used, interrupts could have been already disabled on the local CPU. And now, when we use spin_unlock_irq() it will re-enable interrupts again for the whole (local) CPU! Yes, it means that our code could have enabled interrupts while someone else (who disabled them before we took the lock) is not expecting that. Example to visualize:

Interrupts are disabled on a local CPU,

We use spin_lock_irq() -> interrupts disabled on a local CPU,

We use spin_unlock_irq() -> interrupts enabled on a local CPU

So in point 3. all interrupts were enabled (on the local CPU) and the one who disabled them in point 1. is not aware of that fact at all. We enabled the interrupts for him. _irqsave variants were introduced to solve this problem.

Interrupts are disabled on a local CPU,

We use spin_lock_irqsave(spinlock_t *lock, unsigned long flags) -> we know that interrupts were disabled at this point as we saved the current state of them in the flags parameter,

We use spin_unlock_irqrestore -> and at this point, we know based on the flags value, that the interrupts were disabled before _irqsave, so we just go back to the state before invoking spin_lock_irqsave instead of always re-enabling interrupts like in the case with spin_unlock_irq().

Text does not scale on firefox

Book text doesn't scale in Firefox 90.0.2, also chromium. It makes it pretty hard to read if you have bad eyesight.

`sys_open` is deprecated

Because of sys_open was replaced by sys_openatin use, there is no way to test it.
When switching to other syscall, print such as pr_info causing fatal page fault which crash the kernel.

Introduce User Mode Linux?

Is it beneficial to introduce UML as an alternative to running linux VM under linux?
PROS

Easy access to hostfs
Lightweight
Debuggable from host (attach to gdb)

CONS

Little documentation
I don't know enough of it ='(

Miss sample about inode_operations in `procfs3.c`

In sectionManage /proc file with standard filesystem, there is some description about inode_operations, but the sample procfs3.c don't contains code about it.

where to get the patch in syscall chapter?

You will need to get the complete sourcecode of this guide as a tarball in order to get the patch and the README.

where is the patch?

bug in print_string.c

my_tty = get_current_tty();
ttyops = my_tty->driver->ops;
if (my_tty) {

maybe it would be better to access my_tty fields after checking against null?

Reopen the issue #160

En..., I have to reopen #160 for the fact as follows:
The field f_op used by vfs_xx is a set of wrapper, e.g. proc_reg_file_ops, it wrapps the function ptr in struct file_operations or struct proc_operations passed through proc_create.

https://elixir.bootlin.com/linux/v5.15/source/fs/proc/inode.c#L644
https://elixir.bootlin.com/linux/v5.15/source/fs/proc/inode.c#L276

So the wrapper always be nonnull even the raw op function is null and null ptr doesnt be checked in the wrapper (I think this weakness shouldbe treated as kind of bug for Linux Kernel)

The crash log has verified it:

[419136.210798] /proc/hello created
[419136.210807] /sys/kernel/hi: Init.
[419136.211108] Flywheel ready
[419136.409492] Turn the crank
[419136.409531] Flywheel spins up
[419136.425516] SymmetricKey: password123
[419136.425519] PlainText: THIS IS A PLAIN TEXT
[419159.628887] BUG: kernel NULL pointer dereference, address: 0000000000000000
[419159.628892] #PF: supervisor instruction fetch in kernel mode
[419159.628893] #PF: error_code(0x0010) - not-present page
[419159.628894] PGD 0 P4D 0 
[419159.628897] Oops: 0010 [#1] PREEMPT SMP PTI
[419159.628900] CPU: 2 PID: 419062 Comm: tail Kdump: loaded Tainted: P           OE     5.16.0+ #3
[419159.628902] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B85M Pro3, BIOS C1.20 07/03/2014
[419159.628903] RIP: 0010:0x0
[419159.628908] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[419159.628909] RSP: 0018:ffffc90003a43ec0 EFLAGS: 00010246
[419159.628911] RAX: 0000000000000000 RBX: ffff88803d839cc0 RCX: 0000000000000001
[419159.628912] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88830a589a00
[419159.628913] RBP: ffffc90003a43ed8 R08: 0000000000000008 R09: 0000000000000000
[419159.628914] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
[419159.628915] R13: 0000000000000000 R14: ffffffffffffffea R15: ffff88830a589a00
[419159.628916] FS:  00007f965833e740(0000) GS:ffff88840f700000(0000) knlGS:0000000000000000
[419159.628918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[419159.628919] CR2: ffffffffffffffd6 CR3: 00000002e60d4005 CR4: 00000000001706e0
[419159.628921] Call Trace:
[419159.628923]  <TASK>
[419159.628924]  proc_reg_llseek+0x4e/0x80
[419159.628928]  ? __fdget_pos+0x17/0x50
[419159.628933]  ksys_lseek+0x84/0xc0
[419159.628937]  __x64_sys_lseek+0x18/0x20
[419159.628940]  do_syscall_64+0x5c/0xc0
[419159.628943]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[419159.628945] RIP: 0033:0x7f9658455acb
[419159.628947] Code: ff ff c3 0f 1f 40 00 48 8b 15 61 43 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 f3 0f 1e fa b8 08 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 31 43 10 00 f7 d8
[419159.628948] RSP: 002b:00007ffe702e7fd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000008
[419159.628950] RAX: ffffffffffffffda RBX: 0000563b374544a0 RCX: 00007f9658455acb
[419159.628952] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000003
[419159.628953] RBP: 00007ffe702e888b R08: 0000000000000001 R09: 0000563b37454440
[419159.628954] R10: 0000000000001000 R11: 0000000000000246 R12: 00007ffe702e888b
[419159.628955] R13: 0000000000000000 R14: 0000563b363123f9 R15: 0000000000000003
[419159.628957]  </TASK>

missing sudo before `rmmod`

Chapter 0.4.5, page 14

# sudo insmod hello-5.ko mystring="bebop" myintArray=-1
myshort is a short integer: 1
myint is an integer: 20
mylong is a long integer: 9999
mystring is a string: bebop
myintArray is -1 and 420

# rmmod hello-5
Goodbye, world 5

Should it be sudo rmmod hello-5 for consistency?

Need not install dvipng in Archlinux

It works fine to generate the pdf file. The package texlive-bin from extra also contains a dvipng binary.

Use GitHub Actions for site building and format conversion.

There are a bunch of static site generator that creates HTML pages from templates or components and the given content source. Besides, we can also automate, customize, and execute workflows right in the repository with GitHub Actions.

The OI Wiki Project is an example:

The contents are written in Markdown.
Then the build and build-pdf workflows are triggered whenever there is any push or pull request event.
- build workflow runs mkdocs for creating HTML pages from *.md files.
- build-pdf workflow coverted *.md files to *.tex files (by running this JavaScript script) and compiled them to output.pdf.

Besides, the site 《大家來學 LaTeX》 is another site created with Hugo and use utteranc as the comment system. It would be a possible solution to use pandoc for coverting *.tex to *.md.

A better online version is published

Nice work for all the writers!! Since I am going to write some Linux Kernel Module, I found this new book that explain the new infrastructure in modern Linux Kernel. But I found that the online version feels not good enough. So I spend a week to convert the latex content into a simple yet I think it is better mark language AsciiDoc. The project is based on Antor which is a new writing platform similar to Sphinx. Here is an article that explains why I chosen it.

The content is separated into multiple files which I think is easier to maintain. I hope the latex could also be separated into chapters if you like. But there is a problem that the Github Pages published under my personal domain, I don't known if you like to merge it, or create an organization that place this version under it.
Another problem is that current version of Antora could not generate PDF files, so the latex version could not be replaced. But the underhood AsciiDoc is targeted DocBook whichi could be converted to PDF. So I think this could be resolved later.

Several race conditions in chardev.c?

I believe there are several race conditions in chardev.c. The msg buffer is reused for several instances of the open file. If multiple processes will slowly read the same file this will cause memory corruption. Also, it seems like open_device_cnt should be atomic.

For simplicity, the best solution is probably to use a constant message. Also, I'm not an expert in Linux kernel modules. If there are in fact no bugs in the code, I think the text should explain in a little detail why this is the case.

Collect Linux kernel module programming resources

Since this book was not inactive for more than 13 years, Linux world changes a lot. We should learn from recent materials in order to revise the book.

Random sites:

Linux Device Drivers: Tutorial for Linux Driver Development
Kernel Module
Linux Driver Template: The driver uses following Linux facilities: module, platform driver, file operations (read/write, mmap, ioctl, blocking and nonblocking mode, polling), kfifo, completion, interrupt, tasklet, work, kthread, timer, simple misc device, multiple char devices, Device Model, configfs, UART 0x3f8, HW loopback, SW loopback, ftracer.
ldd3: Linux Device Drivers 3 examples updated to work with recent kernels
Advanced examples of Linux Device Drivers (LDD3) and detailed manual for running examples in QEMU
The Linux Kernel Programming Guide
Advanced Operating Systems and Virtualization
Linux Device Drivers Tutorial was inspired by Linux Kernel Module Programming Guide. It provides the examples about timer and HRT.
Linux Kernel Teaching
Linux Kernel and Driver Development Training
- source: bootlin/training-materials
Linux Device Drivers

Please reply if you found something useful for improving this book.

Reproducible TeX build environment using Docker

This is to centralize efforts of developing Github Actions and if needed a customized docker container for the purpose of building all assets (PDFs, HTMLs, etc).

PDF Building (#28)
Existing release fix (#29 ~~or softprops/action-gh-release#134~~)
HTML Building (#32 , ~~#20~~)

The `proc_ops` structure should be `file_operations`.

Chapter 0.6.1 introduce the proc_ops structure and it says that it is defined in /urs/include/linux/fs.h, but the fs.h never defines this structure since kernel 2.6.12rc2 according to this:
https://github.com/torvalds/linux/blame/master/include/linux/fs.h#L2022
According to the above, I think this structure should be file_operations.

I also find the proc_ops structure in this:
torvalds/linux@d56c0d4
It applied since kernel 5.6-rc1 to improve VFS performance.

Add website link to repository details.

Since that we can access the book via https://sysprog21.github.io/lkmpg/, it's nice to show that link in repository details:

Just clicking the gear button on the top-right corner and enter the URL in the field of "website".

CI: Use static analysis tools to verify the code

Generally speaking, C programming language is not meant to be safety oriented, and we usually rely on extra static analysis tools to verify the code. For Linux kernel, Sparse is integrated into build systems. In addition, Cppcheck is a neat tool to perform static analysis.

We can extend the existing CI pipeline to verify these examples before the built kernel modules are inserted into Linux.

Reference: Assignment 3 - Software RAID

Small indentation problem in 0.4.4

This is a code snippet from hello-4.c. I think the line number should be right aligned, and code should be left aligned, but I'm not familiar with TeX

Really love this project and look forward to a fix.

Avoid using `try_module_get(THIS_MODULE)`

There are various deprecated functions/macros used in LKMPG. For example, try_module_get(THIS_MODULE) is regarded as unsafe.

See https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put

Set up CI infrastructure to automate Linux kernel module validation

At present, CI only checks the coding style. It is far from complete validation. We shall set up the CI infrastructure which is able to perform:

Build all example Linux modules within Ubuntu Linux 20.04 (kernel 5.4+);
Attempt to do "insmod" and "rmmod";
(Optionally) check the kernel messages and validate;

See https://github.com/kernelci/kernelci-core as an example.

Follow Linux kernel coding style

We should follow Linux kernel coding style as possible. The major difference in LKMPG is to use 4 spaces for indention rather than tabs. This is because we would stick to compact layout for source listing. However, others rules should be enforced. e.g., CAPITALIZED macro names are appreciated but macros resembling functions may be named in lower case. There are some example files that violates the rule, such as examples/chardev2.c. We should replace the use of CamelCase with plain lowercase variables.

Mention PDF download information in generated HTML

At present, HTML is rendered via make4ht. However, it lacks the information about PDF downloading.

The expected output:

Change the HTML template and/or setting to add the PDF download hyperlink.
Mention PDF download only in generated HTML. The body of PDF document should not contain the download link though.

Improve readability

Hi, first of all, great write-up. I do want to mention however that the theme of the site could be improved for readability. With popularity of dark mode, choosing dark colors could be the obvious first step, however it's not best for everyone of course. I recommend opening the website in reading mode via firefox or other browser, they offer different colors and formatting to make it more accessible for everyone, including those with reading problems. I did a quick few changes via inspector to see how it would look otherwise. The font is not variable so I couldn't set the text to be semi-bold (to make it look less wiry in certain situations). Using a more readable font like manrope could be great for accessibility.

Here is what I got by using colors from my website as well as increasing font-size by 10% (to 1.1em) of parent as well as increasing header4 size by 20% (1.2em) as well as content width by 10rem: https://i.imgur.com/NBCK4XU.png

A change of font could improve this, here are some example using firefox reading mode:

Dark version: https://i.imgur.com/ZQMXLZX.png
Sepia: https://i.imgur.com/5VGR3tJ.png
Dark with more content width (still within decent eye travel): https://i.imgur.com/R4C0n9I.png

I would love to create a Pull Request to help, however I am not sure where is the source of css for this project, I am unfamiliar with the process used to build this page. I tried running a lighthouse Audit via web.dev/measure but it gives a network error on your page. I ran it via Edge devtools instead and while the score is overall pretty good, including for accessibility, it does fail somewhat on contrast ratios.

Themes are mostly subjective of course but it seems like we are moving into dark mode era, so perhaps it's something to consider, but at the very least using some kind of sepia colors or a theme like Solarized could vastly improve the experience for folk with very bright screens, people working at night, people with sensitive eyes, etc. Something in-between light and dark mode is always a good balance assuming good contrast ratios are maintained.

The page is almost missing meta description which in turn would give it a worse pageRank score and will fail to display embeds that can summarise the content of the page. This is mostly an SEO issue but worth noting.

Make use of GCC Static Analysis

Since gcc version 10, a new -fanalyzer option is implemented as a static analysis pass to identify various problems at compile-time, rather than at runtime.
See:

We might make use of GCC static analysis to detect the potential problems in LKM. A possible way is to add the following into Makefile:

KBUILD_CFLAGS += \
    -Wall -Wundef \
    -Werror=strict-prototypes -Werror=implicit-function-declaration \
    -Werror=implicit-int \
    -fanalyzer -std=gnu89

The package xparse should be include in tex file

Macros defined with \NewDocumentCommand are robust and the "LaTeX 3" team marked it as "stable". The command \samplec is defined with it in lib/codeblock.tex and reported as "error" by pdflatex engine because we use \NewDocumentCommand without \usepackage{xparse}.

This may cause things to work incorrectly when compiling documents on several platform.

Update the timer API

According to this commit, the timer API has been changed.
You can see the current version(5.14) at here.
lwn article: Improving the kernel timers API.

Refine doc & code

Move the confusing note about struct proc_ops replacing with struct file_operations on linux >=5.6:

lkmpg/lkmpg.tex

Lines 957 to 990 in 637e707

 Since Linux v5.6, the \cpp|proc_ops| structure was introduced to replace the use of the \cpp|file_operations| structure when registering proc handlers. 

 \subsection{The file structure} 

 \label{sec:file_struct} 

 Each device is represented in the kernel by a file structure, which is defined in \src{include/linux/fs.h}. 

 Be aware that a file is a kernel level structure and never appears in a user space program. 

 It is not the same thing as a \cpp|FILE|, which is defined by glibc and would never appear in a kernel space function. 

 Also, its name is a bit misleading; it represents an abstract open `file', not a file on a disk, which is represented by a structure named \cpp|inode|. 

 An instance of struct file is commonly named \cpp|filp|. 

 You'll also see it referred to as a struct file object. 

 Resist the temptation. 

 Go ahead and look at the definition of file. 

 Most of the entries you see, like struct dentry are not used by device drivers, and you can ignore them. 

 This is because drivers do not fill file directly; they only use structures contained in file which are created elsewhere. 

 \subsection{Registering A Device} 

 \label{sec:register_device} 

 As discussed earlier, char devices are accessed through device files, usually located in \verb|/dev|. 

 This is by convention. When writing a driver, it is OK to put the device file in your current directory. 

 Just make sure you place it in \verb|/dev| for a production driver. 

 The major number tells you which driver handles which device file. 

 The minor number is used only by the driver itself to differentiate which device it is operating on, just in case the driver handles more than one device. 

 Adding a driver to your system means registering it with the kernel. 

 This is synonymous with assigning it a major number during the module's initialization. 

 You do this by using the \cpp|register_chrdev| function, defined by \src{include/linux/fs.h}. 

 \begin{code} 

 int register_chrdev(unsigned int major, const char *name, struct file_operations *fops); 

 \end{code}

The struct proc_ops would replace the struct file-operations only when create proc releated defined on <linux/proc_fs.h>
In other word, regist device doesn't be affected, still using struct file_operations. It's confusing write this note above the regist device code (FOE ME, I HAVE SPENT PLENTY OF TIME ON SEARCHING HOW TO REGIST DEVICE USING STRUCT PROC_OPS)
I highly recommend to move this note down to next code example which replacing takes place truely!

lkmpg/examples/chardev.c

Lines 25 to 39 in 637e707

 #define BUF_LEN 80 /* Max length of the message from the device */ 

 /* Global variables are declared as static, so are global within the file. */ 

 static int major; /* major number assigned to our device driver */ 

 enum { 

 CDEV_NOT_USED = 0, 

 CDEV_EXCLUSIVE_OPEN = 1, 

 }; 

 /* Is device open? Used to prevent multiple access to device */ 

 static atomic_t already_open = ATOMIC_INIT(CDEV_NOT_USED); 

 static char msg[BUF_LEN]; /* The msg the device will give when asked */

The msg char array should be declared with length of BUF_LEN + 1 since BUF_LEN is max length of message.

lkmpg/examples/chardev.c

Line 152 in 637e707

return -EINVAL;

On device_write, It's prefered to use EOPNOTSUPP instead of EINVAL since the former raises unsupported error, the latter raises invalid argument.

Improve read/write fucntion code among fsproc{1-3}.c:

lkmpg/examples/procfs3.c

Lines 23 to 40 in 637e707

 static ssize_t procfs_read(struct file *filp, char __user *buffer, 

 size_t length, loff_t *offset) 

 { 

 static int finished = 0; 

 if (finished) { 

 pr_debug("procfs_read: END\n"); 

 finished = 0; 

 return 0; 

 } 

 finished = 1; 

 if (copy_to_user(buffer, procfs_buffer, procfs_buffer_size)) 

 return -EFAULT; 

 pr_debug("procfs_read: read %lu bytes\n", procfs_buffer_size); 

 return procfs_buffer_size; 

 }

lkmpg/examples/procfs3.c

Lines 41 to 53 in 637e707

 static ssize_t procfs_write(struct file *file, const char __user *buffer, 

 size_t len, loff_t *off) 

 { 

 if (len > PROCFS_MAX_SIZE) 

 procfs_buffer_size = PROCFS_MAX_SIZE; 

 else 

 procfs_buffer_size = len; 

 if (copy_from_user(procfs_buffer, buffer, procfs_buffer_size)) 

 return -EFAULT; 

 pr_debug("procfs_write: write %lu bytes\n", procfs_buffer_size); 

 return procfs_buffer_size; 

 }

These code are a little wired and far from decent, the parameter loff_t * is ignored incorrectly and so the behavior is quite different from normal read/write.I've reimplemented read/write code, here is full code:

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/uaccess.h>
#include <linux/version.h>
// #include <linux/slab.h>  // kmalloc etc.

/* Header */
#define do_sslice(news, s, n) for(int i=0;i<n;++i) { news[i]=s[i]; };news[n]='\0';

#define proc_name "hello"
#define PROC_BUF_CAP (size_t)1024

static ssize_t proc_read(struct file *filp, char __user *buffer, size_t size,
                         loff_t *offset);
static ssize_t proc_write(struct file *filp, const char __user *buffer,
                          size_t size, loff_t *offset);
static int proc_open(struct inode *, struct file*);
static int proc_close(struct inode *, struct file*);


/* Source */

#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 6, 0)
/// @brief Since Linux 5.6, or else use file_operations on <linux/fs.h> instead
/// for older version
static struct proc_ops proc_ops = {
    .proc_read = proc_read,
    .proc_write = proc_write,
    .proc_open = proc_open,
    .proc_release = proc_close,
};
#else
#include <linux/fs.h>
static struct file_operations proc_ops = {
    .read = proc_read,
    .write = proc_write,
};
#endif

static struct proc_dir_entry *proc_file;
static char proc_buf[PROC_BUF_CAP + 1];
static char helper_buf[PROC_BUF_CAP + 1];


static ssize_t proc_read(struct file *filp, char __user *buffer, size_t size,
                         loff_t *offset) {

    static char *s = "ALOHO!";
    size_t slen = strlen(s) + 1;

    size_t actual_read;

    if (*offset >= slen) {
        pr_info("copy to user finished.");
        return 0;
    }

    if (copy_to_user(buffer, s+*offset,
                     (actual_read = min(size, (size_t)(slen - *offset))))) {
        pr_err("copy to user failed!");
        return -EFAULT;
    }

    pr_info("read /proc/%s\n", filp->f_path.dentry->d_name.name);
    *offset += actual_read;

    return actual_read;
}


static ssize_t proc_write(struct file *filp, const char __user *buffer,
                          size_t size, loff_t *offset) {
    size_t actual_write = min(size, PROC_BUF_CAP-(size_t)*offset);
    static char slice[PROC_BUF_CAP];

    if (filp->f_flags & O_APPEND) {
        return -EOPNOTSUPP;
    }

    if (*offset >= PROC_BUF_CAP) {
        pr_info("write to end.");
        return 0;
    }

    if (copy_from_user(proc_buf + *offset, buffer, actual_write)) {
        pr_err("copy from user failed!");
        return -EFAULT;
    }

    *offset += actual_write;

    do_sslice(helper_buf, proc_buf, *offset);
    pr_info("write to %s\n", helper_buf);

    return actual_write;
}


static int proc_open(struct inode *inode, struct file* filp) {
    try_module_get(THIS_MODULE);
    return 0;
}


static int proc_close(struct inode *inode, struct file* filp) {
    module_put(THIS_MODULE);
    return 0;
}


static int __init proc_init(void) {
    if (!(proc_file = proc_create(proc_name, 0644, NULL, &proc_ops))) {
        proc_remove(proc_file);
        pr_alert("Error: couldn't init /proc/%s\n", proc_name);
        return -ENOMEM;
    }

    pr_info("/proc/%s\n", proc_name);
    return 0;
}

static void __exit proc_exit(void) {
    proc_remove(proc_file);
    pr_info("/proc/%s removed\n", proc_name);
}

module_init(proc_init);
module_exit(proc_exit);

MODULE_LICENSE("GPL");

It need more modern c standard config than C90, for example ccflags-y += -std=gnu17.

MODULE_SUPPORTED_DEVICE removed

In section 0.4.4 Licensing and Module Documentation the code of hello-4.c uses

MODULE_LICENSE("GPL"); 
MODULE_AUTHOR("LKMPG"); 
MODULE_DESCRIPTION("A sample driver"); 
MODULE_SUPPORTED_DEVICE("testdevice");

where MODULE_SUPPORTED_DEVICE is remove from upstream since March 17, 2021.

[11.1 Sleep] `tail` crash the kernel for lack of `proc_lseek` implementaion

This section demonstrates the example using tail.

lkmpg/lkmpg.tex

Lines 1587 to 1608 in 637e707

 \begin{verbatim} 

 $ sudo insmod sleep.ko 

 $ cat_nonblock /proc/sleep 

 Last input: 

 $ tail -f /proc/sleep & 

 Last input: 

 Last input: 

 Last input: 

 Last input: 

 Last input: 

 Last input: 

 Last input: 

 tail: /proc/sleep: file truncated 

 [1] 6540 

 $ cat_nonblock /proc/sleep 

 Open would block 

 $ kill %1 

 [1]+ Terminated tail -f /proc/sleep 

 $ cat_nonblock /proc/sleep 

 Last input: 

 $ 

 \end{verbatim}

The tail calls proc_lseek to set offset, however, our example doesnt supply an nounull function pointer for it.

lkmpg/examples/sleep.c

Lines 166 to 180 in 637e707

 #ifdef HAVE_PROC_OPS 

 static const struct proc_ops file_ops_4_our_proc_file = { 

 .proc_read = module_output, /* "read" from the file */ 

 .proc_write = module_input, /* "write" to the file */ 

 .proc_open = module_open, /* called when the /proc file is opened */ 

 .proc_release = module_close, /* called when it's closed */ 

 }; 

 #else 

 static const struct file_operations file_ops_4_our_proc_file = { 

 .read = module_output, 

 .write = module_input, 

 .open = module_open, 

 .release = module_close, 

 }; 

 #endif

It just causes the kernel crashed AGAIN!

To fix it, an unimplement handle is enough:

static loff_t proc_lseek(struct file *filp, loff_t offset, int whence) {
    return -EOPNOTSUPP;
}

Update Document References

While looking at the examples and attempting to lookup the reference docs that are mentioned, I have been finding it difficult to track down these docs. A particular example is "linux/drivers/char/Makefile" referenced in section 0.4.2. I am unable to find any reference to this location. Is this an anachronism?

In general the reference style is provides a relative link, but does not give a point of reference from the filesystem. This could easily be fixed via actual HTML links or by including instructions to also install the relevant kernel doc packages in the setup guide early in the doc along with a sentence providing a point of reference.

Thanks for the resource.

Update Acknowledgements

The current "Acknowledgements" section is not up to date. We can generate the new list of contributions by looking up git log. The command git shortlog examples lkmpg.tex works out of the box.

However, we might need extra table to convert GitHub account into the names in Pinyin.

Linux

Show GitHub button in rendered HTML

At present, the readers can only reach GitHub hyperlink at the end of browsing the rendered HTML. It would be great if we can add the GitHub button in the top right side of HTML, so that someone can check this repository and/or promote via further actions.

Incoherent example code in chapter 9

It's about the example codes chardev2.c, chardev.h and ioctl.c in the chapter 9. ioctl.c is not connected to chardev2.c, chardev.h, but the context and the comments in the header file said that chardev.h should be included by both chardev2.c and ioctl.c.

In the lkmpg 2003 ver2.4.0, the ioctl.c is a program use ioctl to control the device file /dev/char_dev. Not sure ioctl.c is a new added code or referenced to a different code here.

Modular LaTeX Documents.

As the work grows, the *.tex file can become unwieldy and confusing, especially we're writing a full-length book. It's the good practice to split the file into several files.

LaTeX makes this very easy with the commands \input{filename} and \include{filename}. A template for the modular latex document can be set up as follows structure:

.
├── back
│   ├── appendix-*.tex
│   ├── references.bib
│   └── ...
├── contents
│   ├── chapter-*.tex
│   └── ...
├── figures
│   ├── chapter-*
│   │   ├── *.[png/jpg/pdf]
│   │   └── ...
│   └── ...
├── main.tex
├── README.md
├── [style.tex]
├── [template.cls]
├── ...
├── LICENSE
├── Makefile
└── README.md

References

Follow the steps of Linux Kernel Module Cheat

Linux Kernel Module Cheat is an ideal environment for experimenting Linux kernel modules by means of system emulations such as QEMU. We should follow the steps for better developer experience (DX).

Evaluate smatch for static analysis

smatch was built on sparse. We might introduce smatch for static analysis.
See https://lwn.net/Articles/696624/

Device-tree support

Would it be possible to get a section that talks about device-trees and how they interact with kernel modules? That would be a great addition for the folks that work on embedded systems.

[Question] Why unsigned long argument may be an obstacle to the control-flow integrity?

In 13.2 Flashing keyboard LEDs, it mentioned "Furthermore, the function prototype with unsigned long argument may be an obstacle to the control-flow integrity."

struct timer_list { 
    unsigned long expires; 
    void (*function)(unsigned long);  // <---
    unsigned long data; 
    u32 flags; 
    /* ... */ 
};

Could you make it clear why this argument prevents the control-flow integrity check? As far as I know, hardware-assisted CFI like intel's CET uses shadow stack for ROP and IBT for JOP/COP, and there is nothing about parameters.

Build under raspberry pi

It will happen to unknown sequence, macro...

when building under raspberry pi.

Any suggestion?

Adopt some examples from Linux Device Drivers Tutorial

Linux Device Drivers Tutorial was inspired by Linux Kernel Module Programming Guide. It provides the examples about timer and HRT.

We might adapt the existing examples to improve the usage of Linux timer subsystem.

Fix the warnings raised by Sparse

Sparse is a semantic checker for C programs; it can be used to find a number of potential problems with kernel code. There are various warnings raised by Sparse.

Steps:

Make sure sparse installed in advance. See https://www.kernel.org/doc/html/latest/dev-tools/sparse.html
Run the command:

make -C /lib/modules/`uname -r`/build M=`pwd` modules C=1

Warnings: (partial)

examples/stop.c:8:21: warning: non-ANSI function declaration of function 'cleanup_module'
examples/chardev.c:126:9: warning: incorrect type in argument 1 (different address spaces)
examples/chardev.c:126:9:    expected void const volatile [noderef] __user *ptr
examples/chardev.c:126:9:    got char *
examples/chardev.c:126:9: warning: incorrect type in assignment (different address spaces)
examples/chardev.c:126:9:    expected void [noderef] __user *__ptr_pu
examples/chardev.c:126:9:    got char *
examples/chardev.c:37:13: warning: incorrect type in initializer (incompatible argument 2 (different address spaces))
examples/chardev.c:37:13:    expected long ( *read )( ... )
examples/chardev.c:37:13:    got long ( * )( ... )
examples/chardev.c:38:14: warning: incorrect type in initializer (incompatible argument 2 (different address spaces))
examples/chardev.c:38:14:    expected long ( *write )( ... )
examples/chardev.c:38:14:    got long ( * )( ... )

Merge pull request

[Question] What is `insmod ’ing`?

I raise a question when I read section 5.2, What does insmod ’ing mean. I tried to search it in Google, but I didn't find anything:(. I'm not sure if this is a slip of a pen.

5.2 Functions available to modules
Kernel modules are different here, too. In the hello world example, you might have noticed that we used a function, pr_info() but did not include a standard I/O library. That is because modules are object files whose symbols get resolved upon insmod ’ing. The definition for the symbols comes from the kernel itself; the only external functions you can use are the ones provided by the kernel. If you’re curious about what symbols have been exported by your kernel, take a look at /proc/kallsyms.