Netconsole is a powerful Linux kernel debugging tool. The dmesg output from a machine under test is transferred over an ethernet link (via UDP packets) to another machine. That means that you can see the debugging messages from the test machine on the screen of another machine. Netconsole isn’t good for debugging early kernel panics, but it is very useful if your new kernel driver hangs your system.
I used it to debug an oops in the xHCI driver that was caused by a NULL pointer access in a kernel linked list — I should have used list_empty(). It took four hours to get netconsole working, even with three people who were clueful about Linux. (A big thank you goes out to Jamey Sharp and Josh Triplett for their help with this.)
At the time, there was no good tutorial that talked about all the basics and gotchas, so I decided to create one. This tutorial walks you through configuring both machines to be on the same network subnet, configuring the target machine to listen to UDP packets from the source, and configuring the source to send the kernel debugging messages over UDP.
UPDATE: My latest scripts for setting up Netconsole are here.
First, you need to have some tools installed. You’ll need netcat, ping, and (optionally) wireshark. You’ll also need to have netconsole compiled as a module on the source box. Netconsole has to be a module so you can load it after you get the system set up.
Make sure the ethernet driver for both machines supports netpoll. I had an out-of-tree ethernet driver for my eeepc 1000, and it took us a good hour to figure out why we couldn’t see the UDP packets from the test box. Also make sure that networkmanager isn’t running on either system. Networkmanager detects the ethernet link between the two computers and then tries to do DHCP. This is not what you want, so make sure to kill networkmanager on both boxes.
In this section, I’ll refer to the computer under test that is generating the dmesg output as the “source” machine. The computer that receives the debugging messages is called the “target” machine.
1. Configure the source machine to answer to the IP address of 10.0.0.2:
`# ip addr add 10.0.0.2/8 dev eth0`
The ethernet device that follows the `dev` label may be different on your system. Use `/sbin/ifconfig` to figure out what ethernet devices are available
on your system.
The /8 is a bit of magic to me. The ip manpage says “The ADDRESS may be followed by a slash and a decimal number which encodes the network prefix length.” Basically, I think that creates a rule for how many computers can be on this subnet (10.0.0.x). If you don’t include the /8, the second machine won’t be able to get on the network.
2. Configure the target machine to answer to the IP address of 10.0.0.1:
`# ip addr add 10.0.0.1/8 dev eth0`
3. Verify that the two computers can talk to each other with ping. On the source, type:
`$ ping 10.0.0.1`
You should see no dropped packets. Double check that the target works too:
`$ ping 10.0.0.2`
If you have issues with either step, something is wrong with the network configuration. Wireshark is a helpful tool to debug this. Wireshark can show you all the packets flowing across the network (since it puts the NIC into promiscuous mode).
4. Use netcat to tell the target machine to listen on port 6666:
`$ nc -u -l -p 6666`
6666 is the default port that netconsole will send UDP packets to. You might want to redirect this output into a file, and run `tail -f ` in another window. If you redirect the output, you won’t lose data when your screen history buffer fills.
5. Start netconsole on the source machine:
`# modprobe netconsole=@/eth0,@10.0.0.1/`
The netconsole module takes an argument of the form
Here we’re telling netconsole to send messages out the eth0 device, to the IP address 10.0.0.1.
At this point, you should be able to see output from netcat. If you don’t, use wireshark to debug the system.