Troubleshooting Linux networking problems
It's easy to determine that a problem you're encountering is a network problem -- if your computer can't communicate with other computers, something is wrong on the network. But, it may be harder to find the source of the problem. You need to begin by analyzing the chain of elements involved in network communication.
If your host needs to communicate with another host in the network, the following conditions need to be met:
- The network card is installed and available in the operating system, i.e., the correct driver is loaded.
- The network card has an IP address assigned to it.
- The computer can communicate with other hosts in the same network.
- The computer can communicate with other hosts in other networks.
- The computer can communicate with other hosts using their host names
Troubleshooting network driver issues
To communicate with other computers on the network, your computer needs a network interface. The method your computer uses to obtain such a network interface is well designed. During the system boot, the kernel probes the different interfaces that are available and typically on the PCI bus, finds a network card. Next, it determines which driver is needed to address the network card and if the driver is available, it will address the network card. Following that, the udev daemon (
udevd) is started in the initial boot phase of your computer and it creates the network device for you. In a simple computer with one network interface only, this will typically be the eth0 device but as you will read later, other interfaces can also be used. Once the interface has been loaded, the next stage can be passed in which the network card gets an IP address.
As was just discussed, there are some items involved to load the driver for the network card correctly.
- The kernel probes the PCI bus.
- Based on the information it finds on the PCI bus, a driver is loaded.
- Udev creates the network interface which you need to actually use the network interface.
To fix network card problems, begin by determining if the network card was really found on the PCI-bus. To do that, use the
lspci command. Here is an example output of lspci:
JBO:~ # lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 01)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 01)
00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 08)
00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:07.7 System peripheral: VMware Inc Virtual Machine Communication Interface (rev 10)
00:0f.0 VGA compatible controller: VMware Inc Abstract SVGA II Adapter
00:10.0 SCSI storage controller: LSI Logic /
Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 01)
02:00.0 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB
02:01.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 10)
02:02.0 Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev 02)
02:03.0 USB Controller: VMware Inc Abstract USB2 EHCI Controller
Here, at PCI address 02:01.0 an Ethernet network card is found. The network card is an AMD 79c970 and (between square brackets) the PCnet32 kernel module is needed to address this network card.
The next step is to check the hardware configuration as reflected in the /sys tree. Every PCI device has it's configuration stored in there, and for the network card in this example, it is stored in the directory /sys/bus/pci/devices/0000:02:01.0, which reflects the address of the device on the PCI bus. Here is an example of the contents of this directory:
JBO:/sys/bus/pci/devices/0000:02:01.0 # ls -l
-rw-r--r-- 1 root root 4096 Oct 18 07:08 broken_parity_status
-r--r--r-- 1 root root 4096 Oct 17 07:50 class
-rw-r--r-- 1 root root 256 Oct 17 07:50 config
-r--r--r-- 1 root root 4096 Oct 17 07:50 device
lrwxrwxrwx 1 root root 0 Oct 17 07:51 driver -> ../../../../bus/pci/drivers/pcnet32
-rw------- 1 root root 4096 Oct 18 07:08 enable
lrwxrwxrwx 1 root root 0 Oct 18 07:08 firmware_node ->
-r--r--r-- 1 root root 4096 Oct 17 07:50 irq
-r--r--r-- 1 root root 4096 Oct 18 07:08 local_cpulist
-r--r--r-- 1 root root 4096 Oct 18 07:08 local_cpus
-r--r--r-- 1 root root 4096 Oct 17 07:53 modalias
-rw-r--r-- 1 root root 4096 Oct 18 07:08 msi_bus
drwxr-xr-x 3 root root 0 Oct 17 07:50 net
-r--r--r-- 1 root root 4096 Oct 18 07:08 numa_node
drwxr-xr-x 2 root root 0 Oct 18 07:08 power
-r--r--r-- 1 root root 4096 Oct 17 07:50 resource
-rw------- 1 root root 128 Oct 18 07:08 resource0
-r-------- 1 root root 65536 Oct 18 07:08 rom
lrwxrwxrwx 1 root root 0 Oct 17 07:50 subsystem -> ../../../../bus/pci
-r--r--r-- 1 root root 4096 Oct 17 07:51 subsystem_device
-r--r--r-- 1 root root 4096 Oct 17 07:51 subsystem_vendor
-rw-r--r-- 1 root root 4096 Oct 17 07:51 uevent
-r--r--r-- 1 root root 4096 Oct 17 07:50 vendor
The most interesting item for troubleshooting is the symbolic link to the driver directory. In this example it points to the pcnet32 driver and using the information that lspci provided, we know this is the correct driver.
In most cases, the driver that Linux installs will work fine. In some cases it doesn't. When configuring a Dell server with a Broadcom network card, I have seen severe problems, where a ping command that used a jumbo frame packet was capable of causing kernel panic. One of the first things to suspect in that case, is the same kernel driver for the network card. A nice troubleshooting approach is to start by finding out which version of the driver you are using. You can accomplish this by using the
modinfo command on the driver itself. Here is an example of modinfo on the pcnet32 driver:
JBO:/ # modinfo pcnet32
description: Driver for PCnet32 and PCnetPCI based ethercards
author: Thomas Bogendoerfer
vermagic: 22.214.171.124-5-pae SMP mod_unload modversions 586
parm: debug:pcnet32 debug level (int)
parm: max_interrupt_work:pcnet32 maximum events handled per interrupt (int)
parm: rx_copybreak:pcnet32 copy breakpoint for copy-only-tiny-frames (int)
parm: tx_start_pt:pcnet32 transmit start point (0-3) (int)
parm: pcnet32vlb:pcnet32 Vesa local bus (VLB) support (0/1) (int)
parm: options:pcnet32 initial option setting(s) (0-15) (array of int)
parm: full_duplex:pcnet32 full duplex setting(s) (1) (array of int)
parm: homepna:pcnet32 mode for 79C978 cards
(1 for HomePNA, 0 for Ethernet, default Ethernet (array of int)
The modinfo command will give you different useful information for each module. If a version number is included, check for available updated versions and download and install them.
When working with some hardware, you should also check what kind of module is used. If the module is open source, in general it's fine as open source modules are thoroughly checked by the Linux community. If the module is proprietary, there may be incompatibilities between the kernel and the particular module. If this is the case, your kernel is flagged as "tainted." A tainted kernel is a kernel that has some modules loaded that are not controlled by the Linux kernel community. To find out if this is the case on your system, you can check the contents of the /proc/sys/kernel/tainted file. If this file has a 0 as its contents, no proprietary modules are loaded. If it has a 1, proprietary modules are loaded and you may be able to fix the situation if you replace the proprietary module with an open source module.
The information in this article should help you in fixing driver related issues. In the next article in this series, you'll learn how to troubleshoot problems that are related to the IP address configuration on your server.
ABOUT THE AUTHOR: Sander van Vugt is an author and independent technical trainer, specializing in Linux since 1994. Vugt is also a technical consultant for high-availability (HA) clustering and performance optimization, as well as an expert on SLED 10 administration.
30 Nov 2009
Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.