Start here when things go wrong on your Linux system

0
25

If you’ve been using an operating system for a long time, you’ve probably encountered some strange phenomena. When it comes to computers, strange is usually not welcome. The longer you run a particular OS install without a reinstall, the more likely you’ll see at least a few quirks. This could be anything from programs that freeze, to your cooling fan suddenly spinning, to all sorts of quirks.

For the commercial desktop operating systems with huge installation bases, it is easy to find support in the form of official manufacturer (OEM) or OS developer troubleshooting and documentation pages. However, for Linux, such resources are not always available. Even if they are, they don’t always provide consistent guidelines from distribution to distribution and cannot guarantee that they will take into account the user’s specific hardware.

In this piece, I’ll offer a few routes you can take to spot suspicious behavior on your Linux system. This order of diagnosis is neither definitive nor rigid. I don’t claim to know everything you need to do to find out what went wrong on your Linux system, and even if I did, it would make for an epic poem of an article.

It is very possible that not every procedure will apply to the problem in question. My goal, however, is to bring up quite a few tests that should at least give you a starting point. Conveniently, these (with one exception) serve you well on desktop or server Linux, as they use command-line tools.

What follows will proceed in order from high to low abstraction layer, namely from the application level to the OS level. Without further ado, let’s dig.

Browser off Task? Open Task Manager

Browsers have become so robust and so central to the desktop computing experience that they now have their own OS-style process manager. These tools allow users to see which open web connections are using system resources and how much.

If your web browser is the main program that runs on your computer when resource spikes or slowdowns occur, the process manager is an invaluable resource. It gives a clearer picture than your operating system’s process manager because the browser’s process manager knows which of its constituent processes are driving which web pages.

Each browser has its own way of getting to its task manager. In Firefox and Chrome, you can access the Task Manager from their respective menus at the top right. Chromium and close derivatives (like Chrome) also provide the ability to press the Shift + Escape keys to access the tool. Once you’ve opened the Task Manager, you can sort processes by CPU or memory usage to determine what’s taxing either one. Finally, you can disable a browser process that tries to hang onto your computer’s hardware.

Take it from the top

If your browser isn’t the star of the show, you probably want to see all the processes your system is juggling. The best way to do that is to open your terminal and use the top command. Essentially, it’s a task manager for Unix-like systems (like Linux). It allows you to view the CPU usage, memory usage and much more for each running process. As you would expect, you can also sort by these stats. Any out of control processes can be killed from above.

ADVERTISEMENT

But if you think top is your average task manager, think again. You can sort by any available metric, including run time and “niceness” (basically process priority). Oh yes, there is process priority. You can also choose to display processes as a tree, to indicate which processes spawned others. Best of all, you can search for any text string, a feature missing from many competing operating system task managers.

Overview of open files

If you suspect that the problem is not CPU or memory consumption, but unusual disk I/O, remove lsof. It’s a tool that I both love and that I don’t use anywhere near enough. This CLI command lists all files that are currently open. In other words, it lets users view all files that are read or written to.

The lsof command has powerful options too numerous to cover in detail, for limiting the types of files to run. One of my favorites is the “-u” flag for filtering or excluding files by the user who has access to them. If you have a set of obscure processes (perhaps from above), you can use the “-p” flag to look up just those processes (by PID) to see the files it’s working on.

My favorite way to make short work of lsof’s output is to pipe it into grep and see what I can find. This way I can search for any pattern present, be it a user, path, or anything else I can think of.

Doesn’t matter if I decode

Looking for an overview of all the hardware on your system? Look no further than dmidecode.

Running dmidecode in the shell with superuser privileges will print a summary of your system hardware, listing the make, model, and modes of the hardware your operating system resides on. This is especially useful if you’re into a more do-it-yourself flavor of Linux, or if you’re trying to make unusual hardware functional.

For example, if you need to install a non-default kernel module, running dmidecode will inform you which device the system detects, and thus which module to add.

Linux is not a destination, it is a process

When things really start to get hairy, you can start digging into the lower workings of your system.

The first on our deep dive is the /proc directory. Unlike typical directories that persist between startups with static content, /proc is dynamically populated during startup with information read from the kernel and hardware, continuously updated during operation, and disappears on shutdown. Since everything here is treated as a file, users only need to read the files to see what has been written to them.

I’d certainly be better acquainted with what’s here, but poking around yielded some interesting finds. For example, you can see the mounting options for all your physical drives. You can also get counts for failed kernel operations such as hangs and panics. You can even view all the hardware drivers loaded at startup.

To give a more concrete example: I saw myself dumping /proc/scsi/device_info to check why an inserted SCSI interface was not detected. You may have to get a little creative with /proc, but it won’t disappoint if you do.

get ‘dmesg’

Speaking of the kernel, you can find out exactly what it’s up to by running dmesg with superuser authority. This will output the kernel log to your console in chronological order from boot. If the kernel ever tried to work with some hardware and came up short, it will write down its rejected progress here.

While you probably won’t have to resort to dmesg very often, it’s a command every Linux user should know purely for its speed with which you can figure out hardware problems. It’s the command forum residents expect from you, so they can get what they need to know in which direction to point you.

Linux is packed with all kinds of great system diagnostic tools, but if something goes wrong on your system, you probably won’t with the above.