A surprisingly useful tool: lsof

I'm used to looking for problems in code or in the system, using logs or monitoring indicators, which are displayed on pretty control panels with a simple and intuitive interface. However, if for some reason the data does not arrive to the control panel, or the logs of some service are unavailable, debugging becomes more complicated. Now there are few such problems, they are rare, but they do happen. Therefore, in our time is very valuable knowledge of tools that help to understand what is wrong with a certain process on any computer.

When I debug something for which there are no logs or monitoring indicators, I connect to a remote computer via ssh. Of course, this approach is limited, it is not so simple, it does not correspond to the fashionable trends of DevOps, or all those modern features that can be read on the Internet, but it is surprisingly well suited to me in order to quickly analyze the situation.

This, in fact, is similar to using the command print when debugging programs. Here I immediately want to clarify that I'm not a SRE and not an engineer in the field of IT. The main area of ​​my activity is development.

Sometimes I have to deploy the code I wrote and debug it when something goes wrong. Almost always, when I find myself in a new system for myself, the most difficult thing for me is searching for something . For example, find out which port is listening on the process. Or, what is required more often, find out in which file a certain demon writes logs. And even if I manage to find the answers to these questions, using a bunch of calls to the commandsps, pstree andls, and a great many calls to the commandgrep, often the "answers" that I find either do not contain anything useful or are incorrect.

If what you are reading right now would be a speech by Raymond Goettinger, the lead developer of CPython, there would be a moment when the audience is waiting for the phrase: "There must be a better way."

And, in fact, there is such a method. The tool I constantly use to search the system for what I need is a great tool called lsof.

The utility lsof (its name sounds like el-soff, although some like something more like liss-off or even el-es-o-eff) is an incredibly useful command that displays a list of all open files (LiSts all Open Files).

The cmsof command is especially good for finding something, since in Unix-like systems everything is a file. This is a surprisingly versatile debugging tool, which is quite easy to replace utilitiesps, netstat, and some others too.
Options lsof
The veteran SRE, who dealt with this case decades before the term "SRE" appeared, once told me: "I stopped studying lsof options as soon as I found out all the ones I needed. Learn the most important thing, and this will be everything that you will ever need. "

The utility has a wide range of options.

lsof - list open files
lsof [-? abChKlnNOPRtUvVX] [-AA] [-cc] [+ cc] [+ | -dd] [+ | -DD] [+ | -es] [+ | -f [cfgGn]] [-F [f ]] [-g [s]] [-i [i]] [-k k] [+ | -L [l]] [+ | -mm] [+ | -M] [-o [o] -ps] [+ | -r [t [m <fmt>]]] [-s [p: s]] [-S [t]] [-T [t]] [-us] [+ | -w ] [-x [fl]] [-z [z]] [-Z [Z]] [-] [names]
If you want to learn them all - man to help you. Here I would like to talk about those that I usually use.
▍ Option -u
The -u option displays a list of files opened by a specific user. The following example shows how you can find out how many files a usercindy keeps open.

[email protected]:~$ lsof -u cindy | wc -l
Usually, if you put a "^" (a cover) before the parameter of some option, which means a negative, this results in excluding files corresponding to this parameter from the output of the program. Here, for example, how can you find out the number of files on the computer that are open by all users except for cindy.

[email protected]:~$ lsof -u^cindy | wc -l
▍ Option -U
The -U option allows you to display all Unix domain sockets files.

[email protected]:~$ lsof -U | head -5
init 1 root 7u unix 0xffff88086a171f80 0t0 24598 @ / com / ubuntu / upstart
init 1 root 9u unix 0xffff88046a22b480 0t0 22701 socket
init 1 root 10u unix 0xffff88086a351180 0t0 39003 @ / com / ubuntu / upstart
init 1 root 11u unix 0xffff880469006580 0t0 16510 @ / com / ubuntu / upstart
▍ Option -c
The -c option allows you to display information about files that are kept open by processes executing commands whose names begin with the specified characters. For example, here's what command will let you see the first 15 files opened by all Python processes running on the computer.

[email protected]:~$ lsof -cpython | head -15
python2.7 16905 root cwd DIR 9,1 4096 271589387 / home / cindy / sourcebox
python2.7 16905 root rtd DIR 9,1 4096 2048 /
python2.7 16905 root txt REG 9,1 3345416 268757001 /usr/bin/python2.7
python2.7 16905 root mem REG 9,1 11152 1610852447 /usr/lib/python2.7/lib-dynload/
python2.7 16905 root mem REG 9,1 101240 1610899495 /lib/x86_64-linux-gnu/
python2.7 16905 root mem REG 9,1 22952 1610899509 /lib/x86_64-linux-gnu/
python2.7 16905 root mem REG 9,1 47712 1610899515 /lib/x86_64-linux-gnu/
python2.7 16905 root mem REG 9,1 33448 1610852462 /usr/lib/python2.7/lib-dynload/
python2.7 16905 root mem REG 9,1 54064 1610852477 /usr/lib/python2.7/lib-dynload/
python2.7 16905 root mem REG 9,1 18936 1610619044 /lib/x86_64-linux-gnu/
python2.7 16905 root mem REG 9,1 30944 1207967802 /usr/lib/x86_64-linux-gnu/
python2.7 16905 root mem REG 9,1 136232 1610852472 /usr/lib/python2.7/lib-dynload/
python2.7 16905 root mem REG 9,1 77752 1610852454 /usr/lib/python2.7/lib-dynload/
python2.7 16905 root mem REG 9,1 387256 1610620979 /lib/x86_64-linux-gnu/
Here is another interesting example. For example, there are a number of Python 2.7 and Python 3.6 processes, and you need to find out which files are open by processes that are not Python 2.7 processes. You can do it like this:

[email protected]:~$ lsof -cpython -c^python2.7 | head -10
python 20017 root cwd DIR 9,1 4096 2048 /
python 20017 root rtd DIR 9,1 4096 2048 /
python 20017 root txt REG 9,1 3345416 268757001 /usr/bin/python2.7
python 20017 root mem REG 9,1 11152 1610852447 /usr/lib/python2.7/lib-dynload/
python 20017 root mem REG 9,1 6256 805552236 /usr/lib/python2.7/dist-packages/
python 20017 root mem REG 9,1 14768 805552237 /usr/lib/python2.7/dist-packages/
python 20017 root mem REG 9,1 10592 805451779 /usr/lib/python2.7/dist-packages/Crypto/Util/
python 20017 root mem REG 9,1 11176 1744859170 /usr/lib/python2.7/dist-packages/Crypto/Cipher/
python 20017 root mem REG 9,1 23560 1744859162 /usr/lib/python2.7/dist-packages/Crypto/Cipher/
▍ Option + d
The + d option allows you to find out which folders and files are open in a certain directory (but not in its subdirectories).

[email protected]:~$ lsof +d /usr/bin | head -4
circusd 1351 root txt REG 9,1 3345416 268757001 /usr/bin/python2.7
docker 1363 root txt REG 9,1 19605520 270753792 / usr / bin / docker
runsvdir 1597 root txt REG 9,1 17144 272310314 / usr / bin / runsvdir
▍ Option -d
Perhaps, the -d option is one of those that I use most often. It concedes only the option-p. This option allows you to specify a list of file descriptors, separated by commas, which must be included in the output or excluded from it. Here is what the documentation says about it:

The list is excluded from the output if all entries in the set begin with the "^" character. The list will be included in the output if no entry starts with "^". Mixing of records of different types is not allowed.

The list can contain a range of file descriptor numbers, provided that none of its members is empty, both members are numbers, and the terminating term is greater than the initial one-that is, "0-7" or "3-10".

Ranges can be used to exclude records from output if they have the prefix "^" before them, that is - "^ 0-7" excludes all descriptors from 0 to 7.

The output by several numbers of file descriptors is combined according to the rules of logical OR, into one set, before the logical AND operation is applied to them.

When there are both include and excluded members in the set, lsof reports an error and exits with a non-zero return code. <Tgsrcd>
▍ Option -p
I can not remember when I would not use the-p option when working with lsof. It allows to display all files opened by the process with the specified when the PID command is invoked.

For example, here's how Ubuntu looks for information about all the files opened by the process, say, with PID 1.

Output of the lsof command that was called with the -p option in Ubuntu

That's what's on my MacBook Air.

Output of the lsof command that was called with the -p option on the MacBook Air
▍ Option -P
The-P option suppresses, for network files, the conversion of port numbers to port names. It is useful to use in cases where the resolution of port names does not work correctly.

This option can be used with another option - n, which suppresses the conversion of network numbers to hostnames for network files. It is, moreover, useful in the case of incorrectly working resolution of host names.

Suppressing both of the above conversions sometimes can speed up the work ofsof.
▍ Option -i
The -i option allows you to display information about files whose Internet addresses correspond to the specified address. If you do not specify addresses when calling the command, this option allows you to display information about all Internet sockets and network files.

Using lsof you can, for example, look at the TCP connections opened by the Slack or Dropbox client. For the sake of interest, try to see how many connections open the Chrome tabs, each of which is a separate process. Let's look at the connections opened by Slack:

<code lang="cpp">lsof -i -a -u $USER | grep Slack
Displays information about the connections that Slack opened

But what with the help of lsof you can learn about the TCP-sockets opened by the Dropbox client:

Displays the connection information that Dropbox opened

Lsof allows you to view and information about UDP connections using the command lsof -iUDP.

Displays information about UDP connections

Using the command lsof -i 6, you can list the open IPv6 connections.

Displays information about IPv6 connections
▍ Option -t
The-t option suppresses the output of all information except for process IDs. I often use it if I want to redirect the PID list to some other team, basically -kill-9.

[email protected]:~$ lsof -t /var/log/dummy_svc.log
Combining options
Normally, so combines the results of using several options, following the logical OR principle. If you specify the -a option, the results will be combined according to the rules of the logical AND.

Of course, there are a few exceptions to this rule, here, as usual, it is recommended to look at the documentation, but if in a nutshell, it works like this:

Usually the list options are combined according to the logical OR principle, that is, if you specify the -i option without specifying an address and the -u foo option, a list of all network files or files belonging to processes owned by them is the user "foo". There are a few exceptions to this rule:

User name or user ID (UID) with a "^" sign (negation), specified with the -u option;

The process identifier (PID) with a "^" sign (negation), specified with the -p option;

Process group (PGID) with a "^" sign (negation), specified with the -g option;

The name of the command with a "^" sign (negation), specified with the -c option;

The names of the TCP or UDP protocol states specified with the -s [p: s] option.

Since all these are commands to exclude output from the output, they are applied without using the logical OR or AND principles, they affect the output of the command before applying any other selection criteria.

The -a option can be used to process output by the logical AND principle. For example, if you use the -a, -U and -u foo options, only the list of UNIX socket files that belong to processes owned by the user "foo" will be listed. <Tgsrcd >
The story of a big victory
Perhaps, I'm exaggerating a bit here, the "victory" was not that big, but when that happened, lsof was very helpful.

A couple of weeks ago I needed to pick up a copy of the new service in the test environment. The test service in question was not connected to the working monitoring infrastructure. I tried to find out why the process that was just launched did not register itself with Consul, as a result, it could not be detected by other services. "So, I do not know what's the matter, but I'll look at the logs," I thought. If something does not work as expected, I look at the logs of the service I'm trying to fix, and in most cases the logs immediately indicate the root of the problem.

The service in question was started using the process manager and sockets circus . Logs for processes running undercircus are stored in a special place on the host - let's call it / var / log / circusd. Newer services on the host were launched by another manager, s6 , which writes logs to another location. Then, there are also logs that generatesocklog / svlogd, which, again, are somewhere else. In short, there was no shortage in the logs, and the main problem was to find out in which file descriptor my logging failed process was writing.

Since I knew that the process, with the problems I was trying to figure out, is working undercircus, connecting with the command toail to <var / log / circusd / whatever_tab_completion_suggested
would allow me to look at the threadsstdout andstderr of this process. True, viewing the log did not give me absolutely nothing. It quickly became clear that I was reading the wrong log file, and indeed, on closer examination it turned out that there were two files in / var / log / circusd: stage-svcname-stderr.log and staging -svcname.stderr.log . I then used the Tab key to autocomplete the command, and the file that was selected automatically was not what I needed.

One way to understand which file was actually used by the process I was interested in for logging was to use the lsof -l filename command, which would display information about all processes that have open file descriptors. It turned out that none of the working processes were associated with a log file, which I viewed using the command tail, which meant that this file could be safely deleted.

Viewing another file immediately allowed to find out why the process was failing (with thiscircus restarting it after a failure, which led to an endless cycle of failure-restart).
The more I use the command soso, the more other tools it replaces and the more useful it allows me to learn. I hope now usof has a chance to benefit you.

Dear readers! What Linux command line tools are not particularly widely known do you use most often?
KlauS 20 october 2017, 15:22
Vote for this post
Bring it to the Main Page


Leave a Reply

Avaible tags
  • <b>...</b>highlighting important text on the page in bold
  • <i>..</i>highlighting important text on the page in italic
  • <u>...</u>allocated with tag <u> text shownas underlined
  • <s>...</s>allocated with tag <s> text shown as strikethrough
  • <sup>...</sup>, <sub>...</sub>text in the tag <sup> appears as a superscript, <sub> - subscript
  • <blockquote>...</blockquote>For  highlight citation, use the tag <blockquote>
  • <code lang="lang">...</code>highlighting the program code (supported by bash, cpp, cs, css, xml, html, java, javascript, lisp, lua, php, perl, python, ruby, sql, scala, text)
  • <a href="http://...">...</a>link, specify the desired Internet address in the href attribute
  • <img src="http://..." alt="text" />specify the full path of image in the src attribute