Commit 315c4e45 authored by Thorsten Leemhuis's avatar Thorsten Leemhuis Committed by Jonathan Corbet

docs: reporting-issues.rst: explain how to decode stack traces

Replace placeholder text about decoding stack traces with a section that
properly describes what a typical user should do these days. To make
it works for them, add a paragraph in an earlier section to ensure
people build their kernels with everything that's needed to decode stack
traces later.
Signed-off-by: default avatarThorsten Leemhuis <linux@leemhuis.info>
Reviewed-by: default avatarQais Yousef <qais.yousef@arm.com>
Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20210215172857.382285-1-linux@leemhuis.infoSigned-off-by: default avatarJonathan Corbet <corbet@lwn.net>
parent 69218034
...@@ -154,8 +154,8 @@ After these preparations you'll now enter the main part: ...@@ -154,8 +154,8 @@ After these preparations you'll now enter the main part:
that hear about it for the first time. And if you learned something in this that hear about it for the first time. And if you learned something in this
process, consider searching again for existing reports about the issue. process, consider searching again for existing reports about the issue.
* If the failure includes a stack dump, like an Oops does, consider decoding * If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
it to find the offending line of code. decoding the kernel log to find the line of code that triggered the error.
* If your problem is a regression, try to narrow down when the issue was * If your problem is a regression, try to narrow down when the issue was
introduced as much as possible. introduced as much as possible.
...@@ -869,6 +869,19 @@ pick up the configuration of your current kernel and then tries to adjust it ...@@ -869,6 +869,19 @@ pick up the configuration of your current kernel and then tries to adjust it
somewhat for your system. That does not make the resulting kernel any better, somewhat for your system. That does not make the resulting kernel any better,
but quicker to compile. but quicker to compile.
Note: If you are dealing with a panic, Oops, warning, or BUG from the kernel,
please try to enable CONFIG_KALLSYMS when configuring your kernel.
Additionally, enable CONFIG_DEBUG_KERNEL and CONFIG_DEBUG_INFO, too; the
latter is the relevant one of those two, but can only be reached if you enable
the former. Be aware CONFIG_DEBUG_INFO increases the storage space required to
build a kernel by quite a bit. But that's worth it, as these options will allow
you later to pinpoint the exact line of code that triggers your issue. The
section 'Decode failure messages' below explains this in more detail.
But keep in mind: Always keep a record of the issue encountered in case it is
hard to reproduce. Sending an undecoded report is better than not reporting
the issue at all.
Check 'taint' flag Check 'taint' flag
------------------ ------------------
...@@ -923,31 +936,55 @@ instead you can join. ...@@ -923,31 +936,55 @@ instead you can join.
Decode failure messages Decode failure messages
----------------------- -----------------------
.. note:: *If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
decoding the kernel log to find the line of code that triggered the error.*
FIXME: The text in this section is a placeholder for now and quite similar to When the kernel detects an internal problem, it will log some information about
the old text found in 'Documentation/admin-guide/reporting-bugs.rst' the executed code. This makes it possible to pinpoint the exact line in the
currently. It and the document it references are known to be outdated and source code that triggered the issue and shows how it was called. But that only
thus need to be revisited. Thus consider this note a request for help: if you works if you enabled CONFIG_DEBUG_INFO and CONFIG_KALLSYMS when configuring
are familiar with this topic, please write a few lines that would fit here. your kernel. If you did so, consider to decode the information from the
Alternatively, simply outline the current situation roughly to the main kernel's log. That will make it a lot easier to understand what lead to the
authors of this document (see intro), as they might be able to write 'panic', 'Oops', 'warning', or 'BUG', which increases the chances that someone
something then. can provide a fix.
This section in the end should answer questions like "when is this actually Decoding can be done with a script you find in the Linux source tree. If you
needed", "what .config options to ideally set earlier to make this step easy are running a kernel you compiled yourself earlier, call it like this::
or unnecessary?" (likely CONFIG_UNWINDER_ORC when it's available, otherwise
CONFIG_UNWINDER_FRAME_POINTER; but is there anything else needed?).
.. [user@something ~]$ sudo dmesg | ./linux-5.10.5/scripts/decode_stacktrace.sh ./linux-5.10.5/vmlinux
If you are running a packaged vanilla kernel, you will likely have to install
the corresponding packages with debug symbols. Then call the script (which you
might need to get from the Linux sources if your distro does not package it)
like this::
[user@something ~]$ sudo dmesg | ./linux-5.10.5/scripts/decode_stacktrace.sh \
/usr/lib/debug/lib/modules/5.10.10-4.1.x86_64/vmlinux /usr/src/kernels/5.10.10-4.1.x86_64/
The script will work on log lines like the following, which show the address of
the code the kernel was executing when the error occurred::
[ 68.387301] RIP: 0010:test_module_init+0x5/0xffa [test_module]
Once decoded, these lines will look like this::
[ 68.387301] RIP: 0010:test_module_init (/home/username/linux-5.10.5/test-module/test-module.c:16) test_module
In this case the executed code was built from the file
'~/linux-5.10.5/test-module/test-module.c' and the error occurred by the
instructions found in line '16'.
*If the failure includes a stack dump, like an Oops does, consider decoding The script will similarly decode the addresses mentioned in the section
it to find the offending line of code.* starting with 'Call trace', which show the path to the function where the
problem occurred. Additionally, the script will show the assembler output for
the code section the kernel was executing.
When the kernel detects an error, it will print a stack dump that allows to Note, if you can't get this to work, simply skip this step and mention the
identify the exact line of code where the issue happens. But that information reason for it in the report. If you're lucky, it might not be needed. And if it
sometimes needs to get decoded to be readable, which is explained in is, someone might help you to get things going. Also be aware this is just one
admin-guide/bug-hunting.rst. of several ways to decode kernel stack traces. Sometimes different steps will
be required to retrieve the relevant details. Don't worry about that, if that's
needed in your case, developers will tell you what to do.
Special care for regressions Special care for regressions
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment