Kan Liang [Tue, 8 Oct 2019 15:50:02 +0000 (08:50 -0700)]
x86/cpu: Add Comet Lake to the Intel CPU models header
[ Upstream commit
8d7c6ac3b2371eb1cbc9925a88f4d10efff374de ]
Comet Lake is the new 10th Gen Intel processor. Add two new CPU model
numbers to the Intel family list.
The CPU model numbers are not published in the SDM yet but they come
from an authoritative internal source.
[ bp: Touch up commit message. ]
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: ak@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1570549810-25049-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yunfeng Ye [Sun, 29 Sep 2019 04:44:17 +0000 (12:44 +0800)]
arm64: armv8_deprecated: Checking return value for memory allocation
[ Upstream commit
3e7c93bd04edfb0cae7dad1215544c9350254b8f ]
There are no return value checking when using kzalloc() and kcalloc() for
memory allocation. so add it.
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Austin Kim [Tue, 3 Sep 2019 03:30:19 +0000 (12:30 +0900)]
btrfs: silence maybe-uninitialized warning in clone_range
[ Upstream commit
431d39887d6273d6d84edf3c2eab09f4200e788a ]
GCC throws warning message as below:
‘clone_src_i_size’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
^
fs/btrfs/send.c:5088:6: note: ‘clone_src_i_size’ was declared here
u64 clone_src_i_size;
^
The clone_src_i_size is only used as call-by-reference
in a call to get_inode_info().
Silence the warning by initializing clone_src_i_size to 0.
Note that the warning is a false positive and reported by older versions
of GCC (eg. 7.x) but not eg 9.x. As there have been numerous people, the
patch is applied. Setting clone_src_i_size to 0 does not otherwise make
sense and would not do any action in case the code changes in the future.
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jia-Ju Bai [Mon, 7 Oct 2019 00:57:57 +0000 (17:57 -0700)]
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc()
[ Upstream commit
2abb7d3b12d007c30193f48bebed781009bebdd2 ]
In ocfs2_info_scan_inode_alloc(), there is an if statement on line 283
to check whether inode_alloc is NULL:
if (inode_alloc)
When inode_alloc is NULL, it is used on line 287:
ocfs2_inode_lock(inode_alloc, &bh, 0);
ocfs2_inode_lock_full_nested(inode, ...)
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
Thus, a possible null-pointer dereference may occur.
To fix this bug, inode_alloc is checked on line 286.
This bug is found by a static analysis tool STCheck written by us.
Link: http://lkml.kernel.org/r/20190726033717.32359-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jia-Ju Bai [Mon, 7 Oct 2019 00:57:54 +0000 (17:57 -0700)]
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock()
[ Upstream commit
583fee3e12df0e6f1f66f063b989d8e7fed0e65a ]
In ocfs2_write_end_nolock(), there are an if statement on lines 1976,
2047 and 2058, to check whether handle is NULL:
if (handle)
When handle is NULL, it is used on line 2045:
ocfs2_update_inode_fsync_trans(handle, inode, 1);
oi->i_sync_tid = handle->h_transaction->t_tid;
Thus, a possible null-pointer dereference may occur.
To fix this bug, handle is checked before calling
ocfs2_update_inode_fsync_trans().
This bug is found by a static analysis tool STCheck written by us.
Link: http://lkml.kernel.org/r/20190726033705.32307-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jia-Ju Bai [Mon, 7 Oct 2019 00:57:50 +0000 (17:57 -0700)]
fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
[ Upstream commit
56e94ea132bb5c2c1d0b60a6aeb34dcb7d71a53d ]
In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to
check whether loc->xl_entry is NULL:
if (loc->xl_entry)
When loc->xl_entry is NULL, it is used on line 2158:
ocfs2_xa_add_entry(loc, name_hash);
loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);
and line 2164:
ocfs2_xa_add_namevalue(loc, xi);
loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len);
loc->xl_entry->xe_name_len = xi->xi_name_len;
Thus, possible null-pointer dereferences may occur.
To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry()
abnormally returns with -EINVAL.
These bugs are found by a static analysis tool STCheck written by us.
[akpm@linux-foundation.org: remove now-unused ocfs2_xa_add_entry()]
Link: http://lkml.kernel.org/r/20190726101447.9153-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jia Guo [Mon, 7 Oct 2019 00:57:47 +0000 (17:57 -0700)]
ocfs2: clear zero in unaligned direct IO
[ Upstream commit
7a243c82ea527cd1da47381ad9cd646844f3b693 ]
Unused portion of a part-written fs-block-sized block is not set to zero
in unaligned append direct write.This can lead to serious data
inconsistencies.
Ocfs2 manage disk with cluster size(for example, 1M), part-written in
one cluster will change the cluster state from UN-WRITTEN to WRITTEN,
VFS(function dio_zero_block) doesn't do the cleaning because bh's state
is not set to NEW in function ocfs2_dio_wr_get_block when we write a
WRITTEN cluster. For example, the cluster size is 1M, file size is 8k
and we direct write from 14k to 15k, then 12k~14k and 15k~16k will
contain dirty data.
We have to deal with two cases:
1.The starting position of direct write is outside the file.
2.The starting position of direct write is located in the file.
We need set bh's state to NEW in the first case. In the second case, we
need mapped twice because bh's state of area out file should be set to
NEW while area in file not.
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/5292e287-8f1a-fd4a-1a14-661e555e0bed@huawei.com
Signed-off-by: Jia Guo <guojia12@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Boris Ostrovsky [Mon, 30 Sep 2019 20:44:41 +0000 (16:44 -0400)]
x86/xen: Return from panic notifier
[ Upstream commit
c6875f3aacf2a5a913205accddabf0bfb75cac76 ]
Currently execution of panic() continues until Xen's panic notifier
(xen_panic_event()) is called at which point we make a hypercall that
never returns.
This means that any notifier that is supposed to be called later as
well as significant part of panic() code (such as pstore writes from
kmsg_dump()) is never executed.
There is no reason for xen_panic_event() to be this last point in
execution since panic()'s emergency_restart() will call into
xen_emergency_restart() from where we can perform our hypercall.
Nevertheless, we will provide xen_legacy_crash boot option that will
preserve original behavior during crash. This option could be used,
for example, if running kernel dumper (which happens after panic
notifiers) is undesirable.
Reported-by: James Dingwall <james@dingwall.me.uk>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Vincent Chen [Mon, 23 Sep 2019 00:45:16 +0000 (08:45 +0800)]
riscv: Correct the handling of unexpected ebreak in do_trap_break()
[ Upstream commit
8bb0daef64e5a92db63ad1d3bbf9e280a7b3612a ]
For the kernel space, all ebreak instructions are determined at compile
time because the kernel space debugging module is currently unsupported.
Hence, it should be treated as a bug if an ebreak instruction which does
not belong to BUG_TRAP_TYPE_WARN or BUG_TRAP_TYPE_BUG is executed in
kernel space. For the userspace, debugging module or user problem may
intentionally insert an ebreak instruction to trigger a SIGTRAP signal.
To approach the above two situations, the do_trap_break() will direct
the BUG_TRAP_TYPE_NONE ebreak exception issued in kernel space to die()
and will send a SIGTRAP to the trapped process only when the ebreak is
in userspace.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[paul.walmsley@sifive.com: fixed checkpatch issue]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Vincent Chen [Mon, 23 Sep 2019 00:45:15 +0000 (08:45 +0800)]
riscv: avoid sending a SIGTRAP to a user thread trapped in WARN()
[ Upstream commit
e0c0fc18f10d5080cddde0e81505fd3e952c20c4 ]
On RISC-V, when the kernel runs code on behalf of a user thread, and the
kernel executes a WARN() or WARN_ON(), the user thread will be sent
a bogus SIGTRAP. Fix the RISC-V kernel code to not send a SIGTRAP when
a WARN()/WARN_ON() is executed.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[paul.walmsley@sifive.com: fixed subject]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Vincent Chen [Mon, 23 Sep 2019 00:45:14 +0000 (08:45 +0800)]
riscv: avoid kernel hangs when trapped in BUG()
[ Upstream commit
8b04825ed205da38754f86f4c07ea8600d8c2a65 ]
When the CONFIG_GENERIC_BUG is disabled by disabling CONFIG_BUG, if a
kernel thread is trapped by BUG(), the whole system will be in the
loop that infinitely handles the ebreak exception instead of entering the
die function. To fix this problem, the do_trap_break() will always call
the die() to deal with the break exception as the type of break is
BUG_TRAP_TYPE_BUG.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Thomas Bogendoerfer [Sun, 6 Oct 2019 13:12:32 +0000 (15:12 +0200)]
MIPS: include: Mark __cmpxchg as __always_inline
[ Upstream commit
88356d09904bc606182c625575237269aeece22e ]
Commit
ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
In cace of cmpxchg this would cause to reference function
__cmpxchg_called_with_bad_pointer, which is a error case
for catching bugs and will not happen for correct code, if
__cmpxchg is inlined.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
[paul.burton@mips.com: s/__cmpxchd/__cmpxchg in subject]
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Dave Young [Wed, 2 Oct 2019 16:59:04 +0000 (18:59 +0200)]
efi/x86: Do not clean dummy variable in kexec path
[ Upstream commit
2ecb7402cfc7f22764e7bbc80790e66eadb20560 ]
kexec reboot fails randomly in UEFI based KVM guest. The firmware
just resets while calling efi_delete_dummy_variable(); Unfortunately
I don't know how to debug the firmware, it is also possible a potential
problem on real hardware as well although nobody reproduced it.
The intention of the efi_delete_dummy_variable is to trigger garbage collection
when entering virtual mode. But SetVirtualAddressMap can only run once
for each physical reboot, thus kexec_enter_virtual_mode() is not necessarily
a good place to clean a dummy object.
Drop the efi_delete_dummy_variable so that kexec reboot can work.
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Matthew Garrett <mjg59@google.com>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Talbert <swt@techie.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-integrity@vger.kernel.org
Link: https://lkml.kernel.org/r/20191002165904.8819-8-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Lukas Wunner [Wed, 2 Oct 2019 16:58:58 +0000 (18:58 +0200)]
efi/cper: Fix endianness of PCIe class code
[ Upstream commit
6fb9367a15d1a126d222d738b2702c7958594a5f ]
The CPER parser assumes that the class code is big endian, but at least
on this edk2-derived Intel Purley platform it's little endian:
efi: EFI v2.50 by EDK II BIOS ID:PLYDCRB1.86B.0119.R05.
1701181843
DMI: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0119.R05.
1701181843 01/18/2017
{1}[Hardware Error]: device_id: 0000:5d:00.0
{1}[Hardware Error]: slot: 0
{1}[Hardware Error]: secondary_bus: 0x5e
{1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2030
{1}[Hardware Error]: class_code: 000406
^^^^^^ (should be 060400)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Matthew Garrett <mjg59@google.com>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Talbert <swt@techie.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-integrity@vger.kernel.org
Link: https://lkml.kernel.org/r/20191002165904.8819-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Will Deacon [Fri, 4 Oct 2019 14:44:45 +0000 (15:44 +0100)]
arm64: vdso32: Don't use KBUILD_CPPFLAGS unconditionally
[ Upstream commit
c71e88c437962c1ec43d4d23a0ebf4c9cf9bee0d ]
KBUILD_CPPFLAGS is defined differently depending on whether the main
compiler is clang or not. This means that it is not possible to build
the compat vDSO with GCC if the rest of the kernel is built with clang.
Define VDSO_CPPFLAGS directly to break this dependency and allow a clang
kernel to build a compat vDSO with GCC:
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
CROSS_COMPILE_COMPAT=arm-linux-gnueabihf- CC=clang \
COMPATCC=arm-linux-gnueabihf-gcc
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Will Deacon [Fri, 4 Oct 2019 13:08:13 +0000 (14:08 +0100)]
arm64: Default to building compat vDSO with clang when CONFIG_CC_IS_CLANG
[ Upstream commit
24ee01a927bfe56c66429ec4b1df6955a814adc8 ]
Rather than force the use of GCC for the compat cross-compiler, instead
extract the target from CROSS_COMPILE_COMPAT and pass it to clang if the
main compiler is clang.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Adam Ford [Sun, 6 Oct 2019 16:33:12 +0000 (11:33 -0500)]
serial: 8250_omap: Fix gpio check for auto RTS/CTS
[ Upstream commit
fc64f7abbef2dae7ee4c94702fb3cf9a2be5431a ]
There are two checks to see if the manual gpio is configured, but
these the check is seeing if the structure is NULL instead it
should check to see if there are CTS and/or RTS pins defined.
This patch uses checks for those individual pins instead of
checking for the structure itself to restore auto RTS/CTS.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20191006163314.23191-2-aford173@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Adam Ford [Sun, 6 Oct 2019 16:33:11 +0000 (11:33 -0500)]
serial: mctrl_gpio: Check for NULL pointer
[ Upstream commit
37e3ab00e4734acc15d96b2926aab55c894f4d9c ]
When using mctrl_gpio_to_gpiod, it dereferences gpios into a single
requested GPIO. This dereferencing can break if gpios is NULL,
so this patch adds a NULL check before dereferencing it. If
gpios is NULL, this function will also return NULL.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Yegor Yefremov <yegorslists@googlemail.com>
Link: https://lore.kernel.org/r/20191006163314.23191-1-aford173@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Vincenzo Frascino [Thu, 3 Oct 2019 17:48:34 +0000 (18:48 +0100)]
arm64: vdso32: Detect binutils support for dmb ishld
[ Upstream commit
0df2c90eba60791148cee1823c0bf5fc66e3465c ]
Older versions of binutils (prior to 2.24) do not support the "ISHLD"
option for memory barrier instructions, which leads to a build failure
when assembling the vdso32 library.
Add a compilation time mechanism that detects if binutils supports those
instructions and configure the kernel accordingly.
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Vincenzo Frascino [Thu, 3 Oct 2019 17:48:33 +0000 (18:48 +0100)]
arm64: vdso32: Fix broken compat vDSO build warnings
[ Upstream commit
e0de01aafc3dd7b73308106b056ead2d48391905 ]
The .config file and the generated include/config/auto.conf can
end up out of sync after a set of commands since
CONFIG_CROSS_COMPILE_COMPAT_VDSO is not updated correctly.
The sequence can be reproduced as follows:
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig
[...]
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- menuconfig
[set CONFIG_CROSS_COMPILE_COMPAT_VDSO="arm-linux-gnueabihf-"]
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
Which results in:
arch/arm64/Makefile:62: CROSS_COMPILE_COMPAT not defined or empty,
the compat vDSO will not be built
even though the compat vDSO has been built:
$ file arch/arm64/kernel/vdso32/vdso.so
arch/arm64/kernel/vdso32/vdso.so: ELF 32-bit LSB pie executable, ARM,
EABI5 version 1 (SYSV), dynamically linked,
BuildID[sha1]=
c67f6c786f2d2d6f86c71f708595594aa25247f6, stripped
A similar case that involves changing the configuration parameter
multiple times can be reconducted to the same family of problems.
Remove the use of CONFIG_CROSS_COMPILE_COMPAT_VDSO altogether and
instead rely on the cross-compiler prefix coming from the environment
via CROSS_COMPILE_COMPAT, much like we do for the rest of the kernel.
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Austin Kim [Tue, 1 Oct 2019 07:34:13 +0000 (16:34 +0900)]
fs: cifs: mute -Wunused-const-variable message
[ Upstream commit
dd19c106a36690b47bb1acc68372f2b472b495b8 ]
After 'Initial git repository build' commit,
'mapping_table_ERRHRD' variable has not been used.
So 'mapping_table_ERRHRD' const variable could be removed
to mute below warning message:
fs/cifs/netmisc.c:120:40: warning: unused variable 'mapping_table_ERRHRD' [-Wunused-const-variable]
static const struct smb_to_posix_error mapping_table_ERRHRD[] = {
^
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Thierry Reding [Wed, 2 Oct 2019 12:28:23 +0000 (14:28 +0200)]
gpio: max77620: Use correct unit for debounce times
[ Upstream commit
fffa6af94894126994a7600c6f6f09b892e89fa9 ]
The gpiod_set_debounce() function takes the debounce time in
microseconds. Adjust the switch/case values in the MAX77620 GPIO to use
the correct unit.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20191002122825.3948322-1-thierry.reding@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jason Gunthorpe [Tue, 1 Oct 2019 15:38:21 +0000 (12:38 -0300)]
RDMA/mlx5: Add missing synchronize_srcu() for MW cases
[ Upstream commit
0417791536ae1e28d7f0418f1d20048ec4d3c6cf ]
While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.
Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.
This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.
Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()
Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jason Gunthorpe [Tue, 1 Oct 2019 15:38:19 +0000 (12:38 -0300)]
RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu
[ Upstream commit
aa116b810ac9077a263ed8679fb4d595f180e0eb ]
During destroy setting live = 0 and then synchronize_srcu() prevents
num_pending_prefetch from incrementing, and also, ensures that all work
holding that count is queued on the WQ. Testing before causes races of the
form:
CPU0 CPU1
dereg_mr()
mlx5_ib_advise_mr_prefetch()
srcu_read_lock()
num_pending_prefetch_inc()
if (!live)
live = 0
atomic_read() == 0
// skip flush_workqueue()
atomic_inc()
queue_work();
srcu_read_unlock()
WARN_ON(atomic_read()) // Fails
Swap the order so that the synchronize_srcu() prevents this.
Fixes: a6bc3875f176 ("IB/mlx5: Protect against prefetch of invalid MR")
Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jason Gunthorpe [Tue, 1 Oct 2019 15:38:16 +0000 (12:38 -0300)]
RDMA/mlx5: Do not allow rereg of a ODP MR
[ Upstream commit
880505cfef1d086d18b59d2920eb2160429ffa1f ]
This code is completely broken, the umem of a ODP MR simply cannot be
discarded without a lot more locking, nor can an ODP mkey be blithely
destroyed via destroy_mkey().
Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Leon Romanovsky [Wed, 2 Oct 2019 11:56:27 +0000 (14:56 +0300)]
RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error path
[ Upstream commit
594e6c5d41ed2471ab0b90f3f0b66cdf618b7ac9 ]
Properly unwind QP counter rebinding in case of failure.
Trying to rebind the counter after unbiding it is not going to work
reliably, move the unbind to the end so it doesn't have to be unwound.
Fixes: b389327df905 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink")
Link: https://lore.kernel.org/r/20191002115627.16740-1-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jack Morgenstein [Mon, 16 Sep 2019 07:11:51 +0000 (10:11 +0300)]
RDMA/cm: Fix memory leak in cm_add/remove_one
[ Upstream commit
94635c36f3854934a46d9e812e028d4721bbb0e6 ]
In the process of moving the debug counters sysfs entries, the commit
mentioned below eliminated the cm_infiniband sysfs directory.
This sysfs directory was tied to the cm_port object allocated in procedure
cm_add_one().
Before the commit below, this cm_port object was freed via a call to
kobject_put(port->kobj) in procedure cm_remove_port_fs().
Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated.
This, however, meant that kfree was never called for the cm_port buffers.
Fix this by adding explicit kfree(port) calls to functions cm_add_one()
and cm_remove_one().
Note: the kfree call in the first chunk below (in the cm_add_one error
flow) fixes an old, undetected memory leak.
Fixes: c87e65cfb97c ("RDMA/cm: Move debug counters to be under relevant IB device")
Link: https://lore.kernel.org/r/20190916071154.20383-2-leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Christophe JAILLET [Sun, 18 Aug 2019 09:10:44 +0000 (11:10 +0200)]
RDMA/core: Fix an error handling path in 'res_get_common_doit()'
[ Upstream commit
ab59ca3eb4e7059727df85eee68bda169d26c8f8 ]
According to surrounding error paths, it is likely that 'goto err_get;' is
expected here. Otherwise, a call to 'rdma_restrack_put(res);' would be
missing.
Fixes: c5dfe0ea6ffa ("RDMA/nldev: Add resource tracker doit callback")
Link: https://lore.kernel.org/r/20190818091044.8845-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Navid Emamdoost [Wed, 25 Sep 2019 15:27:41 +0000 (10:27 -0500)]
misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach
[ Upstream commit
fc739a058d99c9297ef6bfd923b809d85855b9a9 ]
In fastrpc_dma_buf_attach if dma_get_sgtable fails the allocated memory
for a should be released.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Link: https://lore.kernel.org/r/20190925152742.16258-1-navid.emamdoost@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Randy Dunlap [Tue, 1 Oct 2019 02:15:12 +0000 (19:15 -0700)]
tty: n_hdlc: fix build on SPARC
[ Upstream commit
47a7e5e97d4edd7b14974d34f0e5a5560fad2915 ]
Fix tty driver build on SPARC by not using __exitdata.
It appears that SPARC does not support section .exit.data.
Fixes these build errors:
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 063246641d4a ("format-security: move static strings to const")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/675e7bd9-955b-3ff3-1101-a973b58b5b75@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Christoph Hellwig [Tue, 10 Sep 2019 05:59:23 +0000 (07:59 +0200)]
serial/sifive: select SERIAL_EARLYCON
[ Upstream commit
7e2a165de5a52003d10a611ee3884cdb5c44e8cd ]
The sifive serial driver implements earlycon support, but unless
another driver is built in that supports earlycon support it won't
be usable. Explicitly select SERIAL_EARLYCON instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20190910055923.28384-1-hch@lst.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Christophe JAILLET [Tue, 10 Sep 2019 04:17:02 +0000 (06:17 +0200)]
tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()'
[ Upstream commit
5080d127127ac5b610b57900774d9559ae55e817 ]
'exit' functions should be marked as __exit, not __init.
Fixes: c10b13325ced ("tty: serial: Add RDA8810PL UART driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20190910041702.7357-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Christophe JAILLET [Tue, 10 Sep 2019 04:11:29 +0000 (06:11 +0200)]
tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'
[ Upstream commit
6264dab6efd6069f0387efb078a9960b5642377b ]
'exit' functions should be marked as __exit, not __init.
Fixes: fc60a8b675bd ("tty: serial: owl: Implement console driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20190910041129.6978-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
James Morse [Wed, 2 Oct 2019 09:49:35 +0000 (10:49 +0100)]
arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #
1542419
[ Upstream commit
dd8a1f13488438c6c220b7cafa500baaf21a6e53 ]
CPUs affected by Neoverse-N1 #
1542419 may execute a stale instruction if
it was recently modified. The affected sequence requires freshly written
instructions to be executable before a branch to them is updated.
There are very few places in the kernel that modify executable text,
all but one come with sufficient synchronisation:
* The module loader's flush_module_icache() calls flush_icache_range(),
which does a kick_all_cpus_sync()
* bpf_int_jit_compile() calls flush_icache_range().
* Kprobes calls aarch64_insn_patch_text(), which does its work in
stop_machine().
* static keys and ftrace both patch between nops and branches to
existing kernel code (not generated code).
The affected sequence is the interaction between ftrace and modules.
The module PLT is cleaned using __flush_icache_range() as the trampoline
shouldn't be executable until we update the branch to it.
Drop the double-underscore so that this path runs kick_all_cpus_sync()
too.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
James Morse [Thu, 3 Oct 2019 17:01:27 +0000 (18:01 +0100)]
arm64: Fix incorrect irqflag restore for priority masking for compat
[ Upstream commit
f46f27a576cc3b1e3d45ea50bc06287aa46b04b2 ]
Commit
bd82d4bd2188 ("arm64: Fix incorrect irqflag restore for priority
masking") added a macro to the entry.S call paths that leave the
PSTATE.I bit set. This tells the pPNMI masking logic that interrupts
are masked by the CPU, not by the PMR. This value is read back by
local_daif_save().
Commit
bd82d4bd2188 added this call to el0_svc, as el0_svc_handler
is called with interrupts masked. el0_svc_compat was missed, but should
be covered in the same way as both of these paths end up in
el0_svc_common(), which expects to unmask interrupts.
Fixes: bd82d4bd2188 ("arm64: Fix incorrect irqflag restore for priority masking")
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Julien Grall [Thu, 3 Oct 2019 11:12:08 +0000 (12:12 +0100)]
arm64: cpufeature: Effectively expose FRINT capability to userspace
[ Upstream commit
7230f7e99fecc684180322b056fad3853d1029d3 ]
The HWCAP framework will detect a new capability based on the sanitized
version of the ID registers.
Sanitization is based on a whitelist, so any field not described will end
up to be zeroed.
At the moment, ID_AA64ISAR1_EL1.FRINTTS is not described in
ftr_id_aa64isar1. This means the field will be zeroed and therefore the
userspace will not be able to see the HWCAP even if the hardware
supports the feature.
This can be fixed by describing the field in ftr_id_aa64isar1.
Fixes: ca9503fc9e98 ("arm64: Expose FRINT capabilities to userspace")
Signed-off-by: Julien Grall <julien.grall@arm.com>
Cc: mark.brown@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
ZhangXiaoxu [Thu, 26 Sep 2019 06:29:38 +0000 (14:29 +0800)]
nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
[ Upstream commit
33ea5aaa87cdae0f9af4d6b7ee4f650a1a36fd1d ]
When xfstests testing, there are some WARNING as below:
WARNING: CPU: 0 PID: 6235 at fs/nfs/inode.c:122 nfs_clear_inode+0x9c/0xd8
Modules linked in:
CPU: 0 PID: 6235 Comm: umount.nfs
Hardware name: linux,dummy-virt (DT)
pstate:
60000005 (nZCv daif -PAN -UAO)
pc : nfs_clear_inode+0x9c/0xd8
lr : nfs_evict_inode+0x60/0x78
sp :
fffffc000f68fc00
x29:
fffffc000f68fc00 x28:
fffffe00c53155c0
x27:
fffffe00c5315000 x26:
fffffc0009a63748
x25:
fffffc000f68fd18 x24:
fffffc000bfaaf40
x23:
fffffc000936d3c0 x22:
fffffe00c4ff5e20
x21:
fffffc000bfaaf40 x20:
fffffe00c4ff5d10
x19:
fffffc000c056000 x18:
000000000000003c
x17:
0000000000000000 x16:
0000000000000000
x15:
0000000000000040 x14:
0000000000000228
x13:
fffffc000c3a2000 x12:
0000000000000045
x11:
0000000000000000 x10:
0000000000000000
x9 :
0000000000000000 x8 :
0000000000000000
x7 :
0000000000000000 x6 :
fffffc00084b027c
x5 :
fffffc0009a64000 x4 :
fffffe00c0e77400
x3 :
fffffc000c0563a8 x2 :
fffffffffffffffb
x1 :
000000000000764e x0 :
0000000000000001
Call trace:
nfs_clear_inode+0x9c/0xd8
nfs_evict_inode+0x60/0x78
evict+0x108/0x380
dispose_list+0x70/0xa0
evict_inodes+0x194/0x210
generic_shutdown_super+0xb0/0x220
nfs_kill_super+0x40/0x88
deactivate_locked_super+0xb4/0x120
deactivate_super+0x144/0x160
cleanup_mnt+0x98/0x148
__cleanup_mnt+0x38/0x50
task_work_run+0x114/0x160
do_notify_resume+0x2f8/0x308
work_pending+0x8/0x14
The nrequest should be increased/decreased only if PG_INODE_REF flag
was setted.
But in the nfs_inode_remove_request function, it maybe decrease when
no PG_INODE_REF flag, this maybe lead nrequests count error.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kees Cook [Thu, 19 Sep 2019 18:06:44 +0000 (11:06 -0700)]
selftests/kselftest/runner.sh: Add 45 second timeout per test
[ Upstream commit
852c8cbf34d3b3130a05c38064dd98614f97d3a8 ]
Commit
a745f7af3cbd ("selftests/harness: Add 30 second timeout per
test") solves the problem of kselftest_harness.h-using binary tests
possibly hanging forever. However, scripts and other binaries can still
hang forever. This adds a global timeout to each test script run.
To make this configurable (e.g. as needed in the "rtc" test case),
include a new per-test-directory "settings" file (similar to "config")
that can contain kselftest-specific settings. The first recognized field
is "timeout".
Additionally, this splits the reporting for timeouts into a specific
"TIMEOUT" not-ok (and adds exit code reporting in the remaining case).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Cristian Marussi [Thu, 26 Sep 2019 17:52:19 +0000 (18:52 +0100)]
kselftest: exclude failed TARGETS from runlist
[ Upstream commit
131b30c94fbc0adb15f911609884dd39dada8f00 ]
A TARGET which failed to be built/installed should not be included in the
runlist generated inside the run_kselftest.sh script.
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Dexuan Cui [Tue, 20 Aug 2019 02:56:34 +0000 (02:56 +0000)]
HID: hyperv: Use in-place iterator API in the channel callback
[ Upstream commit
6a297c90efa68b2864483193b8bfb0d19478600c ]
Simplify the ring buffer handling with the in-place API.
Also avoid the dynamic allocation and the memory leak in the channel
callback function.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Bart Van Assche [Mon, 30 Sep 2019 23:16:54 +0000 (16:16 -0700)]
RDMA/iwcm: Fix a lock inversion issue
[ Upstream commit
b66f31efbdad95ec274345721d99d1d835e6de01 ]
This patch fixes the lock inversion complaint:
============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ #1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]
but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&id_priv->handler_mutex);
lock(&id_priv->handler_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/u16:6/171:
#0:
00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
#1:
000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
#2:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
dump_stack+0x8a/0xd6
__lock_acquire.cold+0xe1/0x24d
lock_acquire+0x106/0x240
__mutex_lock+0x12e/0xcb0
mutex_lock_nested+0x1f/0x30
rdma_destroy_id+0x78/0x4a0 [rdma_cm]
iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
cm_work_handler+0xe62/0x1100 [iw_cm]
process_one_work+0x56d/0xac0
worker_thread+0x7a/0x5d0
kthread+0x1bc/0x210
ret_from_fork+0x24/0x30
This is not a bug as there are actually two lock classes here.
Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org
Fixes: de910bd92137 ("RDMA/cma: Simplify locking needed for serialization of callbacks")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Potnuri Bharat Teja [Mon, 30 Sep 2019 07:41:19 +0000 (13:11 +0530)]
RDMA/iw_cxgb4: fix SRQ access from dump_qp()
[ Upstream commit
91724c1e5afe45b64970036170659726e7dc5cff ]
dump_qp() is wrongly trying to dump SRQ structures as QP when SRQ is used
by the application. This patch matches the QPID before dumping them. Also
removes unwanted SRQ id addition to QP id xarray.
Fixes: 2f43129127e6 ("cxgb4: Convert qpidr to XArray")
Link: https://lore.kernel.org/r/20190930074119.20046-1-bharat@chelsio.com
Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Navid Emamdoost [Wed, 25 Sep 2019 14:45:42 +0000 (09:45 -0500)]
RDMA/hfi1: Prevent memory leak in sdma_init
[ Upstream commit
34b3be18a04ecdc610aae4c48e5d1b799d8689f6 ]
In sdma_init if rhashtable_init fails the allocated memory for
tmp_sdma_rht should be released.
Fixes: 5a52a7acf7e2 ("IB/hfi1: NULL pointer dereference when freeing rhashtable")
Link: https://lore.kernel.org/r/20190925144543.10141-1-navid.emamdoost@gmail.com
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Krishnamraju Eraparaju [Mon, 23 Sep 2019 10:11:12 +0000 (15:41 +0530)]
RDMA/siw: Fix serialization issue in write_space()
[ Upstream commit
df791c54d627bae53c9be3be40a69594c55de487 ]
In siw_qp_llp_write_space(), 'sock' members should be accessed with
sk_callback_lock held, otherwise, it could race with
siw_sk_restore_upcalls(). And this could cause "NULL deref" panic. Below
panic is due to the NULL cep returned from sk_to_cep(sk):
Call Trace:
<IRQ> siw_qp_llp_write_space+0x11/0x40 [siw]
tcp_check_space+0x4c/0xf0
tcp_rcv_established+0x52b/0x630
tcp_v4_do_rcv+0xf4/0x1e0
tcp_v4_rcv+0x9b8/0xab0
ip_protocol_deliver_rcu+0x2c/0x1c0
ip_local_deliver_finish+0x44/0x50
ip_local_deliver+0x6b/0xf0
? ip_protocol_deliver_rcu+0x1c0/0x1c0
ip_rcv+0x52/0xd0
? ip_rcv_finish_core.isra.14+0x390/0x390
__netif_receive_skb_one_core+0x83/0xa0
netif_receive_skb_internal+0x73/0xb0
napi_gro_frags+0x1ff/0x2b0
t4_ethrx_handler+0x4a7/0x740 [cxgb4]
process_responses+0x2c9/0x590 [cxgb4]
? t4_sge_intr_msix+0x1d/0x30 [cxgb4]
? handle_irq_event_percpu+0x51/0x70
? handle_irq_event+0x41/0x60
? handle_edge_irq+0x97/0x1a0
napi_rx_handler+0x14/0xe0 [cxgb4]
net_rx_action+0x2af/0x410
__do_softirq+0xda/0x2a8
do_softirq_own_stack+0x2a/0x40
</IRQ>
do_softirq+0x50/0x60
__local_bh_enable_ip+0x50/0x60
ip_finish_output2+0x18f/0x520
ip_output+0x6e/0xf0
? __ip_finish_output+0x1f0/0x1f0
__ip_queue_xmit+0x14f/0x3d0
? __slab_alloc+0x4b/0x58
__tcp_transmit_skb+0x57d/0xa60
tcp_write_xmit+0x23b/0xfd0
__tcp_push_pending_frames+0x2e/0xf0
tcp_sendmsg_locked+0x939/0xd50
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x57/0x80
siw_tx_hdt+0x894/0xb20 [siw]
? find_busiest_group+0x3e/0x5b0
? common_interrupt+0xa/0xf
? common_interrupt+0xa/0xf
? common_interrupt+0xa/0xf
siw_qp_sq_process+0xf1/0xe60 [siw]
? __wake_up_common_lock+0x87/0xc0
siw_sq_resume+0x33/0xe0 [siw]
siw_run_sq+0xac/0x190 [siw]
? remove_wait_queue+0x60/0x60
kthread+0xf8/0x130
? siw_sq_resume+0xe0/0xe0 [siw]
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
Fixes: f29dd55b0236 ("rdma/siw: queue pair methods")
Link: https://lore.kernel.org/r/20190923101112.32685-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Connor Kuehl [Fri, 27 Sep 2019 21:44:15 +0000 (14:44 -0700)]
staging: rtl8188eu: fix null dereference when kzalloc fails
[ Upstream commit
955c1532a34305f2f780b47f0c40cc7c65500810 ]
If kzalloc() returns NULL, the error path doesn't stop the flow of
control from entering rtw_hal_read_chip_version() which dereferences the
null pointer. Fix this by adding a 'goto' to the error path to more
gracefully handle the issue and avoid proceeding with initialization
steps that we're no longer prepared to handle.
Also update the debug message to be more consistent with the other debug
messages in this function.
Addresses-Coverity: ("Dereference after null check")
Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com>
Link: https://lore.kernel.org/r/20190927214415.899-1-connor.kuehl@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 19:04:21 +0000 (16:04 -0300)]
perf annotate: Don't return -1 for error when doing BPF disassembly
[ Upstream commit
11aad897f6d1a28eae3b7e5b293647c522d65819 ]
Return errno when open_memstream() fails and add two new speciall error
codes for when an invalid, non BPF file or one without BTF is passed to
symbol__disassemble_bpf(), so that its callers can rely on
symbol__strerror_disassemble() to convert that to a human readable error
message that can help figure out what is wrong, with hints even.
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-usevw9r2gcipfcrbpaueurw0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 18:53:33 +0000 (15:53 -0300)]
perf annotate: Return appropriate error code for allocation failures
[ Upstream commit
16ed3c1e91159e28b02f11f71ff4ce4cbc6f99e4 ]
We should return errno or the annotation extra range understood by
symbol__strerror_disassemble() instead of -1, fix it, returning ENOMEM
instead.
Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-8of1cmj3rz0mppfcshc9bbqq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 18:48:12 +0000 (15:48 -0300)]
perf annotate: Fix arch specific ->init() failure errors
[ Upstream commit
42d7a9107d83223a5fcecc6732d626a6c074cbc2 ]
They are called from symbol__annotate() and to propagate errors that can
help understand the problem make them return what
symbol__strerror_disassemble() known, i.e. errno codes and other
annotation specific errors in a special, out of errnos, range.
Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-pqx7srcv7tixgid251aeboj6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 18:44:13 +0000 (15:44 -0300)]
perf annotate: Propagate the symbol__annotate() error return
[ Upstream commit
211f493b611eef012841f795166c38ec7528738d ]
We were just returning -1 in symbol__annotate() when symbol__annotate()
failed, propagate its error as it is used later to pass to
symbol__strerror_disassemble() to present a error message to the user,
that in some cases were getting:
"Invalid -1 error code"
Fix it to propagate the error.
Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-0tj89rs9g7nbcyd5skadlvuu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 18:11:47 +0000 (15:11 -0300)]
perf annotate: Fix the signedness of failure returns
[ Upstream commit
28f4417c3333940b242af03d90214f713bbef232 ]
Callers of symbol__annotate() expect a errno value or some other
extended error value range in symbol__strerror_disassemble() to
convert to a proper error string, fix it when propagating a failure to
find the arch specific annotation routines via arch__find(arch_name).
Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-o0k6dw7cas0vvmjjvgsyvu1i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 18:06:01 +0000 (15:06 -0300)]
perf annotate: Propagate perf_env__arch() error
[ Upstream commit
a66fa0619a0ae3585ef09e9c33ecfb5c7c6cb72b ]
The callers of symbol__annotate2() use symbol__strerror_disassemble() to
convert its failure returns into a human readable string, so
propagate error values from functions it calls, starting with
perf_env__arch() that when fails the right thing to do is to look at
'errno' to see why its possible call to uname() failed.
Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-it5d83kyusfhb1q1b0l4pxzs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arnaldo Carvalho de Melo [Mon, 30 Sep 2019 13:55:34 +0000 (10:55 -0300)]
perf tools: Propagate get_cpuid() error
[ Upstream commit
f67001a4a08eb124197ed4376941e1da9cf94b42 ]
For consistency, propagate the exact cause for get_cpuid() to have
failed.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-9ig269f7ktnhh99g4l15vpu2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Andi Kleen [Fri, 27 Sep 2019 23:35:45 +0000 (16:35 -0700)]
perf jevents: Fix period for Intel fixed counters
[ Upstream commit
6bdfd9f118bd59cf0f85d3bf4b72b586adea17c1 ]
The Intel fixed counters use a special table to override the JSON
information.
During this override the period information from the JSON file got
dropped, which results in inst_retired.any and similar running with
frequency mode instead of a period.
Just specify the expected period in the table.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20190927233546.11533-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Andi Kleen [Fri, 27 Sep 2019 23:35:44 +0000 (16:35 -0700)]
perf script brstackinsn: Fix recovery from LBR/binary mismatch
[ Upstream commit
e98df280bc2a499fd41d7f9e2d6733884de69902 ]
When the LBR data and the instructions in a binary do not match the loop
printing instructions could get confused and print a long stream of
bogus <bad> instructions.
The problem was that if the instruction decoder cannot decode an
instruction it ilen wasn't initialized, so the loop going through the
basic block would continue with the previous value.
Harden the code to avoid such problems:
- Make sure ilen is always freshly initialized and is 0 for bad
instructions.
- Do not overrun the code buffer while printing instructions
- Print a warning message if the final jump is not on an instruction
boundary.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20190927233546.11533-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Steve MacLean [Sat, 28 Sep 2019 01:39:00 +0000 (01:39 +0000)]
perf map: Fix overlapped map handling
[ Upstream commit
ee212d6ea20887c0ef352be8563ca13dbf965906 ]
Whenever an mmap/mmap2 event occurs, the map tree must be updated to add a new
entry. If a new map overlaps a previous map, the overlapped section of the
previous map is effectively unmapped, but the non-overlapping sections are
still valid.
maps__fixup_overlappings() is responsible for creating any new map entries from
the previously overlapped map. It optionally creates a before and an after map.
When creating the after map the existing code failed to adjust the map.pgoff.
This meant the new after map would incorrectly calculate the file offset
for the ip. This results in incorrect symbol name resolution for any ip in the
after region.
Make maps__fixup_overlappings() correctly populate map.pgoff.
Add an assert that new mapping matches old mapping at the beginning of
the after map.
Committer-testing:
Validated correct parsing of libcoreclr.so symbols from .NET Core 3.0 preview9
(which didn't strip symbols).
Preparation:
~/dotnet3.0-preview9/dotnet new webapi -o perfSymbol
cd perfSymbol
~/dotnet3.0-preview9/dotnet publish
perf record ~/dotnet3.0-preview9/dotnet \
bin/Debug/netcoreapp3.0/publish/perfSymbol.dll
^C
Before:
perf script --show-mmap-events 2>&1 | grep -e MMAP -e unknown |\
grep libcoreclr.so | head -n 4
dotnet 1907 373352.698780: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615726000(0x768000) @ 0 08:02
5510620 765057155]: \
r-xp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701091: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615974000(0x1000) @ 0x24e000 08:02
5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701241: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615c42000(0x1000) @ 0x51c000 08:02
5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.705249: 250000 cpu-clock: \
7fe6159a1f99 [unknown] \
(.../3.0.0-preview9-19423-09/libcoreclr.so)
After:
perf script --show-mmap-events 2>&1 | grep -e MMAP -e unknown |\
grep libcoreclr.so | head -n 4
dotnet 1907 373352.698780: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615726000(0x768000) @ 0 08:02
5510620 765057155]: \
r-xp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701091: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615974000(0x1000) @ 0x24e000 08:02
5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
dotnet 1907 373352.701241: PERF_RECORD_MMAP2 1907/1907: \
[0x7fe615c42000(0x1000) @ 0x51c000 08:02
5510620 765057155]: \
rwxp .../3.0.0-preview9-19423-09/libcoreclr.so
All the [unknown] symbols were resolved.
Signed-off-by: Steve MacLean <Steve.MacLean@Microsoft.com>
Tested-by: Brian Robbins <brianrob@microsoft.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: John Keeping <john@metanate.com>
Cc: John Salem <josalem@microsoft.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom McDonald <thomas.mcdonald@microsoft.com>
Link: http://lore.kernel.org/lkml/BN8PR21MB136270949F22A6A02335C238F7800@BN8PR21MB1362.namprd21.prod.outlook.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ian Rogers [Wed, 25 Sep 2019 19:59:24 +0000 (12:59 -0700)]
perf tests: Avoid raising SEGV using an obvious NULL dereference
[ Upstream commit
e3e2cf3d5b1fe800b032e14c0fdcd9a6fb20cf3b ]
An optimized build such as:
make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-O3
will turn the dereference operation into a ud2 instruction, raising a
SIGILL rather than a SIGSEGV. Use raise(..) for correctness and clarity.
Similar issues were addressed in Numfor Mbiziwo-Tiapo's patch:
https://lkml.org/lkml/2019/7/8/1234
Committer testing:
Before:
[root@quaco ~]# perf test hooks
55: perf hooks : Ok
[root@quaco ~]# perf test -v hooks
55: perf hooks :
--- start ---
test child forked, pid 17092
SIGSEGV is observed as expected, try to recover.
Fatal error (SEGFAULT) in perf hook 'test'
test child finished with 0
---- end ----
perf hooks: Ok
[root@quaco ~]#
After:
[root@quaco ~]# perf test hooks
55: perf hooks : Ok
[root@quaco ~]# perf test -v hooks
55: perf hooks :
--- start ---
test child forked, pid 17909
SIGSEGV is observed as expected, try to recover.
Fatal error (SEGFAULT) in perf hook 'test'
test child finished with 0
---- end ----
perf hooks: Ok
[root@quaco ~]#
Fixes: a074865e60ed ("perf tools: Introduce perf hooks")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20190925195924.152834-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ian Rogers [Wed, 25 Sep 2019 19:59:23 +0000 (12:59 -0700)]
libsubcmd: Make _FORTIFY_SOURCE defines dependent on the feature
[ Upstream commit
4b0b2b096da9d296e0e5668cdfba8613bd6f5bc8 ]
Unconditionally defining _FORTIFY_SOURCE can break tools that don't work
with it, such as memory sanitizers:
https://github.com/google/sanitizers/wiki/AddressSanitizer#faq
Fixes: 4b6ab94eabe4 ("perf subcmd: Create subcmd library")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20190925195924.152834-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascal Bouwmann [Thu, 29 Aug 2019 05:29:41 +0000 (07:29 +0200)]
iio: fix center temperature of bmc150-accel-core
[ Upstream commit
6c59a962e081df6d8fe43325bbfabec57e0d4751 ]
The center temperature of the supported devices stored in the constant
BMC150_ACCEL_TEMP_CENTER_VAL is not 24 degrees but 23 degrees.
It seems that some datasheets were inconsistent on this value leading
to the error. For most usecases will only make minor difference so
not queued for stable.
Signed-off-by: Pascal Bouwmann <bouwmann@tau-tec.de>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Remi Pommarel [Sun, 1 Sep 2019 10:54:10 +0000 (12:54 +0200)]
iio: adc: meson_saradc: Fix memory allocation order
[ Upstream commit
de10ac47597e7a3596b27631d0d5ce5f48d2c099 ]
meson_saradc's irq handler uses priv->regmap so make sure that it is
allocated before the irq get enabled.
This also fixes crash when CONFIG_DEBUG_SHIRQ is enabled, as device
managed resources are freed in the inverted order they had been
allocated, priv->regmap was freed before the spurious fake irq that
CONFIG_DEBUG_SHIRQ adds called the handler.
Fixes: 3af109131b7eb8 ("iio: adc: meson-saradc: switch from polling to interrupt mode")
Reported-by: Elie Roudninski <xademax@gmail.com>
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: Elie ROUDNINSKI <xademax@gmail.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Qu Wenruo [Mon, 14 Oct 2019 06:34:51 +0000 (14:34 +0800)]
btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents()
[ Upstream commit
8702ba9396bf7bbae2ab93c94acd4bd37cfa4f09 ]
[Background]
Btrfs qgroup uses two types of reserved space for METADATA space,
PERTRANS and PREALLOC.
PERTRANS is metadata space reserved for each transaction started by
btrfs_start_transaction().
While PREALLOC is for delalloc, where we reserve space before joining a
transaction, and finally it will be converted to PERTRANS after the
writeback is done.
[Inconsistency]
However there is inconsistency in how we handle PREALLOC metadata space.
The most obvious one is:
In btrfs_buffered_write():
btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, true);
We always free qgroup PREALLOC meta space.
While in btrfs_truncate_block():
btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
We only free qgroup PREALLOC meta space when something went wrong.
[The Correct Behavior]
The correct behavior should be the one in btrfs_buffered_write(), we
should always free PREALLOC metadata space.
The reason is, the btrfs_delalloc_* mechanism works by:
- Reserve metadata first, even it's not necessary
In btrfs_delalloc_reserve_metadata()
- Free the unused metadata space
Normally in:
btrfs_delalloc_release_extents()
|- btrfs_inode_rsv_release()
Here we do calculation on whether we should release or not.
E.g. for 64K buffered write, the metadata rsv works like:
/* The first page */
reserve_meta: num_bytes=calc_inode_reservations()
free_meta: num_bytes=0
total: num_bytes=calc_inode_reservations()
/* The first page caused one outstanding extent, thus needs metadata
rsv */
/* The 2nd page */
reserve_meta: num_bytes=calc_inode_reservations()
free_meta: num_bytes=calc_inode_reservations()
total: not changed
/* The 2nd page doesn't cause new outstanding extent, needs no new meta
rsv, so we free what we have reserved */
/* The 3rd~16th pages */
reserve_meta: num_bytes=calc_inode_reservations()
free_meta: num_bytes=calc_inode_reservations()
total: not changed (still space for one outstanding extent)
This means, if btrfs_delalloc_release_extents() determines to free some
space, then those space should be freed NOW.
So for qgroup, we should call btrfs_qgroup_free_meta_prealloc() other
than btrfs_qgroup_convert_reserved_meta().
The good news is:
- The callers are not that hot
The hottest caller is in btrfs_buffered_write(), which is already
fixed by commit
336a8bb8e36a ("btrfs: Fix wrong
btrfs_delalloc_release_extents parameter"). Thus it's not that
easy to cause false EDQUOT.
- The trans commit in advance for qgroup would hide the bug
Since commit
f5fef4593653 ("btrfs: qgroup: Make qgroup async transaction
commit more aggressive"), when btrfs qgroup metadata free space is slow,
it will try to commit transaction and free the wrongly converted
PERTRANS space, so it's not that easy to hit such bug.
[FIX]
So to fix the problem, remove the @qgroup_free parameter for
btrfs_delalloc_release_extents(), and always pass true to
btrfs_inode_rsv_release().
Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Filipe Manana [Thu, 4 Jul 2019 15:24:19 +0000 (16:24 +0100)]
Btrfs: fix inode cache block reserve leak on failure to allocate data space
[ Upstream commit
29d47d00e0ae61668ee0c5d90bef2893c8abbafa ]
If we failed to allocate the data extent(s) for the inode space cache, we
were bailing out without releasing the previously reserved metadata. This
was triggering the following warnings when unmounting a filesystem:
$ cat -n fs/btrfs/inode.c
(...)
9268 void btrfs_destroy_inode(struct inode *inode)
9269 {
(...)
9276 WARN_ON(BTRFS_I(inode)->block_rsv.reserved);
9277 WARN_ON(BTRFS_I(inode)->block_rsv.size);
(...)
9281 WARN_ON(BTRFS_I(inode)->csum_bytes);
9282 WARN_ON(BTRFS_I(inode)->defrag_bytes);
(...)
Several fstests test cases triggered this often, such as generic/083,
generic/102, generic/172, generic/269 and generic/300 at least, producing
stack traces like the following in dmesg/syslog:
[82039.079546] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9276 btrfs_destroy_inode+0x203/0x270 [btrfs]
(...)
[82039.081543] CPU: 2 PID: 13167 Comm: umount Tainted: G W 5.2.0-rc4-btrfs-next-50 #1
[82039.081912] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[82039.082673] RIP: 0010:btrfs_destroy_inode+0x203/0x270 [btrfs]
(...)
[82039.083913] RSP: 0018:
ffffac0b426a7d30 EFLAGS:
00010206
[82039.084320] RAX:
ffff8ddf77691158 RBX:
ffff8dde29b34660 RCX:
0000000000000002
[82039.084736] RDX:
0000000000000000 RSI:
0000000000000001 RDI:
ffff8dde29b34660
[82039.085156] RBP:
ffff8ddf5fbec000 R08:
0000000000000000 R09:
0000000000000000
[82039.085578] R10:
ffffac0b426a7c90 R11:
ffffffffb9aad768 R12:
ffffac0b426a7db0
[82039.086000] R13:
ffff8ddf5fbec0a0 R14:
dead000000000100 R15:
0000000000000000
[82039.086416] FS:
00007f8db96d12c0(0000) GS:
ffff8de036b00000(0000) knlGS:
0000000000000000
[82039.086837] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[82039.087253] CR2:
0000000001416108 CR3:
00000002315cc001 CR4:
00000000003606e0
[82039.087672] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[82039.088089] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[82039.088504] Call Trace:
[82039.088918] destroy_inode+0x3b/0x70
[82039.089340] btrfs_free_fs_root+0x16/0xa0 [btrfs]
[82039.089768] btrfs_free_fs_roots+0xd8/0x160 [btrfs]
[82039.090183] ? wait_for_completion+0x65/0x1a0
[82039.090607] close_ctree+0x172/0x370 [btrfs]
[82039.091021] generic_shutdown_super+0x6c/0x110
[82039.091427] kill_anon_super+0xe/0x30
[82039.091832] btrfs_kill_super+0x12/0xa0 [btrfs]
[82039.092233] deactivate_locked_super+0x3a/0x70
[82039.092636] cleanup_mnt+0x3b/0x80
[82039.093039] task_work_run+0x93/0xc0
[82039.093457] exit_to_usermode_loop+0xfa/0x100
[82039.093856] do_syscall_64+0x162/0x1d0
[82039.094244] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[82039.094634] RIP: 0033:0x7f8db8fbab37
(...)
[82039.095876] RSP: 002b:
00007ffdce35b468 EFLAGS:
00000246 ORIG_RAX:
00000000000000a6
[82039.096290] RAX:
0000000000000000 RBX:
0000560d20b00060 RCX:
00007f8db8fbab37
[82039.096700] RDX:
0000000000000001 RSI:
0000000000000000 RDI:
0000560d20b00240
[82039.097110] RBP:
0000560d20b00240 R08:
0000560d20b00270 R09:
0000000000000015
[82039.097522] R10:
00000000000006b4 R11:
0000000000000246 R12:
00007f8db94bce64
[82039.097937] R13:
0000000000000000 R14:
0000000000000000 R15:
00007ffdce35b6f0
[82039.098350] irq event stamp: 0
[82039.098750] hardirqs last enabled at (0): [<
0000000000000000>] 0x0
[82039.099150] hardirqs last disabled at (0): [<
ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
[82039.099545] softirqs last enabled at (0): [<
ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
[82039.099925] softirqs last disabled at (0): [<
0000000000000000>] 0x0
[82039.100292] ---[ end trace
f2521afa616ddccc ]---
[82039.100707] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9277 btrfs_destroy_inode+0x1ac/0x270 [btrfs]
(...)
[82039.103050] CPU: 2 PID: 13167 Comm: umount Tainted: G W 5.2.0-rc4-btrfs-next-50 #1
[82039.103428] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[82039.104203] RIP: 0010:btrfs_destroy_inode+0x1ac/0x270 [btrfs]
(...)
[82039.105461] RSP: 0018:
ffffac0b426a7d30 EFLAGS:
00010206
[82039.105866] RAX:
ffff8ddf77691158 RBX:
ffff8dde29b34660 RCX:
0000000000000002
[82039.106270] RDX:
0000000000000000 RSI:
0000000000000001 RDI:
ffff8dde29b34660
[82039.106673] RBP:
ffff8ddf5fbec000 R08:
0000000000000000 R09:
0000000000000000
[82039.107078] R10:
ffffac0b426a7c90 R11:
ffffffffb9aad768 R12:
ffffac0b426a7db0
[82039.107487] R13:
ffff8ddf5fbec0a0 R14:
dead000000000100 R15:
0000000000000000
[82039.107894] FS:
00007f8db96d12c0(0000) GS:
ffff8de036b00000(0000) knlGS:
0000000000000000
[82039.108309] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[82039.108723] CR2:
0000000001416108 CR3:
00000002315cc001 CR4:
00000000003606e0
[82039.109146] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[82039.109567] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[82039.109989] Call Trace:
[82039.110405] destroy_inode+0x3b/0x70
[82039.110830] btrfs_free_fs_root+0x16/0xa0 [btrfs]
[82039.111257] btrfs_free_fs_roots+0xd8/0x160 [btrfs]
[82039.111675] ? wait_for_completion+0x65/0x1a0
[82039.112101] close_ctree+0x172/0x370 [btrfs]
[82039.112519] generic_shutdown_super+0x6c/0x110
[82039.112988] kill_anon_super+0xe/0x30
[82039.113439] btrfs_kill_super+0x12/0xa0 [btrfs]
[82039.113861] deactivate_locked_super+0x3a/0x70
[82039.114278] cleanup_mnt+0x3b/0x80
[82039.114685] task_work_run+0x93/0xc0
[82039.115083] exit_to_usermode_loop+0xfa/0x100
[82039.115476] do_syscall_64+0x162/0x1d0
[82039.115863] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[82039.116254] RIP: 0033:0x7f8db8fbab37
(...)
[82039.117463] RSP: 002b:
00007ffdce35b468 EFLAGS:
00000246 ORIG_RAX:
00000000000000a6
[82039.117882] RAX:
0000000000000000 RBX:
0000560d20b00060 RCX:
00007f8db8fbab37
[82039.118330] RDX:
0000000000000001 RSI:
0000000000000000 RDI:
0000560d20b00240
[82039.118743] RBP:
0000560d20b00240 R08:
0000560d20b00270 R09:
0000000000000015
[82039.119159] R10:
00000000000006b4 R11:
0000000000000246 R12:
00007f8db94bce64
[82039.119574] R13:
0000000000000000 R14:
0000000000000000 R15:
00007ffdce35b6f0
[82039.119987] irq event stamp: 0
[82039.120387] hardirqs last enabled at (0): [<
0000000000000000>] 0x0
[82039.120787] hardirqs last disabled at (0): [<
ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
[82039.121182] softirqs last enabled at (0): [<
ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
[82039.121563] softirqs last disabled at (0): [<
0000000000000000>] 0x0
[82039.121933] ---[ end trace
f2521afa616ddccd ]---
[82039.122353] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9278 btrfs_destroy_inode+0x1bc/0x270 [btrfs]
(...)
[82039.124606] CPU: 2 PID: 13167 Comm: umount Tainted: G W 5.2.0-rc4-btrfs-next-50 #1
[82039.125008] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[82039.125801] RIP: 0010:btrfs_destroy_inode+0x1bc/0x270 [btrfs]
(...)
[82039.126998] RSP: 0018:
ffffac0b426a7d30 EFLAGS:
00010202
[82039.127399] RAX:
ffff8ddf77691158 RBX:
ffff8dde29b34660 RCX:
0000000000000002
[82039.127803] RDX:
0000000000000001 RSI:
0000000000000001 RDI:
ffff8dde29b34660
[82039.128206] RBP:
ffff8ddf5fbec000 R08:
0000000000000000 R09:
0000000000000000
[82039.128611] R10:
ffffac0b426a7c90 R11:
ffffffffb9aad768 R12:
ffffac0b426a7db0
[82039.129020] R13:
ffff8ddf5fbec0a0 R14:
dead000000000100 R15:
0000000000000000
[82039.129428] FS:
00007f8db96d12c0(0000) GS:
ffff8de036b00000(0000) knlGS:
0000000000000000
[82039.129846] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[82039.130261] CR2:
0000000001416108 CR3:
00000002315cc001 CR4:
00000000003606e0
[82039.130684] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[82039.131142] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[82039.131561] Call Trace:
[82039.131990] destroy_inode+0x3b/0x70
[82039.132417] btrfs_free_fs_root+0x16/0xa0 [btrfs]
[82039.132844] btrfs_free_fs_roots+0xd8/0x160 [btrfs]
[82039.133262] ? wait_for_completion+0x65/0x1a0
[82039.133688] close_ctree+0x172/0x370 [btrfs]
[82039.134157] generic_shutdown_super+0x6c/0x110
[82039.134575] kill_anon_super+0xe/0x30
[82039.134997] btrfs_kill_super+0x12/0xa0 [btrfs]
[82039.135415] deactivate_locked_super+0x3a/0x70
[82039.135832] cleanup_mnt+0x3b/0x80
[82039.136239] task_work_run+0x93/0xc0
[82039.136637] exit_to_usermode_loop+0xfa/0x100
[82039.137029] do_syscall_64+0x162/0x1d0
[82039.137418] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[82039.137812] RIP: 0033:0x7f8db8fbab37
(...)
[82039.139059] RSP: 002b:
00007ffdce35b468 EFLAGS:
00000246 ORIG_RAX:
00000000000000a6
[82039.139475] RAX:
0000000000000000 RBX:
0000560d20b00060 RCX:
00007f8db8fbab37
[82039.139890] RDX:
0000000000000001 RSI:
0000000000000000 RDI:
0000560d20b00240
[82039.140302] RBP:
0000560d20b00240 R08:
0000560d20b00270 R09:
0000000000000015
[82039.140719] R10:
00000000000006b4 R11:
0000000000000246 R12:
00007f8db94bce64
[82039.141138] R13:
0000000000000000 R14:
0000000000000000 R15:
00007ffdce35b6f0
[82039.141597] irq event stamp: 0
[82039.142043] hardirqs last enabled at (0): [<
0000000000000000>] 0x0
[82039.142443] hardirqs last disabled at (0): [<
ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
[82039.142839] softirqs last enabled at (0): [<
ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
[82039.143220] softirqs last disabled at (0): [<
0000000000000000>] 0x0
[82039.143588] ---[ end trace
f2521afa616ddcce ]---
[82039.167472] WARNING: CPU: 3 PID: 13167 at fs/btrfs/extent-tree.c:10120 btrfs_free_block_groups+0x30d/0x460 [btrfs]
(...)
[82039.173800] CPU: 3 PID: 13167 Comm: umount Tainted: G W 5.2.0-rc4-btrfs-next-50 #1
[82039.174847] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[82039.177031] RIP: 0010:btrfs_free_block_groups+0x30d/0x460 [btrfs]
(...)
[82039.180397] RSP: 0018:
ffffac0b426a7dd8 EFLAGS:
00010206
[82039.181574] RAX:
ffff8de010a1db40 RBX:
ffff8de010a1db40 RCX:
0000000000170014
[82039.182711] RDX:
ffff8ddff4380040 RSI:
ffff8de010a1da58 RDI:
0000000000000246
[82039.183817] RBP:
ffff8ddf5fbec000 R08:
0000000000000000 R09:
0000000000000000
[82039.184925] R10:
ffff8de036404380 R11:
ffffffffb8a5ea00 R12:
ffff8de010a1b2b8
[82039.186090] R13:
ffff8de010a1b2b8 R14:
0000000000000000 R15:
dead000000000100
[82039.187208] FS:
00007f8db96d12c0(0000) GS:
ffff8de036b80000(0000) knlGS:
0000000000000000
[82039.188345] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[82039.189481] CR2:
00007fb044005170 CR3:
00000002315cc006 CR4:
00000000003606e0
[82039.190674] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[82039.191829] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[82039.192978] Call Trace:
[82039.194160] close_ctree+0x19a/0x370 [btrfs]
[82039.195315] generic_shutdown_super+0x6c/0x110
[82039.196486] kill_anon_super+0xe/0x30
[82039.197645] btrfs_kill_super+0x12/0xa0 [btrfs]
[82039.198696] deactivate_locked_super+0x3a/0x70
[82039.199619] cleanup_mnt+0x3b/0x80
[82039.200559] task_work_run+0x93/0xc0
[82039.201505] exit_to_usermode_loop+0xfa/0x100
[82039.202436] do_syscall_64+0x162/0x1d0
[82039.203339] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[82039.204091] RIP: 0033:0x7f8db8fbab37
(...)
[82039.206360] RSP: 002b:
00007ffdce35b468 EFLAGS:
00000246 ORIG_RAX:
00000000000000a6
[82039.207132] RAX:
0000000000000000 RBX:
0000560d20b00060 RCX:
00007f8db8fbab37
[82039.207906] RDX:
0000000000000001 RSI:
0000000000000000 RDI:
0000560d20b00240
[82039.208621] RBP:
0000560d20b00240 R08:
0000560d20b00270 R09:
0000000000000015
[82039.209285] R10:
00000000000006b4 R11:
0000000000000246 R12:
00007f8db94bce64
[82039.209984] R13:
0000000000000000 R14:
0000000000000000 R15:
00007ffdce35b6f0
[82039.210642] irq event stamp: 0
[82039.211306] hardirqs last enabled at (0): [<
0000000000000000>] 0x0
[82039.211971] hardirqs last disabled at (0): [<
ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
[82039.212643] softirqs last enabled at (0): [<
ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
[82039.213304] softirqs last disabled at (0): [<
0000000000000000>] 0x0
[82039.213875] ---[ end trace
f2521afa616ddccf ]---
Fix this by releasing the reserved metadata on failure to allocate data
extent(s) for the inode cache.
Fixes: 69fe2d75dd91d0 ("btrfs: make the delalloc block rsv per inode")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Mikulas Patocka [Wed, 2 Oct 2019 10:15:53 +0000 (06:15 -0400)]
dm snapshot: rework COW throttling to fix deadlock
[ Upstream commit
b21555786f18cd77f2311ad89074533109ae3ffa ]
Commit
721b1d98fb517a ("dm snapshot: Fix excessive memory usage and
workqueue stalls") introduced a semaphore to limit the maximum number of
in-flight kcopyd (COW) jobs.
The implementation of this throttling mechanism is prone to a deadlock:
1. One or more threads write to the origin device causing COW, which is
performed by kcopyd.
2. At some point some of these threads might reach the s->cow_count
semaphore limit and block in down(&s->cow_count), holding a read lock
on _origins_lock.
3. Someone tries to acquire a write lock on _origins_lock, e.g.,
snapshot_ctr(), which blocks because the threads at step (2) already
hold a read lock on it.
4. A COW operation completes and kcopyd runs dm-snapshot's completion
callback, which ends up calling pending_complete().
pending_complete() tries to resubmit any deferred origin bios. This
requires acquiring a read lock on _origins_lock, which blocks.
This happens because the read-write semaphore implementation gives
priority to writers, meaning that as soon as a writer tries to enter
the critical section, no readers will be allowed in, until all
writers have completed their work.
So, pending_complete() waits for the writer at step (3) to acquire
and release the lock. This writer waits for the readers at step (2)
to release the read lock and those readers wait for
pending_complete() (the kcopyd thread) to signal the s->cow_count
semaphore: DEADLOCK.
The above was thoroughly analyzed and documented by Nikos Tsironis as
part of his initial proposal for fixing this deadlock, see:
https://www.redhat.com/archives/dm-devel/2019-October/msg00001.html
Fix this deadlock by reworking COW throttling so that it waits without
holding any locks. Add a variable 'in_progress' that counts how many
kcopyd jobs are running. A function wait_for_in_progress() will sleep if
'in_progress' is over the limit. It drops _origins_lock in order to
avoid the deadlock.
Reported-by: Guruswamy Basavaiah <guru2018@gmail.com>
Reported-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Tested-by: Nikos Tsironis <ntsironis@arrikto.com>
Fixes: 721b1d98fb51 ("dm snapshot: Fix excessive memory usage and workqueue stalls")
Cc: stable@vger.kernel.org # v5.0+
Depends-on:
4a3f111a73a8c ("dm snapshot: introduce account_start_copy() and account_end_copy()")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Mikulas Patocka [Wed, 2 Oct 2019 10:14:17 +0000 (06:14 -0400)]
dm snapshot: introduce account_start_copy() and account_end_copy()
[ Upstream commit
a2f83e8b0c82c9500421a26c49eb198b25fcdea3 ]
This simple refactoring moves code for modifying the semaphore cow_count
into separate functions to prepare for changes that will extend these
methods to provide for a more sophisticated mechanism for COW
throttling.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jens Axboe [Thu, 17 Oct 2019 15:20:46 +0000 (09:20 -0600)]
io_uring: fix up O_NONBLOCK handling for sockets
[ Upstream commit
491381ce07ca57f68c49c79a8a43da5b60749e32 ]
We've got two issues with the non-regular file handling for non-blocking
IO:
1) We don't want to re-do a short read in full for a non-regular file,
as we can't just read the data again.
2) For non-regular files that don't support non-blocking IO attempts,
we need to punt to async context even if the file is opened as
non-blocking. Otherwise the caller always gets -EAGAIN.
Add two new request flags to handle these cases. One is just a cache
of the inode S_ISREG() status, the other tells io_uring that we always
need to punt this request to async context, even if REQ_F_NOWAIT is set.
Cc: stable@vger.kernel.org
Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Greg Kroah-Hartman [Tue, 29 Oct 2019 08:22:48 +0000 (09:22 +0100)]
Linux 5.3.8
Greg KH [Tue, 1 Oct 2019 16:56:11 +0000 (18:56 +0200)]
RDMA/cxgb4: Do not dma memory off of the stack
commit
3840c5b78803b2b6cc1ff820100a74a092c40cbb upstream.
Nicolas pointed out that the cxgb4 driver is doing dma off of the stack,
which is generally considered a very bad thing. On some architectures it
could be a security problem, but odds are none of them actually run this
driver, so it's just a "normal" bug.
Resolve this by allocating the memory for a message off of the heap
instead of the stack. kmalloc() always will give us a proper memory
location that DMA will work correctly from.
Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com
Reported-by: Nicolas Waisman <nico@semmle.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tejun Heo [Tue, 15 Oct 2019 15:49:27 +0000 (08:49 -0700)]
blk-rq-qos: fix first node deletion of rq_qos_del()
commit
307f4065b9d7c1e887e8bdfb2487e4638559fea1 upstream.
rq_qos_del() incorrectly assigns the node being deleted to the head if
it was the first on the list in the !prev path. Fix it by iterating
with ** instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Fixes: a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chris Goldsworthy [Sun, 20 Oct 2019 01:57:24 +0000 (18:57 -0700)]
of: reserved_mem: add missing of_node_put() for proper ref-counting
commit
5dba51754b04a941a1064f584e7a7f607df3f9bc upstream.
Commit
d698a388146c ("of: reserved-memory: ignore disabled memory-region
nodes") added an early return in of_reserved_mem_device_init_by_idx(), but
didn't call of_node_put() on a device_node whose ref-count was incremented
in the call to of_parse_phandle() preceding the early exit.
Fixes: d698a388146c ("of: reserved-memory: ignore disabled memory-region nodes")
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: stable@vger.kernel.org
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Viresh Kumar [Thu, 10 Oct 2019 10:25:33 +0000 (15:55 +0530)]
opp: of: drop incorrect lockdep_assert_held()
commit
f2edbb6699b0bc6e4f789846b99007200546c6c2 upstream.
_find_opp_of_np() doesn't traverse the list of OPP tables but instead
just the entries within an OPP table and so only requires to lock the
OPP table itself.
The lockdep_assert_held() was added there by mistake and isn't really
required.
Fixes: 5d6d106fa455 ("OPP: Populate required opp tables from "required-opps" property")
Cc: v5.0+ <stable@vger.kernel.org> # v5.0+
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rafael J. Wysocki [Mon, 14 Oct 2019 11:25:00 +0000 (13:25 +0200)]
PCI: PM: Fix pci_power_up()
commit
45144d42f299455911cc29366656c7324a3a7c97 upstream.
There is an arbitrary difference between the system resume and
runtime resume code paths for PCI devices regarding the delay to
apply when switching the devices from D3cold to D0.
Namely, pci_restore_standard_config() used in the runtime resume
code path calls pci_set_power_state() which in turn invokes
__pci_start_power_transition() to power up the device through the
platform firmware and that function applies the transition delay
(as per PCI Express Base Specification Revision 2.0, Section 6.6.1).
However, pci_pm_default_resume_early() used in the system resume
code path calls pci_power_up() which doesn't apply the delay at
all and that causes issues to occur during resume from
suspend-to-idle on some systems where the delay is required.
Since there is no reason for that difference to exist, modify
pci_power_up() to follow pci_set_power_state() more closely and
invoke __pci_start_power_transition() from there to call the
platform firmware to power up the device (in case that's necessary).
Fixes: db288c9c5f9d ("PCI / PM: restore the original behavior of pci_set_power_state()")
Reported-by: Daniel Drake <drake@endlessm.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Link: https://lore.kernel.org/linux-pm/CAD8Lp44TYxrMgPLkHCqF9hv6smEurMXvmmvmtyFhZ6Q4SE+dig@mail.gmail.com/T/#m21be74af263c6a34f36e0fc5c77c5449d9406925
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Juergen Gross [Fri, 18 Oct 2019 07:45:49 +0000 (09:45 +0200)]
xen/netback: fix error path of xenvif_connect_data()
commit
3d5c1a037d37392a6859afbde49be5ba6a70a6b3 upstream.
xenvif_connect_data() calls module_put() in case of error. This is
wrong as there is no related module_get().
Remove the superfluous module_put().
Fixes: 279f438e36c0a7 ("xen-netback: Don't destroy the netdev until the vif is shut down")
Cc: <stable@vger.kernel.org> # 3.12
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jeff Layton [Thu, 26 Sep 2019 20:05:11 +0000 (16:05 -0400)]
ceph: just skip unrecognized info in ceph_reply_info_extra
commit
1d3f87233e26362fc3d4e59f0f31a71b570f90b9 upstream.
In the future, we're going to want to extend the ceph_reply_info_extra
for create replies. Currently though, the kernel code doesn't accept an
extra blob that is larger than the expected data.
Change the code to skip over any unrecognized fields at the end of the
extra blob, rather than returning -EIO.
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rafael J. Wysocki [Tue, 8 Oct 2019 23:29:10 +0000 (01:29 +0200)]
cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
commit
65650b35133ff20f0c9ef0abd5c3c66dbce3ae57 upstream.
It is incorrect to set the cpufreq syscore shutdown callback pointer
to cpufreq_suspend(), because that function cannot be run in the
syscore stage of system shutdown for two reasons: (a) it may attempt
to carry out actions depending on devices that have already been shut
down at that point and (b) the RCU synchronization carried out by it
may not be able to make progress then.
The latter issue has been present since commit
45975c7d21a1 ("rcu:
Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds"),
but the former one has been there since commit
90de2a4aa9f3 ("cpufreq:
suspend cpufreq governors on shutdown") regardless.
Fix that by dropping cpufreq_syscore_ops altogether and making
device_shutdown() call cpufreq_suspend() directly before shutting
down devices, which is along the lines of what system-wide power
management does.
Fixes: 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds")
Fixes: 90de2a4aa9f3 ("cpufreq: suspend cpufreq governors on shutdown")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christophe JAILLET [Sat, 5 Oct 2019 11:21:01 +0000 (13:21 +0200)]
memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'
commit
28c9fac09ab0147158db0baeec630407a5e9b892 upstream.
If 'jmb38x_ms_count_slots()' returns 0, we must undo the previous
'pci_request_regions()' call.
Goto 'err_out_int' to fix it.
Fixes: 60fdd931d577 ("memstick: add support for JMicron jmb38x MemoryStick host controller")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kurz [Fri, 27 Sep 2019 11:53:43 +0000 (13:53 +0200)]
KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
commit
12ade69c1eb9958b13374edf5ef742ea20ccffde upstream.
Connecting a vCPU to a XIVE KVM device means establishing a 1:1
association between a vCPU id and the offset (VP id) of a VP
structure within a fixed size block of VPs. We currently try to
enforce the 1:1 relationship by checking that a vCPU with the
same id isn't already connected. This is good but unfortunately
not enough because we don't map VP ids to raw vCPU ids but to
packed vCPU ids, and the packing function kvmppc_pack_vcpu_id()
isn't bijective by design. We got away with it because QEMU passes
vCPU ids that fit well in the packing pattern. But nothing prevents
userspace to come up with a forged vCPU id resulting in a packed id
collision which causes the KVM device to associate two vCPUs to the
same VP. This greatly confuses the irq layer and ultimately crashes
the kernel, as shown below.
Example: a guest with 1 guest thread per core, a core stride of
8 and 300 vCPUs has vCPU ids 0,8,16...2392. If QEMU is patched to
inject at some point an invalid vCPU id 348, which is the packed
version of itself and 2392, we get:
genirq: Flags mismatch irq 199.
00010000 (kvm-2-2392) vs.
00010000 (kvm-2-348)
CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38
Call Trace:
[
c000003f7f9937e0] [
c000000000c0110c] dump_stack+0xb0/0xf4 (unreliable)
[
c000003f7f993820] [
c0000000001cb480] __setup_irq+0xa70/0xad0
[
c000003f7f9938d0] [
c0000000001cb75c] request_threaded_irq+0x13c/0x260
[
c000003f7f993940] [
c00800000d44e7ac] kvmppc_xive_attach_escalation+0x104/0x270 [kvm]
[
c000003f7f9939d0] [
c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm]
[
c000003f7f993ac0] [
c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm]
[
c000003f7f993b90] [
c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm]
[
c000003f7f993d00] [
c0000000004840f0] do_vfs_ioctl+0xe0/0xc30
[
c000003f7f993db0] [
c000000000484d44] ksys_ioctl+0x104/0x120
[
c000003f7f993e00] [
c000000000484d88] sys_ioctl+0x28/0x80
[
c000003f7f993e20] [
c00000000000b278] system_call+0x5c/0x68
xive-kvm: Failed to request escalation interrupt for queue 0 of VCPU 2392
------------[ cut here ]------------
remove_proc_entry: removing non-empty directory 'irq/199', leaking at least 'kvm-2-348'
WARNING: CPU: 24 PID: 88176 at /home/greg/Work/linux/kernel-kvm-ppc/fs/proc/generic.c:684 remove_proc_entry+0x1ec/0x200
Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm]
CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38
NIP:
c00000000053b0cc LR:
c00000000053b0c8 CTR:
c0000000000ba3b0
REGS:
c000003f7f9934b0 TRAP: 0700 Not tainted (5.3.0-xive-nr-servers-5.3-gku+)
MSR:
9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR:
48228222 XER:
20040000
CFAR:
c000000000131a50 IRQMASK: 0
GPR00:
c00000000053b0c8 c000003f7f993740 c0000000015ec500 0000000000000057
GPR04:
0000000000000001 0000000000000000 000049fb98484262 0000000000001bcf
GPR08:
0000000000000007 0000000000000007 0000000000000001 9000000000001033
GPR12:
0000000000008000 c000003ffffeb800 0000000000000000 000000012f4ce5a1
GPR16:
000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000
GPR20:
000000012f45d918 c000003f863758b0 c000003f86375870 0000000000000006
GPR24:
c000003f86375a30 0000000000000007 c0002039373d9020 c0000000014c4a48
GPR28:
0000000000000001 c000003fe62a4f6b c00020394b2e9fab c000003fe62a4ec0
NIP [
c00000000053b0cc] remove_proc_entry+0x1ec/0x200
LR [
c00000000053b0c8] remove_proc_entry+0x1e8/0x200
Call Trace:
[
c000003f7f993740] [
c00000000053b0c8] remove_proc_entry+0x1e8/0x200 (unreliable)
[
c000003f7f9937e0] [
c0000000001d3654] unregister_irq_proc+0x114/0x150
[
c000003f7f993880] [
c0000000001c6284] free_desc+0x54/0xb0
[
c000003f7f9938c0] [
c0000000001c65ec] irq_free_descs+0xac/0x100
[
c000003f7f993910] [
c0000000001d1ff8] irq_dispose_mapping+0x68/0x80
[
c000003f7f993940] [
c00800000d44e8a4] kvmppc_xive_attach_escalation+0x1fc/0x270 [kvm]
[
c000003f7f9939d0] [
c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm]
[
c000003f7f993ac0] [
c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm]
[
c000003f7f993b90] [
c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm]
[
c000003f7f993d00] [
c0000000004840f0] do_vfs_ioctl+0xe0/0xc30
[
c000003f7f993db0] [
c000000000484d44] ksys_ioctl+0x104/0x120
[
c000003f7f993e00] [
c000000000484d88] sys_ioctl+0x28/0x80
[
c000003f7f993e20] [
c00000000000b278] system_call+0x5c/0x68
Instruction dump:
2c230000 41820008 3923ff78 e8e900a0 3c82ff69 3c62ff8d 7fa6eb78 7fc5f378
3884f080 3863b948 4bbf6925 60000000 <
0fe00000>
4bffff7c fba10088 4bbf6e41
---[ end trace
b925b67a74a1d8d1 ]---
BUG: Kernel NULL pointer dereference at 0x00000010
Faulting instruction address: 0xc00800000d44fc04
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm]
CPU: 24 PID: 88176 Comm: qemu-system-ppc Tainted: G W 5.3.0-xive-nr-servers-5.3-gku+ #38
NIP:
c00800000d44fc04 LR:
c00800000d44fc00 CTR:
c0000000001cd970
REGS:
c000003f7f9938e0 TRAP: 0300 Tainted: G W (5.3.0-xive-nr-servers-5.3-gku+)
MSR:
9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR:
24228882 XER:
20040000
CFAR:
c0000000001cd9ac DAR:
0000000000000010 DSISR:
40000000 IRQMASK: 0
GPR00:
c00800000d44fc00 c000003f7f993b70 c00800000d468300 0000000000000000
GPR04:
00000000000000c7 0000000000000000 0000000000000000 c000003ffacd06d8
GPR08:
0000000000000000 c000003ffacd0738 0000000000000000 fffffffffffffffd
GPR12:
0000000000000040 c000003ffffeb800 0000000000000000 000000012f4ce5a1
GPR16:
000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000
GPR20:
000000012f45d918 00007ffffe0d9a80 000000012f4f5df0 000000012ef8c9f8
GPR24:
0000000000000001 0000000000000000 c000003fe4501ed0 c000003f8b1d0000
GPR28:
c0000033314689c0 c000003fe4501c00 c000003fe4501e70 c000003fe4501e90
NIP [
c00800000d44fc04] kvmppc_xive_cleanup_vcpu+0xfc/0x210 [kvm]
LR [
c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm]
Call Trace:
[
c000003f7f993b70] [
c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] (unreliable)
[
c000003f7f993bd0] [
c00800000d450bd4] kvmppc_xive_release+0xdc/0x1b0 [kvm]
[
c000003f7f993c30] [
c00800000d436a98] kvm_device_release+0xb0/0x110 [kvm]
[
c000003f7f993c70] [
c00000000046730c] __fput+0xec/0x320
[
c000003f7f993cd0] [
c000000000164ae0] task_work_run+0x150/0x1c0
[
c000003f7f993d30] [
c000000000025034] do_notify_resume+0x304/0x440
[
c000003f7f993e20] [
c00000000000dcc4] ret_from_except_lite+0x70/0x74
Instruction dump:
3bff0008 7fbfd040 419e0054 847e0004 2fa30000 419effec e93d0000 8929203c
2f890000 419effb8 4800821d e8410018 <
e9230010>
e9490008 9b2a0039 7c0004ac
---[ end trace
b925b67a74a1d8d2 ]---
Kernel panic - not syncing: Fatal exception
This affects both XIVE and XICS-on-XIVE devices since the beginning.
Check the VP id instead of the vCPU id when a new vCPU is connected.
The allocation of the XIVE CPU structure in kvmppc_xive_connect_vcpu()
is moved after the check to avoid the need for rollback.
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qu Wenruo [Thu, 17 Oct 2019 02:38:37 +0000 (10:38 +0800)]
btrfs: tracepoints: Fix bad entry members of qgroup events
commit
1b2442b4ae0f234daeadd90e153b466332c466d8 upstream.
[BUG]
For btrfs:qgroup_meta_reserve event, the trace event can output garbage:
qgroup_meta_reserve:
9c7f6acc-b342-4037-bc47-
7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2
qgroup_meta_reserve:
9c7f6acc-b342-4037-bc47-
7f6e4d2232d7: refroot=5(FS_TREE) type=0x258792 diff=2
The @type can be completely garbage, as DATA type is not possible for
trace_qgroup_meta_reserve() trace event.
[CAUSE]
Ther are several problems related to qgroup trace events:
- Unassigned entry member
Member entry::type of trace_qgroup_update_reserve() and
trace_qgourp_meta_reserve() is not assigned
- Redundant entry member
Member entry::type is completely useless in
trace_qgroup_meta_convert()
Fixes: 4ee0d8832c2e ("btrfs: qgroup: Update trace events for metadata reservation")
CC: stable@vger.kernel.org # 4.10+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qu Wenruo [Thu, 17 Oct 2019 02:38:36 +0000 (10:38 +0800)]
btrfs: tracepoints: Fix wrong parameter order for qgroup events
commit
fd2b007eaec898564e269d1f478a2da0380ecf51 upstream.
[BUG]
For btrfs:qgroup_meta_reserve event, the trace event can output garbage:
qgroup_meta_reserve:
9c7f6acc-b342-4037-bc47-
7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2
The diff should always be alinged to sector size (4k), so there is
definitely something wrong.
[CAUSE]
For the wrong @diff, it's caused by wrong parameter order.
The correct parameters are:
struct btrfs_root, s64 diff, int type.
However the parameters used are:
struct btrfs_root, int type, s64 diff.
Fixes: 4ee0d8832c2e ("btrfs: qgroup: Update trace events for metadata reservation")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Wed, 16 Oct 2019 15:28:52 +0000 (16:28 +0100)]
Btrfs: check for the full sync flag while holding the inode lock during fsync
commit
ba0b084ac309283db6e329785c1dc4f45fdbd379 upstream.
We were checking for the full fsync flag in the inode before locking the
inode, which is racy, since at that that time it might not be set but
after we acquire the inode lock some other task set it. One case where
this can happen is on a system low on memory and some concurrent task
failed to allocate an extent map and therefore set the full sync flag on
the inode, to force the next fsync to work in full mode.
A consequence of missing the full fsync flag set is hitting the problems
fixed by commit
0c713cbab620 ("Btrfs: fix race between ranged fsync and
writeback of adjacent ranges"), BUG_ON() when dropping extents from a log
tree, hitting assertion failures at tree-log.c:copy_items() or all sorts
of weird inconsistencies after replaying a log due to file extents items
representing ranges that overlap.
So just move the check such that it's done after locking the inode and
before starting writeback again.
Fixes: 0c713cbab620 ("Btrfs: fix race between ranged fsync and writeback of adjacent ranges")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Tue, 15 Oct 2019 09:54:39 +0000 (10:54 +0100)]
Btrfs: fix qgroup double free after failure to reserve metadata for delalloc
commit
c7967fc1499beb9b70bb9d33525fb0b384af8883 upstream.
If we fail to reserve metadata for delalloc operations we end up releasing
the previously reserved qgroup amount twice, once explicitly under the
'out_qgroup' label by calling btrfs_qgroup_free_meta_prealloc() and once
again, under label 'out_fail', by calling btrfs_inode_rsv_release() with a
value of 'true' for its 'qgroup_free' argument, which results in
btrfs_qgroup_free_meta_prealloc() being called again, so we end up having
a double free.
Also if we fail to reserve the necessary qgroup amount, we jump to the
label 'out_fail', which calls btrfs_inode_rsv_release() and that in turns
calls btrfs_qgroup_free_meta_prealloc(), even though we weren't able to
reserve any qgroup amount. So we freed some amount we never reserved.
So fix this by removing the call to btrfs_inode_rsv_release() in the
failure path, since it's not necessary at all as we haven't changed the
inode's block reserve in any way at this point.
Fixes: c8eaeac7b73434 ("btrfs: reserve delalloc metadata differently")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Sterba [Sat, 12 Oct 2019 16:42:10 +0000 (18:42 +0200)]
btrfs: don't needlessly create extent-refs kernel thread
commit
80ed4548d0711d15ca51be5dee0ff813051cfc90 upstream.
The patch
32b593bfcb58 ("Btrfs: remove no longer used function to run
delayed refs asynchronously") removed the async delayed refs but the
thread has been created, without any use. Remove it to avoid resource
consumption.
Fixes: 32b593bfcb58 ("Btrfs: remove no longer used function to run delayed refs asynchronously")
CC: stable@vger.kernel.org # 5.2+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Wed, 9 Oct 2019 16:43:45 +0000 (17:43 +0100)]
Btrfs: add missing extents release on file extent cluster relocation error
commit
44db1216efe37bf670f8d1019cdc41658d84baf5 upstream.
If we error out when finding a page at relocate_file_extent_cluster(), we
need to release the outstanding extents counter on the relocation inode,
set by the previous call to btrfs_delalloc_reserve_metadata(), otherwise
the inode's block reserve size can never decrease to zero and metadata
space is leaked. Therefore add a call to btrfs_delalloc_release_extents()
in case we can't find the target page.
Fixes: 8b62f87bad9c ("Btrfs: rework outstanding_extents")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qu Wenruo [Thu, 10 Oct 2019 02:39:26 +0000 (10:39 +0800)]
btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
commit
4b654acdae850f48b8250b9a578a4eaa518c7a6f upstream.
In btrfs_read_block_groups(), if we have an invalid block group which
has mixed type (DATA|METADATA) while the fs doesn't have MIXED_GROUPS
feature, we error out without freeing the block group cache.
This patch will add the missing btrfs_put_block_group() to prevent
memory leak.
Note for stable backports: the file to patch in versions <= 5.3 is
fs/btrfs/extent-tree.c
Fixes: 49303381f19a ("Btrfs: bail out if block group has different mixed flag")
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patrick Williams [Tue, 1 Oct 2019 15:51:38 +0000 (10:51 -0500)]
pinctrl: armada-37xx: swap polarity on LED group
commit
b835d6953009dc350d61402a854b5a7178d8c615 upstream.
The configuration registers for the LED group have inverted
polarity, which puts the GPIO into open-drain state when used in
GPIO mode. Switch to '0' for GPIO and '1' for LED modes.
Fixes: 87466ccd9401 ("pinctrl: armada-37xx: Add pin controller support for Armada 37xx")
Signed-off-by: Patrick Williams <alpawi@amazon.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191001155154.99710-1-alpawi@amazon.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patrick Williams [Tue, 1 Oct 2019 15:46:31 +0000 (10:46 -0500)]
pinctrl: armada-37xx: fix control of pins 32 and up
commit
20504fa1d2ffd5d03cdd9dc9c9dd4ed4579b97ef upstream.
The 37xx configuration registers are only 32 bits long, so
pins 32-35 spill over into the next register. The calculation
for the register address was done, but the bitmask was not, so
any configuration to pin 32 or above resulted in a bitmask that
overflowed and performed no action.
Fix the register / offset calculation to also adjust the offset.
Fixes: 5715092a458c ("pinctrl: armada-37xx: Add gpio support")
Signed-off-by: Patrick Williams <alpawi@amazon.com>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191001154634.96165-1-alpawi@amazon.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry Torokhov [Tue, 24 Sep 2019 02:49:58 +0000 (19:49 -0700)]
pinctrl: cherryview: restore Strago DMI workaround for all versions
commit
260996c30f4f3a732f45045e3e0efe27017615e4 upstream.
This is essentially a revert of:
e3f72b749da2 pinctrl: cherryview: fix Strago DMI workaround
86c5dd6860a6 pinctrl: cherryview: limit Strago DMI workarounds to version 1.0
because even with 1.1 versions of BIOS there are some pins that are
configured as interrupts but not claimed by any driver, and they
sometimes fire up and result in interrupt storms that cause touchpad
stop functioning and other issues.
Given that we are unlikely to qualify another firmware version for a
while it is better to keep the workaround active on all Strago boards.
Reported-by: Alex Levin <levinale@chromium.org>
Fixes: 86c5dd6860a6 ("pinctrl: cherryview: limit Strago DMI workarounds to version 1.0")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Alex Levin <levinale@chromium.org>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Roman Kagan [Thu, 10 Oct 2019 12:33:05 +0000 (12:33 +0000)]
x86/hyperv: Make vapic support x2apic mode
commit
e211288b72f15259da86eed6eca680758dbe9e74 upstream.
Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode
when supported by the vcpus.
However, the apic access functions for Hyper-V enlightened apic assume
xapic mode only.
As a result, Linux fails to bring up secondary cpus when run as a guest
in QEMU/KVM with both hv_apic and x2apic enabled.
According to Michael Kelley, when in x2apic mode, the Hyper-V synthetic
apic MSRs behave exactly the same as the corresponding architectural
x2apic MSRs, so there's no need to override the apic accessors. The
only exception is hv_apic_eoi_write, which benefits from lazy EOI when
available; however, its implementation works for both xapic and x2apic
modes.
Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
Fixes: 6b48cb5f8347 ("X86/Hyper-V: Enlighten APIC access")
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191010123258.16919-1-rkagan@virtuozzo.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sean Christopherson [Tue, 1 Oct 2019 20:50:19 +0000 (13:50 -0700)]
x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
commit
7a22e03b0c02988e91003c505b34d752a51de344 upstream.
Check that the per-cpu cluster mask pointer has been set prior to
clearing a dying cpu's bit. The per-cpu pointer is not set until the
target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the
teardown function, x2apic_dead_cpu(), is associated with the earlier
CPUHP_X2APIC_PREPARE. If an error occurs before the cpu is awakened,
e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference
the NULL pointer and cause a panic.
smpboot: do_boot_cpu failed(-22) to wakeup CPU#1
BUG: kernel NULL pointer dereference, address:
0000000000000008
RIP: 0010:x2apic_dead_cpu+0x1a/0x30
Call Trace:
cpuhp_invoke_callback+0x9a/0x580
_cpu_up+0x10d/0x140
do_cpu_up+0x69/0xb0
smp_init+0x63/0xa9
kernel_init_freeable+0xd7/0x229
? rest_init+0xa0/0xa0
kernel_init+0xa/0x100
ret_from_fork+0x35/0x40
Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191001205019.5789-1-sean.j.christopherson@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steve Wahl [Tue, 24 Sep 2019 21:03:55 +0000 (16:03 -0500)]
x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
commit
2aa85f246c181b1fa89f27e8e20c5636426be624 upstream.
Our hardware (UV aka Superdome Flex) has address ranges marked
reserved by the BIOS. Access to these ranges is caught as an error,
causing the BIOS to halt the system.
Initial page tables mapped a large range of physical addresses that
were not checked against the list of BIOS reserved addresses, and
sometimes included reserved addresses in part of the mapped range.
Including the reserved range in the map allowed processor speculative
accesses to the reserved range, triggering a BIOS halt.
Used early in booting, the page table level2_kernel_pgt addresses 1
GiB divided into 2 MiB pages, and it was set up to linearly map a full
1 GiB of physical addresses that included the physical address range
of the kernel image, as chosen by KASLR. But this also included a
large range of unused addresses on either side of the kernel image.
And unlike the kernel image's physical address range, this extra
mapped space was not checked against the BIOS tables of usable RAM
addresses. So there were times when the addresses chosen by KASLR
would result in processor accessible mappings of BIOS reserved
physical addresses.
The kernel code did not directly access any of this extra mapped
space, but having it mapped allowed the processor to issue speculative
accesses into reserved memory, causing system halts.
This was encountered somewhat rarely on a normal system boot, and much
more often when starting the crash kernel if "crashkernel=512M,high"
was specified on the command line (this heavily restricts the physical
address of the crash kernel, in our case usually within 1 GiB of
reserved space).
The solution is to invalidate the pages of this table outside the kernel
image's space before the page table is activated. It fixes this problem
on our hardware.
[ bp: Touchups. ]
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Zyngier [Sun, 15 Sep 2019 14:17:45 +0000 (15:17 +0100)]
irqchip/sifive-plic: Switch to fasteoi flow
commit
bb0fed1c60cccbe4063b455a7228818395dac86e upstream.
The SiFive PLIC interrupt controller seems to have all the HW
features to support the fasteoi flow, but the driver seems to be
stuck in a distant past. Bring it into the 21st century.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Palmer Dabbelt <palmer@sifive.com> (QEMU Boot)
Tested-by: Darius Rad <darius@bluespec.com> (on 2 HW PLIC implementations)
Tested-by: Paul Walmsley <paul.walmsley@sifive.com> (HiFive Unleashed)
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/8636gxskmj.wl-maz@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mikulas Patocka [Wed, 16 Oct 2019 13:21:50 +0000 (09:21 -0400)]
dm cache: fix bugs when a GFP_NOWAIT allocation fails
commit
13bd677a472d534bf100bab2713efc3f9e3f5978 upstream.
GFP_NOWAIT allocation can fail anytime - it doesn't wait for memory being
available and it fails if the mempool is exhausted and there is not enough
memory.
If we go down this path:
map_bio -> mg_start -> alloc_migration -> mempool_alloc(GFP_NOWAIT)
we can see that map_bio() doesn't check the return value of mg_start(),
and the bio is leaked.
If we go down this path:
map_bio -> mg_start -> mg_lock_writes -> alloc_prison_cell ->
dm_bio_prison_alloc_cell_v2 -> mempool_alloc(GFP_NOWAIT) ->
mg_lock_writes -> mg_complete
the bio is ended with an error - it is unacceptable because it could
cause filesystem corruption if the machine ran out of memory
temporarily.
Change GFP_NOWAIT to GFP_NOIO, so that the mempool code will properly
wait until memory becomes available. mempool_alloc with GFP_NOIO can't
fail, so remove the code paths that deal with allocation failure.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Williams [Mon, 21 Oct 2019 16:29:20 +0000 (09:29 -0700)]
fs/dax: Fix pmd vs pte conflict detection
commit
6370740e5f8ef12de7f9a9bf48a0393d202cd827 upstream.
Users reported a v5.3 performance regression and inability to establish
huge page mappings. A revised version of the ndctl "dax.sh" huge page
unit test identifies commit
23c84eb78375 "dax: Fix missed wakeup with
PMD faults" as the source.
Update get_unlocked_entry() to check for NULL entries before checking
the entry order, otherwise NULL is misinterpreted as a present pte
conflict. The 'order' check needs to happen before the locked check as
an unlocked entry at the wrong order must fallback to lookup the correct
order.
Reported-by: Jeff Smits <jeff.smits@intel.com>
Reported-by: Doug Nelson <doug.nelson@intel.com>
Cc: <stable@vger.kernel.org>
Fixes: 23c84eb78375 ("dax: Fix missed wakeup with PMD faults")
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Link: https://lore.kernel.org/r/157167532455.3945484.11971474077040503994.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Prateek Sood [Tue, 15 Oct 2019 06:17:25 +0000 (11:47 +0530)]
tracing: Fix race in perf_trace_buf initialization
commit
6b1340cc00edeadd52ebd8a45171f38c8de2a387 upstream.
A race condition exists while initialiazing perf_trace_buf from
perf_trace_init() and perf_kprobe_init().
CPU0 CPU1
perf_trace_init()
mutex_lock(&event_mutex)
perf_trace_event_init()
perf_trace_event_reg()
total_ref_count == 0
buf = alloc_percpu()
perf_trace_buf[i] = buf
tp_event->class->reg() //fails perf_kprobe_init()
goto fail perf_trace_event_init()
perf_trace_event_reg()
fail:
total_ref_count == 0
total_ref_count == 0
buf = alloc_percpu()
perf_trace_buf[i] = buf
tp_event->class->reg()
total_ref_count++
free_percpu(perf_trace_buf[i])
perf_trace_buf[i] = NULL
Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0,
causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf
getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring
event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should
fix this race.
The race caused the following bug:
Unable to handle kernel paging request at virtual address
0000003106f2003c
Mem abort info:
ESR = 0x96000045
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000045
CM = 0, WnR = 1
user pgtable: 4k pages, 39-bit VAs, pgdp =
ffffffc034b9b000
[
0000003106f2003c] pgd=
0000000000000000, pud=
0000000000000000
Internal error: Oops:
96000045 [#1] PREEMPT SMP
Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000)
pstate:
80400005 (Nzcv daif +PAN -UAO)
pc : __memset+0x20/0x1ac
lr : memset+0x3c/0x50
sp :
ffffffc09319fc50
__memset+0x20/0x1ac
perf_trace_buf_alloc+0x140/0x1a0
perf_trace_sys_enter+0x158/0x310
syscall_trace_enter+0x348/0x7c0
el0_svc_common+0x11c/0x368
el0_svc_handler+0x12c/0x198
el0_svc+0x8/0xc
Ramdumps showed the following:
total_ref_count = 3
perf_trace_buf = (
0x0 -> NULL,
0x0 -> NULL,
0x0 -> NULL,
0x0 -> NULL)
Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org
Cc: stable@vger.kernel.org
Fixes: e12f03d7031a9 ("perf/core: Implement the 'perf_kprobe' PMU")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Shishkin [Tue, 22 Oct 2019 07:39:40 +0000 (10:39 +0300)]
perf/aux: Fix AUX output stopping
commit
f3a519e4add93b7b31a6616f0b09635ff2e6a159 upstream.
Commit:
8a58ddae2379 ("perf/core: Fix exclusive events' grouping")
allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:
Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pavel Shilovsky [Wed, 23 Oct 2019 22:37:19 +0000 (15:37 -0700)]
CIFS: Fix use after free of file info structures
commit
1a67c415965752879e2e9fad407bc44fc7f25f23 upstream.
Currently the code assumes that if a file info entry belongs
to lists of open file handles of an inode and a tcon then
it has non-zero reference. The recent changes broke that
assumption when putting the last reference of the file info.
There may be a situation when a file is being deleted but
nothing prevents another thread to reference it again
and start using it. This happens because we do not hold
the inode list lock while checking the number of references
of the file info structure. Fix this by doing the proper
locking when doing the check.
Fixes: 487317c99477d ("cifs: add spinlock for the openFileList to cifsInodeInfo")
Fixes: cb248819d209d ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic")
Cc: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chuhong Yuan [Mon, 14 Oct 2019 07:15:31 +0000 (15:15 +0800)]
cifs: Fix missed free operations
commit
783bf7b8b641167fb6f3f4f787f60ae62bad41b3 upstream.
cifs_setattr_nounix has two paths which miss free operations
for xid and fullpath.
Use goto cifs_setattr_exit like other paths to fix them.
CC: Stable <stable@vger.kernel.org>
Fixes: aa081859b10c ("cifs: flush before set-info if we have writeable handles")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Roberto Bergantinos Corpas [Mon, 14 Oct 2019 08:59:23 +0000 (10:59 +0200)]
CIFS: avoid using MID 0xFFFF
commit
03d9a9fe3f3aec508e485dd3dcfa1e99933b4bdb upstream.
According to MS-CIFS specification MID 0xFFFF should not be used by the
CIFS client, but we actually do. Besides, this has proven to cause races
leading to oops between SendReceive2/cifs_demultiplex_thread. On SMB1,
MID is a 2 byte value easy to reach in CurrentMid which may conflict with
an oplock break notification request coming from server
Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Zyngier [Fri, 13 Sep 2019 09:57:50 +0000 (10:57 +0100)]
arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
commit
603afdc9438ac546181e843f807253d75d3dbc45 upstream.
Allow the user to select the workaround for TX2-219, and update
the silicon-errata.rst file to reflect this.
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Zyngier [Tue, 9 Apr 2019 15:26:21 +0000 (16:26 +0100)]
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
commit
93916beb70143c46bf1d2bacf814be3a124b253b upstream.
It appears that the only case where we need to apply the TX2_219_TVM
mitigation is when the core is in SMT mode. So let's condition the
enabling on detecting a CPU whose MPIDR_EL1.Aff0 is non-zero.
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Zyngier [Tue, 9 Apr 2019 15:22:24 +0000 (16:22 +0100)]
arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
commit
9405447ef79bc93101373e130f72e9e6cbf17dbb upstream.
As a PRFM instruction racing against a TTBR update can have undesirable
effects on TX2, NOP-out such PRFM on cores that are affected by
the TX2-219 erratum.
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Zyngier [Thu, 7 Feb 2019 16:01:21 +0000 (16:01 +0000)]
arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
commit
d3ec3a08fa700c8b46abb137dce4e2514a6f9668 upstream.
In order to workaround the TX2-219 erratum, it is necessary to trap
TTBRx_EL1 accesses to EL2. This is done by setting HCR_EL2.TVM on
guest entry, which has the side effect of trapping all the other
VM-related sysregs as well.
To minimize the overhead, a fast path is used so that we don't
have to go all the way back to the main sysreg handling code,
unless the rest of the hypervisor expects to see these accesses.
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>