Jason A. Donenfeld [Tue, 8 Feb 2022 11:40:14 +0000 (12:40 +0100)]
random: inline leaves of rand_initialize()
commit
8566417221fcec51346ec164e920dacb979c6b5f upstream.
This is a preparatory commit for the following one. We simply inline the
various functions that rand_initialize() calls that have no other
callers. The compiler was doing this anyway before. Doing this will
allow us to reorganize this after. We can then move the trust_cpu and
parse_trust_cpu definitions a bit closer to where they're actually used,
which makes the code easier to read.
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason A. Donenfeld [Sun, 6 Feb 2022 22:51:41 +0000 (23:51 +0100)]
random: get rid of secondary crngs
commit
a9412d510ab9a9ba411fea612903631d2e1f1601 upstream.
As the comment said, this is indeed a "hack". Since it was introduced,
it's been a constant state machine nightmare, with lots of subtle early
boot issues and a wildly complex set of machinery to keep everything in
sync. Rather than continuing to play whack-a-mole with this approach,
this commit simply removes it entirely. This commit is preparation for
"random: use simpler fast key erasure flow on per-cpu keys" in this
series, which introduces a simpler (and faster) mechanism to accomplish
the same thing.
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason A. Donenfeld [Tue, 8 Feb 2022 11:18:33 +0000 (12:18 +0100)]
random: use RDSEED instead of RDRAND in entropy extraction
commit
28f425e573e906a4c15f8392cc2b1561ef448595 upstream.
When /dev/random was directly connected with entropy extraction, without
any expansion stage, extract_buf() was called for every 10 bytes of data
read from /dev/random. For that reason, RDRAND was used rather than
RDSEED. At the same time, crng_reseed() was still only called every 5
minutes, so there RDSEED made sense.
Those olden days were also a time when the entropy collector did not use
a cryptographic hash function, which meant most bets were off in terms
of real preimage resistance. For that reason too it didn't matter
_that_ much whether RDSEED was mixed in before or after entropy
extraction; both choices were sort of bad.
But now we have a cryptographic hash function at work, and with that we
get real preimage resistance. We also now only call extract_entropy()
every 5 minutes, rather than every 10 bytes. This allows us to do two
important things.
First, we can switch to using RDSEED in extract_entropy(), as Dominik
suggested. Second, we can ensure that RDSEED input always goes into the
cryptographic hash function with other things before being used
directly. This eliminates a category of attacks in which the CPU knows
the current state of the crng and knows that we're going to xor RDSEED
into it, and so it computes a malicious RDSEED. By going through our
hash function, it would require the CPU to compute a preimage on the
fly, which isn't going to happen.
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Suggested-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dominik Brodowski [Sat, 5 Feb 2022 10:34:57 +0000 (11:34 +0100)]
random: fix locking in crng_fast_load()
commit
7c2fe2b32bf76441ff5b7a425b384e5f75aa530a upstream.
crng_init is protected by primary_crng->lock, so keep holding that lock
when incrementing crng_init from 0 to 1 in crng_fast_load(). The call to
pr_notice() can wait until the lock is released; this code path cannot
be reached twice, as crng_fast_load() aborts early if crng_init > 0.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason A. Donenfeld [Fri, 28 Jan 2022 22:29:45 +0000 (23:29 +0100)]
random: remove batched entropy locking
commit
77760fd7f7ae3dfd03668204e708d1568d75447d upstream.
Rather than use spinlocks to protect batched entropy, we can instead
disable interrupts locally, since we're dealing with per-cpu data, and
manage resets with a basic generation counter. At the same time, we
can't quite do this on PREEMPT_RT, where we still want spinlocks-as-
mutexes semantics. So we use a local_lock_t, which provides the right
behavior for each. Because this is a per-cpu lock, that generation
counter is still doing the necessary CPU-to-CPU communication.
This should improve performance a bit. It will also fix the linked splat
that Jonathan received with a PROVE_RAW_LOCK_NESTING=y.
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Tested-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Biggers [Fri, 4 Feb 2022 22:17:33 +0000 (14:17 -0800)]
random: remove use_input_pool parameter from crng_reseed()
commit
5d58ea3a31cc98b9fa563f6921d3d043bf0103d1 upstream.
The primary_crng is always reseeded from the input_pool, while the NUMA
crngs are always reseeded from the primary_crng. Remove the redundant
'use_input_pool' parameter from crng_reseed() and just directly check
whether the crng is the primary_crng.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason A. Donenfeld [Fri, 4 Feb 2022 00:45:53 +0000 (01:45 +0100)]
random: make credit_entropy_bits() always safe
commit
a49c010e61e1938be851f5e49ac219d49b704103 upstream.
This is called from various hwgenerator drivers, so rather than having
one "safe" version for userspace and one "unsafe" version for the
kernel, just make everything safe; the checks are cheap and sensible to
have anyway.
Reported-by: Sultan Alsawaf <sultan@kerneltoast.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason A. Donenfeld [Sat, 5 Feb 2022 13:00:58 +0000 (14:00 +0100)]
random: always wake up entropy writers after extraction
commit
489c7fc44b5740d377e8cfdbf0851036e493af00 upstream.
Now that POOL_BITS == POOL_MIN_BITS, we must unconditionally wake up
entropy writers after every extraction. Therefore there's no point of
write_wakeup_threshold, so we can move it to the dustbin of unused
compatibility sysctls. While we're at it, we can fix a small comparison
where we were waking up after <= min rather than < min.
Cc: Theodore Ts'o <tytso@mit.edu>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason A. Donenfeld [Thu, 3 Feb 2022 12:28:06 +0000 (13:28 +0100)]
random: use linear min-entropy accumulation crediting
commit
c570449094844527577c5c914140222cb1893e3f upstream.
30e37ec516ae ("random: account for entropy loss due to overwrites")
assumed that adding new entropy to the LFSR pool probabilistically
cancelled out old entropy there, so entropy was credited asymptotically,
approximating Shannon entropy of independent sources (rather than a
stronger min-entropy notion) using 1/8th fractional bits and replacing
a constant 2-2/√𝑒 term (~0.786938) with 3/4 (0.75) to slightly
underestimate it. This wasn't superb, but it was perhaps better than
nothing, so that's what was done. Which entropy specifically was being
cancelled out and how much precisely each time is hard to tell, though
as I showed with the attack code in my previous commit, a motivated
adversary with sufficient information can actually cancel out
everything.
Since we're no longer using an LFSR for entropy accumulation, this
probabilistic cancellation is no longer relevant. Rather, we're now
using a computational hash function as the accumulator and we've
switched to working in the random oracle model, from which we can now
revisit the question of min-entropy accumulation, which is done in
detail in <https://eprint.iacr.org/2019/198>.
Consider a long input bit string that is built by concatenating various
smaller independent input bit strings. Each one of these inputs has a
designated min-entropy, which is what we're passing to
credit_entropy_bits(h). When we pass the concatenation of these to a
random oracle, it means that an adversary trying to receive back the
same reply as us would need to become certain about each part of the
concatenated bit string we passed in, which means becoming certain about
all of those h values. That means we can estimate the accumulation by
simply adding up the h values in calls to credit_entropy_bits(h);
there's no probabilistic cancellation at play like there was said to be
for the LFSR. Incidentally, this is also what other entropy accumulators
based on computational hash functions do as well.
So this commit replaces credit_entropy_bits(h) with essentially `total =
min(POOL_BITS, total + h)`, done with a cmpxchg loop as before.
What if we're wrong and the above is nonsense? It's not, but let's
assume we don't want the actual _behavior_ of the code to change much.
Currently that behavior is not extracting from the input pool until it
has 128 bits of entropy in it. With the old algorithm, we'd hit that
magic 128 number after roughly 256 calls to credit_entropy_bits(1). So,
we can retain more or less the old behavior by waiting to extract from
the input pool until it hits 256 bits of entropy using the new code. For
people concerned about this change, it means that there's not that much
practical behavioral change. And for folks actually trying to model
the behavior rigorously, it means that we have an even higher margin
against attacks.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason A. Donenfeld [Wed, 2 Feb 2022 12:30:03 +0000 (13:30 +0100)]
random: simplify entropy debiting
commit
9c07f57869e90140080cfc282cc628d123e27704 upstream.
Our pool is 256 bits, and we only ever use all of it or don't use it at
all, which is decided by whether or not it has at least 128 bits in it.
So we can drastically simplify the accounting and cmpxchg loop to do
exactly this. While we're at it, we move the minimum bit size into a
constant so it can be shared between the two places where it matters.
The reason we want any of this is for the case in which an attacker has
compromised the current state, and then bruteforces small amounts of
entropy added to it. By demanding a particular minimum amount of entropy
be present before reseeding, we make that bruteforcing difficult.
Note that this rationale no longer includes anything about /dev/random
blocking at the right moment, since /dev/random no longer blocks (except
for at ~boot), but rather uses the crng. In a former life, /dev/random
was different and therefore required a more nuanced account(), but this
is no longer.
Behaviorally, nothing changes here. This is just a simplification of
the code.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason A. Donenfeld [Sun, 16 Jan 2022 13:23:10 +0000 (14:23 +0100)]
random: use computational hash for entropy extraction
commit
6e8ec2552c7d13991148e551e3325a624d73fac6 upstream.
The current 4096-bit LFSR used for entropy collection had a few
desirable attributes for the context in which it was created. For
example, the state was huge, which meant that /dev/random would be able
to output quite a bit of accumulated entropy before blocking. It was
also, in its time, quite fast at accumulating entropy byte-by-byte,
which matters given the varying contexts in which mix_pool_bytes() is
called. And its diffusion was relatively high, which meant that changes
would ripple across several words of state rather quickly.
However, it also suffers from a few security vulnerabilities. In
particular, inputs learned by an attacker can be undone, but moreover,
if the state of the pool leaks, its contents can be controlled and
entirely zeroed out. I've demonstrated this attack with this SMT2
script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in
a matter of seconds on a single core of my laptop, resulting in little
proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>.
For basically all recent formal models of RNGs, these attacks represent
a significant cryptographic flaw. But how does this manifest
practically? If an attacker has access to the system to such a degree
that he can learn the internal state of the RNG, arguably there are
other lower hanging vulnerabilities -- side-channel, infoleak, or
otherwise -- that might have higher priority. On the other hand, seed
files are frequently used on systems that have a hard time generating
much entropy on their own, and these seed files, being files, often leak
or are duplicated and distributed accidentally, or are even seeded over
the Internet intentionally, where their contents might be recorded or
tampered with. Seen this way, an otherwise quasi-implausible
vulnerability is a bit more practical than initially thought.
Another aspect of the current mix_pool_bytes() function is that, while
its performance was arguably competitive for the time in which it was
created, it's no longer considered so. This patch improves performance
significantly: on a high-end CPU, an i7-11850H, it improves performance
of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it
improves performance by 103%.
This commit replaces the LFSR of mix_pool_bytes() with a straight-
forward cryptographic hash function, BLAKE2s, which is already in use
for pool extraction. Universal hashing with a secret seed was considered
too, something along the lines of <https://eprint.iacr.org/2013/338>,
but the requirement for a secret seed makes for a chicken & egg problem.
Instead we go with a formally proven scheme using a computational hash
function, described in sections 5.1, 6.4, and B.1.8 of
<https://eprint.iacr.org/2019/198>.
BLAKE2s outputs 256 bits, which should give us an appropriate amount of
min-entropy accumulation, and a wide enough margin of collision
resistance against active attacks. mix_pool_bytes() becomes a simple
call to blake2s_update(), for accumulation, while the extraction step
becomes a blake2s_final() to generate a seed, with which we can then do
a HKDF-like or BLAKE2X-like expansion, the first part of which we fold
back as an init key for subsequent blake2s_update()s, and the rest we
produce to the caller. This then is provided to our CRNG like usual. In
that expansion step, we make opportunistic use of 32 bytes of RDRAND
output, just as before. We also always reseed the crng with 32 bytes,
unconditionally, or not at all, rather than sometimes with 16 as before,
as we don't win anything by limiting beyond the 16 byte threshold.
Going for a hash function as an entropy collector is a conservative,
proven approach. The result of all this is a much simpler and much less
bespoke construction than what's there now, which not only plugs a
vulnerability but also improves performance considerably.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paolo Bonzini [Fri, 20 May 2022 17:48:11 +0000 (13:48 -0400)]
KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID
commit
9f46c187e2e680ecd9de7983e4d081c3391acc76 upstream.
With shadow paging enabled, the INVPCID instruction results in a call
to kvm_mmu_invpcid_gva. If INVPCID is executed with CR0.PG=0, the
invlpg callback is not set and the result is a NULL pointer dereference.
Fix it trivially by checking for mmu->invlpg before every call.
There are other possibilities:
- check for CR0.PG, because KVM (like all Intel processors after P5)
flushes guest TLB on CR0.PG changes so that INVPCID/INVLPG are a
nop with paging disabled
- check for EFER.LMA, because KVM syncs and flushes when switching
MMU contexts outside of 64-bit mode
All of these are tricky, go for the simple solution. This is CVE-2022-1789.
Reported-by: Yongkang Jia <kangel@zju.edu.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[fix conflict due to missing
b9e5603c2a3accbadfec570ac501a54431a6bdba]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Basavaraj Natikar [Mon, 9 May 2022 13:20:20 +0000 (18:50 +0530)]
HID: amd_sfh: Add support for sensor discovery
commit
b5d7f43e97dabfa04a4be5ff027ce7da119332be upstream.
Sensor discovery status fails in case of broken sensors or
platform not supported. Hence disable driver on failure
of sensor discovery.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Mario Limonciello <Mario.Limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Wed, 25 May 2022 12:43:10 +0000 (14:43 +0200)]
Linux 5.17.11
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mat Martineau [Tue, 17 May 2022 18:02:12 +0000 (11:02 -0700)]
mptcp: Do TCP fallback on early DSS checksum failure
commit
ae66fb2ba6c3dcaf8b9612b65aa949a1a4bed150 upstream.
RFC 8684 section 3.7 describes several opportunities for a MPTCP
connection to "fall back" to regular TCP early in the connection
process, before it has been confirmed that MPTCP options can be
successfully propagated on all SYN, SYN/ACK, and data packets. If a peer
acknowledges the first received data packet with a regular TCP header
(no MPTCP options), fallback is allowed.
If the recipient of that first data packet finds a MPTCP DSS checksum
error, this provides an opportunity to fail gracefully with a TCP
fallback rather than resetting the connection (as might happen if a
checksum failure were detected later).
This commit modifies the checksum failure code to attempt fallback on
the initial subflow of a MPTCP connection, only if it's a failure in the
first data mapping. In cases where the peer initiates the connection,
requests checksums, is the first to send data, and the peer is sending
incorrect checksums (see
https://github.com/multipath-tcp/mptcp_net-next/issues/275), this allows
the connection to proceed as TCP rather than reset.
Fixes: dd8bcd1768ff ("mptcp: validate the data checksum")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[mathew.j.martineau: backport: Resolved bitfield conflict in protocol.h]
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Wed, 25 May 2022 07:59:15 +0000 (09:59 +0200)]
Linux 5.17.10
Link: https://lore.kernel.org/r/20220523165830.581652127@linuxfoundation.org
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Zan Aziz <zanaziz313@gmail.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Fenil Jain<fkjainco@gmail.com>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Fox Chen <foxhlchen@gmail.com>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Khalid Masum <khalid.masum.92@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Howells [Sat, 21 May 2022 07:18:28 +0000 (08:18 +0100)]
afs: Fix afs_getattr() to refetch file status if callback break occurred
[ Upstream commit
2aeb8c86d49967552394d5e723f87454cb53f501 ]
If a callback break occurs (change notification), afs_getattr() needs to
issue an FS.FetchStatus RPC operation to update the status of the file
being examined by the stat-family of system calls.
Fix afs_getattr() to do this if AFS_VNODE_CB_PROMISED has been cleared
on a vnode by a callback break. Skip this if AT_STATX_DONT_SYNC is set.
This can be tested by appending to a file on one AFS client and then
using "stat -L" to examine its length on a machine running kafs. This
can also be watched through tracing on the kafs machine. The callback
break is seen:
kworker/1:1-46 [001] ..... 978.910812: afs_cb_call: c=
0000005f YFSCB.CallBack
kworker/1:1-46 [001] ...1. 978.910829: afs_cb_break: 100058:23b4c:
242d2c2 b=2 s=1 break-cb
kworker/1:1-46 [001] ..... 978.911062: afs_call_done: c=
0000005f ret=0 ab=0 [
0000000082994ead]
And then the stat command generated no traffic if unpatched, but with
this change a call to fetch the status can be observed:
stat-4471 [000] ..... 986.744122: afs_make_fs_call: c=
000000ab 100058:023b4c:
242d2c2 YFS.FetchStatus
stat-4471 [000] ..... 986.745578: afs_call_done: c=
000000ab ret=0 ab=0 [
0000000087fc8c84]
Fixes: 08e0e7c82eea ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
Tested-by: kafs-testing+fedora34_64checkkafs-build-496@auristor.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216010
Link: https://lore.kernel.org/r/165308359800.162686.14122417881564420962.stgit@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yang Yingliang [Sat, 14 May 2022 02:31:47 +0000 (10:31 +0800)]
i2c: mt7621: fix missing clk_disable_unprepare() on error in mtk_i2c_probe()
[ Upstream commit
a2537c98a8a3b57002e54a262d180b9490bc7190 ]
Fix the missing clk_disable_unprepare() before return
from mtk_i2c_probe() in the error handling case.
Fixes: d04913ec5f89 ("i2c: mt7621: Add MediaTek MT7621/7628/7688 I2C driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Stefan Roese <sr@denx.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jae Hyun Yoo [Tue, 29 Mar 2022 17:39:28 +0000 (10:39 -0700)]
dt-bindings: pinctrl: aspeed-g6: remove FWQSPID group
commit
a29c96a4053dc3c1d39353b61089882f81c6b23d upstream.
FWQSPID is not a group of FWSPID so remove it.
Fixes: 7488838f2315 ("dt-bindings: pinctrl: aspeed: Document AST2600 pinmux")
Signed-off-by: Jae Hyun Yoo <quic_jaehyoo@quicinc.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20220329173932.2588289-4-quic_jaehyoo@quicinc.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marek Vasut [Wed, 18 May 2022 21:28:32 +0000 (14:28 -0700)]
Input: ili210x - fix reset timing
commit
e4920d42ce0e9c8aafb7f64b6d9d4ae02161e51e upstream.
According to Ilitek "231x & ILI251x Programming Guide" Version: 2.30
"2.1. Power Sequence", "T4 Chip Reset and discharge time" is minimum
10ms and "T2 Chip initial time" is maximum 150ms. Adjust the reset
timings such that T4 is 12ms and T2 is 160ms to fit those figures.
This prevents sporadic touch controller start up failures when some
systems with at least ILI251x controller boot, without this patch
the systems sometimes fail to communicate with the touch controller.
Fixes: 201f3c803544c ("Input: ili210x - add reset GPIO support")
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20220518204901.93534-1-marex@denx.de
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shreyas K K [Thu, 12 May 2022 11:01:34 +0000 (16:31 +0530)]
arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUs
[ Upstream commit
51f559d66527e238f9a5f82027bff499784d4eac ]
Add KRYO4XX gold/big cores to the list of CPUs that need the
repeat TLBI workaround. Apply this to the affected
KRYO4XX cores (rcpe to rfpe).
The variant and revision bits are implementation defined and are
different from the their Cortex CPU counterparts on which they are
based on, i.e., (r0p0 to r3p0) is equivalent to (rcpe to rfpe).
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
Reviewed-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
Link: https://lore.kernel.org/r/20220512110134.12179-1-quic_shrekk@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Grant Grundler [Tue, 10 May 2022 02:28:26 +0000 (19:28 -0700)]
net: atlantic: verify hw_head_ lies within TX buffer ring
[ Upstream commit
2120b7f4d128433ad8c5f503a9584deba0684901 ]
Bounds check hw_head index provided by NIC to verify it lies
within the TX buffer ring.
Reported-by: Aashay Shringarpure <aashay@google.com>
Reported-by: Yi Chou <yich@google.com>
Reported-by: Shervin Oloumi <enlightened@google.com>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Grant Grundler [Tue, 10 May 2022 02:28:25 +0000 (19:28 -0700)]
net: atlantic: add check for MAX_SKB_FRAGS
[ Upstream commit
6aecbba12b5c90b26dc062af3b9de8c4b3a2f19f ]
Enforce that the CPU can not get stuck in an infinite loop.
Reported-by: Aashay Shringarpure <aashay@google.com>
Reported-by: Yi Chou <yich@google.com>
Reported-by: Shervin Oloumi <enlightened@google.com>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Grant Grundler [Tue, 10 May 2022 02:28:24 +0000 (19:28 -0700)]
net: atlantic: reduce scope of is_rsc_complete
[ Upstream commit
79784d77ebbd3ec516b7a5ce555d979fb7946202 ]
Don't defer handling the err case outside the loop. That's pointless.
And since is_rsc_complete is only used inside this loop, declare
it inside the loop to reduce it's scope.
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Grant Grundler [Tue, 10 May 2022 02:28:23 +0000 (19:28 -0700)]
net: atlantic: fix "frag[0] not initialized"
[ Upstream commit
62e0ae0f4020250f961cf8d0103a4621be74e077 ]
In aq_ring_rx_clean(), if buff->is_eop is not set AND
buff->len < AQ_CFG_RX_HDR_SIZE, then hdr_len remains equal to
buff->len and skb_add_rx_frag(xxx, *0*, ...) is not called.
The loop following this code starts calling skb_add_rx_frag() starting
with i=1 and thus frag[0] is never initialized. Since i is initialized
to zero at the top of the primary loop, we can just reference and
post-increment i instead of hardcoding the 0 when calling
skb_add_rx_frag() the first time.
Reported-by: Aashay Shringarpure <aashay@google.com>
Reported-by: Yi Chou <yich@google.com>
Reported-by: Shervin Oloumi <enlightened@google.com>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yang Yingliang [Tue, 10 May 2022 03:13:16 +0000 (11:13 +0800)]
net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
[ Upstream commit
0807ce0b010418a191e0e4009803b2d74c3245d5 ]
Switch to using pcim_enable_device() to avoid missing pci_disable_device().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220510031316.1780409-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yang Yingliang [Fri, 6 May 2022 09:42:50 +0000 (17:42 +0800)]
ethernet: tulip: fix missing pci_disable_device() on error in tulip_init_one()
[ Upstream commit
51ca86b4c9c7c75f5630fa0dbe5f8f0bd98e3c3e ]
Fix the missing pci_disable_device() before return
from tulip_init_one() in the error handling case.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220506094250.3630615-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Johannes Berg [Fri, 6 May 2022 08:21:38 +0000 (10:21 +0200)]
nl80211: fix locking in nl80211_set_tx_bitrate_mask()
[ Upstream commit
f971e1887fdb3ab500c9bebf4b98f62d49a20655 ]
This accesses the wdev's chandef etc., so cannot safely
be used without holding the lock.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20220506102136.06b7205419e6.I2a87c05fbd8bc5e565e84d190d4cfd2e92695a90@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Lina Wang [Thu, 5 May 2022 05:48:49 +0000 (13:48 +0800)]
net: fix wrong network header length
[ Upstream commit
cf3ab8d4a797960b4be20565abb3bcd227b18a68 ]
When clatd starts with ebpf offloaing, and NETIF_F_GRO_FRAGLIST is enable,
several skbs are gathered in skb_shinfo(skb)->frag_list. The first skb's
ipv6 header will be changed to ipv4 after bpf_skb_proto_6_to_4,
network_header\transport_header\mac_header have been updated as ipv4 acts,
but other skbs in frag_list didnot update anything, just ipv6 packets.
udp_queue_rcv_skb will call skb_segment_list to traverse other skbs in
frag_list and make sure right udp payload is delivered to user space.
Unfortunately, other skbs in frag_list who are still ipv6 packets are
updated like the first skb and will have wrong transport header length.
e.g.before bpf_skb_proto_6_to_4,the first skb and other skbs in frag_list
has the same network_header(24)& transport_header(64), after
bpf_skb_proto_6_to_4, ipv6 protocol has been changed to ipv4, the first
skb's network_header is 44,transport_header is 64, other skbs in frag_list
didnot change.After skb_segment_list, the other skbs in frag_list has
different network_header(24) and transport_header(44), so there will be 20
bytes different from original,that is difference between ipv6 header and
ipv4 header. Just change transport_header to be the same with original.
Actually, there are two solutions to fix it, one is traversing all skbs
and changing every skb header in bpf_skb_proto_6_to_4, the other is
modifying frag_list skb's header in skb_segment_list. Considering
efficiency, adopt the second one--- when the first skb and other skbs in
frag_list has different network_header length, restore them to make sure
right udp payload is delivered to user space.
Signed-off-by: Lina Wang <lina.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eric Yang [Sat, 19 Mar 2022 20:34:24 +0000 (16:34 -0400)]
drm/amd/display: undo clearing of z10 related function pointers
[ Upstream commit
9b9bd3f640640f94272a461b2dfe558f91b322c5 ]
[Why]
Z10 and S0i3 have some shared path. Previous code clean up ,
incorrectly removed these pointers, which breaks s0i3 restore
[How]
Do not clear the function pointers based on Z10 disable.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Maximilian Luz [Fri, 29 Apr 2022 18:00:49 +0000 (20:00 +0200)]
platform/surface: gpe: Add support for Surface Pro 8
[ Upstream commit
ed13d4ac57474d959c40fd05d8860e2b1607becb ]
The new Surface Pro 8 uses GPEs for lid events as well. Add an entry for
that so that the lid can be used to wake the device. Note that this is a
device with a keyboard type-cover, where this acts as the "lid".
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20220429180049.1282447-1-luzmaximilian@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Prarit Bhargava [Fri, 29 Apr 2022 12:23:22 +0000 (08:23 -0400)]
platform/x86/intel: Fix 'rmmod pmt_telemetry' panic
[ Upstream commit
2cdfa0c20d58da3757054797c2974c967035926a ]
'rmmod pmt_telemetry' panics with:
BUG: kernel NULL pointer dereference, address:
0000000000000040
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 1697 Comm: rmmod Tainted: G S W -------- --- 5.18.0-rc4 #3
Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS ADLPFWI1.R00.3056.B00.
2201310233 01/31/2022
RIP: 0010:device_del+0x1b/0x3d0
Code: e8 1a d9 e9 ff e9 58 ff ff ff 48 8b 08 eb dc 0f 1f 44 00 00 41 56 41 55 41 54 55 48 8d af 80 00 00 00 53 48 89 fb 48 83 ec 18 <4c> 8b 67 40 48 89 ef 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31
RSP: 0018:
ffffb520415cfd60 EFLAGS:
00010286
RAX:
0000000000000070 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
0000000000000001 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
0000000000000080 R08:
ffffffffffffffff R09:
ffffb520415cfd78
R10:
0000000000000002 R11:
ffffb520415cfd78 R12:
0000000000000000
R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
FS:
00007f7e198e5740(0000) GS:
ffff905c9f700000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000040 CR3:
000000010782a005 CR4:
0000000000770ee0
PKRU:
55555554
Call Trace:
<TASK>
? __xa_erase+0x53/0xb0
device_unregister+0x13/0x50
intel_pmt_dev_destroy+0x34/0x60 [pmt_class]
pmt_telem_remove+0x40/0x50 [pmt_telemetry]
auxiliary_bus_remove+0x18/0x30
device_release_driver_internal+0xc1/0x150
driver_detach+0x44/0x90
bus_remove_driver+0x74/0xd0
auxiliary_driver_unregister+0x12/0x20
pmt_telem_exit+0xc/0xe4a [pmt_telemetry]
__x64_sys_delete_module+0x13a/0x250
? syscall_trace_enter.isra.19+0x11e/0x1a0
do_syscall_64+0x58/0x80
? syscall_exit_to_user_mode+0x12/0x30
? do_syscall_64+0x67/0x80
? syscall_exit_to_user_mode+0x12/0x30
? do_syscall_64+0x67/0x80
? syscall_exit_to_user_mode+0x12/0x30
? do_syscall_64+0x67/0x80
? exc_page_fault+0x64/0x140
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7e1803a05b
Code: 73 01 c3 48 8b 0d 2d 4e 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 4d 38 00 f7 d8 64 89 01 48
The probe function, pmt_telem_probe(), adds an entry for devices even if
they have not been initialized. This results in the array of initialized
devices containing both initialized and uninitialized entries. This
causes a panic in the remove function, pmt_telem_remove() which expects
the array to only contain initialized entries.
Only use an entry when a device is initialized.
Cc: "David E. Box" <david.e.box@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mark Gross <markgross@kernel.org>
Cc: platform-driver-x86@vger.kernel.org
Signed-off-by: David Arcari <darcari@redhat.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20220429122322.2550003-1-prarit@redhat.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Mark Pearson [Mon, 2 May 2022 19:12:00 +0000 (15:12 -0400)]
platform/x86: thinkpad_acpi: Correct dual fan probe
[ Upstream commit
aa2fef6f40e6ccc22e932b36898f260f0e5a021a ]
There was an issue with the dual fan probe whereby the probe was
failing as it assuming that second_fan support was not available.
Corrected the logic so the probe works correctly. Cleaned up so
quirks only used if 2nd fan not detected.
Tested on X1 Carbon 10 (2 fans), X1 Carbon 9 (2 fans) and T490 (1 fan)
Signed-off-by: Mark Pearson <markpearson@lenovo.com>
Link: https://lore.kernel.org/r/20220502191200.63470-1-markpearson@lenovo.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Mario Limonciello [Fri, 29 Apr 2022 03:05:00 +0000 (22:05 -0500)]
platform/x86: thinkpad_acpi: Convert btusb DMI list to quirks
[ Upstream commit
c25d7f32e3e209462cd82e6e93e66b72dbb2308f ]
DMI matching in thinkpad_acpi happens local to a function meaning
quirks can only match that function.
Future changes to thinkpad_acpi may need to quirk other code, so
change this to use a quirk infrastructure.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mark Pearson <markpearson@lenvo.com>
Link: https://lore.kernel.org/r/20220429030501.1909-2-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Daniel Vetter [Thu, 5 May 2022 22:04:13 +0000 (00:04 +0200)]
fbdev: Prevent possible use-after-free in fb_release()
[ Upstream commit
89bfd4017e58faaf70411555e7f508495114e90b ]
Most fbdev drivers have issues with the fb_info lifetime, because call to
framebuffer_release() from their driver's .remove callback, rather than
doing from fbops.fb_destroy callback.
Doing that will destroy the fb_info too early, while references to it may
still exist, leading to a use-after-free error.
To prevent this, check the fb_info reference counter when attempting to
kfree the data structure in framebuffer_release(). That will leak it but
at least will prevent the mentioned error.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220505220413.365977-1-javierm@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Javier Martinez Canillas [Wed, 4 May 2022 11:59:17 +0000 (13:59 +0200)]
Revert "fbdev: Make fb_release() return -ENODEV if fbdev was unregistered"
[ Upstream commit
135332f34ba2662bc1e32b5c612e06a8cc41a053 ]
This reverts commit
aafa025c76dcc7d1a8c8f0bdefcbe4eb480b2f6a. That commit
attempted to fix a NULL pointer dereference, caused by the struct fb_info
associated with a framebuffer device to not longer be valid when the file
descriptor was closed.
The issue was exposed by commit
27599aacbaef ("fbdev: Hot-unplug firmware
fb devices on forced removal"), which added a new path that goes through
the struct device removal instead of directly unregistering the fb.
Most fbdev drivers have issues with the fb_info lifetime, because call to
framebuffer_release() from their driver's .remove callback, rather than
doing from fbops.fb_destroy callback. This meant that due to this switch,
the fb_info was now destroyed too early, while references still existed,
while before it was simply leaked.
The patch we're reverting here reinstated that leak, hence "fixed" the
regression. But the proper solution is to fix the drivers to not release
the fb_info too soon.
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220504115917.758787-1-javierm@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Nicolas Dichtel [Wed, 4 May 2022 09:07:39 +0000 (11:07 +0200)]
selftests: add ping test with ping_group_range tuned
[ Upstream commit
e71b7f1f44d3d88c677769c85ef0171caf9fc89f ]
The 'ping' utility is able to manage two kind of sockets (raw or icmp),
depending on the sysctl ping_group_range. By default, ping_group_range is
set to '1 0', which forces ping to use an ip raw socket.
Let's replay the ping tests by allowing 'ping' to use the ip icmp socket.
After the previous patch, ipv4 tests results are the same with both kinds
of socket. For ipv6, there are a lot a new failures (the previous patch
fixes only two cases).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kieran Frewen [Wed, 20 Apr 2022 04:13:20 +0000 (04:13 +0000)]
cfg80211: retrieve S1G operating channel number
[ Upstream commit
e847ffe2d146cfd52980ca688d84358e024a6e70 ]
When retrieving the S1G channel number from IEs, we should retrieve
the operating channel instead of the primary channel. The S1G operation
element specifies the main channel of operation as the oper channel,
unlike for HT and HE which specify their main channel of operation as
the primary channel.
Signed-off-by: Kieran Frewen <kieran.frewen@morsemicro.com>
Signed-off-by: Bassem Dawood <bassem@morsemicro.com>
Link: https://lore.kernel.org/r/20220420041321.3788789-1-kieran.frewen@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kieran Frewen [Wed, 20 Apr 2022 04:13:21 +0000 (04:13 +0000)]
nl80211: validate S1G channel width
[ Upstream commit
5d087aa759eb82b8208411913f6c2158bd85abc0 ]
Validate the S1G channel width input by user to ensure it matches
that of the requested channel
Signed-off-by: Kieran Frewen <kieran.frewen@morsemicro.com>
Signed-off-by: Bassem Dawood <bassem@morsemicro.com>
Link: https://lore.kernel.org/r/20220420041321.3788789-2-kieran.frewen@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Felix Fietkau [Wed, 20 Apr 2022 10:50:38 +0000 (12:50 +0200)]
mac80211: fix rx reordering with non explicit / psmp ack policy
[ Upstream commit
5e469ed9764d4722c59562da13120bd2dc6834c5 ]
When the QoS ack policy was set to non explicit / psmp ack, frames are treated
as not being part of a BA session, which causes extra latency on reordering.
Fix this by only bypassing reordering for packets with no-ack policy
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20220420105038.36443-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Gleb Chesnokov [Fri, 15 Apr 2022 12:42:29 +0000 (12:42 +0000)]
scsi: qla2xxx: Fix missed DMA unmap for aborted commands
[ Upstream commit
26f9ce53817a8fd84b69a73473a7de852a24c897 ]
Aborting commands that have already been sent to the firmware can
cause BUG in qlt_free_cmd(): BUG_ON(cmd->sg_mapped)
For instance:
- Command passes rdx_to_xfer state, maps sgl, sends to the firmware
- Reset occurs, qla2xxx performs ISP error recovery, aborts the command
- Target stack calls qlt_abort_cmd() and then qlt_free_cmd()
- BUG_ON(cmd->sg_mapped) in qlt_free_cmd() occurs because sgl was not
unmapped
Thus, unmap sgl in qlt_abort_cmd() for commands with the aborted flag set.
Link: https://lore.kernel.org/r/AS8PR10MB4952D545F84B6B1DFD39EC1E9DEE9@AS8PR10MB4952.EURPRD10.PROD.OUTLOOK.COM
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Gleb Chesnokov <Chesnokov.G@raidix.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Brian Bunker [Mon, 2 May 2022 15:09:17 +0000 (08:09 -0700)]
scsi: scsi_dh_alua: Properly handle the ALUA transitioning state
[ Upstream commit
6056a92ceb2a7705d61df7ec5370548e96aee258 ]
The handling of the ALUA transitioning state is currently broken. When a
target goes into this state, it is expected that the target is allowed to
stay in this state for the implicit transition timeout without a path
failure. The handler has this logic, but it gets skipped currently.
When the target transitions, there is in-flight I/O from the initiator. The
first of these responses from the target will be a unit attention letting
the initiator know that the ALUA state has changed. The remaining
in-flight I/Os, before the initiator finds out that the portal state has
changed, will return not ready, ALUA state is transitioning. The portal
state will change to SCSI_ACCESS_STATE_TRANSITIONING. This will lead to all
new I/O immediately failing the path unexpectedly. The path failure happens
in less than a second instead of the expected successes until the
transition timer is exceeded.
Allow I/Os to continue while the path is in the ALUA transitioning
state. The handler already takes care of a target that stays in the
transitioning state for too long by changing the state to ALUA state
standby once the transition timeout is exceeded at which point the path
will fail.
Link: https://lore.kernel.org/r/CAHZQxy+4sTPz9+pY3=7VJH+CLUJsDct81KtnR2be8ycN5mhqTg@mail.gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Krishna Kant <krishna.kant@purestorage.com>
Acked-by: Seamus Connor <sconnor@purestorage.com>
Signed-off-by: Brian Bunker <brian@purestorage.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Athira Rajeev [Wed, 11 May 2022 11:54:38 +0000 (17:24 +0530)]
perf test bpf: Skip test if clang is not present
[ Upstream commit
8994e97be3eb3c3a7b59d6223018ffab8c272e2d ]
Perf BPF filter test fails in environment where "clang" is not
installed.
Test failure logs:
<<>>
42: BPF filter :
42.1: Basic BPF filtering : Skip
42.2: BPF pinning : FAILED!
42.3: BPF prologue generation : FAILED!
<<>>
Enabling verbose option provided debug logs which says clang/llvm needs
to be installed. Snippet of verbose logs:
<<>>
42.2: BPF pinning :
--- start ---
test child forked, pid 61423
ERROR: unable to find clang.
Hint: Try to install latest clang/llvm to support BPF.
Check your $PATH
<<logs_here>>
Failed to compile test case: 'Basic BPF llvm compile'
Unable to get BPF object, fix kbuild first
test child finished with -1
---- end ----
BPF filter subtest 2: FAILED!
<<>>
Here subtests, "BPF pinning" and "BPF prologue generation" failed and
logs shows clang/llvm is needed. After installing clang, testcase
passes.
Reason on why subtest failure happens though logs has proper debug
information:
Main function __test__bpf calls test_llvm__fetch_bpf_obj by
passing 4th argument as true ( 4th arguments maps to parameter
"force" in test_llvm__fetch_bpf_obj ). But this will cause
test_llvm__fetch_bpf_obj to skip the check for clang/llvm.
Snippet of code part which checks for clang based on
parameter "force" in test_llvm__fetch_bpf_obj:
<<>>
if (!force && (!llvm_param.user_set_param &&
<<>>
Since force is set to "false", test won't get skipped and fails to
compile test case. The BPF code compilation needs clang, So pass the
fourth argument as "false" and also skip the test if reason for return
is "TEST_SKIP"
After the patch:
<<>>
42: BPF filter :
42.1: Basic BPF filtering : Skip
42.2: BPF pinning : Skip
42.3: BPF prologue generation : Skip
<<>>
Fixes: ba1fae431e74bb42 ("perf test: Add 'perf test BPF'")
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lore.kernel.org/r/20220511115438.84032-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Thomas Richter [Fri, 20 May 2022 08:11:58 +0000 (10:11 +0200)]
perf bench numa: Address compiler error on s390
[ Upstream commit
f8ac1c478424a9a14669b8cef7389b1e14e5229d ]
The compilation on s390 results in this error:
# make DEBUG=y bench/numa.o
...
bench/numa.c: In function ‘__bench_numa’:
bench/numa.c:1749:81: error: ‘%d’ directive output may be truncated
writing between 1 and 11 bytes into a region of size between
10 and 20 [-Werror=format-truncation=]
1749 | snprintf(tname, sizeof(tname), "process%d:thread%d", p, t);
^~
...
bench/numa.c:1749:64: note: directive argument in the range
[-
2147483647,
2147483646]
...
#
The maximum length of the %d replacement is 11 characters because of the
negative sign. Therefore extend the array by two more characters.
Output after:
# make DEBUG=y bench/numa.o > /dev/null 2>&1; ll bench/numa.o
-rw-r--r-- 1 root root 418320 May 19 09:11 bench/numa.o
#
Fixes: 3aff8ba0a4c9c919 ("perf bench numa: Avoid possible truncation when using snprintf()")
Suggested-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220520081158.2990006-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kan Liang [Wed, 18 May 2022 14:51:25 +0000 (07:51 -0700)]
perf regs x86: Fix arch__intr_reg_mask() for the hybrid platform
[ Upstream commit
01b28e4a58152e8906eeb5f1b55a0c404c48c7c8 ]
The X86 specific arch__intr_reg_mask() is to check whether the kernel
and hardware can collect XMM registers. But it doesn't work on some
hybrid platform.
Without the patch on ADL-N:
$ perf record -I?
available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
R11 R12 R13 R14 R15
The config of the test event doesn't contain the PMU information. The
kernel may fail to initialize it on the correct hybrid PMU and return
the wrong non-supported information.
Add the PMU information into the config for the hybrid platform. The
same register set is supported among different hybrid PMUs. Checking
the first available one is good enough.
With the patch on ADL-N:
$ perf record -I?
available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
R11 R12 R13 R14 R15 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9
XMM10 XMM11 XMM12 XMM13 XMM14 XMM15
Fixes: 6466ec14aaf44ff1 ("perf regs x86: Add X86 specific arch__intr_reg_mask()")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220518145125.1494156-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Athira Rajeev [Fri, 20 May 2022 10:12:36 +0000 (15:42 +0530)]
perf test: Fix "all PMU test" to skip hv_24x7/hv_gpci tests on powerpc
[ Upstream commit
451ed8058c69a3fee29fa9e2967a4e22a221fe75 ]
"perf all PMU test" picks the input events from "perf list --raw-dump
pmu" list and runs "perf stat -e" for each of the event in the list. In
case of powerpc, the PowerVM environment supports events from hv_24x7
and hv_gpci PMU which is of example format like below:
- hv_24x7/CPM_ADJUNCT_INST,domain=?,core=?/
- hv_gpci/event,partition_id=?/
The value for "?" needs to be filled in depending on system and
respective event. CPM_ADJUNCT_INST needs have core value and domain
value. hv_gpci event needs partition_id. Similarly, there are other
events for hv_24x7 and hv_gpci having "?" in event format. Hence skip
these events on powerpc platform since values like partition_id, domain
is specific to system and event.
Fixes: 3d5ac9effcc640d5 ("perf test: Workload test of all PMUs")
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20220520101236.17249-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Uwe Kleine-König [Wed, 11 May 2022 07:58:56 +0000 (09:58 +0200)]
gpio: mvebu/pwm: Refuse requests with inverted polarity
[ Upstream commit
3ecb10175b1f776f076553c24e2689e42953fef5 ]
The driver doesn't take struct pwm_state::polarity into account when
configuring the hardware, so refuse requests for inverted polarity.
Fixes: 757642f9a584 ("gpio: mvebu: Add limited PWM support")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Haibo Chen [Wed, 11 May 2022 02:15:04 +0000 (10:15 +0800)]
gpio: gpio-vf610: do not touch other bits when set the target bit
[ Upstream commit
9bf3ac466faa83d51a8fe9212131701e58fdef74 ]
For gpio controller contain register PDDR, when set one target bit,
current logic will clear all other bits, this is wrong. Use operator
'|=' to fix it.
Fixes: 659d8a62311f ("gpio: vf610: add imx7ulp support")
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ian Rogers [Thu, 19 May 2022 03:20:01 +0000 (20:20 -0700)]
perf stat: Fix and validate CPU map inputs in synthetic PERF_RECORD_STAT events
[ Upstream commit
92d579ea3279aa87392b862df5810f0a7e30fcc6 ]
Stat events can come from disk and so need a degree of validation. They
contain a CPU which needs looking up via CPU map to access a counter.
Add the CPU to index translation, alongside validity checking.
Discussion thread:
https://lore.kernel.org/linux-perf-users/CAP-5=fWQR=sCuiSMktvUtcbOLidEpUJLCybVF6=BRvORcDOq+g@mail.gmail.com/
Fixes: 7ac0089d138f80dc ("perf evsel: Pass cpu not cpu map index to synthesize")
Reported-by: Michael Petlan <mpetlan@redhat.com>
Suggested-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20220519032005.1273691-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arnaldo Carvalho de Melo [Fri, 20 May 2022 00:25:12 +0000 (21:25 -0300)]
perf build: Fix check for btf__load_from_kernel_by_id() in libbpf
[ Upstream commit
0ae065a5d265bc5ada13e350015458e0c5e5c351 ]
Avi Kivity reported a problem where the __weak
btf__load_from_kernel_by_id() in tools/perf/util/bpf-event.c was being
used and it called btf__get_from_id() in tools/lib/bpf/btf.c that in
turn called back to btf__load_from_kernel_by_id(), resulting in an
endless loop.
Fix this by adding a feature test to check if
btf__load_from_kernel_by_id() is available when building perf with
LIBBPF_DYNAMIC=1, and if not then provide the fallback to the old
btf__get_from_id(), that doesn't call back to btf__load_from_kernel_by_id()
since at that time it didn't exist at all.
Tested on Fedora 35 where we have libbpf-devel 0.4.0 with LIBBPF_DYNAMIC
where we don't have btf__load_from_kernel_by_id() and thus its feature
test fail, not defining HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID:
$ cat /tmp/build/perf-urgent/feature/test-libbpf-btf__load_from_kernel_by_id.make.output
test-libbpf-btf__load_from_kernel_by_id.c: In function ‘main’:
test-libbpf-btf__load_from_kernel_by_id.c:6:16: error: implicit declaration of function ‘btf__load_from_kernel_by_id’ [-Werror=implicit-function-declaration]
6 | return btf__load_from_kernel_by_id(
20151128, NULL);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
$
$ nm /tmp/build/perf-urgent/perf | grep btf__load_from_kernel_by_id
00000000005ba180 T btf__load_from_kernel_by_id
$
$ objdump --disassemble=btf__load_from_kernel_by_id -S /tmp/build/perf-urgent/perf
/tmp/build/perf-urgent/perf: file format elf64-x86-64
<SNIP>
00000000005ba180 <btf__load_from_kernel_by_id>:
#include "record.h"
#include "util/synthetic-events.h"
#ifndef HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID
struct btf *btf__load_from_kernel_by_id(__u32 id)
{
5ba180: 55 push %rbp
5ba181: 48 89 e5 mov %rsp,%rbp
5ba184: 48 83 ec 10 sub $0x10,%rsp
5ba188: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
5ba18f: 00 00
5ba191: 48 89 45 f8 mov %rax,-0x8(%rbp)
5ba195: 31 c0 xor %eax,%eax
struct btf *btf;
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
int err = btf__get_from_id(id, &btf);
5ba197: 48 8d 75 f0 lea -0x10(%rbp),%rsi
5ba19b: e8 a0 57 e5 ff call 40f940 <btf__get_from_id@plt>
5ba1a0: 89 c2 mov %eax,%edx
#pragma GCC diagnostic pop
return err ? ERR_PTR(err) : btf;
5ba1a2: 48 98 cltq
5ba1a4: 85 d2 test %edx,%edx
5ba1a6: 48 0f 44 45 f0 cmove -0x10(%rbp),%rax
}
<SNIP>
Fixes: 218e7b775d368f38 ("perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions")
Reported-by: Avi Kivity <avi@scylladb.com>
Link: https://lore.kernel.org/linux-perf-users/f0add43b-3de5-20c5-22c4-70aff4af959f@scylladb.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/linux-perf-users/YobjjFOblY4Xvwo7@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Aaron Lewis [Tue, 17 May 2022 05:12:36 +0000 (05:12 +0000)]
kvm: x86/pmu: Fix the compare function used by the pmu event filter
[ Upstream commit
4ac19ead0dfbabd8e0bfc731f507cfb0b95d6c99 ]
When returning from the compare function the u64 is truncated to an
int. This results in a loss of the high nybble[1] in the event select
and its sign if that nybble is in use. Switch from using a result that
can end up being truncated to a result that can only be: 1, 0, -1.
[1] bits 35:32 in the event select register and bits 11:8 in the event
select.
Fixes: 7ff775aca48ad ("KVM: x86/pmu: Use binary search to check filtered events")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20220517051238.
2566934-1-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Daejun Park [Thu, 19 May 2022 06:05:29 +0000 (15:05 +0900)]
scsi: ufs: core: Fix referencing invalid rsp field
[ Upstream commit
d5d92b64408443e113b9742f8f1c35278910dd4d ]
Fix referencing sense data when it is invalid. When the length of the data
segment is 0, there is no valid information in the rsp field, so
ufshpb_rsp_upiu() is returned without additional operation.
Link: https://lore.kernel.org/r/252651381.41652940482659.JavaMail.epsvc@epcpadp4
Fixes: 4b5f49079c52 ("scsi: ufs: ufshpb: L2P map management for HPB read")
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Krzysztof Kozlowski [Thu, 7 Apr 2022 19:38:56 +0000 (21:38 +0200)]
riscv: dts: sifive: fu540-c000: align dma node name with dtschema
[ Upstream commit
b17410182b6f98191fbf7f42d3b4a78512769d29 ]
Fixes dtbs_check warnings like:
dma@
3000000: $nodename:0: 'dma@
3000000' does not match '^dma-controller(@.*)?$'
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20220407193856.18223-1-krzysztof.kozlowski@linaro.org
Fixes: c5ab54e9945b ("riscv: dts: add support for PDMA device of HiFive Unleashed Rev A00")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Andrew Lunn [Wed, 18 May 2022 00:58:40 +0000 (02:58 +0200)]
net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.
[ Upstream commit
fbb3abdf2223cd0dfc07de85fe5a43ba7f435bdf ]
It is possible to stack bridges on top of each other. Consider the
following which makes use of an Ethernet switch:
br1
/ \
/ \
/ \
br0.11 wlan0
|
br0
/ | \
p1 p2 p3
br0 is offloaded to the switch. Above br0 is a vlan interface, for
vlan 11. This vlan interface is then a slave of br1. br1 also has a
wireless interface as a slave. This setup trunks wireless lan traffic
over the copper network inside a VLAN.
A frame received on p1 which is passed up to the bridge has the
skb->offload_fwd_mark flag set to true, indicating that the switch has
dealt with forwarding the frame out ports p2 and p3 as needed. This
flag instructs the software bridge it does not need to pass the frame
back down again. However, the flag is not getting reset when the frame
is passed upwards. As a result br1 sees the flag, wrongly interprets
it, and fails to forward the frame to wlan0.
When passing a frame upwards, clear the flag. This is the Rx
equivalent of br_switchdev_frame_unmark() in br_dev_xmit().
Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20220518005840.771575-1-andrew@lunn.ch
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eli Cohen [Mon, 16 May 2022 08:47:35 +0000 (11:47 +0300)]
vdpa/mlx5: Use consistent RQT size
[ Upstream commit
acde3929492bcb9ceb0df1270230c422b1013798 ]
The current code evaluates RQT size based on the configured number of
virtqueues. This can raise an issue in the following scenario:
Assume MQ was negotiated.
1. mlx5_vdpa_set_map() gets called.
2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
than the configured max VQs.
3. A second set_map gets called, but now a smaller number of VQs is used
to evaluate the size of the RQT.
4. handle_ctrl_mq() is called with a value larger than what the RQT can
hold. This will emit errors and the driver state is compromised.
To fix this, we use a new field in struct mlx5_vdpa_net to hold the
required number of entries in the RQT. This value is evaluated in
mlx5_vdpa_set_driver_features() where we have the negotiated features
all set up.
In addition to that, we take into consideration the max capability of RQT
entries early when the device is added so we don't need to take consider
it when creating the RQT.
Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
max_vas / 2 and make the code clearer.
Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ritaro Takenaka [Tue, 17 May 2022 10:55:30 +0000 (12:55 +0200)]
netfilter: flowtable: move dst_check to packet path
[ Upstream commit
2738d9d963bd1f06d5114c2b4fa5771a95703991 ]
Fixes sporadic IPv6 packet loss when flow offloading is enabled.
IPv6 route GC and flowtable GC are not synchronized.
When dst_cache becomes stale and a packet passes through the flow before
the flowtable GC teardowns it, the packet can be dropped.
So, it is necessary to check dst every time in packet path.
Fixes: 227e1e4d0d6c ("netfilter: nf_flowtable: skip device lookup from interface index")
Signed-off-by: Ritaro Takenaka <ritarot634@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pablo Neira Ayuso [Fri, 18 Mar 2022 12:11:24 +0000 (13:11 +0100)]
netfilter: flowtable: pass flowtable to nf_flow_table_iterate()
[ Upstream commit
217cff36e885627c41a14e803fc44f9cbc945767 ]
The flowtable object is already passed as argument to
nf_flow_table_iterate(), do use not data pointer to pass flowtable.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pablo Neira Ayuso [Tue, 17 May 2022 08:44:14 +0000 (10:44 +0200)]
netfilter: flowtable: fix TCP flow teardown
[ Upstream commit
e5eaac2beb54f0a16ff851125082d9faeb475572 ]
This patch addresses three possible problems:
1. ct gc may race to undo the timeout adjustment of the packet path, leaving
the conntrack entry in place with the internal offload timeout (one day).
2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
timeout is reached before the flow offload del.
3. tcp ct is always set to ESTABLISHED with a very long timeout
in flow offload teardown/delete even though the state might be already
CLOSED. Also as a remark we cannot assume that the FIN or RST packet
is hitting flow table teardown as the packet might get bumped to the
slow path in nftables.
This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
state transition.
Moreover, teturn the connection's ownership to conntrack upon teardown
by clearing the offload flag and fixing the established timeout value.
The flow table GC thread will asynchonrnously free the flow table and
hardware offload entries.
Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
which is also misleading since the flow is back to classic conntrack
path.
If nf_ct_delete() removes the entry from the conntrack table, then it
calls nf_ct_put() which decrements the refcnt. This is not a problem
because the flowtable holds a reference to the conntrack object from
flow_offload_alloc() path which is released via flow_offload_free().
This patch also updates nft_flow_offload to skip packets in SYN_RECV
state. Since we might miss or bump packets to slow path, we do not know
what will happen there while we are still in SYN_RECV, this patch
postpones offload up to the next packet which also aligns to the
existing behaviour in tc-ct.
flow_offload_teardown() does not reset the existing tcp state from
flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
path might have already update the state to CLOSE/FIN.
Joint work with Oz and Sven.
Fixes: 1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kevin Mitchell [Tue, 17 May 2022 18:01:05 +0000 (11:01 -0700)]
igb: skip phy status check where unavailable
[ Upstream commit
942d2ad5d2e0df758a645ddfadffde2795322728 ]
igb_read_phy_reg() will silently return, leaving phy_data untouched, if
hw->ops.read_reg isn't set. Depending on the uninitialized value of
phy_data, this led to the phy status check either succeeding immediately
or looping continuously for 2 seconds before emitting a noisy err-level
timeout. This message went out to the console even though there was no
actual problem.
Instead, first check if there is read_reg function pointer. If not,
proceed without trying to check the phy status register.
Fixes: b72f3f72005d ("igb: When GbE link up, wait for Remote receiver status condition")
Signed-off-by: Kevin Mitchell <kevmitch@arista.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Paolo Abeni [Tue, 17 May 2022 18:02:11 +0000 (11:02 -0700)]
mptcp: fix checksum byte order
[ Upstream commit
ba2c89e0ea74a904d5231643245753d77422e7f5 ]
The MPTCP code typecasts the checksum value to u16 and
then converts it to big endian while storing the value into
the MPTCP option.
As a result, the wire encoding for little endian host is
wrong, and that causes interoperabilty interoperability
issues with other implementation or host with different endianness.
Address the issue writing in the packet the unmodified __sum16 value.
MPTCP checksum is disabled by default, interoperating with systems
with bad mptcp-level csum encoding should cause fallback to TCP.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/275
Fixes: c5b39e26d003 ("mptcp: send out checksum for DSS")
Fixes: 390b95a5fb84 ("mptcp: receive checksum for DSS")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Daniel Thompson [Mon, 23 May 2022 18:11:02 +0000 (19:11 +0100)]
lockdown: also lock down previous kgdb use
commit
eadb2f47a3ced5c64b23b90fd2a3463f63726066 upstream.
KGDB and KDB allow read and write access to kernel memory, and thus
should be restricted during lockdown. An attacker with access to a
serial port (for example, via a hypervisor console, which some cloud
vendors provide over the network) could trigger the debugger so it is
important that the debugger respect the lockdown mode when/if it is
triggered.
Fix this by integrating lockdown into kdb's existing permissions
mechanism. Unfortunately kgdb does not have any permissions mechanism
(although it certainly could be added later) so, for now, kgdb is simply
and brutally disabled by immediately exiting the gdb stub without taking
any action.
For lockdowns established early in the boot (e.g. the normal case) then
this should be fine but on systems where kgdb has set breakpoints before
the lockdown is enacted than "bad things" will happen.
CVE: CVE-2022-21499
Co-developed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ard Biesheuvel [Wed, 20 Apr 2022 08:46:17 +0000 (09:46 +0100)]
ARM: 9197/1: spectre-bhb: fix loop8 sequence for Thumb2
[ Upstream commit
3cfb3019979666bdf33a1010147363cf05e0f17b ]
In Thumb2, 'b . + 4' produces a branch instruction that uses a narrow
encoding, and so it does not jump to the following instruction as
expected. So use W(b) instead.
Fixes: 6c7cb60bff7a ("ARM: fix Thumb2 regression with Spectre BHB")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ard Biesheuvel [Wed, 20 Apr 2022 08:44:51 +0000 (09:44 +0100)]
ARM: 9196/1: spectre-bhb: enable for Cortex-A15
[ Upstream commit
0dc14aa94ccd8ba35eb17a0f9b123d1566efd39e ]
The Spectre-BHB mitigations were inadvertently left disabled for
Cortex-A15, due to the fact that cpu_v7_bugs_init() is not called in
that case. So fix that.
Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jiasheng Jiang [Tue, 17 May 2022 09:42:31 +0000 (17:42 +0800)]
net: af_key: add check for pfkey_broadcast in function pfkey_process
[ Upstream commit
4dc2a5a8f6754492180741facf2a8787f2c415d7 ]
If skb_clone() returns null pointer, pfkey_broadcast() will
return error.
Therefore, it should be better to check the return value of
pfkey_broadcast() and return error if fails.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Shay Drory [Mon, 4 Apr 2022 07:47:36 +0000 (10:47 +0300)]
net/mlx5: Drain fw_reset when removing device
[ Upstream commit
16d42d313350946f4b9a8b74a13c99f0461a6572 ]
In case fw sync reset is called in parallel to device removal, device
might stuck in the following deadlock:
CPU 0 CPU 1
----- -----
remove_one
uninit_one (locks intf_state_mutex)
mlx5_sync_reset_now_event()
work in fw_reset->wq.
mlx5_enter_error_state()
mutex_lock (intf_state_mutex)
cleanup_once
fw_reset_cleanup()
destroy_workqueue(fw_reset->wq)
Drain the fw_reset WQ, and make sure no new work is being queued, before
entering uninit_one().
The Drain is done before devlink_unregister() since fw_reset, in some
flows, is using devlink API devlink_remote_reload_actions_performed().
Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Gal Pressman [Wed, 13 Apr 2022 12:50:42 +0000 (15:50 +0300)]
net/mlx5e: Remove HW-GRO from reported features
[ Upstream commit
6bbd723035badafe4a8eb17ccdecd96eae7a96d5 ]
We got reports of certain HW-GRO flows causing kernel call traces, which
might be related to firmware. To be on the safe side, disable the
feature for now and re-enable it once a driver/firmware fix is found.
Fixes: 83439f3c37aa ("net/mlx5e: Add HW-GRO offload")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Maxim Mikityanskiy [Tue, 12 Apr 2022 15:54:26 +0000 (18:54 +0300)]
net/mlx5e: Properly block HW GRO when XDP is enabled
[ Upstream commit
b0617e7b35001c92c8fa777e1a095d3e693813df ]
HW GRO is incompatible and mutually exclusive with XDP and XSK. However,
the needed checks are only made when enabling XDP. If HW GRO is enabled
when XDP is already active, the command will succeed, and XDP will be
skipped in the data path, although still enabled.
This commit fixes the bug by checking the XDP and XSK status in
mlx5e_fix_features and disabling HW GRO if XDP is enabled.
Fixes: 83439f3c37aa ("net/mlx5e: Add HW-GRO offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Maxim Mikityanskiy [Tue, 12 Apr 2022 15:37:03 +0000 (18:37 +0300)]
net/mlx5e: Properly block LRO when XDP is enabled
[ Upstream commit
cf6e34c8c22fba66bd21244b95ea47e235f68974 ]
LRO is incompatible and mutually exclusive with XDP. However, the needed
checks are only made when enabling XDP. If LRO is enabled when XDP is
already active, the command will succeed, and XDP will be skipped in the
data path, although still enabled.
This commit fixes the bug by checking the XDP status in
mlx5e_fix_features and disabling LRO if XDP is enabled.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Aya Levin [Mon, 11 Apr 2022 14:29:08 +0000 (17:29 +0300)]
net/mlx5e: Block rx-gro-hw feature in switchdev mode
[ Upstream commit
15a5078cab30d7aa02ad14bfadebf247d95fc239 ]
When the driver is in switchdev mode and rx-gro-hw is set, the RQ needs
special CQE handling. Till then, block setting of rx-gro-hw feature in
switchdev mode, to avoid failure while setting the feature due to
failure while opening the RQ.
Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yevgeny Kliteynik [Sun, 3 Apr 2022 20:18:10 +0000 (23:18 +0300)]
net/mlx5: DR, Ignore modify TTL on RX if device doesn't support it
[ Upstream commit
785d7ed295513bd3374095304b7034fd65c123b0 ]
When modifying TTL, packet's csum has to be recalculated.
Due to HW issue in ConnectX-5, csum recalculation for modify
TTL on RX is supported through a work-around that is specifically
enabled by configuration.
If the work-around isn't enabled, rather than adding an unsupported
action the modify TTL action on RX should be ignored.
Ignoring modify TTL action might result in zero actions, so in such
cases we will not convert the match STE to modify STE, as it is done
by FW in DMFS.
This patch fixes an issue where modify TTL action was ignored both
on RX and TX instead of only on RX.
Fixes: 4ff725e1d4ad ("net/mlx5: DR, Ignore modify TTL if device doesn't support it")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Shay Drory [Wed, 9 Mar 2022 12:45:58 +0000 (14:45 +0200)]
net/mlx5: Initialize flow steering during driver probe
[ Upstream commit
b33886971dbc4a86d1ec5369a2aaefc60a7cd72d ]
Currently, software objects of flow steering are created and destroyed
during reload flow. In case a device is unloaded, the following error
is printed during grace period:
mlx5_core 0000:00:0b.0: mlx5_fw_fatal_reporter_err_work:690:(pid 95):
Driver is in error state. Unloading
As a solution to fix use-after-free bugs, where we try to access
these objects, when reading the value of flow_steering_mode devlink
param[1], let's split flow steering creation and destruction into two
routines:
* init and cleanup: memory, cache, and pools allocation/free.
* create and destroy: namespaces initialization and cleanup.
While at it, re-order the cleanup function to mirror the init function.
[1]
Kasan trace:
[ 385.119849 ] BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] Read of size 4 at addr
ffff888104b79308 by task bash/291
[ 385.119849 ]
[ 385.119849 ] CPU: 1 PID: 291 Comm: bash Not tainted 5.17.0-rc1+ #2
[ 385.119849 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[ 385.119849 ] Call Trace:
[ 385.119849 ] <TASK>
[ 385.119849 ] dump_stack_lvl+0x6e/0x91
[ 385.119849 ] print_address_description.constprop.0+0x1f/0x160
[ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] kasan_report.cold+0x83/0xdf
[ 385.119849 ] ? devlink_param_notify+0x20/0x190
[ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] devlink_nl_param_fill+0x18a/0xa50
[ 385.119849 ] ? _raw_spin_lock_irqsave+0x8d/0xe0
[ 385.119849 ] ? devlink_flash_update_timeout_notify+0xf0/0xf0
[ 385.119849 ] ? __wake_up_common+0x4b/0x1e0
[ 385.119849 ] ? preempt_count_sub+0x14/0xc0
[ 385.119849 ] ? _raw_spin_unlock_irqrestore+0x28/0x40
[ 385.119849 ] ? __wake_up_common_lock+0xe3/0x140
[ 385.119849 ] ? __wake_up_common+0x1e0/0x1e0
[ 385.119849 ] ? __sanitizer_cov_trace_const_cmp8+0x27/0x80
[ 385.119849 ] ? __rcu_read_unlock+0x48/0x70
[ 385.119849 ] ? kasan_unpoison+0x23/0x50
[ 385.119849 ] ? __kasan_slab_alloc+0x2c/0x80
[ 385.119849 ] ? memset+0x20/0x40
[ 385.119849 ] ? __sanitizer_cov_trace_const_cmp4+0x25/0x80
[ 385.119849 ] devlink_param_notify+0xce/0x190
[ 385.119849 ] devlink_unregister+0x92/0x2b0
[ 385.119849 ] remove_one+0x41/0x140
[ 385.119849 ] pci_device_remove+0x68/0x140
[ 385.119849 ] ? pcibios_free_irq+0x10/0x10
[ 385.119849 ] __device_release_driver+0x294/0x3f0
[ 385.119849 ] device_driver_detach+0x82/0x130
[ 385.119849 ] unbind_store+0x193/0x1b0
[ 385.119849 ] ? subsys_interface_unregister+0x270/0x270
[ 385.119849 ] drv_attr_store+0x4e/0x70
[ 385.119849 ] ? drv_attr_show+0x60/0x60
[ 385.119849 ] sysfs_kf_write+0xa7/0xc0
[ 385.119849 ] kernfs_fop_write_iter+0x23a/0x2f0
[ 385.119849 ] ? sysfs_kf_bin_read+0x160/0x160
[ 385.119849 ] new_sync_write+0x311/0x430
[ 385.119849 ] ? new_sync_read+0x480/0x480
[ 385.119849 ] ? _raw_spin_lock+0x87/0xe0
[ 385.119849 ] ? __sanitizer_cov_trace_cmp4+0x25/0x80
[ 385.119849 ] ? security_file_permission+0x94/0xa0
[ 385.119849 ] vfs_write+0x4c7/0x590
[ 385.119849 ] ksys_write+0xf6/0x1e0
[ 385.119849 ] ? __x64_sys_read+0x50/0x50
[ 385.119849 ] ? fpregs_assert_state_consistent+0x99/0xa0
[ 385.119849 ] do_syscall_64+0x3d/0x90
[ 385.119849 ] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 385.119849 ] RIP: 0033:0x7fc36ef38504
[ 385.119849 ] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f
80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 385.119849 ] RSP: 002b:
00007ffde0ff3d08 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
[ 385.119849 ] RAX:
ffffffffffffffda RBX:
000000000000000c RCX:
00007fc36ef38504
[ 385.119849 ] RDX:
000000000000000c RSI:
00007fc370521040 RDI:
0000000000000001
[ 385.119849 ] RBP:
00007fc370521040 R08:
00007fc36f00b8c0 R09:
00007fc36ee4b740
[ 385.119849 ] R10:
0000000000000000 R11:
0000000000000246 R12:
00007fc36f00a760
[ 385.119849 ] R13:
000000000000000c R14:
00007fc36f005760 R15:
000000000000000c
[ 385.119849 ] </TASK>
[ 385.119849 ]
[ 385.119849 ] Allocated by task 65:
[ 385.119849 ] kasan_save_stack+0x1e/0x40
[ 385.119849 ] __kasan_kmalloc+0x81/0xa0
[ 385.119849 ] mlx5_init_fs+0x11b/0x1160
[ 385.119849 ] mlx5_load+0x13c/0x220
[ 385.119849 ] mlx5_load_one+0xda/0x160
[ 385.119849 ] mlx5_recover_device+0xb8/0x100
[ 385.119849 ] mlx5_health_try_recover+0x2f9/0x3a1
[ 385.119849 ] devlink_health_reporter_recover+0x75/0x100
[ 385.119849 ] devlink_health_report+0x26c/0x4b0
[ 385.275909 ] mlx5_fw_fatal_reporter_err_work+0x11e/0x1b0
[ 385.275909 ] process_one_work+0x520/0x970
[ 385.275909 ] worker_thread+0x378/0x950
[ 385.275909 ] kthread+0x1bb/0x200
[ 385.275909 ] ret_from_fork+0x1f/0x30
[ 385.275909 ]
[ 385.275909 ] Freed by task 65:
[ 385.275909 ] kasan_save_stack+0x1e/0x40
[ 385.275909 ] kasan_set_track+0x21/0x30
[ 385.275909 ] kasan_set_free_info+0x20/0x30
[ 385.275909 ] __kasan_slab_free+0xfc/0x140
[ 385.275909 ] kfree+0xa5/0x3b0
[ 385.275909 ] mlx5_unload+0x2e/0xb0
[ 385.275909 ] mlx5_unload_one+0x86/0xb0
[ 385.275909 ] mlx5_fw_fatal_reporter_err_work.cold+0xca/0xcf
[ 385.275909 ] process_one_work+0x520/0x970
[ 385.275909 ] worker_thread+0x378/0x950
[ 385.275909 ] kthread+0x1bb/0x200
[ 385.275909 ] ret_from_fork+0x1f/0x30
[ 385.275909 ]
[ 385.275909 ] The buggy address belongs to the object at
ffff888104b79300
[ 385.275909 ] which belongs to the cache kmalloc-128 of size 128
[ 385.275909 ] The buggy address is located 8 bytes inside of
[ 385.275909 ] 128-byte region [
ffff888104b79300,
ffff888104b79380)
[ 385.275909 ] The buggy address belongs to the page:
[ 385.275909 ] page:
00000000de44dd39 refcount:1 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x104b78
[ 385.275909 ] head:
00000000de44dd39 order:1 compound_mapcount:0
[ 385.275909 ] flags: 0x8000000000010200(slab|head|zone=2)
[ 385.275909 ] raw:
8000000000010200 0000000000000000 dead000000000122 ffff8881000428c0
[ 385.275909 ] raw:
0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 385.275909 ] page dumped because: kasan: bad access detected
[ 385.275909 ]
[ 385.275909 ] Memory state around the buggy address:
[ 385.275909 ]
ffff888104b79200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc
[ 385.275909 ]
ffff888104b79280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 385.275909 ] >
ffff888104b79300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 385.275909 ] ^
[ 385.275909 ]
ffff888104b79380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 385.275909 ]
ffff888104b79400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 385.275909 ]]
Fixes: e890acd5ff18 ("net/mlx5: Add devlink flow_steering_mode parameter")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Maor Dickman [Mon, 21 Mar 2022 08:07:44 +0000 (10:07 +0200)]
net/mlx5: DR, Fix missing flow_source when creating multi-destination FW table
[ Upstream commit
2c5fc6cd269ad3476da99dad02521d2af4a8e906 ]
In order to support multiple destination FTEs with SW steering
FW table is created with single FTE with multiple actions and
SW steering rule forward to it. When creating this table, flow
source isn't set according to the original FTE.
Fix this by passing the original FTE flow source to the created
FW table.
Fixes: 34583beea4b7 ("net/mlx5: DR, Create multi-destination table for SW-steering use")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Duoming Zhou [Tue, 17 May 2022 01:25:30 +0000 (09:25 +0800)]
NFC: nci: fix sleep in atomic context bugs caused by nci_skb_alloc
[ Upstream commit
23dd4581350d4ffa23d58976ec46408f8f4c1e16 ]
There are sleep in atomic context bugs when the request to secure
element of st-nci is timeout. The root cause is that nci_skb_alloc
with GFP_KERNEL parameter is called in st_nci_se_wt_timeout which is
a timer handler. The call paths that could trigger bugs are shown below:
(interrupt context 1)
st_nci_se_wt_timeout
nci_hci_send_event
nci_hci_send_data
nci_skb_alloc(..., GFP_KERNEL) //may sleep
(interrupt context 2)
st_nci_se_wt_timeout
nci_hci_send_event
nci_hci_send_data
nci_send_data
nci_queue_tx_data_frags
nci_skb_alloc(..., GFP_KERNEL) //may sleep
This patch changes allocation mode of nci_skb_alloc from GFP_KERNEL to
GFP_ATOMIC in order to prevent atomic context sleeping. The GFP_ATOMIC
flag makes memory allocation operation could be used in atomic context.
Fixes: ed06aeefdac3 ("nfc: st-nci: Rename st21nfcb to st-nci")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220517012530.75714-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Christophe JAILLET [Sun, 15 May 2022 18:07:02 +0000 (20:07 +0200)]
net/qla3xxx: Fix a test in ql_reset_work()
[ Upstream commit
5361448e45fac6fb96738df748229432a62d78b6 ]
test_bit() tests if one bit is set or not.
Here the logic seems to check of bit QL_RESET_PER_SCSI (i.e. 4) OR bit
QL_RESET_START (i.e. 3) is set.
In fact, it checks if bit 7 (4 | 3 = 7) is set, that is to say
QL_ADAPTER_UP.
This looks harmless, because this bit is likely be set, and when the
ql_reset_work() delayed work is scheduled in ql3xxx_isr() (the only place
that schedule this work), QL_RESET_START or QL_RESET_PER_SCSI is set.
This has been spotted by smatch.
Fixes: 5a4faa873782 ("[PATCH] qla3xxx NIC driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/80e73e33f390001d9c0140ffa9baddf6466a41a2.1652637337.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Codrin Ciubotariu [Wed, 13 Apr 2022 07:13:18 +0000 (10:13 +0300)]
clk: at91: generated: consider range when calculating best rate
[ Upstream commit
d0031e6fbed955ff8d5f5bbc8fe7382482559cec ]
clk_generated_best_diff() helps in finding the parent and the divisor to
compute a rate closest to the required one. However, it doesn't take into
account the request's range for the new rate. Make sure the new rate
is within the required range.
Fixes: 8a8f4bf0c480 ("clk: at91: clk-generated: create function to find best_diff")
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Link: https://lore.kernel.org/r/20220413071318.244912-1-codrin.ciubotariu@microchip.com
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Michal Wilczynski [Sun, 8 May 2022 23:33:48 +0000 (19:33 -0400)]
ice: Fix interrupt moderation settings getting cleared
[ Upstream commit
bf13502ed5f941b0777b3fd1e24dac5d93f3886c ]
Adaptive-rx and Adaptive-tx are interrupt moderation settings
that can be enabled/disabled using ethtool:
ethtool -C ethX adaptive-rx on/off adaptive-tx on/off
Unfortunately those settings are getting cleared after
changing number of queues, or in ethtool world 'channels':
ethtool -L ethX rx 1 tx 1
Clearing was happening due to introduction of bit fields
in ice_ring_container struct. This way only itr_setting
bits were rebuilt during ice_vsi_rebuild_set_coalesce().
Introduce an anonymous struct of bitfields and create a
union to refer to them as a single variable.
This way variable can be easily saved and restored.
Fixes: 61dc79ced7aa ("ice: Restore interrupt throttle settings after VSI rebuild")
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Paul Greenwalt [Thu, 28 Apr 2022 21:11:42 +0000 (14:11 -0700)]
ice: fix possible under reporting of ethtool Tx and Rx statistics
[ Upstream commit
31b6298fd8e29effe9ed6b77351ac5969be56ce0 ]
The hardware statistics counters are not cleared during resets so the
drivers first access is to initialize the baseline and then subsequent
reads are for reporting the counters. The statistics counters are read
during the watchdog subtask when the interface is up. If the baseline
is not initialized before the interface is up, then there can be a brief
window in which some traffic can be transmitted/received before the
initial baseline reading takes place.
Directly initialize ethtool statistics in driver open so the baseline will
be initialized when the interface is up, and any dropped packets
incremented before the interface is up won't be reported.
Fixes: 28dc1b86f8ea9 ("ice: ignore dropped packets during init")
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arkadiusz Kubalewski [Thu, 28 Apr 2022 08:33:50 +0000 (10:33 +0200)]
ice: fix crash when writing timestamp on RX rings
[ Upstream commit
4503cc7fdf9a84cd631b0cb8ecb3c9b1bdbf3594 ]
Do not allow to write timestamps on RX rings if PF is being configured.
When PF is being configured RX rings can be freed or rebuilt. If at the
same time timestamps are updated, the kernel will crash by dereferencing
null RX ring pointer.
PID: 1449 TASK:
ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51"
#0 [
ff1966a94a713bb0] machine_kexec at
ffffffff9d05a0be
#1 [
ff1966a94a713c08] __crash_kexec at
ffffffff9d192e9d
#2 [
ff1966a94a713cd0] crash_kexec at
ffffffff9d1941bd
#3 [
ff1966a94a713ce8] oops_end at
ffffffff9d01bd54
#4 [
ff1966a94a713d08] no_context at
ffffffff9d06bda4
#5 [
ff1966a94a713d60] __bad_area_nosemaphore at
ffffffff9d06c10c
#6 [
ff1966a94a713da8] do_page_fault at
ffffffff9d06cae4
#7 [
ff1966a94a713de0] page_fault at
ffffffff9da0107e
[exception RIP: ice_ptp_update_cached_phctime+91]
RIP:
ffffffffc076db8b RSP:
ff1966a94a713e98 RFLAGS:
00010246
RAX:
16e3db9c6b7ccae4 RBX:
ff187d269dd3c180 RCX:
ff187d269cd4d018
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
ff187d269cfcc644 R8:
ff187d339b9641b0 R9:
0000000000000000
R10:
0000000000000002 R11:
0000000000000000 R12:
ff187d269cfcc648
R13:
ffffffff9f128784 R14:
ffffffff9d101b70 R15:
ff187d269cfcc640
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0018
#8 [
ff1966a94a713ea0] ice_ptp_periodic_work at
ffffffffc076dbef [ice]
#9 [
ff1966a94a713ee0] kthread_worker_fn at
ffffffff9d101c1b
#10 [
ff1966a94a713f10] kthread at
ffffffff9d101b4d
#11 [
ff1966a94a713f50] ret_from_fork at
ffffffff9da0023f
Fixes: 77a781155a65 ("ice: enable receive hardware timestamping")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Dave Cain <dcain@redhat.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Mark Rutland [Mon, 16 May 2022 16:07:35 +0000 (17:07 +0100)]
arm64: kexec: load from kimage prior to clobbering
[ Upstream commit
eb3d8ea3e1f03f4b0b72d8f5ed9eb7c3165862e8 ]
In arm64_relocate_new_kernel() we load some fields out of the kimage
structure after relocation has occurred. As the kimage structure isn't
allocated to be relocation-safe, it may be clobbered during relocation,
and we may load junk values out of the structure.
Due to this, kexec may fail when the kimage allocation happens to fall
within a PA range that an object will be relocated to. This has been
observed to occur for regular kexec on a QEMU TCG 'virt' machine with
2GiB of RAM, where the PA range of the new kernel image overlaps the
kimage structure.
Avoid this by ensuring we load all values from the kimage structure
prior to relocation.
I've tested this atop v5.16 and v5.18-rc6.
Fixes: 878fdbd70486 ("arm64: kexec: pass kimage as the only argument to relocation function")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20220516160735.731404-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Zixuan Fu [Sat, 14 May 2022 05:07:11 +0000 (13:07 +0800)]
net: vmxnet3: fix possible NULL pointer dereference in vmxnet3_rq_cleanup()
[ Upstream commit
edf410cb74dc612fd47ef5be319c5a0bcd6e6ccd ]
In vmxnet3_rq_create(), when dma_alloc_coherent() fails,
vmxnet3_rq_destroy() is called. It sets rq->rx_ring[i].base to NULL. Then
vmxnet3_rq_create() returns an error to its callers mxnet3_rq_create_all()
-> vmxnet3_change_mtu(). Then vmxnet3_change_mtu() calls
vmxnet3_force_close() -> dev_close() in error handling code. And the driver
calls vmxnet3_close() -> vmxnet3_quiesce_dev() -> vmxnet3_rq_cleanup_all()
-> vmxnet3_rq_cleanup(). In vmxnet3_rq_cleanup(),
rq->rx_ring[ring_idx].base is accessed, but this variable is NULL, causing
a NULL pointer dereference.
To fix this possible bug, an if statement is added to check whether
rq->rx_ring[0].base is NULL in vmxnet3_rq_cleanup() and exit early if so.
The error log in our fault-injection testing is shown as follows:
[ 65.220135] BUG: kernel NULL pointer dereference, address:
0000000000000008
...
[ 65.222633] RIP: 0010:vmxnet3_rq_cleanup_all+0x396/0x4e0 [vmxnet3]
...
[ 65.227977] Call Trace:
...
[ 65.228262] vmxnet3_quiesce_dev+0x80f/0x8a0 [vmxnet3]
[ 65.228580] vmxnet3_close+0x2c4/0x3f0 [vmxnet3]
[ 65.228866] __dev_close_many+0x288/0x350
[ 65.229607] dev_close_many+0xa4/0x480
[ 65.231124] dev_close+0x138/0x230
[ 65.231933] vmxnet3_force_close+0x1f0/0x240 [vmxnet3]
[ 65.232248] vmxnet3_change_mtu+0x75d/0x920 [vmxnet3]
...
Fixes: d1a890fa37f27 ("net: VMware virtual Ethernet NIC driver: vmxnet3")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Zixuan Fu <r33s3n6@gmail.com>
Link: https://lore.kernel.org/r/20220514050711.2636709-1-r33s3n6@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Zixuan Fu [Sat, 14 May 2022 05:06:56 +0000 (13:06 +0800)]
net: vmxnet3: fix possible use-after-free bugs in vmxnet3_rq_alloc_rx_buf()
[ Upstream commit
9e7fef9521e73ca8afd7da9e58c14654b02dfad8 ]
In vmxnet3_rq_alloc_rx_buf(), when dma_map_single() fails, rbi->skb is
freed immediately. Similarly, in another branch, when dma_map_page() fails,
rbi->page is also freed. In the two cases, vmxnet3_rq_alloc_rx_buf()
returns an error to its callers vmxnet3_rq_init() -> vmxnet3_rq_init_all()
-> vmxnet3_activate_dev(). Then vmxnet3_activate_dev() calls
vmxnet3_rq_cleanup_all() in error handling code, and rbi->skb or rbi->page
are freed again in vmxnet3_rq_cleanup_all(), causing use-after-free bugs.
To fix these possible bugs, rbi->skb and rbi->page should be cleared after
they are freed.
The error log in our fault-injection testing is shown as follows:
[ 14.319016] BUG: KASAN: use-after-free in consume_skb+0x2f/0x150
...
[ 14.321586] Call Trace:
...
[ 14.325357] consume_skb+0x2f/0x150
[ 14.325671] vmxnet3_rq_cleanup_all+0x33a/0x4e0 [vmxnet3]
[ 14.326150] vmxnet3_activate_dev+0xb9d/0x2ca0 [vmxnet3]
[ 14.326616] vmxnet3_open+0x387/0x470 [vmxnet3]
...
[ 14.361675] Allocated by task 351:
...
[ 14.362688] __netdev_alloc_skb+0x1b3/0x6f0
[ 14.362960] vmxnet3_rq_alloc_rx_buf+0x1b0/0x8d0 [vmxnet3]
[ 14.363317] vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3]
[ 14.363661] vmxnet3_open+0x387/0x470 [vmxnet3]
...
[ 14.367309]
[ 14.367412] Freed by task 351:
...
[ 14.368932] __dev_kfree_skb_any+0xd2/0xe0
[ 14.369193] vmxnet3_rq_alloc_rx_buf+0x71e/0x8d0 [vmxnet3]
[ 14.369544] vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3]
[ 14.369883] vmxnet3_open+0x387/0x470 [vmxnet3]
[ 14.370174] __dev_open+0x28a/0x420
[ 14.370399] __dev_change_flags+0x192/0x590
[ 14.370667] dev_change_flags+0x7a/0x180
[ 14.370919] do_setlink+0xb28/0x3570
[ 14.371150] rtnl_newlink+0x1160/0x1740
[ 14.371399] rtnetlink_rcv_msg+0x5bf/0xa50
[ 14.371661] netlink_rcv_skb+0x1cd/0x3e0
[ 14.371913] netlink_unicast+0x5dc/0x840
[ 14.372169] netlink_sendmsg+0x856/0xc40
[ 14.372420] ____sys_sendmsg+0x8a7/0x8d0
[ 14.372673] __sys_sendmsg+0x1c2/0x270
[ 14.372914] do_syscall_64+0x41/0x90
[ 14.373145] entry_SYSCALL_64_after_hwframe+0x44/0xae
...
Fixes: 5738a09d58d5a ("vmxnet3: fix checks for dma mapping errors")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Zixuan Fu <r33s3n6@gmail.com>
Link: https://lore.kernel.org/r/20220514050656.2636588-1-r33s3n6@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Christophe JAILLET [Sun, 15 May 2022 17:01:56 +0000 (19:01 +0200)]
net: systemport: Fix an error handling path in bcm_sysport_probe()
[ Upstream commit
ef6b1cd11962aec21c58d137006ab122dbc8d6fd ]
if devm_clk_get_optional() fails, we still need to go through the error
handling path.
Add the missing goto.
Fixes: 6328a126896ea ("net: systemport: Manage Wake-on-LAN clock")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/99d70634a81c229885ae9e4ee69b2035749f7edc.1652634040.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Horatiu Vultur [Fri, 13 May 2022 18:00:30 +0000 (20:00 +0200)]
net: lan966x: Fix assignment of the MAC address
[ Upstream commit
af8ca6eaa9b24a90484218e356f959a94bff22fa ]
The following two scenarios were failing for lan966x.
1. If the port had the address X and then trying to assign the same
address, then the HW was just removing this address because first it
tries to learn new address and then delete the old one. As they are
the same the HW remove it.
2. If the port eth0 was assigned the same address as one of the other
ports eth1 then when assigning back the address to eth0 then the HW
was deleting the address of eth1.
The case 1. is fixed by checking if the port has already the same
address while case 2. is fixed by checking if the address is used by any
other port.
Fixes: e18aba8941b40b ("net: lan966x: add mactable support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20220513180030.3076793-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pali Rohár [Sun, 15 May 2022 12:58:15 +0000 (14:58 +0200)]
Revert "PCI: aardvark: Rewrite IRQ code to chained IRQ handler"
[ Upstream commit
a3b69dd0ad6265c29c4b6fb381cd76fb3bebdf8c ]
This reverts commit
1571d67dc190e50c6c56e8f88cdc39f7cc53166e.
This commit broke support for setting interrupt affinity. It looks like
that it is related to the chained IRQ handler. Revert this commit until
issue with setting interrupt affinity is fixed.
Fixes: 1571d67dc190 ("PCI: aardvark: Rewrite IRQ code to chained IRQ handler")
Link: https://lore.kernel.org/r/20220515125815.30157-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jonathan Lemon [Fri, 13 May 2022 22:52:31 +0000 (15:52 -0700)]
ptp: ocp: have adjtime handle negative delta_ns correctly
[ Upstream commit
da2172a9bfec858ceeb0271b9d444378490398c8 ]
delta_ns is a s64, but it was being passed ptp_ocp_adjtime_coarse
as an u64. Also, it turns out that timespec64_add_ns() only handles
positive values, so perform the math with set_normalized_timespec().
Fixes: 90f8f4c0e3ce ("ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments")
Suggested-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/20220513225231.1412-1-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Felix Fietkau [Mon, 9 May 2022 12:26:16 +0000 (14:26 +0200)]
netfilter: nft_flow_offload: fix offload with pppoe + vlan
[ Upstream commit
2456074935003b66c40f78df6adfc722435d43ea ]
When running a combination of PPPoE on top of a VLAN, we need to set
info->outdev to the PPPoE device, otherwise PPPoE encap is skipped
during software offload.
Fixes: 72efd585f714 ("netfilter: flowtable: add pppoe support")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Felix Fietkau [Mon, 9 May 2022 12:26:15 +0000 (14:26 +0200)]
net: fix dev_fill_forward_path with pppoe + bridge
[ Upstream commit
cf2df74e202d81b09f09d84c2d8903e0e87e9274 ]
When calling dev_fill_forward_path on a pppoe device, the provided destination
address is invalid. In order for the bridge fdb lookup to succeed, the pppoe
code needs to update ctx->daddr to the correct value.
Fix this by storing the address inside struct net_device_path_ctx
Fixes: f6efc675c9dd ("net: ppp: resolve forwarding path for bridge pppoe devices")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Felix Fietkau [Mon, 9 May 2022 12:26:14 +0000 (14:26 +0200)]
netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices
[ Upstream commit
45ca3e61999e9a30ca2b7cfbf9da8a9f8d13be31 ]
The dst entry does not contain a valid hardware address, so skip the lookup
in order to avoid running into errors here.
The proper hardware address is filled in from nft_dev_path_info
Fixes: 72efd585f714 ("netfilter: flowtable: add pppoe support")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Felix Fietkau [Mon, 9 May 2022 12:26:13 +0000 (14:26 +0200)]
netfilter: flowtable: fix excessive hw offload attempts after failure
[ Upstream commit
396ef64113a8ba01c46315d67a99db8dde3eef51 ]
If a flow cannot be offloaded, the code currently repeatedly tries again as
quickly as possible, which can significantly increase system load.
Fix this by limiting flow timeout update and hardware offload retry to once
per second.
Fixes: c07531c01d82 ("netfilter: flowtable: Remove redundant hw refresh bit")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Paolo Abeni [Fri, 13 May 2022 09:27:06 +0000 (11:27 +0200)]
net/sched: act_pedit: sanitize shift argument before usage
[ Upstream commit
4d42d54a7d6aa6d29221d3fd4f2ae9503e94f011 ]
syzbot was able to trigger an Out-of-Bound on the pedit action:
UBSAN: shift-out-of-bounds in net/sched/act_pedit.c:238:43
shift exponent
1400735974 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 3606 Comm: syz-executor151 Not tainted
5.18.0-rc5-syzkaller-00165-g810c2f0a3f86 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
ubsan_epilogue+0xb/0x50 lib/ubsan.c:151
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x187 lib/ubsan.c:322
tcf_pedit_init.cold+0x1a/0x1f net/sched/act_pedit.c:238
tcf_action_init_1+0x414/0x690 net/sched/act_api.c:1367
tcf_action_init+0x530/0x8d0 net/sched/act_api.c:1432
tcf_action_add+0xf9/0x480 net/sched/act_api.c:1956
tc_ctl_action+0x346/0x470 net/sched/act_api.c:2015
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5993
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e2/0x800 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fe36e9e1b59
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007ffef796fe88 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007fe36e9e1b59
RDX:
0000000000000000 RSI:
0000000020000300 RDI:
0000000000000003
RBP:
00007fe36e9a5d00 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00007fe36e9a5d90
R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
</TASK>
The 'shift' field is not validated, and any value above 31 will
trigger out-of-bounds. The issue predates the git history, but
syzbot was able to trigger it only after the commit mentioned in
the fixes tag, and this change only applies on top of such commit.
Address the issue bounding the 'shift' value to the maximum allowed
by the relevant operator.
Reported-and-tested-by: syzbot+8ed8fc4c57e9dcf23ca6@syzkaller.appspotmail.com
Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Eyal Birger [Fri, 13 May 2022 20:34:02 +0000 (23:34 +0300)]
xfrm: fix "disable_policy" flag use when arriving from different devices
[ Upstream commit
e6175a2ed1f18bf2f649625bf725e07adcfa6a28 ]
In IPv4 setting the "disable_policy" flag on a device means no policy
should be enforced for traffic originating from the device. This was
implemented by seting the DST_NOPOLICY flag in the dst based on the
originating device.
However, dsts are cached in nexthops regardless of the originating
devices, in which case, the DST_NOPOLICY flag value may be incorrect.
Consider the following setup:
+------------------------------+
| ROUTER |
+-------------+ | +-----------------+ |
| ipsec src |----|-|ipsec0 | |
+-------------+ | |disable_policy=0 | +----+ |
| +-----------------+ |eth1|-|-----
+-------------+ | +-----------------+ +----+ |
| noipsec src |----|-|eth0 | |
+-------------+ | |disable_policy=1 | |
| +-----------------+ |
+------------------------------+
Where ROUTER has a default route towards eth1.
dst entries for traffic arriving from eth0 would have DST_NOPOLICY
and would be cached and therefore can be reused by traffic originating
from ipsec0, skipping policy check.
Fix by setting a IPSKB_NOPOLICY flag in IPCB and observing it instead
of the DST in IN/FWD IPv4 policy checks.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Nicolas Dichtel [Mon, 14 Mar 2022 10:38:22 +0000 (11:38 +0100)]
xfrm: rework default policy structure
[ Upstream commit
b58b1f563ab78955d37e9e43e02790a85c66ac05 ]
This is a follow up of commit
f8d858e607b2 ("xfrm: make user policy API
complete"). The goal is to align userland API to the internal structures.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Umesh Nerlige Ramappa [Tue, 26 Apr 2022 00:30:45 +0000 (17:30 -0700)]
i915/guc/reset: Make __guc_reset_context aware of guilty engines
[ Upstream commit
89e96d822bd51f7afe2d3e95a34099480b5c3d55 ]
There are 2 ways an engine can get reset in i915 and the method of reset
affects how KMD labels a context as guilty/innocent.
(1) GuC initiated engine-reset: GuC resets a hung engine and notifies
KMD. The context that hung on the engine is marked guilty and all other
contexts are innocent. The innocent contexts are resubmitted.
(2) GT based reset: When an engine heartbeat fails to tick, KMD
initiates a gt/chip reset. All active contexts are marked as guilty and
discarded.
In order to correctly mark the contexts as guilty/innocent, pass a mask
of engines that were reset to __guc_reset_context.
Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220426003045.3929439-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit
303760aa914b7f5ac9602dbb4b471a2ad52eeb3e)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Harini Katakam [Thu, 12 May 2022 17:19:00 +0000 (22:49 +0530)]
net: macb: Increment rx bd head after allocating skb and buffer
[ Upstream commit
9500acc631dbb8b73166e25700e656b11f6007b6 ]
In gem_rx_refill rx_prepared_head is incremented at the beginning of
the while loop preparing the skb and data buffers. If the skb or data
buffer allocation fails, this BD will be unusable BDs until the head
loops back to the same BD (and obviously buffer allocation succeeds).
In the unlikely event that there's a string of allocation failures,
there will be an equal number of unusable BDs and an inconsistent RX
BD chain. Hence increment the head at the end of the while loop to be
clean.
Fixes: 4df95131ea80 ("net/macb: change RX path for GEM")
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20220512171900.32593-1-harini.katakam@xilinx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Paolo Abeni [Thu, 12 May 2022 23:26:41 +0000 (16:26 -0700)]
mptcp: fix subflow accounting on close
[ Upstream commit
95d686517884a403412b000361cee2b08b2ed1e6 ]
If the PM closes a fully established MPJ subflow or the subflow
creation errors out in it's early stage the subflows counter is
not bumped accordingly.
This change adds the missing accounting, additionally taking care
of updating accordingly the 'accept_subflow' flag.
Fixes: a88c9e496937 ("mptcp: do not block subflows creation on errors")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Bart Van Assche [Fri, 13 May 2022 17:13:07 +0000 (10:13 -0700)]
block/mq-deadline: Set the fifo_time member also if inserting at head
[ Upstream commit
725f22a1477c9c15aa67ad3af96fe28ec4fe72d2 ]
Before commit
322cff70d46c the fifo_time member of requests on a dispatch
list was not used. Commit
322cff70d46c introduces code that reads the
fifo_time member of requests on dispatch lists. Hence this patch that sets
the fifo_time member when adding a request to a dispatch list.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Fixes: 322cff70d46c ("block/mq-deadline: Prioritize high-priority requests")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220513171307.32564-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Alex Elder [Thu, 12 May 2022 15:10:32 +0000 (10:10 -0500)]
net: ipa: record proper RX transaction count
[ Upstream commit
d8290cbe1111105f92f0c8ab455bec8bf98d0630 ]
Each time we are notified that some number of transactions on an RX
channel has completed, we record the number of bytes that have been
transferred since the previous notification. We also track the
number of transactions completed, but that is not currently being
calculated correctly; we're currently counting the number of such
notifications, but each notification can represent many transaction
completions. Fix this.
Fixes: 650d1603825d8 ("soc: qcom: ipa: the generic software interface")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Alex Elder [Thu, 12 May 2022 15:10:31 +0000 (10:10 -0500)]
net: ipa: certain dropped packets aren't accounted for
[ Upstream commit
30b338ff7998b6ed7a90815870cd5db725f87168 ]
If an RX endpoint receives packets containing status headers, and a
packet in the buffer is not dropped, ipa_endpoint_skb_copy() is
responsible for wrapping the packet data in an SKB and forwarding it
to ipa_modem_skb_rx() for further processing.
If ipa_endpoint_skb_copy() gets a null pointer from build_skb(), it
just returns early. But in the process it doesn't record that as a
dropped packet in the network device statistics.
Instead, call ipa_modem_skb_rx() whether or not the SKB pointer is
NULL; that function ensures the statistics are properly updated.
Fixes: 1b65bbcc9a710 ("net: ipa: skip SKB copy if no netdev")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Randy Dunlap [Sat, 30 Apr 2022 19:33:18 +0000 (12:33 -0700)]
ALSA: hda - fix unused Realtek function when PM is not enabled
[ Upstream commit
c3d9ca93f1e3bd3d1adfc4479a12c82fed424c87 ]
When CONFIG_PM is not enabled, alc_shutup() is not needed,
so move it inside the #ifdef CONFIG_PM guard.
Also drop some contiguous #endif / #ifdef CONFIG_PM for simplicity.
Fixes this build warning:
sound/pci/hda/patch_realtek.c:886:20: warning: unused function 'alc_shutup'
Fixes: 08c189f2c552 ("ALSA: hda - Use generic parser codes for Realtek driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20220430193318.29024-1-rdunlap@infradead.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Mattijs Korpershoek [Tue, 26 Apr 2022 12:57:14 +0000 (14:57 +0200)]
pinctrl: mediatek: mt8365: fix IES control pins
[ Upstream commit
f680058f406863b55ac226d1c157701939c63db4 ]
IES26 (BIT 16 of IES1_CFG_ADDR) controls the following pads:
- PAD_I2S_DATA_IN (GPIO114)
- PAD_I2S_LRCK (GPIO115)
- PAD_I2S_BCK (GPIO116)
The pinctrl table is wrong since it lists pins 114 to 112.
Update the table with the correct values.
Fixes: e94d8b6fb83a ("pinctrl: mediatek: add support for mt8365 SoC")
Reported-by: Youngmin Han <Youngmin.Han@geappliances.com>
Signed-off-by: Mattijs Korpershoek <mkorpershoek@baylibre.com>
Link: https://lore.kernel.org/r/20220426125714.298907-1-mkorpershoek@baylibre.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>