Xu Yu [Fri, 29 Apr 2022 06:14:43 +0000 (23:14 -0700)]
mm/huge_memory: do not overkill when splitting huge_zero_page
Kernel panic when injecting memory_failure for the global huge_zero_page,
when CONFIG_DEBUG_VM is enabled, as follows.
Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
page:
00000000fb053fc3 refcount:2 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x109e00
head:
00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
raw:
017fffc000010001 0000000000000000 dead000000000122 0000000000000000
raw:
0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:2499!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS
3288b3c 04/01/2014
RIP: 0010:split_huge_page_to_list+0x66a/0x880
Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
RSP: 0018:
ffffc90000dcbdf8 EFLAGS:
00010246
RAX:
000000000000003c RBX:
0000000000000001 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
ffffffff823e4c4f RDI:
00000000ffffffff
RBP:
ffff88843fffdb40 R08:
0000000000000000 R09:
00000000fffeffff
R10:
ffffc90000dcbc48 R11:
ffffffff82d68448 R12:
ffffea0004278000
R13:
ffffffff823c6203 R14:
0000000000109ff9 R15:
ffffea000427fe40
FS:
00007fc375a26740(0000) GS:
ffff88842fd80000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fc3757c9290 CR3:
0000000102174006 CR4:
00000000003706e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
try_to_split_thp_page+0x3a/0x130
memory_failure+0x128/0x800
madvise_inject_error.cold+0x8b/0xa1
__x64_sys_madvise+0x54/0x60
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc3754f8bf9
Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
RSP: 002b:
00007ffeda93a1d8 EFLAGS:
00000217 ORIG_RAX:
000000000000001c
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007fc3754f8bf9
RDX:
0000000000000064 RSI:
0000000000003000 RDI:
0000000020ff9000
RBP:
00007ffeda93a200 R08:
0000000000000000 R09:
0000000000000000
R10:
00000000ffffffff R11:
0000000000000217 R12:
0000000000400490
R13:
00007ffeda93a2e0 R14:
0000000000000000 R15:
0000000000000000
We think that raising BUG is overkilling for splitting huge_zero_page, the
huge_zero_page can't be met from normal paths other than memory failure,
but memory failure is a valid caller. So we tend to replace the BUG to
WARN + returning -EBUSY, and thus the panic above won't happen again.
Link: https://lkml.kernel.org/r/f35f8b97377d5d3ede1bc5ac3114da888c57cbce.1651052574.git.xuyu@linux.alibaba.com
Fixes: d173d5417fb6 ("mm/memory-failure.c: skip huge_zero_page in memory_failure()")
Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7")
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Suggested-by: Yang Shi <shy828301@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Xu Yu [Fri, 29 Apr 2022 06:14:43 +0000 (23:14 -0700)]
Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
Patch series "mm/memory-failure: rework fix on huge_zero_page splitting".
This patch (of 2):
This reverts commit
d173d5417fb67411e623d394aab986d847e47dad.
The commit
d173d5417fb6 ("mm/memory-failure.c: skip huge_zero_page in
memory_failure()") explicitly skips huge_zero_page in memory_failure(), in
order to avoid triggering VM_BUG_ON_PAGE on huge_zero_page in
split_huge_page_to_list().
This works, but Yang Shi thinks that,
Raising BUG is overkilling for splitting huge_zero_page. The
huge_zero_page can't be met from normal paths other than memory
failure, but memory failure is a valid caller. So I tend to replace
the BUG to WARN + returning -EBUSY. If we don't care about the
reason code in memory failure, we don't have to touch memory
failure.
And for the issue that huge_zero_page will be set PG_has_hwpoisoned,
Yang Shi comments that,
The anonymous page fault doesn't check if the page is poisoned or
not since it typically gets a fresh allocated page and assumes the
poisoned page (isolated successfully) can't be reallocated again.
But huge zero page and base zero page are reused every time. So no
matter what fix we pick, the issue is always there.
Finally, Yang, David, Anshuman and Naoya all agree to fix the bug, i.e.,
to split huge_zero_page, in split_huge_page_to_list().
This reverts the commit
d173d5417fb6 ("mm/memory-failure.c: skip
huge_zero_page in memory_failure()"), and the original bug will be fixed
by the next patch.
Link: https://lkml.kernel.org/r/872cefb182ba1dd686b0e7db1e6b2ebe5a4fff87.1651039624.git.xuyu@linux.alibaba.com
Fixes: d173d5417fb6 ("mm/memory-failure.c: skip huge_zero_page in memory_failure()")
Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7")
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Suggested-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wen Gong [Wed, 27 Apr 2022 11:16:19 +0000 (14:16 +0300)]
ath11k: reduce the wait time of 11d scan and hw scan while add interface
(cherry picked from commit
1f682dc9fb3790aa7ec27d3d122ff32b1eda1365 in wireless-next)
Currently ath11k will wait 11d scan complete while add interface in
ath11k_mac_op_add_interface(), when system resume without enable
wowlan, ath11k_mac_op_add_interface() is called for each resume, thus
it increase the resume time of system. And ath11k_mac_op_hw_scan()
after ath11k_mac_op_add_interface() also needs some time cost because
the previous 11d scan need more than 5 seconds when 6 GHz is enabled,
then the scan started event will indicated to ath11k after the 11d
scan completed.
While 11d scan/hw scan is running in firmware, if ath11k update channel
list to firmware by WMI_SCAN_CHAN_LIST_CMDID, then firmware will cancel
the current scan which is running, it lead the scan failed. The patch
commit
9dcf6808b253 ("ath11k: add 11d scan offload support") used
finish_11d_scan/finish_11d_ch_list/pending_11d to synchronize the 11d
scan/hw scan/channel list between ath11k/firmware/mac80211 and to avoid
the scan fail.
Add wait operation before ath11k update channel list, function
ath11k_reg_update_chan_list() will wait until the current 11d scan/hw
scan completed. And remove the wait operation of start 11d scan and
waiting channel list complete in hw scan. After these changes, resume
time cost reduce about 5 seconds and also hw scan time cost reduced
obviously, and scan failed not seen.
The 11d scan is sent to firmware only one time for each interface added
in mac.c, and it is moved after the 1st hw scan because 11d scan will
cost some time and thus leads the AP scan result update to UI delay.
Currently priority of ath11k's hw scan is WMI_SCAN_PRIORITY_LOW, and
priority of 11d scan in firmware is WMI_SCAN_PRIORITY_MEDIUM, then the
11d scan which sent after hw scan will cancel the hw scan in firmware,
so change the priority to WMI_SCAN_PRIORITY_MEDIUM for the hw scan which
is in front of the 11d scan, thus it will not happen scan cancel in
firmware.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Fixes: 9dcf6808b253 ("ath11k: add 11d scan offload support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215777
Cc: <stable@vger.kernel.org>
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220328035832.14122-1-quic_wgong@quicinc.com
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220427111619.9758-1-kvalo@kernel.org
Linus Torvalds [Fri, 29 Apr 2022 01:00:34 +0000 (18:00 -0700)]
Merge tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Another relatively quiet week, amdgpu leads the way, some i915 display
fixes, and a single sunxi fix.
amdgpu:
- Runtime pm fix
- DCN memory leak fix in error path
- SI DPM deadlock fix
- S0ix fix
amdkfd:
- GWS fix
- GWS support for CRIU
i915:
- Fix #5284: Backlight control regression on XMG Core 15 e21
- Fix black display plane on Acer One AO532h
- Two smaller display fixes
sunxi:
- Single fix removing applying PHYS_OFFSET twice"
* tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend
drm/amd/pm: fix the deadlock issue observed on SI
drm/amd/display: Fix memory leak in dcn21_clock_source_create
drm/amdgpu: don't runtime suspend if there are displays attached (v3)
drm/amdkfd: CRIU add support for GWS queues
drm/amdkfd: Fix GWS queue count
drm/sun4i: Remove obsolete references to PHYS_OFFSET
drm/i915/fbc: Consult hw.crtc instead of uapi.crtc
drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addresses
drm/i915: Check EDID for HDR static metadata when choosing blc
drm/i915: Fix DISP_POS_Y and DISP_HEIGHT defines
Dave Airlie [Fri, 29 Apr 2022 00:27:04 +0000 (10:27 +1000)]
Merge tag 'amd-drm-fixes-5.18-2022-04-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.18-2022-04-27:
amdgpu:
- Runtime pm fix
- DCN memory leak fix in error path
- SI DPM deadlock fix
- S0ix fix
amdkfd:
- GWS fix
- GWS support for CRIU
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220428023232.5794-1-alexander.deucher@amd.com
Dave Airlie [Fri, 29 Apr 2022 00:17:46 +0000 (10:17 +1000)]
Merge tag 'drm-intel-fixes-2022-04-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix #5284: Backlight control regression on XMG Core 15 e21
- Fix black display plane on Acer One AO532h
- Two smaller display fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Ymotel5VfZUrJahf@jlahtine-mobl.ger.corp.intel.com
Dave Airlie [Fri, 29 Apr 2022 00:02:04 +0000 (10:02 +1000)]
Merge tag 'drm-misc-fixes-2022-04-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.18-rc5:
- Single fix removing applying PHYS_OFFSET twice in sunxi.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f692bb62-5620-1868-91b7-dffb8d6f9175@linux.intel.com
Kurt Kanzenbach [Thu, 28 Apr 2022 06:24:32 +0000 (08:24 +0200)]
timekeeping: Mark NMI safe time accessors as notrace
Mark the CLOCK_MONOTONIC fast time accessors as notrace. These functions are
used in tracing to retrieve timestamps, so they should not recurse.
Fixes: 4498e7467e9e ("time: Parametrize all tk_fast_mono users")
Fixes: f09cb9a1808e ("time: Introduce tk_fast_raw")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220426175338.3807ca4f@gandalf.local.home/
Link: https://lore.kernel.org/r/20220428062432.61063-1-kurt@linutronix.de
Linus Torvalds [Thu, 28 Apr 2022 19:34:50 +0000 (12:34 -0700)]
Merge tag 'net-5.18-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and netfilter.
Current release - new code bugs:
- bridge: switchdev: check br_vlan_group() return value
- use this_cpu_inc() to increment net->core_stats, fix preempt-rt
Previous releases - regressions:
- eth: stmmac: fix write to sgmii_adapter_base
Previous releases - always broken:
- netfilter: nf_conntrack_tcp: re-init for syn packets only,
resolving issues with TCP fastopen
- tcp: md5: fix incorrect tcp_header_len for incoming connections
- tcp: fix F-RTO may not work correctly when receiving DSACK
- tcp: ensure use of most recently sent skb when filling rate samples
- tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
- virtio_net: fix wrong buf address calculation when using xdp
- xsk: fix forwarding when combining copy mode with busy poll
- xsk: fix possible crash when multiple sockets are created
- bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
bpf_xmit lwt hook
- sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event
- wireguard: device: check for metadata_dst with skb_valid_dst()
- netfilter: update ip6_route_me_harder to consider L3 domain
- gre: make o_seqno start from 0 in native mode
- gre: switch o_seqno to atomic to prevent races in collect_md mode
Misc:
- add Eric Dumazet to networking maintainers
- dt: dsa: realtek: remove realtek,rtl8367s string
- netfilter: flowtable: Remove the empty file"
* tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
tcp: fix F-RTO may not work correctly when receiving DSACK
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
ixgbe: ensure IPsec VF<->PF compatibility
MAINTAINERS: Update BNXT entry with firmware files
netfilter: nft_socket: only do sk lookups when indev is available
net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
bnx2x: fix napi API usage sequence
tls: Skip tls_append_frag on zero copy size
Add Eric Dumazet to networking maintainers
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
net: Use this_cpu_inc() to increment net->core_stats
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
ice: fix use-after-free when deinitializing mailbox snapshot
ice: wait 5 s for EMP reset after firmware flash
ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
...
Joseph Ravichandran [Thu, 28 Apr 2022 16:57:52 +0000 (12:57 -0400)]
io_uring: fix uninitialized field in rw io_kiocb
io_rw_init_file does not initialize kiocb->private, so when iocb_bio_iopoll
reads kiocb->private it can contain uninitialized data.
Fixes: 3e08773c3841 ("block: switch polling to be bio based")
Signed-off-by: Joseph Ravichandran <jravi@mit.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 28 Apr 2022 18:57:00 +0000 (11:57 -0700)]
Merge tag 'thermal-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These take back recent chages that started to confuse users and fix up
an attr.show callback prototype in a driver.
Specifics:
- Stop warning about deprecation of the userspace thermal governor
and cooling device status interface, because there are cases in
which user space has to drive thermal management with the help of
them (Daniel Lezcano)
- Fix attr.show callback prototype in the int340x thermal driver
(Kees Cook)"
* tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/governor: Remove deprecated information
Revert "thermal/core: Deprecate changing cooling device state from userspace"
thermal: int340x: Fix attr.show callback prototype
Linus Torvalds [Thu, 28 Apr 2022 18:50:21 +0000 (11:50 -0700)]
Merge tag 'pm-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix up recent intel_idle driver changes and fix some ARM cpufreq
driver issues.
Specifics:
- Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov,
Vladimir Zapolskiy).
- Fix memory leak with the Sun501 driver (Xiaobing Luo).
- Make intel_idle enable C1E promotion on all CPUs when C1E is
preferred to C1 (Artem Bityutskiy).
- Make C6 optimization on Sapphire Rapids added recently work as
expected if both C1E and C1 are "preferred" (Artem Bityutskiy)"
* tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: Fix SPR C6 optimization
intel_idle: Fix the 'preferred_cstates' module parameter
cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
cpufreq: qcom-hw: provide online/offline operations
cpufreq: qcom-hw: fix the opp entries refcounting
cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
Linus Torvalds [Thu, 28 Apr 2022 18:37:20 +0000 (11:37 -0700)]
Merge tag 'acpi-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael WysockiL
"These fix up the ACPI processor driver after a change made during the
5.16 cycle that inadvertently broke falling back to shallower C-states
when C3 cannot be used.
Specifics:
- Make the ACPI processor driver avoid falling back to C3 type of
C-states when C3 cannot be requested (Ville Syrjälä)
- Revert a quirk that is not necessary any more after fixing the
underlying issue properly (Ville Syrjälä)"
* tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40"
ACPI: processor: idle: Avoid falling back to C3 type C-states
Linus Torvalds [Thu, 28 Apr 2022 18:13:00 +0000 (11:13 -0700)]
Merge tag 'platform-drivers-x86-v5.18-3' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Highlights:
- asus-wmi bug-fixes
- intel-sdsu bug-fixes
- build (warning) fixes
- couple of hw-id additions"
* tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel: pmc/core: change pmc_lpm_modes to static
platform/x86/intel/sdsi: Fix bug in multi packet reads
platform/x86/intel/sdsi: Poll on ready bit for writes
platform/x86/intel/sdsi: Handle leaky bucket
platform/x86: intel-uncore-freq: Prevent driver loading in guests
platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboard
platform/x86: dell-laptop: Add quirk entry for Latitude 7520
platform/x86: asus-wmi: Fix driver not binding when fan curve control probe fails
platform/x86: asus-wmi: Potential buffer overflow in asus_wmi_evaluate_method_buf()
tools/power/x86/intel-speed-select: fix build failure when using -Wl,--as-needed
Linus Torvalds [Thu, 28 Apr 2022 18:07:49 +0000 (11:07 -0700)]
Merge tag 'regulator-fix-v5.18-rc4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A minor fix for the DT binding documentation of the rt5190a driver"
* tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: Revise the rt5190a buck/ldo description
Pengcheng Yang [Tue, 26 Apr 2022 10:03:39 +0000 (18:03 +0800)]
tcp: fix F-RTO may not work correctly when receiving DSACK
Currently DSACK is regarded as a dupack, which may cause
F-RTO to incorrectly enter "loss was real" when receiving
DSACK.
Packetdrill to demonstrate:
// Enable F-RTO and TLP
0 `sysctl -q net.ipv4.tcp_frto=2`
0 `sysctl -q net.ipv4.tcp_early_retrans=3`
0 `sysctl -q net.ipv4.tcp_congestion_control=cubic`
// Establish a connection
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
// RTT 10ms, RTO 210ms
+.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.01 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
// Send 2 data segments
+0 write(4, ..., 2000) = 2000
+0 > P. 1:2001(2000) ack 1
// TLP
+.022 > P. 1001:2001(1000) ack 1
// Continue to send 8 data segments
+0 write(4, ..., 10000) = 10000
+0 > P. 2001:10001(8000) ack 1
// RTO
+.188 > . 1:1001(1000) ack 1
// The original data is acked and new data is sent(F-RTO step 2.b)
+0 < . 1:1(0) ack 2001 win 257
+0 > P. 10001:12001(2000) ack 1
// D-SACK caused by TLP is regarded as a dupack, this results in
// the incorrect judgment of "loss was real"(F-RTO step 3.a)
+.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop>
// Never-retransmitted data(3001:4001) are acked and
// expect to switch to open state(F-RTO step 3.b)
+0 < . 1:1(0) ack 4001 win 257
+0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }%
Fixes: e33099f96d99 ("tcp: implement RFC5682 F-RTO")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1650967419-2150-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 28 Apr 2022 16:55:59 +0000 (09:55 -0700)]
Merge git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix incorrect TCP connection tracking window reset for non-syn
packets, from Florian Westphal.
2) Incorrect dependency on CONFIG_NFT_FLOW_OFFLOAD, from Volodymyr Mytnyk.
3) Fix nft_socket from the output path, from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_socket: only do sk lookups when indev is available
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
====================
Link: https://lore.kernel.org/r/20220428142109.38726-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Apr 2022 16:50:29 +0000 (09:50 -0700)]
Merge tag 'gfs2-v5.18-rc4-fix2' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
- No short reads or writes upon glock contention
* tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: No short reads or writes upon glock contention
Dany Madden [Wed, 27 Apr 2022 23:51:46 +0000 (18:51 -0500)]
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
This reverts commit
723ad916134784b317b72f3f6cf0f7ba774e5dae
When client requests channel or ring size larger than what the server
can support the server will cap the request to the supported max. So,
the client would not be able to successfully request resources that
exceed the server limit.
Fixes: 723ad9161347 ("ibmvnic: Add ethtool private flag for driver-defined queue limits")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 27 Apr 2022 20:30:17 +0000 (23:30 +0300)]
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
The Time-Specified Departure feature is indeed mutually exclusive with
TX IP checksumming in ENETC, but TX checksumming in itself is broken and
was removed from this driver in commit
82728b91f124 ("enetc: Remove Tx
checksumming offload code").
The blamed commit declared NETIF_F_HW_CSUM in dev->features to comply
with software TSO's expectations, and still did the checksumming in
software by calling skb_checksum_help(). So there isn't any restriction
for the Time-Specified Departure feature.
However, enetc_setup_tc_txtime() doesn't understand that, and blindly
looks for NETIF_F_CSUM_MASK.
Instead of checking for things which can literally never happen in the
current code base, just remove the check and let the driver offload
tc-etf qdiscs.
Fixes: acede3c5dad5 ("net: enetc: declare NETIF_F_HW_CSUM and do it in software")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220427203017.1291634-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Wed, 27 Apr 2022 17:31:52 +0000 (10:31 -0700)]
ixgbe: ensure IPsec VF<->PF compatibility
The VF driver can forward any IPsec flags and such makes the function
is not extendable and prone to backward/forward incompatibility.
If new software runs on VF, it won't know that PF configured something
completely different as it "knows" only XFRM_OFFLOAD_INBOUND flag.
Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shannon Nelson <snelson@pensando.io>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220427173152.443102-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Apr 2022 16:37:56 +0000 (09:37 -0700)]
Merge tag 'xfs-5.18-fixes-1' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Dave Chinner:
- define buffer bit flags as unsigned to fix gcc-5 + c11 warnings
- remove redundant XFS fields from MAINTAINERS
- fix inode buffer locking order regression
* tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reorder iunlink remove operation in xfs_ifree
MAINTAINERS: update IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEM
xfs: convert buffer flags to unsigned.
Florian Fainelli [Wed, 27 Apr 2022 16:36:06 +0000 (09:36 -0700)]
MAINTAINERS: Update BNXT entry with firmware files
There appears to be a maintainer gap for BNXT TEE firmware files which
causes some patches to be missed. Update the entry for the BNXT Ethernet
controller with its companion firmware files.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220427163606.126154-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Trond Myklebust [Thu, 28 Apr 2022 15:08:13 +0000 (11:08 -0400)]
SUNRPC: Don't leak sockets in xs_local_connect()
If there is still a closed socket associated with the transport, then we
need to trigger an autoclose before we can set up a new connection.
Reported-by: wanghai (M) <wanghai38@huawei.com>
Fixes: f00432063db1 ("SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Rafael J. Wysocki [Thu, 28 Apr 2022 14:51:24 +0000 (16:51 +0200)]
Merge branch 'thermal-int340x'
Merge a fix for the attr.show callback prototype in the int340x thermal
driver (Kees Cook).
* thermal-int340x:
thermal: int340x: Fix attr.show callback prototype
Greg Kroah-Hartman [Thu, 28 Apr 2022 14:35:55 +0000 (16:35 +0200)]
Merge tag 'iio-fixes-for-5.18a' of https://git./linux/kernel/git/jic23/iio
Pull set of IIO fixes for 5.18 from Jonathan Cameron:
"1st set of IIO fixes for the 5.18 cycle
ad3552r:
- Fix a bug with error codes being stored in unsigned local variable.
- Fix IS_ERR when value is either NULL or not rather than ERR_PTR
ad5446
- Fix shifting of read_raw value.
ad5592r
- Fix missing return value being set for a fwnode property read.
ad7280a
- Wrong variable being used to set thresholds.
admv8818
- Kconfig dependency fix.
ak8975
- Missing regulator disable in error path.
bmi160
- Disable regulators in an error path.
dac5571
- Fix chip id detection for devices with OF bindings.
inv_icm42600
- Handle a case of a missing I2C NACK during initially configuration.
ltc2688
- Fix voltage scaling where integer part was written twice and
decimal part not at all.
scd4x
- Handle error before using value.
sx9310
- Device property parsing against indio_dev->dev.of_node which
hasn't been set yet.
sx9324
- Fix hardware gain related maths.
- Wrong defaults for precharge internal resistance register."
* tag 'iio-fixes-for-5.18a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: imu: inv_icm42600: Fix I2C init possible nack
iio: dac: ltc2688: fix voltage scale read
iio:dac:ad3552r: Fix an IS_ERR() vs NULL check
iio: sx9324: Fix default precharge internal resistance register
iio: dac: ad5446: Fix read_raw not returning set value
iio: magnetometer: ak8975: Fix the error handling in ak8975_power_on()
iio:proximity:sx9324: Fix hardware gain read/write
iio:proximity:sx_common: Fix device property parsing on DT systems
iio: adc:
ad7280a: Fix wrong variable used when setting thresholds.
iio:filter:admv8818: select REGMAP_SPI for ADMV8818
iio: dac: ad5592r: Fix the missing return value.
iio: dac:
dac5571: Fix chip id detection for OF devices
iio:imu:bmi160: disable regulator in error path
iio: scd4x: check return of scd4x_write_and_fetch
iio: dac: ad3552r: fix signedness bug in ad3552r_reset()
Florian Westphal [Thu, 28 Apr 2022 07:39:21 +0000 (09:39 +0200)]
netfilter: nft_socket: only do sk lookups when indev is available
Check if the incoming interface is available and NFT_BREAK
in case neither skb->sk nor input device are set.
Because nf_sk_lookup_slow*() assume packet headers are in the
'in' direction, use in postrouting is not going to yield a meaningful
result. Same is true for the forward chain, so restrict the use
to prerouting, input and output.
Use in output work if a socket is already attached to the skb.
Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching")
Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Rafael J. Wysocki [Thu, 28 Apr 2022 14:09:50 +0000 (16:09 +0200)]
Merge branch 'pm-cpuidle'
Merge cpuidle fixes for 5.18-rc5:
- Make intel_idle enable C1E promotion on all CPUs when C1E is
preferred to C1 (Artem Bityutskiy).
- Make C6 optimization on Sapphire Rapids added recently work as
expected if both C1E and C1 are "preferred" (Artem Bityutskiy).
* pm-cpuidle:
intel_idle: Fix SPR C6 optimization
intel_idle: Fix the 'preferred_cstates' module parameter
Namhyung Kim [Sat, 16 Apr 2022 00:40:48 +0000 (17:40 -0700)]
perf symbol: Remove arch__symbols__fixup_end()
Now the generic code can handle kallsyms fixup properly so no need to
keep the arch-functions anymore.
Fixes: 3cf6a32f3f2a4594 ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220416004048.1514900-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Sat, 16 Apr 2022 00:40:47 +0000 (17:40 -0700)]
perf symbol: Update symbols__fixup_end()
Now arch-specific functions all do the same thing. When it fixes the
symbol address it needs to check the boundary between the kernel image
and modules. For the last symbol in the previous region, it cannot
know the exact size as it's discarded already. Thus it just uses a
small page size (4096) and rounds it up like the last symbol.
Fixes: 3cf6a32f3f2a4594 ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220416004048.1514900-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Sat, 16 Apr 2022 00:40:46 +0000 (17:40 -0700)]
perf symbol: Pass is_kallsyms to symbols__fixup_end()
The symbol fixup is necessary for symbols in kallsyms since they don't
have size info. So we use the next symbol's address to calculate the
size. Now it's also used for user binaries because sometimes they miss
size for hand-written asm functions.
There's a arch-specific function to handle kallsyms differently but
currently it cannot distinguish kallsyms from others. Pass this
information explicitly to handle it properly. Note that those arch
functions will be moved to the generic function so I didn't added it to
the arch-functions.
Fixes: 3cf6a32f3f2a4594 ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220416004048.1514900-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Timothy Hayes [Thu, 21 Apr 2022 16:52:05 +0000 (17:52 +0100)]
perf test: Add perf_event_attr test for Arm SPE
Adds a perf_event_attr test for Arm SPE in which the presence of
physical addresses are checked when SPE unit is run with pa_enable=1.
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220421165205.117662-4-timothy.hayes@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Timothy Hayes [Thu, 21 Apr 2022 16:52:04 +0000 (17:52 +0100)]
perf arm-spe: Fix SPE events with phys addresses
This patch corrects a bug whereby SPE collection is invoked with
pa_enable=1 but synthesized events fail to show physical addresses.
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220421165205.117662-3-timothy.hayes@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Timothy Hayes [Thu, 21 Apr 2022 16:52:03 +0000 (17:52 +0100)]
perf arm-spe: Fix addresses of synthesized SPE events
This patch corrects a bug whereby synthesized events from SPE
samples are missing virtual addresses.
Fixes: 54f7815efef7fad9 ("perf arm-spe: Fill address info for samples")
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220421165205.117662-2-timothy.hayes@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 28 Apr 2022 09:31:09 +0000 (12:31 +0300)]
perf intel-pt: Fix timeless decoding with perf.data directory
Intel PT does not capture data in separate directories, so do not
use separate directory processing because it doesn't work for
timeless decoding. It also looks like it doesn't support one_mmap
handling.
Example:
Before:
# perf record --kcore -a -e intel_pt/tsc=0/k sleep 0.1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.799 MB perf.data ]
# perf script --itrace=bep | head
#
After:
# perf script --itrace=bep | head
perf 21073 [000] psb: psb offs: 0
ffffffffaa68faf4 native_write_msr+0x4 ([kernel.kallsyms])
perf 21073 [000] cbr: cbr: 45 freq: 4505 MHz (161%)
ffffffffaa68faf4 native_write_msr+0x4 ([kernel.kallsyms])
perf 21073 [000] 1 branches:k: 0 [unknown] ([unknown]) =>
ffffffffaa68faf6 native_write_msr+0x6 ([kernel.kallsyms])
perf 21073 [000] 1 branches:k:
ffffffffaa68faf8 native_write_msr+0x8 ([kernel.kallsyms]) =>
ffffffffaa61aab0 pt_config_start+0x60 ([kernel.kallsyms])
perf 21073 [000] 1 branches:k:
ffffffffaa61aabd pt_config_start+0x6d ([kernel.kallsyms]) =>
ffffffffaa61b8ad pt_event_start+0x27d ([kernel.kallsyms])
perf 21073 [000] 1 branches:k:
ffffffffaa61b8bb pt_event_start+0x28b ([kernel.kallsyms]) =>
ffffffffaa61ba60 pt_event_add+0x40 ([kernel.kallsyms])
perf 21073 [000] 1 branches:k:
ffffffffaa61ba76 pt_event_add+0x56 ([kernel.kallsyms]) =>
ffffffffaa880e86 event_sched_in+0xc6 ([kernel.kallsyms])
perf 21073 [000] 1 branches:k:
ffffffffaa880e9b event_sched_in+0xdb ([kernel.kallsyms]) =>
ffffffffaa880ea5 event_sched_in+0xe5 ([kernel.kallsyms])
perf 21073 [000] 1 branches:k:
ffffffffaa880eba event_sched_in+0xfa ([kernel.kallsyms]) =>
ffffffffaa880f96 event_sched_in+0x1d6 ([kernel.kallsyms])
perf 21073 [000] 1 branches:k:
ffffffffaa880fc8 event_sched_in+0x208 ([kernel.kallsyms]) =>
ffffffffaa880ec0 event_sched_in+0x100 ([kernel.kallsyms])
Fixes: bb6be405c4a2a5 ("perf session: Load data directory files for analysis")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20220428093109.274641-1-adrian.hunter@intel.com
Cc: Ian Rogers <irogers@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: linux-kernel@vger.kernel.org
Andreas Gruenbacher [Thu, 28 Apr 2022 12:51:33 +0000 (14:51 +0200)]
gfs2: No short reads or writes upon glock contention
Commit
00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered
I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to
allow dropping the inode glock while faulting in user buffers. When the
lock was dropped, a short result was returned to indicate that the
operation was interrupted.
As pointed out by Linus (see the link below), this behavior is broken
and the operations should always re-acquire the inode glock and resume
the operation instead.
Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/
Fixes: 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Catalin Marinas [Mon, 25 Apr 2022 15:18:33 +0000 (16:18 +0100)]
elf: Fix the arm64 MTE ELF segment name and value
Unfortunately, the name/value choice for the MTE ELF segment type
(PT_ARM_MEMTAG_MTE) was pretty poor: LOPROC+1 is already in use by
PT_AARCH64_UNWIND, as defined in the AArch64 ELF ABI
(https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst).
Update the ELF segment type value to LOPROC+2 and also change the define
to PT_AARCH64_MEMTAG_MTE to match the AArch64 ELF ABI namespace. The
AArch64 ELF ABI document is updating accordingly (segment type not
previously mentioned in the document).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 761b9b366cec ("elf: Introduce the ARM MTE ELF segment type")
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>
Link: https://lore.kernel.org/r/20220425151833.2603830-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Yang Yingliang [Mon, 25 Apr 2022 09:08:26 +0000 (17:08 +0800)]
iommu/dart: check return value after calling platform_get_resource()
It will cause null-ptr-deref in resource_size(), if platform_get_resource()
returns NULL, move calling resource_size() after devm_ioremap_resource() that
will check 'res' to avoid null-ptr-deref.
And use devm_platform_get_and_ioremap_resource() to simplify code.
Fixes: 46d1fb072e76 ("iommu/dart: Add DART iommu driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/20220425090826.2532165-1-yangyingliang@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Joerg Roedel [Thu, 28 Apr 2022 09:33:40 +0000 (11:33 +0200)]
Merge tag 'arm-smmu-fixes' of git://git./linux/kernel/git/will/linux into iommu/fixes
Arm SMMU fixes for 5.18
- Fix off-by-one in SMMUv3 SVA TLB invalidation
- Disable large mappings to workaround nvidia erratum
Lu Baolu [Sat, 23 Apr 2022 08:23:30 +0000 (16:23 +0800)]
iommu/vt-d: Drop stop marker messages
The page fault handling framework in the IOMMU core explicitly states
that it doesn't handle PCI PASID Stop Marker and the IOMMU drivers must
discard them before reporting faults. This handles Stop Marker messages
in prq_event_thread() before reporting events to the core.
The VT-d driver explicitly drains the pending page requests when a CPU
page table (represented by a mm struct) is unbound from a PASID according
to the procedures defined in the VT-d spec. The Stop Marker messages do
not need a response. Hence, it is safe to drop the Stop Marker messages
silently if any of them is found in the page request queue.
Fixes: d5b9e4bfe0d88 ("iommu/vt-d: Report prq to io-pgfault framework")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220421113558.3504874-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220423082330.3897867-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
David Stevens [Sun, 10 Apr 2022 01:35:33 +0000 (09:35 +0800)]
iommu/vt-d: Calculate mask for non-aligned flushes
Calculate the appropriate mask for non-size-aligned page selective
invalidation. Since psi uses the mask value to mask out the lower order
bits of the target address, properly flushing the iotlb requires using a
mask value such that [pfn, pfn+pages) all lie within the flushed
size-aligned region. This is not normally an issue because iova.c
always allocates iovas that are aligned to their size. However, iovas
which come from other sources (e.g. userspace via VFIO) may not be
aligned.
To properly flush the IOTLB, both the start and end pfns need to be
equal after applying the mask. That means that the most efficient mask
to use is the index of the lowest bit that is equal where all higher
bits are also equal. For example, if pfn=0x17f and pages=3, then
end_pfn=0x181, so the smallest mask we can use is 8. Any differences
above the highest bit of pages are due to carrying, so by xnor'ing pfn
and end_pfn and then masking out the lower order bits based on pages, we
get 0xffffff00, where the first set bit is the mask we want to use.
Fixes: 6fe1010d6d9c ("vfio/type1: DMA unmap chunking")
Cc: stable@vger.kernel.org
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220401022430.1262215-1-stevensd@google.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220410013533.3959168-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Paolo Abeni [Thu, 28 Apr 2022 08:18:51 +0000 (10:18 +0200)]
Merge tag 'for-net-2022-04-27' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix regression causing some HCI events to be discarded when they
shouldn't.
* tag 'for-net-2022-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
====================
Link: https://lore.kernel.org/r/20220427234031.1257281-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
David Jeffery [Wed, 27 Apr 2022 18:32:50 +0000 (14:32 -0400)]
scsi: target: pscsi: Set SCF_TREAT_READ_AS_NORMAL flag only if there is valid data
With tape devices, the SCF_TREAT_READ_AS_NORMAL flag is used by the target
subsystem to mark commands which have both data to return as well as sense
data. But with pscsi, SCF_TREAT_READ_AS_NORMAL can be set even if there is
no data to return. The SCF_TREAT_READ_AS_NORMAL flag causes the target core
to call iscsit data-in callbacks even if there is no data, which iscsit
does not support. This results in iscsit going into an error state
requiring recovery and being unable to complete the command to the
initiator.
This issue can be resolved by fixing pscsi to only set
SCF_TREAT_READ_AS_NORMAL if there is valid data to return alongside the
sense data.
Link: https://lore.kernel.org/r/20220427183250.291881-1-djeffery@redhat.com
Fixes: bd81372065fa ("scsi: target: transport should handle st FM/EOM/ILI reads")
Reported-by: Scott Hamilton <scott.hamilton@atos.net>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Yang Yingliang [Tue, 26 Apr 2022 12:52:31 +0000 (20:52 +0800)]
net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
Put device node in error path in fec_enet_init_stop_mode().
Fixes: 8a448bf832af ("net: ethernet: fec: move GPR register offset and bit into DT")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220426125231.375688-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Manish Chopra [Tue, 26 Apr 2022 15:39:13 +0000 (08:39 -0700)]
bnx2x: fix napi API usage sequence
While handling PCI errors (AER flow) driver tries to
disable NAPI [napi_disable()] after NAPI is deleted
[__netif_napi_del()] which causes unexpected system
hang/crash.
System message log shows the following:
=======================================
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
'need reset'
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
'need reset'
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor:
168e14e4 [ 3222.583557] EEH: PCI cmd/status register:
00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
[ 3222.583744] EEH: PCI-E 00:
00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10:
10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20:
00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
[ 3222.584079] EEH: PCI-E AER 00:
13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10:
00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20:
00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30:
00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor:
168e14e4 [ 3222.584491] EEH: PCI cmd/status register:
00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
[ 3222.584677] EEH: PCI-E 00:
00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10:
10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20:
00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
[ 3222.585011] EEH: PCI-E AER 00:
13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10:
00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20:
00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30:
00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->slot_reset()
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
--> driver unload
Uninterruptible tasks
=====================
crash> ps | grep UN
213 2 11
c000000004c89e00 UN 0.0 0 0 [eehd]
215 2 0
c000000004c80000 UN 0.0 0 0
[kworker/0:2]
2196 1 28
c000000004504f00 UN 0.1 15936 11136 wickedd
4287 1 9
c00000020d076800 UN 0.0 4032 3008 agetty
4289 1 20
c00000020d056680 UN 0.0 7232 3840 agetty
32423 2 26
c00000020038c580 UN 0.0 0 0
[kworker/26:3]
32871 4241 27
c0000002609ddd00 UN 0.1 18624 11648 sshd
32920 10130 16
c00000027284a100 UN 0.1 48512 12608 sendmail
33092 32987 0
c000000205218b00 UN 0.1 48512 12608 sendmail
33154 4567 16
c000000260e51780 UN 0.1 48832 12864 pickup
33209 4241 36
c000000270cb6500 UN 0.1 18624 11712 sshd
33473 33283 0
c000000205211480 UN 0.1 48512 12672 sendmail
33531 4241 37
c00000023c902780 UN 0.1 18624 11648 sshd
EEH handler hung while bnx2x sleeping and holding RTNL lock
===========================================================
crash> bt 213
PID: 213 TASK:
c000000004c89e00 CPU: 11 COMMAND: "eehd"
#0 [
c000000004d477e0] __schedule at
c000000000c70808
#1 [
c000000004d478b0] schedule at
c000000000c70ee0
#2 [
c000000004d478e0] schedule_timeout at
c000000000c76dec
#3 [
c000000004d479c0] msleep at
c0000000002120cc
#4 [
c000000004d479f0] napi_disable at
c000000000a06448
^^^^^^^^^^^^^^^^
#5 [
c000000004d47a30] bnx2x_netif_stop at
c0080000018dba94 [bnx2x]
#6 [
c000000004d47a60] bnx2x_io_slot_reset at
c0080000018a551c [bnx2x]
#7 [
c000000004d47b20] eeh_report_reset at
c00000000004c9bc
#8 [
c000000004d47b90] eeh_pe_report at
c00000000004d1a8
#9 [
c000000004d47c40] eeh_handle_normal_event at
c00000000004da64
And the sleeping source code
============================
crash> dis -ls
c000000000a06448
FILE: ../net/core/dev.c
LINE: 6702
6697 {
6698 might_sleep();
6699 set_bit(NAPI_STATE_DISABLE, &n->state);
6700
6701 while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
* 6702 msleep(1);
6703 while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
6704 msleep(1);
6705
6706 hrtimer_cancel(&n->timer);
6707
6708 clear_bit(NAPI_STATE_DISABLE, &n->state);
6709 }
EEH calls into bnx2x twice based on the system log above, first through
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
the following call chains:
bnx2x_io_error_detected()
+-> bnx2x_eeh_nic_unload()
+-> bnx2x_del_all_napi()
+-> __netif_napi_del()
bnx2x_io_slot_reset()
+-> bnx2x_netif_stop()
+-> bnx2x_napi_disable()
+->napi_disable()
Fix this by correcting the sequence of NAPI APIs usage,
that is delete the NAPI after disabling it.
Fixes: 7fa6f34081f1 ("bnx2x: AER revised")
Reported-by: David Christensen <drc@linux.vnet.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 26 Apr 2022 15:49:49 +0000 (18:49 +0300)]
tls: Skip tls_append_frag on zero copy size
Calling tls_append_frag when max_open_record_len == record->len might
add an empty fragment to the TLS record if the call happens to be on the
page boundary. Normally tls_append_frag coalesces the zero-sized
fragment to the previous one, but not if it's on page boundary.
If a resync happens then, the mlx5 driver posts dump WQEs in
tx_post_resync_dump, and the empty fragment may become a data segment
with byte_count == 0, which will confuse the NIC and lead to a CQE
error.
This commit fixes the described issue by skipping tls_append_frag on
zero size to avoid adding empty fragments. The fix is not in the driver,
because an empty fragment is hardly the desired behavior.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220426154949.159055-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 27 Apr 2022 22:18:39 +0000 (15:18 -0700)]
Merge https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2022-04-27
We've added 5 non-merge commits during the last 20 day(s) which contain
a total of 6 files changed, 34 insertions(+), 12 deletions(-).
The main changes are:
1) Fix xsk sockets when rx and tx are separately bound to the same umem, also
fix xsk copy mode combined with busy poll, from Maciej Fijalkowski.
2) Fix BPF tunnel/collect_md helpers with bpf_xmit lwt hook usage which triggered
a crash due to invalid metadata_dst access, from Eyal Birger.
3) Fix release of page pool in XDP live packet mode, from Toke Høiland-Jørgensen.
4) Fix potential NULL pointer dereference in kretprobes, from Adam Zabrocki.
(Masami & Steven preferred this small fix to be routed via bpf tree given it's
follow-up fix to Masami's rethook work that went via bpf earlier, too.)
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xsk: Fix possible crash when multiple sockets are created
kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set
bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook
bpf: Fix release of page_pool in BPF_PROG_RUN in test runner
xsk: Fix l2fwd for copy mode + busy poll combo
====================
Link: https://lore.kernel.org/r/20220427212748.9576-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Zyngier [Thu, 21 Apr 2022 14:38:10 +0000 (15:38 +0100)]
KVM: arm64: Inject exception on out-of-IPA-range translation fault
When taking a translation fault for an IPA that is outside of
the range defined by the hypervisor (between the HW PARange and
the IPA range), we stupidly treat it as an IO and forward the access
to userspace. Of course, userspace can't do much with it, and things
end badly.
Arguably, the guest is braindead, but we should at least catch the
case and inject an exception.
Check the faulting IPA against:
- the sanitised PARange: inject an address size fault
- the IPA size: inject an abort
Reported-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Prike Liang [Tue, 19 Apr 2022 09:22:34 +0000 (17:22 +0800)]
drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend
Without MMHUB clock gating being enabled then MMHUB will not disconnect
from DF and will result in DF C-state entry can't be accessed during S2idle
suspend, and eventually s0ix entry will be blocked.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Fri, 8 Apr 2022 11:51:34 +0000 (19:51 +0800)]
drm/amd/pm: fix the deadlock issue observed on SI
The adev->pm.mutx is already held at the beginning of
amdgpu_dpm_compute_clocks/amdgpu_dpm_enable_uvd/amdgpu_dpm_enable_vce.
But on their calling path, amdgpu_display_bandwidth_update will be
called and thus its sub functions amdgpu_dpm_get_sclk/mclk. They
will then try to acquire the same adev->pm.mutex and deadlock will
occur.
By placing amdgpu_display_bandwidth_update outside of adev->pm.mutex
protection(considering logically they do not need such protection) and
restructuring the call flow accordingly, we can eliminate the deadlock
issue. This comes with no real logics change.
Fixes: 3712e7a49459 ("drm/amd/pm: unified lock protections in amdgpu_dpm.c")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Arthur Marsh <arthur.marsh@internode.on.net>
Link: https://lore.kernel.org/all/9e689fea-6c69-f4b0-8dee-32c4cf7d8f9c@molgen.mpg.de/
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1957
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Miaoqian Lin [Thu, 21 Apr 2022 09:03:09 +0000 (17:03 +0800)]
drm/amd/display: Fix memory leak in dcn21_clock_source_create
When dcn20_clk_src_construct() fails, we need to release clk_src.
Fixes: 6f4e6361c3ff ("drm/amd/display: Add Renoir resource (v2)")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 28 Dec 2021 22:26:24 +0000 (17:26 -0500)]
drm/amdgpu: don't runtime suspend if there are displays attached (v3)
We normally runtime suspend when there are displays attached if they
are in the DPMS off state, however, if something wakes the GPU
we send a hotplug event on resume (in case any displays were connected
while the GPU was in suspend) which can cause userspace to light
up the displays again soon after they were turned off.
Prior to
commit
087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's."),
the driver took a runtime pm reference when the fbdev emulation was
enabled because we didn't implement proper shadowing support for
vram access when the device was off so the device never runtime
suspended when there was a console bound. Once that commit landed,
we now utilize the core fb helper implementation which properly
handles the emulation, so runtime pm now suspends in cases where it did
not before. Ultimately, we need to sort out why runtime suspend in not
working in this case for some users, but this should restore similar
behavior to before.
v2: move check into runtime_suspend
v3: wake ups -> wakeups in comment, retain pm_runtime behavior in
runtime_idle callback
Fixes: 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Link: https://lore.kernel.org/r/20220403132322.51c90903@darkstar.example.org/
Tested-by: Michele Ballabio <ballabio.m@gmail.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
David Yat Sin [Wed, 13 Apr 2022 15:37:53 +0000 (11:37 -0400)]
drm/amdkfd: CRIU add support for GWS queues
Add support to checkpoint/restore GWS (Global Wave Sync) queues.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Yat Sin [Mon, 18 Apr 2022 15:55:58 +0000 (11:55 -0400)]
drm/amdkfd: Fix GWS queue count
dqm->gws_queue_count and pdd->qpd.mapped_gws_queue need to be updated
each time the queue gets evicted.
Fixes: b8020b0304c8 ("drm/amdkfd: Enable over-subscription with >1 GWS queue")
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alexandru Elisei [Mon, 25 Apr 2022 14:55:30 +0000 (15:55 +0100)]
KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not set
kvm->arch.arm_pmu is set when userspace attempts to set the first PMU
attribute. As certain attributes are mandatory, arm_pmu ends up always
being set to a valid arm_pmu, otherwise KVM will refuse to run the VCPU.
However, this only happens if the VCPU has the PMU feature. If the VCPU
doesn't have the feature bit set, kvm->arch.arm_pmu will be left
uninitialized and equal to NULL.
KVM doesn't do ID register emulation for 32-bit guests and accesses to the
PMU registers aren't gated by the pmu_visibility() function. This is done
to prevent injecting unexpected undefined exceptions in guests which have
detected the presence of a hardware PMU. But even though the VCPU feature
is missing, KVM still attempts to emulate certain aspects of the PMU when
PMU registers are accessed. This leads to a NULL pointer dereference like
this one, which happens on an odroid-c4 board when running the
kvm-unit-tests pmu-cycle-counter test with kvmtool and without the PMU
feature being set:
[ 454.402699] Unable to handle kernel NULL pointer dereference at virtual address
0000000000000150
[ 454.405865] Mem abort info:
[ 454.408596] ESR = 0x96000004
[ 454.411638] EC = 0x25: DABT (current EL), IL = 32 bits
[ 454.416901] SET = 0, FnV = 0
[ 454.419909] EA = 0, S1PTW = 0
[ 454.423010] FSC = 0x04: level 0 translation fault
[ 454.427841] Data abort info:
[ 454.430687] ISV = 0, ISS = 0x00000004
[ 454.434484] CM = 0, WnR = 0
[ 454.437404] user pgtable: 4k pages, 48-bit VAs, pgdp=
000000000c924000
[ 454.443800] [
0000000000000150] pgd=
0000000000000000, p4d=
0000000000000000
[ 454.450528] Internal error: Oops:
96000004 [#1] PREEMPT SMP
[ 454.456036] Modules linked in:
[ 454.459053] CPU: 1 PID: 267 Comm: kvm-vcpu-0 Not tainted 5.18.0-rc4 #113
[ 454.465697] Hardware name: Hardkernel ODROID-C4 (DT)
[ 454.470612] pstate:
60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 454.477512] pc : kvm_pmu_event_mask.isra.0+0x14/0x74
[ 454.482427] lr : kvm_pmu_set_counter_event_type+0x2c/0x80
[ 454.487775] sp :
ffff80000a9839c0
[ 454.491050] x29:
ffff80000a9839c0 x28:
ffff000000a83a00 x27:
0000000000000000
[ 454.498127] x26:
0000000000000000 x25:
0000000000000000 x24:
ffff00000a510000
[ 454.505198] x23:
ffff000000a83a00 x22:
ffff000003b01000 x21:
0000000000000000
[ 454.512271] x20:
000000000000001f x19:
00000000000003ff x18:
0000000000000000
[ 454.519343] x17:
000000008003fe98 x16:
0000000000000000 x15:
0000000000000000
[ 454.526416] x14:
0000000000000000 x13:
0000000000000000 x12:
0000000000000000
[ 454.533489] x11:
000000008003fdbc x10:
0000000000009d20 x9 :
000000000000001b
[ 454.540561] x8 :
0000000000000000 x7 :
0000000000000d00 x6 :
0000000000009d00
[ 454.547633] x5 :
0000000000000037 x4 :
0000000000009d00 x3 :
0d09000000000000
[ 454.554705] x2 :
000000000000001f x1 :
0000000000000000 x0 :
0000000000000000
[ 454.561779] Call trace:
[ 454.564191] kvm_pmu_event_mask.isra.0+0x14/0x74
[ 454.568764] kvm_pmu_set_counter_event_type+0x2c/0x80
[ 454.573766] access_pmu_evtyper+0x128/0x170
[ 454.577905] perform_access+0x34/0x80
[ 454.581527] kvm_handle_cp_32+0x13c/0x160
[ 454.585495] kvm_handle_cp15_32+0x1c/0x30
[ 454.589462] handle_exit+0x70/0x180
[ 454.592912] kvm_arch_vcpu_ioctl_run+0x1c4/0x5e0
[ 454.597485] kvm_vcpu_ioctl+0x23c/0x940
[ 454.601280] __arm64_sys_ioctl+0xa8/0xf0
[ 454.605160] invoke_syscall+0x48/0x114
[ 454.608869] el0_svc_common.constprop.0+0xd4/0xfc
[ 454.613527] do_el0_svc+0x28/0x90
[ 454.616803] el0_svc+0x34/0xb0
[ 454.619822] el0t_64_sync_handler+0xa4/0x130
[ 454.624049] el0t_64_sync+0x18c/0x190
[ 454.627675] Code:
a9be7bfd 910003fd f9000bf3 52807ff3 (
b9415001)
[ 454.633714] ---[ end trace
0000000000000000 ]---
In this particular case, Linux hasn't detected the presence of a hardware
PMU because the PMU node is missing from the DTB, so userspace would have
been unable to set the VCPU PMU feature even if it attempted it. What
happens is that the 32-bit guest reads ID_DFR0, which advertises the
presence of the PMU, and when it tries to program a counter, it triggers
the NULL pointer dereference because kvm->arch.arm_pmu is NULL.
kvm-arch.arm_pmu was introduced by commit
46b187821472 ("KVM: arm64:
Keep a per-VM pointer to the default PMU"). Until that commit, this
error would be triggered instead:
[ 73.388140] ------------[ cut here ]------------
[ 73.388189] Unknown PMU version 0
[ 73.390420] WARNING: CPU: 1 PID: 264 at arch/arm64/kvm/pmu-emul.c:36 kvm_pmu_event_mask.isra.0+0x6c/0x74
[ 73.399821] Modules linked in:
[ 73.402835] CPU: 1 PID: 264 Comm: kvm-vcpu-0 Not tainted 5.17.0 #114
[ 73.409132] Hardware name: Hardkernel ODROID-C4 (DT)
[ 73.414048] pstate:
60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 73.420948] pc : kvm_pmu_event_mask.isra.0+0x6c/0x74
[ 73.425863] lr : kvm_pmu_event_mask.isra.0+0x6c/0x74
[ 73.430779] sp :
ffff80000a8db9b0
[ 73.434055] x29:
ffff80000a8db9b0 x28:
ffff000000dbaac0 x27:
0000000000000000
[ 73.441131] x26:
ffff000000dbaac0 x25:
00000000c600000d x24:
0000000000180720
[ 73.448203] x23:
ffff800009ffbe10 x22:
ffff00000b612000 x21:
0000000000000000
[ 73.455276] x20:
000000000000001f x19:
0000000000000000 x18:
ffffffffffffffff
[ 73.462348] x17:
000000008003fe98 x16:
0000000000000000 x15:
0720072007200720
[ 73.469420] x14:
0720072007200720 x13:
ffff800009d32488 x12:
00000000000004e6
[ 73.476493] x11:
00000000000001a2 x10:
ffff800009d32488 x9 :
ffff800009d32488
[ 73.483565] x8 :
00000000ffffefff x7 :
ffff800009d8a488 x6 :
ffff800009d8a488
[ 73.490638] x5 :
ffff0000f461a9d8 x4 :
0000000000000000 x3 :
0000000000000001
[ 73.497710] x2 :
0000000000000000 x1 :
0000000000000000 x0 :
ffff000000dbaac0
[ 73.504784] Call trace:
[ 73.507195] kvm_pmu_event_mask.isra.0+0x6c/0x74
[ 73.511768] kvm_pmu_set_counter_event_type+0x2c/0x80
[ 73.516770] access_pmu_evtyper+0x128/0x16c
[ 73.520910] perform_access+0x34/0x80
[ 73.524532] kvm_handle_cp_32+0x13c/0x160
[ 73.528500] kvm_handle_cp15_32+0x1c/0x30
[ 73.532467] handle_exit+0x70/0x180
[ 73.535917] kvm_arch_vcpu_ioctl_run+0x20c/0x6e0
[ 73.540489] kvm_vcpu_ioctl+0x2b8/0x9e0
[ 73.544283] __arm64_sys_ioctl+0xa8/0xf0
[ 73.548165] invoke_syscall+0x48/0x114
[ 73.551874] el0_svc_common.constprop.0+0xd4/0xfc
[ 73.556531] do_el0_svc+0x28/0x90
[ 73.559808] el0_svc+0x28/0x80
[ 73.562826] el0t_64_sync_handler+0xa4/0x130
[ 73.567054] el0t_64_sync+0x1a0/0x1a4
[ 73.570676] ---[ end trace
0000000000000000 ]---
[ 73.575382] kvm: pmu event creation failed -2
The root cause remains the same: kvm->arch.pmuver was never set to
something sensible because the VCPU feature itself was never set.
The odroid-c4 is somewhat of a special case, because Linux doesn't probe
the PMU. But the above errors can easily be reproduced on any hardware,
with or without a PMU driver, as long as userspace doesn't set the PMU
feature.
Work around the fact that KVM advertises a PMU even when the VCPU feature
is not set by gating all PMU emulation on the feature. The guest can still
access the registers without KVM injecting an undefined exception.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425145530.723858-1-alexandru.elisei@arm.com
Will Deacon [Wed, 27 Apr 2022 17:13:32 +0000 (18:13 +0100)]
KVM: arm64: Handle host stage-2 faults from 32-bit EL0
When pKVM is enabled, host memory accesses are translated by an identity
mapping at stage-2, which is populated lazily in response to synchronous
exceptions from 64-bit EL1 and EL0.
Extend this handling to cover exceptions originating from 32-bit EL0 as
well. Although these are very unlikely to occur in practice, as the
kernel typically ensures that user pages are initialised before mapping
them in, drivers could still map previously untouched device pages into
userspace and expect things to work rather than panic the system.
Cc: Quentin Perret <qperret@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220427171332.13635-1-will@kernel.org
Linus Torvalds [Wed, 27 Apr 2022 20:44:37 +0000 (13:44 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"Two patches.
Subsystems affected by this patch series: mm/kasan and mm/debug"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
docs: vm/page_owner: use literal blocks for param description
kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time
Akira Yokosawa [Wed, 27 Apr 2022 19:41:59 +0000 (12:41 -0700)]
docs: vm/page_owner: use literal blocks for param description
Sphinx generates hard-to-read lists of parameters at the bottom of the
page. Fix them by putting literal-block markers of "::" in front of
them.
Link: https://lkml.kernel.org/r/cfd3bcc0-b51d-0c68-c065-ca1c4c202447@gmail.com
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Fixes: 57f2b54a9379 ("Documentation/vm/page_owner.rst: update the documentation")
Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn>
Cc: Haowen Bai <baihaowen@meizu.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Alex Shi <seakeel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zqiang [Wed, 27 Apr 2022 19:41:56 +0000 (12:41 -0700)]
kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time
kasan_quarantine_remove_cache() is called in kmem_cache_shrink()/
destroy(). The kasan_quarantine_remove_cache() call is protected by
cpuslock in kmem_cache_destroy() to ensure serialization with
kasan_cpu_offline().
However the kasan_quarantine_remove_cache() call is not protected by
cpuslock in kmem_cache_shrink(). When a CPU is going offline and cache
shrink occurs at same time, the cpu_quarantine may be corrupted by
interrupt (per_cpu_remove_cache operation).
So add a cpu_quarantine offline flags check in per_cpu_remove_cache().
[akpm@linux-foundation.org: add comment, per Zqiang]
Link: https://lkml.kernel.org/r/20220414025925.2423818-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Filipe Manana [Thu, 21 Apr 2022 10:01:22 +0000 (11:01 +0100)]
btrfs: skip compression property for anything other than files and dirs
The compression property only has effect on regular files and directories
(so that it's propagated to files and subdirectories created inside a
directory). For any other inode type (symlink, fifo, device, socket),
it's pointless to set the compression property because it does nothing
and ends up unnecessarily wasting leaf space due to the pointless xattr
(75 or 76 bytes, depending on the compression value). Symlinks in
particular are very common (for example, I have almost 10k symlinks under
/etc, /usr and /var alone) and therefore it's worth to avoid wasting
leaf space with the compression xattr.
For example, the compression property can end up on a symlink or character
device implicitly, through inheritance from a parent directory
$ mkdir /mnt/testdir
$ btrfs property set /mnt/testdir compression lzo
$ ln -s yadayada /mnt/testdir/lnk
$ mknod /mnt/testdir/dev c 0 0
Or explicitly like this:
$ ln -s yadayda /mnt/lnk
$ setfattr -h -n btrfs.compression -v lzo /mnt/lnk
So skip the compression property on inodes that are neither a regular
file nor a directory.
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 21 Apr 2022 10:03:09 +0000 (11:03 +0100)]
btrfs: do not BUG_ON() on failure to update inode when setting xattr
We are doing a BUG_ON() if we fail to update an inode after setting (or
clearing) a xattr, but there's really no reason to not instead simply
abort the transaction and return the error to the caller. This should be
a rare error because we have previously reserved enough metadata space to
update the inode and the delayed inode should have already been setup, so
an -ENOSPC or -ENOMEM, which are the possible errors, are very unlikely to
happen.
So replace the BUG_ON()s with a transaction abort.
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 21 Apr 2022 09:56:39 +0000 (10:56 +0100)]
btrfs: always log symlinks in full mode
On Linux, empty symlinks are invalid, and attempting to create one with
the system call symlink(2) results in an -ENOENT error and this is
explicitly documented in the man page.
If we rename a symlink that was created in the current transaction and its
parent directory was logged before, we actually end up logging the symlink
without logging its content, which is stored in an inline extent. That
means that after a power failure we can end up with an empty symlink,
having no content and an i_size of 0 bytes.
It can be easily reproduced like this:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/testdir
$ sync
# Create a file inside the directory and fsync the directory.
$ touch /mnt/testdir/foo
$ xfs_io -c "fsync" /mnt/testdir
# Create a symlink inside the directory and then rename the symlink.
$ ln -s /mnt/testdir/foo /mnt/testdir/bar
$ mv /mnt/testdir/bar /mnt/testdir/baz
# Now fsync again the directory, this persist the log tree.
$ xfs_io -c "fsync" /mnt/testdir
<power failure>
$ mount /dev/sdc /mnt
$ stat -c %s /mnt/testdir/baz
0
$ readlink /mnt/testdir/baz
$
Fix this by always logging symlinks in full mode (LOG_INODE_ALL), so that
their content is also logged.
A test case for fstests will follow.
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Chung-Chiang Cheng [Fri, 15 Apr 2022 08:04:06 +0000 (16:04 +0800)]
btrfs: do not allow compression on nodatacow files
Compression and nodatacow are mutually exclusive. A similar issue was
fixed by commit
f37c563bab429 ("btrfs: add missing check for nocow and
compression inode flags"). Besides ioctl, there is another way to
enable/disable/reset compression directly via xattr. The following
steps will result in a invalid combination.
$ touch bar
$ chattr +C bar
$ lsattr bar
---------------C-- bar
$ setfattr -n btrfs.compression -v zstd bar
$ lsattr bar
--------c------C-- bar
To align with the logic in check_fsflags, nocompress will also be
unacceptable after this patch, to prevent mix any compression-related
options with nodatacow.
$ touch bar
$ chattr +C bar
$ lsattr bar
---------------C-- bar
$ setfattr -n btrfs.compression -v zstd bar
setfattr: bar: Invalid argument
$ setfattr -n btrfs.compression -v no bar
setfattr: bar: Invalid argument
When both compression and nodatacow are enabled, then
btrfs_run_delalloc_range prefers nodatacow and no compression happens.
Reported-by: Jayce Lin <jaycelin@synology.com>
CC: stable@vger.kernel.org # 5.10.x: e6f9d6964802: btrfs: export a helper for compression hard check
CC: stable@vger.kernel.org # 5.10.x
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Chung-Chiang Cheng [Fri, 15 Apr 2022 08:04:05 +0000 (16:04 +0800)]
btrfs: export a helper for compression hard check
inode_can_compress will be used outside of inode.c to check the
availability of setting compression flag by xattr. This patch moves
this function as an internal helper and renames it to
btrfs_inode_can_compress.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Tejun Heo [Wed, 27 Apr 2022 19:49:12 +0000 (09:49 -1000)]
Revert "block: inherit request start time from bio for BLK_CGROUP"
This reverts commit
0006707723233cb2a9a23ca19fc3d0864835704c. It has a
couple problems:
* bio_issue_time() is stored in bio->bi_issue truncated to 51 bits. This
overflows in slightly over 26 days. Setting rq->io_start_time_ns with it
means that io duration calculation would yield >26days after 26 days of
uptime. This, for example, confuses kyber making it cause high IO
latencies.
* rq->io_start_time_ns should record the time that the IO is issued to the
device so that on-device latency can be measured. However,
bio_issue_time() is set before the bio goes through the rq-qos controllers
(wbt, iolatency, iocost), so when the bio gets throttled in any of the
mechanisms, the measured latencies make no sense - on-device latencies end
up higher than request-alloc-to-completion latencies.
We'll need a smarter way to avoid calling ktime_get_ns() repeatedly
back-to-back. For now, let's revert the commit.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v5.16+
Link: https://lore.kernel.org/r/YmmeOLfo5lzc+8yI@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Artem Bityutskiy [Wed, 27 Apr 2022 06:08:53 +0000 (09:08 +0300)]
intel_idle: Fix SPR C6 optimization
The Sapphire Rapids (SPR) C6 optimization was added to the end of the
'spr_idle_state_table_update()' function. However, the function has a
'return' which may happen before the optimization has a chance to run.
And this may prevent the optimization from happening.
This is an unlikely scenario, but possible if user boots with, say,
the 'intel_idle.preferred_cstates=6' kernel boot option.
This patch fixes the issue by eliminating the problematic 'return'
statement.
Fixes: 3a9cf77b60dc ("intel_idle: add core C6 optimization for SPR")
Suggested-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Artem Bityutskiy [Wed, 27 Apr 2022 06:08:52 +0000 (09:08 +0300)]
intel_idle: Fix the 'preferred_cstates' module parameter
Problem description.
When user boots kernel up with the 'intel_idle.preferred_cstates=4' option,
we enable C1E and disable C1 states on Sapphire Rapids Xeon (SPR). In order
for C1E to work on SPR, we have to enable the C1E promotion bit on all
CPUs. However, we enable it only on one CPU.
Fix description.
The 'intel_idle' driver already has the infrastructure for disabling C1E
promotion on every CPU. This patch uses the same infrastructure for
enabling C1E promotion on every CPU. It changes the boolean
'disable_promotion_to_c1e' variable to a tri-state 'c1e_promotion'
variable.
Tested on a 2-socket SPR system. I verified the following combinations:
* C1E promotion enabled and disabled in BIOS.
* Booted with and without the 'intel_idle.preferred_cstates=4' kernel
argument.
In all 4 cases C1E promotion was correctly set on all CPUs.
Also tested on an old Broadwell system, just to make sure it does not cause
a regression. C1E promotion was correctly disabled on that system, both C1
and C1E were exposed (as expected).
Fixes: da0e58c038e6 ("intel_idle: add 'preferred_cstates' module argument")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rob Herring [Tue, 26 Apr 2022 13:35:08 +0000 (08:35 -0500)]
dt-bindings: leds-mt6360: Drop redundant 'unevaluatedProperties'
The binding has both 'unevaluatedProperties: false' and
'additionalProperties: false' which is redundant. 'additionalProperties'
is the stricter of the two, so drop 'unevaluatedProperties'.
Fixes: e05cab34e417 ("dt-bindings: leds: Add bindings for MT6360 LED")
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220426133508.1849580-1-robh@kernel.org
Krzysztof Kozlowski [Wed, 27 Apr 2022 06:58:02 +0000 (08:58 +0200)]
dt-bindings: ufs: cdns,ufshc: Add power-domains
The Cadence UFS controller can be part of power domain (as it is in
example DTS of TI J721e UFS Host Controller Glue), so allow such
property.
Reported-by: Rob Herring <robh@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220427065802.110402-1-krzysztof.kozlowski@linaro.org
Rafael J. Wysocki [Wed, 27 Apr 2022 18:20:03 +0000 (20:20 +0200)]
Merge tag 'cpufreq-arm-fixes-5.18-rc5' of git://git./linux/kernel/git/vireshk/pm
Pull ARM cpufreq fixes for 5.18-rc5 from Viresh Kumar:
"- Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov and
Vladimir Zapolskiy).
- Fix memory leak with the Sun501 driver (Xiaobing Luo)."
* tag 'cpufreq-arm-fixes-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
cpufreq: qcom-hw: provide online/offline operations
cpufreq: qcom-hw: fix the opp entries refcounting
cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
Mikulas Patocka [Wed, 27 Apr 2022 15:26:40 +0000 (11:26 -0400)]
hex2bin: fix access beyond string end
If we pass too short string to "hex2bin" (and the string size without
the terminating NUL character is even), "hex2bin" reads one byte after
the terminating NUL character. This patch fixes it.
Note that hex_to_bin returns -1 on error and hex2bin return -EINVAL on
error - so we can't just return the variable "hi" or "lo" on error.
This inconsistency may be fixed in the next merge window, but for the
purpose of fixing this bug, we just preserve the existing behavior and
return -1 and -EINVAL.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Fixes: b78049831ffe ("lib: add error checking to hex2bin")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikulas Patocka [Mon, 25 Apr 2022 12:07:48 +0000 (08:07 -0400)]
hex2bin: make the function hex_to_bin constant-time
The function hex2bin is used to load cryptographic keys into device
mapper targets dm-crypt and dm-integrity. It should take constant time
independent on the processed data, so that concurrently running
unprivileged code can't infer any information about the keys via
microarchitectural convert channels.
This patch changes the function hex_to_bin so that it contains no
branches and no memory accesses.
Note that this shouldn't cause performance degradation because the size
of the new function is the same as the size of the old function (on
x86-64) - and the new function causes no branch misprediction penalties.
I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64
i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32
sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64
powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are
no branches in the generated code.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shin'ichiro Kawasaki [Tue, 12 Apr 2022 07:56:36 +0000 (16:56 +0900)]
bus: fsl-mc-msi: Fix MSI descriptor mutex lock for msi_first_desc()
Commit
e8604b1447b4 introduced a call to the helper function
msi_first_desc(), which needs MSI descriptor mutex lock before
call. However, the required mutex lock was not added. This results in
lockdep assertion:
WARNING: CPU: 4 PID: 119 at kernel/irq/msi.c:274 msi_first_desc+0xd0/0x10c
msi_first_desc+0xd0/0x10c
fsl_mc_msi_domain_alloc_irqs+0x7c/0xc0
fsl_mc_populate_irq_pool+0x80/0x3cc
Fix this by adding the mutex lock and unlock around the function call.
Fixes: e8604b1447b4 ("bus: fsl-mc-msi: Simplify MSI descriptor handling")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220412075636.755454-1-shinichiro.kawasaki@wdc.com
Minchan Kim [Wed, 27 Apr 2022 17:21:51 +0000 (10:21 -0700)]
kernfs: fix NULL dereferencing in kernfs_remove
kernfs_remove supported NULL kernfs_node param to bail out but revent
per-fs lock change introduced regression that dereferencing the
param without NULL check so kernel goes crash.
This patch checks the NULL kernfs_node in kernfs_remove and if so,
just return.
Quote from bug report by Jirka
```
The bug is triggered by running NAS Parallel benchmark suite on
SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
log:
[ 247.035564] BUG: kernel NULL pointer dereference, address:
0000000000000008
[ 247.036009] #PF: supervisor read access in kernel mode
[ 247.036009] #PF: error_code(0x0000) - not-present page
[ 247.036009] PGD 0 P4D 0
[ 247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
[ 247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
5.16.
0393c3714081a53795bbff0e985d24146def6f57f+ #16
[ 247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
2.0b 03/07/2018
[ 247.058060] RIP: 0010:kernfs_remove+0x8/0x50
[ 247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
c4 60
[ 247.058060] RSP: 0018:
ffffbbfa48a27e48 EFLAGS:
00010246
[ 247.058060] RAX:
0000000000000001 RBX:
ffffffff89e31f98 RCX:
0000000080200018
[ 247.058060] RDX:
0000000080200019 RSI:
fffff6760786c900 RDI:
0000000000000000
[ 247.058060] RBP:
ffffffff89e31f98 R08:
ffff926b61b24d00 R09:
0000000080200018
[ 247.122048] R10:
ffff926b61b24d00 R11:
ffff926a8040c000 R12:
ffff927bd09a2000
[ 247.122048] R13:
ffffffff89e31fa0 R14:
dead000000000122 R15:
dead000000000100
[ 247.122048] FS:
00007f01be0a8c40(0000) GS:
ffff926fa8e40000(0000)
knlGS:
0000000000000000
[ 247.122048] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 247.122048] CR2:
0000000000000008 CR3:
00000001145c6003 CR4:
00000000007706e0
[ 247.122048] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 247.122048] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 247.122048] PKRU:
55555554
[ 247.122048] Call Trace:
[ 247.122048] <TASK>
[ 247.122048] rdt_kill_sb+0x29d/0x350
[ 247.122048] deactivate_locked_super+0x36/0xa0
[ 247.122048] cleanup_mnt+0x131/0x190
[ 247.122048] task_work_run+0x5c/0x90
[ 247.122048] exit_to_user_mode_prepare+0x229/0x230
[ 247.122048] syscall_exit_to_user_mode+0x18/0x40
[ 247.122048] do_syscall_64+0x48/0x90
[ 247.122048] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 247.122048] RIP: 0033:0x7f01be2d735b
```
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215696
Link: https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BEO3cw@mail.gmail.com/
Fixes: 393c3714081a (kernfs: switch global kernfs_rwsem lock to per-fs lock)
Cc: stable@vger.kernel.org
Reported-by: Jirka Hladky <jhladky@redhat.com>
Tested-by: Jirka Hladky <jhladky@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Link: https://lore.kernel.org/r/20220427172152.3505364-1-minchan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Wed, 27 Apr 2022 17:30:29 +0000 (10:30 -0700)]
Merge tag 'zonefs-5.18-rc5' of git://git./linux/kernel/git/dlemoal/zonefs
Pull zonefs fixes from Damien Le Moal:
"Two fixes for rc5:
- Fix inode initialization to make sure that the inode flags are all
cleared.
- Use zone reset operation instead of close to make sure that the
zone of an empty sequential file in never in an active state after
closing the file"
* tag 'zonefs-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Fix management of open zones
zonefs: Clear inode information flags on inode creation
Kuogee Hsieh [Tue, 26 Apr 2022 21:12:14 +0000 (14:12 -0700)]
drm/msm/dp: remove fail safe mode related code
Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.
However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.
After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0
but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8
-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20
-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8
-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8
Changes in v2:
-- re text commit title
-- remove all fail safe mode
Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes
Changes in v5:
-- to=dianders@chromium.org
Changes in v6:
-- fix Fixes commit ID
Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Linus Torvalds [Wed, 27 Apr 2022 17:14:52 +0000 (10:14 -0700)]
Merge tag 'mtd/fixes-for-5.18-rc5' of git://git./linux/kernel/git/mtd/linux
Pull MTD fixes from Miquel Raynal:
"Core fix:
- Fix a possible data corruption of the 'part' field in mtd_info
Rawnand fixes:
- Fix the check on the return value of wait_for_completion_timeout
- Fix wrong ECC parameters for mt7622
- Fix a possible memory corruption that might panic in the Qcom
driver"
* tag 'mtd/fixes-for-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: qcom: fix memory corruption that causes panic
mtd: fix 'part' field data corruption in mtd_info
mtd: rawnand: Fix return value check of wait_for_completion_timeout
mtd: rawnand: fix ecc parameters for mt7622
Jakub Kicinski [Tue, 26 Apr 2022 17:57:23 +0000 (10:57 -0700)]
Add Eric Dumazet to networking maintainers
Welcome Eric!
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/r/20220426175723.417614-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willy Tarreau [Tue, 26 Apr 2022 20:41:05 +0000 (23:41 +0300)]
floppy: disable FDRAWCMD by default
Minh Yuan reported a concurrency use-after-free issue in the floppy code
between raw_cmd_ioctl and seek_interrupt.
[ It turns out this has been around, and that others have reported the
KASAN splats over the years, but Minh Yuan had a reproducer for it and
so gets primary credit for reporting it for this fix - Linus ]
The problem is, this driver tends to break very easily and nowadays,
nobody is expected to use FDRAWCMD anyway since it was used to
manipulate non-standard formats. The risk of breaking the driver is
higher than the risk presented by this race, and accessing the device
requires privileges anyway.
Let's just add a config option to completely disable this ioctl and
leave it disabled by default. Distros shouldn't use it, and only those
running on antique hardware might need to enable it.
Link: https://lore.kernel.org/all/000000000000b71cdd05d703f6bf@google.com/
Link: https://lore.kernel.org/lkml/CAKcFiNC=MfYVW-Jt9A3=FPJpTwCD2PL_ULNCpsCVE5s8ZeBQgQ@mail.gmail.com
Link: https://lore.kernel.org/all/CAEAjamu1FRhz6StCe_55XY5s389ZP_xmCF69k987En+1z53=eg@mail.gmail.com
Reported-by: Minh Yuan <yuanmingbuaa@gmail.com>
Reported-by: syzbot+8e8958586909d62b6840@syzkaller.appspotmail.com
Reported-by: cruise k <cruise4k@gmail.com>
Reported-by: Kyungtae Kim <kt0755@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Tested-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tom Rix [Sat, 23 Apr 2022 12:30:48 +0000 (08:30 -0400)]
platform/x86/intel: pmc/core: change pmc_lpm_modes to static
Sparse reports this issue
core.c: note: in included file:
core.h:239:12: warning: symbol 'pmc_lpm_modes' was not declared. Should it be static?
Global variables should not be defined in headers. This only works
because core.h is only included by core.c. Single file use
variables should be static, so change its storage-class specifier
to static.
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20220423123048.591405-1-trix@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
David E. Box [Wed, 20 Apr 2022 15:56:22 +0000 (08:56 -0700)]
platform/x86/intel/sdsi: Fix bug in multi packet reads
Fix bug that added an offset to the mailbox addr during multi-packet
reads. Did not affect current ABI since it doesn't support multi-packet
transactions.
Fixes: 2546c6000430 ("platform/x86: Add Intel Software Defined Silicon driver")
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20220420155622.1763633-4-david.e.box@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
David E. Box [Wed, 20 Apr 2022 15:56:21 +0000 (08:56 -0700)]
platform/x86/intel/sdsi: Poll on ready bit for writes
Due to change in firmware flow, update mailbox writes to poll on ready bit
instead of run_busy bit. This change makes the polling method consistent
for both writes and reads, which also uses the ready bit.
Fixes: 2546c6000430 ("platform/x86: Add Intel Software Defined Silicon driver")
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20220420155622.1763633-3-david.e.box@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
David E. Box [Wed, 20 Apr 2022 15:56:20 +0000 (08:56 -0700)]
platform/x86/intel/sdsi: Handle leaky bucket
To prevent an agent from indefinitely holding the mailbox firmware has
implemented a leaky bucket algorithm. Repeated access to the mailbox may
now incur a delay of up to 2.1 seconds. Add a retry loop that tries for
up to 2.5 seconds to acquire the mailbox.
Fixes: 2546c6000430 ("platform/x86: Add Intel Software Defined Silicon driver")
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20220420155622.1763633-2-david.e.box@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Srinivas Pandruvada [Wed, 27 Apr 2022 10:03:04 +0000 (03:03 -0700)]
platform/x86: intel-uncore-freq: Prevent driver loading in guests
Loading this driver in guests results in unchecked MSR access error for
MSR 0x620.
There is no use of reading and modifying package/die scope uncore MSRs
in guests. So check for CPU feature X86_FEATURE_HYPERVISOR to prevent
loading of this driver in guests.
Fixes: dbce412a7733 ("platform/x86/intel-uncore-freq: Split common and enumeration part")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215870
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20220427100304.2562990-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Darryn Anton Jordan [Thu, 14 Apr 2022 14:24:43 +0000 (16:24 +0200)]
platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboard
This works on my system.
Signed-off-by: Darryn Anton Jordan <darrynjordan@icloud.com>
Acked-by: Thomas Weißschuh <thomas@weissschuh.net>
Link: https://lore.kernel.org/r/Ylguq87YG+9L3foV@hark
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Gabriele Mazzotta [Tue, 26 Apr 2022 12:08:27 +0000 (14:08 +0200)]
platform/x86: dell-laptop: Add quirk entry for Latitude 7520
The Latitude 7520 supports AC timeouts, but it has no KBD_LED_AC_TOKEN
and so changes to stop_timeout appear to have no effect if the laptop
is plugged in.
Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Acked-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/20220426120827.12363-1-gabriele.mzt@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Hans de Goede [Wed, 27 Apr 2022 11:49:56 +0000 (13:49 +0200)]
platform/x86: asus-wmi: Fix driver not binding when fan curve control probe fails
Before this commit fan_curve_check_present() was trying to not cause
the probe to fail on devices without fan curve control by testing for
known error codes returned by asus_wmi_evaluate_method_buf().
Checking for ENODATA or ENODEV, with the latter being returned by this
function when an ACPI integer with a value of ASUS_WMI_UNSUPPORTED_METHOD
is returned. But for other ACPI integer returns this function just returns
them as is, including the ASUS_WMI_DSTS_UNKNOWN_BIT value of 2.
On the Asus U36SD ASUS_WMI_DSTS_UNKNOWN_BIT gets returned, leading to:
asus-nb-wmi: probe of asus-nb-wmi failed with error 2
Instead of playing whack a mole with error codes here, simply treat all
errors as there not being any fan curves, fixing the driver no longer
loading on the Asus U36SD laptop.
Fixes: e3d13da7f77d ("platform/x86: asus-wmi: Fix regression when probing for fan curve control")
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2079125
Cc: Luke D. Jones <luke@ljones.dev>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220427114956.332919-1-hdegoede@redhat.com
Dan Carpenter [Wed, 13 Apr 2022 07:37:44 +0000 (10:37 +0300)]
platform/x86: asus-wmi: Potential buffer overflow in asus_wmi_evaluate_method_buf()
This code tests for if the obj->buffer.length is larger than the buffer
but then it just does the memcpy() anyway.
Fixes: 0f0ac158d28f ("platform/x86: asus-wmi: Add support for custom fan curves")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220413073744.GB8812@kili
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tejun Heo [Wed, 27 Apr 2022 05:01:01 +0000 (19:01 -1000)]
iocost: don't reset the inuse weight of under-weighted debtors
When an iocg is in debt, its inuse weight is owned by debt handling and
should stay at 1. This invariant was broken when determining the amount of
surpluses at the beginning of donation calculation - when an iocg's
hierarchical weight is too low, the iocg is excluded from donation
calculation and its inuse is reset to its active regardless of its
indebtedness, triggering warnings like the following:
WARNING: CPU: 5 PID: 0 at block/blk-iocost.c:1416 iocg_kick_waitq+0x392/0x3a0
...
RIP: 0010:iocg_kick_waitq+0x392/0x3a0
Code: 00 00 be ff ff ff ff 48 89 4d a8 e8 98 b2 70 00 48 8b 4d a8 85 c0 0f 85 4a fe ff ff 0f 0b e9 43 fe ff ff 0f 0b e9 4d fe ff ff <0f> 0b e9 50 fe ff ff e8 a2 ae 70 00 66 90 0f 1f 44 00 00 55 48 89
RSP: 0018:
ffffc90000200d08 EFLAGS:
00010016
...
<IRQ>
ioc_timer_fn+0x2e0/0x1470
call_timer_fn+0xa1/0x2c0
...
As this happens only when an iocg's hierarchical weight is negligible, its
impact likely is limited to triggering the warnings. Fix it by skipping
resetting inuse of under-weighted debtors.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rik van Riel <riel@surriel.com>
Fixes: c421a3eb2e27 ("blk-iocost: revamp debt handling")
Cc: stable@vger.kernel.org # v5.10+
Link: https://lore.kernel.org/r/YmjODd4aif9BzFuO@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Volodymyr Mytnyk [Wed, 27 Apr 2022 11:09:00 +0000 (14:09 +0300)]
netfilter: conntrack: fix udp offload timeout sysctl
`nf_flowtable_udp_timeout` sysctl option is available only
if CONFIG_NFT_FLOW_OFFLOAD enabled. But infra for this flow
offload UDP timeout was added under CONFIG_NF_FLOW_TABLE
config option. So, if you have CONFIG_NFT_FLOW_OFFLOAD
disabled and CONFIG_NF_FLOW_TABLE enabled, the
`nf_flowtable_udp_timeout` is not present in sysfs.
Please note, that TCP flow offload timeout sysctl option
is present even CONFIG_NFT_FLOW_OFFLOAD is disabled.
I suppose it was a typo in commit that adds UDP flow offload
timeout and CONFIG_NF_FLOW_TABLE should be used instead.
Fixes: 975c57504da1 ("netfilter: conntrack: Introduce udp offload timeout configuration")
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 25 Apr 2022 09:47:11 +0000 (11:47 +0200)]
netfilter: nf_conntrack_tcp: re-init for syn packets only
Jaco Kroon reported tcp problems that Eric Dumazet and Neal Cardwell
pinpointed to nf_conntrack tcp_in_window() bug.
tcp trace shows following sequence:
I > R Flags [S], seq
3451342529, win 62580, options [.. tfo [|tcp]>
R > I Flags [S.], seq
2699962254, ack
3451342530, win 65535, options [..]
R > I Flags [P.], seq 1:89, ack 1, [..]
Note 3rd ACK is from responder to initiator so following branch is taken:
} else if (((state->state == TCP_CONNTRACK_SYN_SENT
&& dir == IP_CT_DIR_ORIGINAL)
|| (state->state == TCP_CONNTRACK_SYN_RECV
&& dir == IP_CT_DIR_REPLY))
&& after(end, sender->td_end)) {
... because state == TCP_CONNTRACK_SYN_RECV and dir is REPLY.
This causes the scaling factor to be reset to 0: window scale option
is only present in syn(ack) packets. This in turn makes nf_conntrack
mark valid packets as out-of-window.
This was always broken, it exists even in original commit where
window tracking was added to ip_conntrack (nf_conntrack predecessor)
in 2.6.9-rc1 kernel.
Restrict to 'tcph->syn', just like the 3rd condtional added in
commit
82b72cb94666 ("netfilter: conntrack: re-init state for retransmitted syn-ack").
Upon closer look, those conditionals/branches can be merged:
Because earlier checks prevent syn-ack from showing up in
original direction, the 'dir' checks in the conditional quoted above are
redundant, remove them. Return early for pure syn retransmitted in reply
direction (simultaneous open).
Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
Reported-by: Jaco Kroon <jaco@uls.co.za>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ajit Kumar Pandey [Tue, 26 Apr 2022 18:33:57 +0000 (13:33 -0500)]
ASoC: SOF: Fix NULL pointer exception in sof_pci_probe callback
We are accessing "desc->ops" in sof_pci_probe without checking "desc"
pointer. This results in NULL pointer exception if pci_id->driver_data
i.e desc pointer isn't defined in sof device probe:
BUG: kernel NULL pointer dereference, address:
0000000000000060
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:sof_pci_probe+0x1e/0x17f [snd_sof_pci]
Code: Unable to access opcode bytes at RIP 0xffffffffc043dff4.
RSP: 0018:
ffffac4b03b9b8d8 EFLAGS:
00010246
Add NULL pointer check for sof_dev_desc pointer to avoid such exception.
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Ajit Kumar Pandey <AjitKumar.Pandey@amd.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20220426183357.102155-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Zev Weiss [Wed, 27 Apr 2022 03:51:09 +0000 (20:51 -0700)]
hwmon: (pmbus) delta-ahe50dc-fan: work around hardware quirk
CLEAR_FAULTS commands can apparently sometimes trigger catastrophic
power output glitches on the ahe-50dc, so block them from being sent
at all.
Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220427035109.3819-1-zev@bewilderbeest.net
Fixes: d387d88ed045 ("hwmon: (pmbus) Add Delta AHE-50DC fan control module driver")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Sven Schnelle [Mon, 25 Apr 2022 12:17:42 +0000 (14:17 +0200)]
s390: disable -Warray-bounds
gcc-12 shows a lot of array bound warnings on s390. This is caused
by the S390_lowcore macro which uses a hardcoded address of 0.
Wrapping that with absolute_pointer() works, but gcc no longer knows
that a 12 bit displacement is sufficient to access lowcore. So it
emits instructions like 'lghi %r1,0; l %rx,xxx(%r1)' instead of a
single load/store instruction. As s390 stores variables often
read/written in lowcore, this is considered problematic. Therefore
disable -Warray-bounds on s390 for gcc-12 for the time being, until
there is a better solution.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://lore.kernel.org/r/yt9dzgkelelc.fsf@linux.ibm.com
Link: https://lore.kernel.org/r/20220422134308.1613610-1-svens@linux.ibm.com
Link: https://lore.kernel.org/r/20220425121742.3222133-1-svens@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
David S. Miller [Wed, 27 Apr 2022 09:58:39 +0000 (10:58 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net
-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-04-26
This series contains updates to ice driver only.
Ivan Vecera removes races related to VF message processing by changing
mutex_trylock() call to mutex_lock() and moving additional operations
to occur under mutex.
Petr Oros increases wait time after firmware flash as current time is
not sufficient.
Jake resolves a use-after-free issue for mailbox snapshot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jens Axboe [Wed, 27 Apr 2022 01:34:57 +0000 (19:34 -0600)]
io_uring: check reserved fields for recv/recvmsg
We should check unused fields for non-zero and -EINVAL if they are set,
making it consistent with other opcodes.
Fixes: aa1fa28fc73e ("io_uring: add support for recvmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 27 Apr 2022 01:34:11 +0000 (19:34 -0600)]
io_uring: check reserved fields for send/sendmsg
We should check unused fields for non-zero and -EINVAL if they are set,
making it consistent with other opcodes.
Fixes: 0fa03c624d8f ("io_uring: add support for sendmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Martin Blumenstingl [Mon, 25 Apr 2022 15:20:27 +0000 (17:20 +0200)]
net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
Commit
4b5923249b8fa4 ("net: dsa: lantiq_gswip: Configure all remaining
GSWIP_MII_CFG bits") added all known bits in the GSWIP_MII_CFGp
register. It helped bring this register into a well-defined state so the
driver has to rely less on the bootloader to do things right.
Unfortunately it also sets the GSWIP_MII_CFG_RMII_CLK bit without any
possibility to configure it. Upon further testing it turns out that all
boards which are supported by the GSWIP driver in OpenWrt which use an
RMII PHY have a dedicated oscillator on the board which provides the
50MHz RMII reference clock.
Don't set the GSWIP_MII_CFG_RMII_CLK bit (but keep the code which always
clears it) to fix support for the Fritz!Box 7362 SL in OpenWrt. This is
a board with two Atheros AR8030 RMII PHYs. With the "RMII clock" bit set
the MAC also generates the RMII reference clock whose signal then
conflicts with the signal from the oscillator on the board. This results
in a constant cycle of the PHY detecting link up/down (and as a result
of that: the two ports using the AR8030 PHYs are not working).
At the time of writing this patch there's no known board where the MAC
(GSWIP) has to generate the RMII reference clock. If needed this can be
implemented in future by providing a device-tree flag so the
GSWIP_MII_CFG_RMII_CLK bit can be toggled per port.
Fixes: 4b5923249b8fa4 ("net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits")
Tested-by: Jan Hoffmann <jan@3e8.eu>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://lore.kernel.org/r/20220425152027.2220750-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sebastian Andrzej Siewior [Mon, 25 Apr 2022 16:39:46 +0000 (18:39 +0200)]
net: Use this_cpu_inc() to increment net->core_stats
The macro dev_core_stats_##FIELD##_inc() disables preemption and invokes
netdev_core_stats_alloc() to return a per-CPU pointer.
netdev_core_stats_alloc() will allocate memory on its first invocation
which breaks on PREEMPT_RT because it requires non-atomic context for
memory allocation.
This can be avoided by enabling preemption in netdev_core_stats_alloc()
assuming the caller always disables preemption.
It might be better to replace local_inc() with this_cpu_inc() now that
dev_core_stats_##FIELD##_inc() gained a preempt-disable section and does
not rely on already disabled preemption. This results in less
instructions on x86-64:
local_inc:
| incl %gs:__preempt_count(%rip) # __preempt_count
| movq 488(%rdi), %rax # _1->core_stats, _22
| testq %rax, %rax # _22
| je .L585 #,
| add %gs:this_cpu_off(%rip), %rax # this_cpu_off, tcp_ptr__
| .L586:
| testq %rax, %rax # _27
| je .L587 #,
| incq (%rax) # _6->a.counter
| .L587:
| decl %gs:__preempt_count(%rip) # __preempt_count
this_cpu_inc(), this patch:
| movq 488(%rdi), %rax # _1->core_stats, _5
| testq %rax, %rax # _5
| je .L591 #,
| .L585:
| incq %gs:(%rax) # _18->rx_dropped
Use unsigned long as type for the counter. Use this_cpu_inc() to
increment the counter. Use a plain read of the counter.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/YmbO0pxgtKpCw4SY@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 26 Apr 2022 23:34:11 +0000 (16:34 -0700)]
Merge tag 'pinctrl-v5.18-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Fix some register offsets on Intel Alderlake
- Fix the order the UFS and SDC pins on Qualcomm SM6350
- Fix a build error in Mediatek Moore.
- Fix a pin function table in the Sunplus SP7021.
- Fix some Kconfig and static keywords on the Samsung Tesla FSD SoC.
- Fix up the EOI function for edge triggered IRQs and keep the block
clock enabled for level IRQs in the STM32 driver.
- Fix some bits and order in the Rockchip RK3308 driver.
- Handle the errorpath in the Pistachio driver probe() properly.
* tag 'pinctrl-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: pistachio: fix use of irq_of_parse_and_map()
pinctrl: stm32: Keep pinctrl block clock enabled when LEVEL IRQ requested
pinctrl: rockchip: sort the rk3308_mux_recalced_data entries
pinctrl: rockchip: fix RK3308 pinmux bits
pinctrl: stm32: Do not call stm32_gpio_get() for edge triggered IRQs in EOI
pinctrl: Fix an error in pin-function table of SP7021
pinctrl: samsung: fix missing GPIOLIB on ARM64 Exynos config
pinctrl: mediatek: moore: Fix build error
pinctrl: qcom: sm6350: fix order of UFS & SDC pins
pinctrl: alderlake: Fix register offsets for ADL-N variant
pinctrl: samsung: staticize fsd_pin_ctrl