io-wq: fix race between worker exiting and activating free worker
authorJens Axboe <axboe@kernel.dk>
Tue, 3 Aug 2021 15:14:35 +0000 (09:14 -0600)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 12 Aug 2021 11:32:23 +0000 (13:32 +0200)
commit6ed983ea4a12a95c5e8845354a66d6f866d8cc76
tree3220d38b2b733cf635bf4833f446ce46244eb15d
parent08ed8d676c942ab52b5fd62320d276615f73beb0
io-wq: fix race between worker exiting and activating free worker

commit 83d6c39310b6d11199179f6384c2b0a415389597 upstream.

Nadav correctly reports that we have a race between a worker exiting,
and new work being queued. This can lead to work being queued behind
an existing worker that could be sleeping on an event before it can
run to completion, and hence introducing potential big latency gaps
if we hit this race condition:

cpu0 cpu1
---- ----
io_wqe_worker()
schedule_timeout()
 // timed out
io_wqe_enqueue()
io_wqe_wake_worker()
// work_flags & IO_WQ_WORK_CONCURRENT
io_wqe_activate_free_worker()
 io_worker_exit()

Fix this by having the exiting worker go through the normal decrement
of a running worker, which will spawn a new one if needed.

The free worker activation is modified to only return success if we
were able to find a sleeping worker - if not, we keep looking through
the list. If we fail, we create a new worker as per usual.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/io-uring/BFF746C0-FEDE-4646-A253-3021C57C26C9@gmail.com/
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Tested-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fs/io-wq.c