git.fsl.cs.sunysb.edu Git - wrapfs-3.4.y.git/commit

author	Nate Dailey <nate.dailey@stratus.com>
	Mon, 29 Feb 2016 15:43:58 +0000 (10:43 -0500)
committer	Zefan Li <lizefan@huawei.com>
	Wed, 27 Apr 2016 10:55:30 +0000 (18:55 +0800)
commit	9237baa5c61ef9e11d8a71d02d73b53d8a2b7d01
tree	2e1459c858e864535f0f1434bdc9cb8abe9652c1	tree \| snapshot
parent	4b7e6b747c90c912340b5f3f3c876d81d60cf273	commit \| diff

raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang

commit ccfc7bf1f09d6190ef86693ddc761d5fe3fa47cb upstream.

If raid1d is handling a mix of read and write errors, handle_read_error's
call to freeze_array can get stuck.

This can happen because, though the bio_end_io_list is initially drained,
writes can be added to it via handle_write_finished as the retry_list
is processed. These writes contribute to nr_pending but are not included
in nr_queued.

If a later entry on the retry_list triggers a call to handle_read_error,
freeze array hangs waiting for nr_pending == nr_queued+extra. The writes
on the bio_end_io_list aren't included in nr_queued so the condition will
never be satisfied.

To prevent the hang, include bio_end_io_list writes in nr_queued.

There's probably a better way to handle decrementing nr_queued, but this
seemed like the safest way to avoid breaking surrounding code.

I'm happy to supply the script I used to repro this hang.

Fixes: 55ce74d4bfe1b(md/raid1: ensure device failure recorded before write request returns.)
Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>