powerpc/eeh: Fix deadlock handling dead PHB
authorSam Bobroff <sbobroff@linux.ibm.com>
Fri, 7 Feb 2020 04:57:31 +0000 (15:57 +1100)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Fri, 28 Feb 2020 16:22:17 +0000 (17:22 +0100)
commit d4f194ed9eb9841a8f978710e4d24296f791a85b upstream.

Recovering a dead PHB can currently cause a deadlock as the PCI
rescan/remove lock is taken twice.

This is caused as part of an existing bug in
eeh_handle_special_event(). The pe is processed while traversing the
PHBs even though the pe is unrelated to the loop. This causes the pe
to be, incorrectly, processed more than once.

Untangling this section can move the pe processing out of the loop and
also outside the locked section, correcting both problems.

Fixes: 2e25505147b8 ("powerpc/eeh: Fix crash when edev->pdev changes")
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Tested-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0547e82dbf90ee0729a2979a8cac5c91665c621f.1581051445.git.sbobroff@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
arch/powerpc/kernel/eeh_driver.c

index 2fb166928e91b791afb98e68a22b8656475a4fc6..4fd7efdf2a53a765271bd9b7d63011b48aed3f52 100644 (file)
@@ -1200,6 +1200,17 @@ void eeh_handle_special_event(void)
                        eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
                        eeh_handle_normal_event(pe);
                } else {
+                       eeh_for_each_pe(pe, tmp_pe)
+                               eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
+                                       edev->mode &= ~EEH_DEV_NO_HANDLER;
+
+                       /* Notify all devices to be down */
+                       eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
+                       eeh_set_channel_state(pe, pci_channel_io_perm_failure);
+                       eeh_pe_report(
+                               "error_detected(permanent failure)", pe,
+                               eeh_report_failure, NULL);
+
                        pci_lock_rescan_remove();
                        list_for_each_entry(hose, &hose_list, list_node) {
                                phb_pe = eeh_phb_pe_get(hose);
@@ -1208,16 +1219,6 @@ void eeh_handle_special_event(void)
                                    (phb_pe->state & EEH_PE_RECOVERING))
                                        continue;
 
-                               eeh_for_each_pe(pe, tmp_pe)
-                                       eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
-                                               edev->mode &= ~EEH_DEV_NO_HANDLER;
-
-                               /* Notify all devices to be down */
-                               eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
-                               eeh_set_channel_state(pe, pci_channel_io_perm_failure);
-                               eeh_pe_report(
-                                       "error_detected(permanent failure)", pe,
-                                       eeh_report_failure, NULL);
                                bus = eeh_pe_bus_get(phb_pe);
                                if (!bus) {
                                        pr_err("%s: Cannot find PCI bus for "