Naoya Horiguchi [Fri, 8 Sep 2017 23:10:46 +0000 (16:10 -0700)]
mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
commit
eee4818baac0f2b37848fdf90e4b16430dc536ac upstream.
_PAGE_PSE is used to distinguish between a truly non-present
(_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and
should be treated as present.
But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would
cause confusion between one of those PMDs undergoing a THP split, and a
soft-dirty PMD. Dropping _PAGE_PSE check in pmd_present() does not work
well, because it can hurt optimization of tlb handling in thp split.
Thus, we need to move the bit.
In the current kernel, bits 1-4 are not used in non-present format since
commit
00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work
around erratum"). So let's move _PAGE_SWP_SOFT_DIRTY to bit 1. Bit 7
is used as reserved (always clear), so please don't use it for other
purpose.
Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: Bit 9 may be reserved for PAGE_BIT_NUMA here]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Dave Hansen [Fri, 8 Jul 2016 00:19:11 +0000 (17:19 -0700)]
x86/mm: Move swap offset/type up in PTE to work around erratum
commit
00839ee3b299303c6a5e26a0a2485427a3afcbbf upstream.
This erratum can result in Accessed/Dirty getting set by the hardware
when we do not expect them to be (on !Present PTEs).
Instead of trying to fix them up after this happens, we just
allow the bits to get set and try to ignore them. We do this by
shifting the layout of the bits we use for swap offset/type in
our 64-bit PTEs.
It looks like this:
bitnrs: | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0|
names: | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P|
before: | OFFSET (9-63) |0|X|X| TYPE(1-5) |0|
after: | OFFSET (14-63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0|
Note that D was already a don't care (X) even before. We just
move TYPE up and turn its old spot (which could be hit by the
A bit) into all don't cares.
We take 5 bits away from the offset, but that still leaves us
with 50 bits which lets us index into a 62-bit swapfile (4 EiB).
I think that's probably fine for the moment. We could
theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it
doesn't gain us anything.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: Bit 9 may be reserved for PAGE_BIT_NUMA, which
no longer exists upstream. Adjust the bit numbers accordingly,
incorporating commit
ace7fab7a6cd "x86/mm: Fix swap entry comment and
macro".]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Andi Kleen [Wed, 13 Jun 2018 22:48:21 +0000 (15:48 -0700)]
x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT
commit
50896e180c6aa3a9c61a26ced99e15d602666a4c upstream.
L1 Terminal Fault (L1TF) is a speculation related vulnerability. The CPU
speculates on PTE entries which do not have the PRESENT bit set, if the
content of the resulting physical address is available in the L1D cache.
The OS side mitigation makes sure that a !PRESENT PTE entry points to a
physical address outside the actually existing and cachable memory
space. This is achieved by inverting the upper bits of the PTE. Due to the
address space limitations this only works for 64bit and 32bit PAE kernels,
but not for 32bit non PAE.
This mitigation applies to both host and guest kernels, but in case of a
64bit host (hypervisor) and a 32bit PAE guest, inverting the upper bits of
the PAE address space (44bit) is not enough if the host has more than 43
bits of populated memory address space, because the speculation treats the
PTE content as a physical host address bypassing EPT.
The host (hypervisor) protects itself against the guest by flushing L1D as
needed, but pages inside the guest are not protected against attacks from
other processes inside the same guest.
For the guest the inverted PTE mask has to match the host to provide the
full protection for all pages the host could possibly map into the
guest. The hosts populated address space is not known to the guest, so the
mask must cover the possible maximal host address space, i.e. 52 bit.
On 32bit PAE the maximum PTE mask is currently set to 44 bit because that
is the limit imposed by 32bit unsigned long PFNs in the VMs. This limits
the mask to be below what the host could possible use for physical pages.
The L1TF PROT_NONE protection code uses the PTE masks to determine which
bits to invert to make sure the higher bits are set for unmapped entries to
prevent L1TF speculation attacks against EPT inside guests.
In order to invert all bits that could be used by the host, increase
__PHYSICAL_PAGE_SHIFT to 52 to match 64bit.
The real limit for a 32bit PAE kernel is still 44 bits because all Linux
PTEs are created from unsigned long PFNs, so they cannot be higher than 44
bits on a 32bit kernel. So these extra PFN bits should be never set. The
only users of this macro are using it to look at PTEs, so it's safe.
[ tglx: Massaged changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 17 Feb 2015 00:00:18 +0000 (16:00 -0800)]
powerpc: drop _PAGE_FILE and pte_file()-related helpers
commit
780fc5642f59b6c6e2b05794de60b2d2ad5f040e upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:11:25 +0000 (14:11 -0800)]
xtensa: drop _PAGE_FILE and pte_file()-related helpers
commit
d9ecee281b8f89da6d3203be62802eda991e37cc upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:11:22 +0000 (14:11 -0800)]
x86: drop _PAGE_FILE and pte_file()-related helpers
commit
0a191362058391878cc2a4d4ccddcd8223eb4f79 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:11:20 +0000 (14:11 -0800)]
unicore32: drop pte_file()-related helpers
commit
40171798fe11a6dc1d963058b097b2c4c9d34a9c upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:11:17 +0000 (14:11 -0800)]
um: drop _PAGE_FILE and pte_file()-related helpers
commit
3513006a5691ae3629eef9ddef0b71a47c40dfbc upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:11:14 +0000 (14:11 -0800)]
tile: drop pte_file()-related helpers
commit
eb12f4872a3845a8803f689646dea5b92a30aff7 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Chris Metcalf <cmetcalf@ezchip.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:11:12 +0000 (14:11 -0800)]
sparc: drop pte_file()-related helpers
commit
6a8c4820895cf1dd2a128aef67ce079ba6eded80 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also increase number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:11:09 +0000 (14:11 -0800)]
sh: drop _PAGE_FILE and pte_file()-related helpers
commit
8b70beac99466b6d164de9fe647b3567e6f17e3a upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:11:06 +0000 (14:11 -0800)]
score: drop _PAGE_FILE and pte_file()-related helpers
commit
917e401ea75478d4f4575bc8b0ef3d14ecf9ef69 upstream.
We've replaced remap_file_pages(2) implementation with emulation.
Nobody creates non-linear mapping anymore.
This patch also increase number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:11:04 +0000 (14:11 -0800)]
s390: drop pte_file()-related helpers
commit
6e76d4b20bf6b514408ab5bd07f4a76723259b64 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:11:01 +0000 (14:11 -0800)]
parisc: drop _PAGE_FILE and pte_file()-related helpers
commit
8d55da810f1fabcf1d4c0bbc46205e5f2c0fa84b upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:58 +0000 (14:10 -0800)]
openrisc: drop _PAGE_FILE and pte_file()-related helpers
commit
3824e3cf7e865b2ff0b71de23b16e332fe6a853a upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:53 +0000 (14:10 -0800)]
mn10300: drop _PAGE_FILE and pte_file()-related helpers
commit
6bf63a8ccb1dccd6ab81bc8bc46863493629cdb8 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also increases the number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:50 +0000 (14:10 -0800)]
mips: drop _PAGE_FILE and pte_file()-related helpers
commit
b32da82e28ce90bff4e371fc15d2816fa3175bb0 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: Deleted definitions are slightly different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:47 +0000 (14:10 -0800)]
microblaze: drop _PAGE_FILE and pte_file()-related helpers
commit
937fa39fb22fea1c1d8ca9e5f31c452b91ac7239 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:45 +0000 (14:10 -0800)]
metag: drop _PAGE_FILE and pte_file()-related helpers
commit
22f9bf3950f20d24198791685f2dccac2c4ef38a upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:41 +0000 (14:10 -0800)]
m68k: drop _PAGE_FILE and pte_file()-related helpers
commit
1eeda0abf4425c91e7ce3ca32f1908c3a51bf84e upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:39 +0000 (14:10 -0800)]
m32r: drop _PAGE_FILE and pte_file()-related helpers
commit
406b16e26d0996516c8d1641008a7d326bf282d6 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:36 +0000 (14:10 -0800)]
ia64: drop _PAGE_FILE and pte_file()-related helpers
commit
636a002b704e0a36cefb5f4cf0293fab858fc46c upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also increase number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:33 +0000 (14:10 -0800)]
hexagon: drop _PAGE_FILE and pte_file()-related helpers
commit
d99f95e6522db22192c331c75de182023a49fbcc upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also increase number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:31 +0000 (14:10 -0800)]
frv: drop _PAGE_FILE and pte_file()-related helpers
commit
ca5bfa7b390017f053d7581bc701518b87bc3d43 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also increase number of bits availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:28 +0000 (14:10 -0800)]
cris: drop _PAGE_FILE and pte_file()-related helpers
commit
103f3d9a26df944f4c29de190d72dfbf913c71af upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:25 +0000 (14:10 -0800)]
c6x: drop pte_file()
commit
f5b45de9b00eb53d11ada85c61e4ea1c31ab8218 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:23 +0000 (14:10 -0800)]
blackfin: drop pte_file()
commit
2bc6ff14d46745a7728ed4ed90c5e0edca91f52e upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Steven Miao <realmz6@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:20 +0000 (14:10 -0800)]
avr32: drop _PAGE_FILE and pte_file()-related helpers
commit
7a7d2db4b8b3505a3195178619ffcc80985c4be1 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:17 +0000 (14:10 -0800)]
arm: drop L_PTE_FILE and pte_file()-related helpers
commit
b007ea798f5c568d3f464d37288220ef570f062c upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also adjust __SWP_TYPE_SHIFT, effectively increase size of
possible swap file to 128G.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:15 +0000 (14:10 -0800)]
arm64: drop PTE_FILE and pte_file()-related helpers
commit
9b3e661e58b90b0c2d5c2168c23408f1e59e9e35 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
This patch also adjust __SWP_TYPE_SHIFT and increase number of bits
availble for swap offset.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:12 +0000 (14:10 -0800)]
arc: drop _PAGE_FILE and pte_file()-related helpers
commit
18747151308f9e0fb63766057957617ec4afa190 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:09 +0000 (14:10 -0800)]
alpha: drop _PAGE_FILE and pte_file()-related helpers
commit
b816157a5366550c5ee29a6431ba1abb88721266 upstream.
We've replaced remap_file_pages(2) implementation with emulation. Nobody
creates non-linear mapping anymore.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:07 +0000 (14:10 -0800)]
asm-generic: drop unused pte_file* helpers
commit
5064c8e19dc215afae8ffae95570e7f22062d49c upstream.
All users are gone.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:04 +0000 (14:10 -0800)]
mm: remove rest usage of VM_NONLINEAR and pte_file()
commit
0661a33611fca12570cba48d9344ce68834ee86c upstream.
One bit in ->vm_flags is unused now!
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: Drop changes in mm/debug.c]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:02 +0000 (14:10 -0800)]
mm: replace vma->sharead.linear with vma->shared
commit
ac51b934f3912582d3c897c6c4d09b32ea57b2c7 upstream.
After removing vma->shared.nonlinear we have only one member of
vma->shared union, which doesn't make much sense.
This patch drops the union and move struct vma->shared.linear to
vma->shared.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:59 +0000 (14:09 -0800)]
rmap: drop support of non-linear mappings
commit
27ba0644ea9dfe6e7693abc85837b60e40583b96 upstream.
We don't create non-linear mappings anymore. Let's drop code which
handles them in rmap.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16:
- Deleted code is slightly different
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:57 +0000 (14:09 -0800)]
proc: drop handling non-linear mappings
commit
1da4b35b001481df99a6dcab12d5d39a876f7056 upstream.
We have to handle non-linear mappings for /proc/PID/{smaps,clear_refs}
which is unused now. Let's drop it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16:
- Deleted code is slightly different
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:54 +0000 (14:09 -0800)]
mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub
commit
d83a08db5ba6072caa658745881f4baa9bad6a08 upstream.
Nobody uses it anymore.
[akpm@linux-foundation.org: fix filemap_xip.c]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16:
- Deleted code is slightly different
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:51 +0000 (14:09 -0800)]
mm: drop support of non-linear mapping from fault codepath
commit
9b4bdd2ffab9557ac43af7dff02e7dab1c8c58bd upstream.
We don't create non-linear mappings anymore. Let's drop code which
handles them on page fault.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16:
- Deleted code is slightly different
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:49 +0000 (14:09 -0800)]
mm: drop support of non-linear mapping from unmap/zap codepath
commit
8a5f14a23177061ec11daeaa3d09d0765d785c47 upstream.
We have remap_file_pages(2) emulation in -mm tree for few release cycles
and we plan to have it mainline in v3.20. This patchset removes rest of
VM_NONLINEAR infrastructure.
Patches 1-8 take care about generic code. They are pretty
straight-forward and can be applied without other of patches.
Rest patches removes pte_file()-related stuff from architecture-specific
code. It usually frees up one bit in non-present pte. I've tried to reuse
that bit for swap offset, where I was able to figure out how to do that.
For obvious reason I cannot test all that arch-specific code and would
like to see acks from maintainers.
In total, remap_file_pages(2) required about 1.4K lines of not-so-trivial
kernel code. That's too much for functionality nobody uses.
Tested-by: Felipe Balbi <balbi@ti.com>
This patch (of 38):
We don't create non-linear mappings anymore. Let's drop code which
handles them on unmap/zap.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Wed, 17 Feb 2016 21:11:15 +0000 (13:11 -0800)]
mm: fix regression in remap_file_pages() emulation
commit
48f7df329474b49d83d0dffec1b6186647f11976 upstream.
Grazvydas Ignotas has reported a regression in remap_file_pages()
emulation.
Testcase:
#define _GNU_SOURCE
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#define SIZE (4096 * 3)
int main(int argc, char **argv)
{
unsigned long *p;
long i;
p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
return -1;
}
for (i = 0; i < SIZE / 4096; i++)
p[i * 4096 / sizeof(*p)] = i;
if (remap_file_pages(p, 4096, 0, 1, 0)) {
perror("remap_file_pages");
return -1;
}
if (remap_file_pages(p, 4096 * 2, 0, 1, 0)) {
perror("remap_file_pages");
return -1;
}
assert(p[0] == 1);
munmap(p, SIZE);
return 0;
}
The second remap_file_pages() fails with -EINVAL.
The reason is that remap_file_pages() emulation assumes that the target
vma covers whole area we want to over map. That assumption is broken by
first remap_file_pages() call: it split the area into two vma.
The solution is to check next adjacent vmas, if they map the same file
with the same flags.
Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:46 +0000 (14:09 -0800)]
mm: replace remap_file_pages() syscall with emulation
commit
c8d78c1823f46519473949d33f0d1d33fe21ea16 upstream.
remap_file_pages(2) was invented to be able efficiently map parts of
huge file into limited 32-bit virtual address space such as in database
workloads.
Nonlinear mappings are pain to support and it seems there's no
legitimate use-cases nowadays since 64-bit systems are widely available.
Let's drop it and get rid of all these special-cased code.
The patch replaces the syscall with emulation which creates new VMA on
each remap_file_pages(), unless they it can be merged with an adjacent
one.
I didn't find *any* real code that uses remap_file_pages(2) to test
emulation impact on. I've checked Debian code search and source of all
packages in ALT Linux. No real users: libc wrappers, mentions in
strace, gdb, valgrind and this kind of stuff.
There are few basic tests in LTP for the syscall. They work just fine
with emulation.
To test performance impact, I've written small test case which
demonstrate pretty much worst case scenario: map 4G shmfs file, write to
begin of every page pgoff of the page, remap pages in reverse order,
read every page.
The test creates 1 million of VMAs if emulation is in use, so I had to
set vm.max_map_count to
1100000 to avoid -ENOMEM.
Before: 23.3 ( +- 4.31% ) seconds
After: 43.9 ( +- 0.85% ) seconds
Slowdown: 1.88x
I believe we can live with that.
Test case:
#define _GNU_SOURCE
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#define MB (1024UL * 1024)
#define SIZE (4096 * MB)
int main(int argc, char **argv)
{
unsigned long *p;
long i, pass;
for (pass = 0; pass < 10; pass++) {
p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
return -1;
}
for (i = 0; i < SIZE / 4096; i++)
p[i * 4096 / sizeof(*p)] = i;
for (i = 0; i < SIZE / 4096; i++) {
if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
perror("remap_file_pages");
return -1;
}
}
for (i = SIZE / 4096 - 1; i >= 0; i--)
assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);
munmap(p, SIZE);
}
return 0;
}
[akpm@linux-foundation.org: fix spello]
[sasha.levin@oracle.com: initialize populate before usage]
[sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Armin Rigo <arigo@tunes.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16:
- Deleted code is slightly different
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ben Hutchings [Thu, 27 Sep 2018 20:31:31 +0000 (21:31 +0100)]
x86/cpufeatures: Show KAISER in cpuinfo
I noticed that in the upstream kernel PTI is exposed in /proc/cpuinfo
and in the 4.4 and 4.9 stable branches KAISER is exposed similarly,
but for some reason the backport to 3.16 hid it. Change to match the
other branches.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Juergen Gross [Thu, 21 Jun 2018 08:43:31 +0000 (10:43 +0200)]
x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths
commit
74899d92e66663dc7671a8017b3146dcd4735f3b upstream.
Commit:
1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD")
... added speculative_store_bypass_ht_init() to the per-CPU initialization sequence.
speculative_store_bypass_ht_init() needs to be called on each CPU for
PV guests, too.
Reported-by: Brian Woods <brian.woods@amd.com>
Tested-by: Brian Woods <brian.woods@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: xen-devel@lists.xenproject.org
Fixes: 1f50ddb4f4189243c05926b842dc1a0332195f31 ("x86/speculation: Handle HT correctly on AMD")
Link: https://lore.kernel.org/lkml/20180621084331.21228-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Sun, 12 Aug 2018 18:41:45 +0000 (20:41 +0200)]
KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled
commit
024d83cadc6b2af027e473720f3c3da97496c318 upstream.
Mikhail reported the following lockdep splat:
WARNING: possible irq lock inversion dependency detected
CPU 0/KVM/10284 just changed the state of lock:
000000000d538a88 (&st->lock){+...}, at:
speculative_store_bypass_update+0x10b/0x170
but this lock was taken by another, HARDIRQ-safe lock
in the past:
(&(&sighand->siglock)->rlock){-.-.}
and interrupts could create inverse lock ordering between them.
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&st->lock);
local_irq_disable();
lock(&(&sighand->siglock)->rlock);
lock(&st->lock);
<Interrupt>
lock(&(&sighand->siglock)->rlock);
*** DEADLOCK ***
The code path which connects those locks is:
speculative_store_bypass_update()
ssb_prctl_set()
do_seccomp()
do_syscall_64()
In svm_vcpu_run() speculative_store_bypass_update() is called with
interupts enabled via x86_virt_spec_ctrl_set_guest/host().
This is actually a false positive, because GIF=0 so interrupts are
disabled even if IF=1; however, we can easily move the invocations of
x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to
cure it, and it's a good idea to keep the GIF=0/IF=1 area as small
and self-contained as possible.
Fixes: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Mon, 21 May 2018 21:54:49 +0000 (17:54 -0400)]
KVM/VMX: Expose SSBD properly to guests
commit
0aa48468d00959c8a37cd3ac727284f4f7359151 upstream.
The X86_FEATURE_SSBD is an synthetic CPU feature - that is
it bit location has no relevance to the real CPUID 0x7.EBX[31]
bit position. For that we need the new CPU feature name.
Fixes: 52817587e706 ("x86/cpufeatures: Disentangle SSBD enumeration")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20180521215449.26423-2-konrad.wilk@oracle.com
[bwh: Backported to 3.16:
- Fix guest_cpuid_has_spec_ctrl() as well
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 17 May 2018 03:18:09 +0000 (23:18 -0400)]
x86/bugs: Rename SSBD_NO to SSB_NO
commit
240da953fcc6a9008c92fae5b1f727ee5ed167ab upstream.
The "336996 Speculative Execution Side Channel Mitigations" from
May defines this as SSB_NO, hence lets sync-up.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tom Lendacky [Thu, 10 May 2018 20:06:39 +0000 (22:06 +0200)]
KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
commit
bc226f07dcd3c9ef0b7f6236fe356ea4a9cb4769 upstream.
Expose the new virtualized architectural mechanism, VIRT_SSBD, for using
speculative store bypass disable (SSBD) under SVM. This will allow guests
to use SSBD on hardware that uses non-architectural mechanisms for enabling
SSBD.
[ tglx: Folded the migration fixup from Paolo Bonzini ]
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- There is no SMM support or cpu_has_high_real_mode_segbase operation
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Thu, 10 May 2018 18:42:48 +0000 (20:42 +0200)]
x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
commit
47c61b3955cf712cadfc25635bf9bc174af030ea upstream.
Add the necessary logic for supporting the emulated VIRT_SPEC_CTRL MSR to
x86_virt_spec_ctrl(). If either X86_FEATURE_LS_CFG_SSBD or
X86_FEATURE_VIRT_SPEC_CTRL is set then use the new guest_virt_spec_ctrl
argument to check whether the state must be modified on the host. The
update reuses speculative_store_bypass_update() so the ZEN-specific sibling
coordination can be reused.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Sat, 12 May 2018 18:10:00 +0000 (20:10 +0200)]
x86/bugs: Rework spec_ctrl base and mask logic
commit
be6fcb5478e95bb1c91f489121238deb3abca46a upstream.
x86_spec_ctrL_mask is intended to mask out bits from a MSR_SPEC_CTRL value
which are not to be modified. However the implementation is not really used
and the bitmask was inverted to make a check easier, which was removed in
"x86/bugs: Remove x86_spec_ctrl_set()"
Aside of that it is missing the STIBP bit if it is supported by the
platform, so if the mask would be used in x86_virt_spec_ctrl() then it
would prevent a guest from setting STIBP.
Add the STIBP bit if supported and use the mask in x86_virt_spec_ctrl() to
sanitize the value which is supplied by the guest.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Sat, 12 May 2018 18:53:14 +0000 (20:53 +0200)]
x86/bugs: Remove x86_spec_ctrl_set()
commit
4b59bdb569453a60b752b274ca61f009e37f4dae upstream.
x86_spec_ctrl_set() is only used in bugs.c and the extra mask checks there
provide no real value as both call sites can just write x86_spec_ctrl_base
to MSR_SPEC_CTRL. x86_spec_ctrl_base is valid and does not need any extra
masking or checking.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Sat, 12 May 2018 18:49:16 +0000 (20:49 +0200)]
x86/bugs: Expose x86_spec_ctrl_base directly
commit
fa8ac4988249c38476f6ad678a4848a736373403 upstream.
x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR.
x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to
prevent modification to that variable. Though the variable is read only
after init and globaly visible already.
Remove the function and export the variable instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Borislav Petkov [Fri, 11 May 2018 22:14:51 +0000 (00:14 +0200)]
x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
commit
cc69b34989210f067b2c51d5539b5f96ebcc3a01 upstream.
Function bodies are very similar and are going to grow more almost
identical code. Add a bool arg to determine whether SPEC_CTRL is being set
for the guest or restored to the host.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Thu, 10 May 2018 18:31:44 +0000 (20:31 +0200)]
x86/speculation: Rework speculative_store_bypass_update()
commit
0270be3e34efb05a88bc4c422572ece038ef3608 upstream.
The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse
speculative_store_bypass_update() to avoid code duplication. Add an
argument for supplying a thread info (TIF) value and create a wrapper
speculative_store_bypass_update_current() which is used at the existing
call site.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tom Lendacky [Thu, 17 May 2018 15:09:18 +0000 (17:09 +0200)]
x86/speculation: Add virtualized speculative store bypass disable support
commit
11fb0683493b2da112cd64c9dada221b52463bf7 upstream.
Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD). To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.
Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- This CPUID word is feature word 11
- Adjust filenames, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Wed, 9 May 2018 21:01:01 +0000 (23:01 +0200)]
x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
commit
ccbcd2674472a978b48c91c1fbfb66c0ff959f24 upstream.
AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store
Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care
about the bit position of the SSBD bit and thus facilitate migration.
Also, the sibling coordination on Family 17H CPUs can only be done on
the host.
Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an
extra argument for the VIRT_SPEC_CTRL MSR.
Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU
data structure which is going to be used in later patches for the actual
implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Wed, 9 May 2018 19:53:09 +0000 (21:53 +0200)]
x86/speculation: Handle HT correctly on AMD
commit
1f50ddb4f4189243c05926b842dc1a0332195f31 upstream.
The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when
hyperthreading is enabled the SSBD bit toggle needs to take both cores into
account. Otherwise the following situation can happen:
CPU0 CPU1
disable SSB
disable SSB
enable SSB <- Enables it for the Core, i.e. for CPU0 as well
So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled
again.
On Intel the SSBD control is per core as well, but the synchronization
logic is implemented behind the per thread SPEC_CTRL MSR. It works like
this:
CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL
i.e. if one of the threads enables a mitigation then this affects both and
the mitigation is only disabled in the core when both threads disabled it.
Add the necessary synchronization logic for AMD family 17H. Unfortunately
that requires a spinlock to serialize the access to the MSR, but the locks
are only shared between siblings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- s/topology_sibling_cpumask/topology_thread_cpumask/
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Thu, 10 May 2018 14:26:00 +0000 (16:26 +0200)]
x86/cpufeatures: Add FEATURE_ZEN
commit
d1035d971829dcf80e8686ccde26f94b0a069472 upstream.
Add a ZEN feature bit so family-dependent static_cpu_has() optimizations
can be built for ZEN.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- Use the next available bit number in CPU feature word 7
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Thu, 10 May 2018 18:21:36 +0000 (20:21 +0200)]
x86/cpufeatures: Disentangle SSBD enumeration
commit
52817587e706686fcdb27f14c1b000c92f266c96 upstream.
The SSBD enumeration is similarly to the other bits magically shared
between Intel and AMD though the mechanisms are different.
Make X86_FEATURE_SSBD synthetic and set it depending on the vendor specific
features or family dependent setup.
Change the Intel bit to X86_FEATURE_SPEC_CTRL_SSBD to denote that SSBD is
controlled via MSR_SPEC_CTRL and fix up the usage sites.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- Use the next available bit number in CPU feature word 7
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Thu, 10 May 2018 17:13:18 +0000 (19:13 +0200)]
x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
commit
7eb8956a7fec3c1f0abc2a5517dada99ccc8a961 upstream.
The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on
Intel and implied by IBRS or STIBP support on AMD. That's just confusing
and in case an AMD CPU has IBRS not supported because the underlying
problem has been fixed but has another bit valid in the SPEC_CTRL MSR,
the thing falls apart.
Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the
availability on both Intel and AMD.
While at it replace the boot_cpu_has() checks with static_cpu_has() where
possible. This prevents late microcode loading from exposing SPEC_CTRL, but
late loading is already very limited as it does not reevaluate the
mitigation options and other bits and pieces. Having static_cpu_has() is
the simplest and least fragile solution.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- Use the next available bit number in CPU feature word 7
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Borislav Petkov [Wed, 2 May 2018 16:15:14 +0000 (18:15 +0200)]
x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
commit
e7c587da125291db39ddf1f49b18e5970adbac17 upstream.
Intel and AMD have different CPUID bits hence for those use synthetic bits
which get set on the respective vendor's in init_speculation_control(). So
that debacles like what the commit message of
c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload")
talks about don't happen anymore.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- Use the next available bit numbers in CPU feature word 7
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Fri, 11 May 2018 13:21:01 +0000 (15:21 +0200)]
KVM: SVM: Move spec control call after restore of GS
commit
15e6c22fd8e5a42c5ed6d487b7c9fe44c2517765 upstream.
svm_vcpu_run() invokes x86_spec_ctrl_restore_host() after VMEXIT, but
before the host GS is restored. x86_spec_ctrl_restore_host() uses 'current'
to determine the host SSBD state of the thread. 'current' is GS based, but
host GS is not yet restored and the access causes a triple fault.
Move the call after the host GS restore.
Fixes: 885f82bfbc6f x86/process: Allow runtime control of Speculative Store Bypass
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Jim Mattson [Sun, 13 May 2018 21:33:57 +0000 (17:33 -0400)]
x86/cpu: Make alternative_msr_write work for 32-bit code
commit
5f2b745f5e1304f438f9b2cd03ebc8120b6e0d3b upstream.
Cast val and (val >> 32) to (u32), so that they fit in a
general-purpose register in both 32-bit and 64-bit code.
[ tglx: Made it u32 instead of uintptr_t ]
Fixes: c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload")
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Fri, 11 May 2018 20:50:35 +0000 (16:50 -0400)]
x86/bugs: Fix the parameters alignment and missing void
commit
ffed645e3be0e32f8e9ab068d257aee8d0fe8eec upstream.
Fixes: 7bb4d366c ("x86/bugs: Make cpu_show_common() static")
Fixes: 24f7fc83b ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Jiri Kosina [Thu, 10 May 2018 20:47:32 +0000 (22:47 +0200)]
x86/bugs: Make cpu_show_common() static
commit
7bb4d366cba992904bffa4820d24e70a3de93e76 upstream.
cpu_show_common() is not used outside of arch/x86/kernel/cpu/bugs.c, so
make it static.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Jiri Kosina [Thu, 10 May 2018 20:47:18 +0000 (22:47 +0200)]
x86/bugs: Fix __ssb_select_mitigation() return type
commit
d66d8ff3d21667b41eddbe86b35ab411e40d8c5f upstream.
__ssb_select_mitigation() returns one of the members of enum ssb_mitigation,
not ssb_mitigation_cmd; fix the prototype to reflect that.
Fixes: 24f7fc83b9204 ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Borislav Petkov [Tue, 8 May 2018 13:43:45 +0000 (15:43 +0200)]
Documentation/spec_ctrl: Do some minor cleanups
commit
dd0792699c4058e63c0715d9a7c2d40226fcdddc upstream.
Fix some typos, improve formulations, end sentences with a fullstop.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Wed, 9 May 2018 19:41:38 +0000 (21:41 +0200)]
proc: Use underscores for SSBD in 'status'
commit
e96f46ee8587607a828f783daa6eb5b44d25004d upstream.
The style for the 'status' file is CamelCase or this. _.
Fixes: fae1fa0fc ("proc: Provide details on speculation flaw mitigations")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Wed, 9 May 2018 19:41:38 +0000 (21:41 +0200)]
x86/bugs: Rename _RDS to _SSBD
commit
9f65fb29374ee37856dbad847b4e121aab72b510 upstream.
Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2]
as SSBD (Speculative Store Bypass Disable).
Hence changing it.
It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name
is going to be. Following the rename it would be SSBD_NO but that rolls out
to Speculative Store Bypass Disable No.
Also fixed the missing space in X86_FEATURE_AMD_SSBD.
[ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16:
- Update guest_cpuid_has_spec_ctrl() rather than vmx_{get,set}_msr()
- Update _TIF_WORK_MASK and _TIF_ALLWORK_MASK
- Adjust filenames, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kees Cook [Thu, 3 May 2018 21:37:54 +0000 (14:37 -0700)]
x86/speculation: Make "seccomp" the default mode for Speculative Store Bypass
commit
f21b53b20c754021935ea43364dbf53778eeba32 upstream.
Unless explicitly opted out of, anything running under seccomp will have
SSB mitigations enabled. Choosing the "prctl" mode will disable this.
[ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Fri, 4 May 2018 13:12:06 +0000 (15:12 +0200)]
seccomp: Move speculation migitation control to arch code
commit
8bf37d8c067bb7eb8e7c381bdadf9bd89182b6bc upstream.
The migitation control is simpler to implement in architecture code as it
avoids the extra function call to check the mode. Aside of that having an
explicit seccomp enabled mode in the architecture mitigations would require
even more workarounds.
Move it into architecture code and provide a weak function in the seccomp
code. Remove the 'which' argument as this allows the architecture to decide
which mitigations are relevant for seccomp.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kees Cook [Thu, 3 May 2018 21:56:12 +0000 (14:56 -0700)]
seccomp: Add filter flag to opt-out of SSB mitigation
commit
00a02d0c502a06d15e07b857f8ff921e3e402675 upstream.
If a seccomp user is not interested in Speculative Store Bypass mitigation
by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when
adding filters.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16:
- We don't support SECCOMP_FILTER_FLAG_TSYNC or SECCOMP_FILTER_FLAG_LOG
- Drop selftest changes]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Fri, 4 May 2018 07:40:03 +0000 (09:40 +0200)]
seccomp: Use PR_SPEC_FORCE_DISABLE
commit
b849a812f7eb92e96d1c8239b06581b2cfd8b275 upstream.
Use PR_SPEC_FORCE_DISABLE in seccomp() because seccomp does not allow to
widen restrictions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Thu, 3 May 2018 20:09:15 +0000 (22:09 +0200)]
prctl: Add force disable speculation
commit
356e4bfff2c5489e016fdb925adbf12a1e3950ee upstream.
For certain use cases it is desired to enforce mitigations so they cannot
be undone afterwards. That's important for loader stubs which want to
prevent a child from disabling the mitigation again. Will also be used for
seccomp(). The extra state preserving of the prctl state for SSB is a
preparatory step for EBPF dymanic speculation control.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kees Cook [Tue, 1 May 2018 22:07:31 +0000 (15:07 -0700)]
seccomp: Enable speculation flaw mitigations
commit
5c3070890d06ff82eecb808d02d2ca39169533ef upstream.
When speculation flaw mitigations are opt-in (via prctl), using seccomp
will automatically opt-in to these protections, since using seccomp
indicates at least some level of sandboxing is desired.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16:
- Apply to current task
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kees Cook [Tue, 1 May 2018 22:31:45 +0000 (15:31 -0700)]
proc: Provide details on speculation flaw mitigations
commit
fae1fa0fc6cca8beee3ab8ed71d54f9a78fa3f64 upstream.
As done with seccomp and no_new_privs, also show speculation flaw
mitigation state in /proc/$pid/status.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kees Cook [Tue, 1 May 2018 22:19:04 +0000 (15:19 -0700)]
nospec: Allow getting/setting on non-current task
commit
7bbf1373e228840bb0295a2ca26d548ef37f448e upstream.
Adjust arch_prctl_get/set_spec_ctrl() to operate on tasks other than
current.
This is needed both for /proc/$pid/status queries and for seccomp (since
thread-syncing can trigger seccomp in non-current threads).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Sun, 29 Apr 2018 13:26:40 +0000 (15:26 +0200)]
x86/speculation: Add prctl for Speculative Store Bypass mitigation
commit
a73ec77ee17ec556fe7f165d00314cb7c047b1ac upstream.
Add prctl based control for Speculative Store Bypass mitigation and make it
the default mitigation for Intel and AMD.
Andi Kleen provided the following rationale (slightly redacted):
There are multiple levels of impact of Speculative Store Bypass:
1) JITed sandbox.
It cannot invoke system calls, but can do PRIME+PROBE and may have call
interfaces to other code
2) Native code process.
No protection inside the process at this level.
3) Kernel.
4) Between processes.
The prctl tries to protect against case (1) doing attacks.
If the untrusted code can do random system calls then control is already
lost in a much worse way. So there needs to be system call protection in
some way (using a JIT not allowing them or seccomp). Or rather if the
process can subvert its environment somehow to do the prctl it can already
execute arbitrary code, which is much worse than SSB.
To put it differently, the point of the prctl is to not allow JITed code
to read data it shouldn't read from its JITed sandbox. If it already has
escaped its sandbox then it can already read everything it wants in its
address space, and do much worse.
The ability to control Speculative Store Bypass allows to enable the
protection selectively without affecting overall system performance.
Based on an initial patch from Tim Chen. Completely rewritten.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Sun, 29 Apr 2018 13:21:42 +0000 (15:21 +0200)]
x86/process: Allow runtime control of Speculative Store Bypass
commit
885f82bfbc6fefb6664ea27965c3ab9ac4194b8c upstream.
The Speculative Store Bypass vulnerability can be mitigated with the
Reduced Data Speculation (RDS) feature. To allow finer grained control of
this eventually expensive mitigation a per task mitigation control is
required.
Add a new TIF_RDS flag and put it into the group of TIF flags which are
evaluated for mismatch in switch_to(). If these bits differ in the previous
and the next task, then the slow path function __switch_to_xtra() is
invoked. Implement the TIF_RDS dependent mitigation control in the slow
path.
If the prctl for controlling Speculative Store Bypass is disabled or no
task uses the prctl then there is no overhead in the switch_to() fast
path.
Update the KVM related speculation control functions to take TID_RDS into
account as well.
Based on a patch from Tim Chen. Completely rewritten.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.16:
- Exclude _TIF_RDS from _TIF_WORK_MASK and _TIF_ALLWORK_MASK
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Sun, 29 Apr 2018 13:20:11 +0000 (15:20 +0200)]
prctl: Add speculation control prctls
commit
b617cfc858161140d69cc0b5cc211996b557a1c7 upstream.
Add two new prctls to control aspects of speculation related vulnerabilites
and their mitigations to provide finer grained control over performance
impacting mitigations.
PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
the following meaning:
Bit Define Description
0 PR_SPEC_PRCTL Mitigation can be controlled per task by
PR_SET_SPECULATION_CTRL
1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
disabled
2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
enabled
If all bits are 0 the CPU is not affected by the speculation misfeature.
If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.
PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.
The common return values are:
EINVAL prctl is not implemented by the architecture or the unused prctl()
arguments are not 0
ENODEV arg2 is selecting a not supported speculation misfeature
PR_SET_SPECULATION_CTRL has these additional return values:
ERANGE arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
ENXIO prctl control of the selected speculation misfeature is disabled
The first supported controlable speculation misfeature is
PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
architectures.
Based on an initial patch from Tim Chen and mostly rewritten.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.16:
- Add the documentation directly under Documentation/ since there is
no userspace-api subdirectory or reST index
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Gleixner [Sun, 29 Apr 2018 13:01:37 +0000 (15:01 +0200)]
x86/speculation: Create spec-ctrl.h to avoid include hell
commit
28a2775217b17208811fa43a9e96bd1fdf417b86 upstream.
Having everything in nospec-branch.h creates a hell of dependencies when
adding the prctl based switching mechanism. Move everything which is not
required in nospec-branch.h to spec-ctrl.h and fix up the includes in the
relevant files.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 26 Apr 2018 02:04:25 +0000 (22:04 -0400)]
x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest
commit
da39556f66f5cfe8f9c989206974f1cb16ca5d7c upstream.
Expose the CPUID.7.EDX[31] bit to the guest, and also guard against various
combinations of SPEC_CTRL MSR values.
The handling of the MSR (to take into account the host value of SPEC_CTRL
Bit(2)) is taken care of in patch:
KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[dwmw2: Handle 4.9 guest CPUID differences, rename
guest_cpu_has_ibrs() → guest_cpu_has_spec_ctrl()]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 26 Apr 2018 02:04:24 +0000 (22:04 -0400)]
x86/bugs/AMD: Add support to disable RDS on Fam[15,16,17]h if requested
commit
764f3c21588a059cd783c6ba0734d4db2d72822d upstream.
AMD does not need the Speculative Store Bypass mitigation to be enabled.
The parameters for this are already available and can be done via MSR
C001_1020. Each family uses a different bit in that MSR for this.
[ tglx: Expose the bit mask via a variable and move the actual MSR fiddling
into the bugs code as that's the right thing to do and also required
to prepare for dynamic enable/disable ]
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16:
- Renumber the feature bit
- We don't have __ro_after_init
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 26 Apr 2018 02:04:23 +0000 (22:04 -0400)]
x86/bugs: Whitelist allowed SPEC_CTRL MSR values
commit
1115a859f33276fe8afb31c60cf9d8e657872558 upstream.
Intel and AMD SPEC_CTRL (0x48) MSR semantics may differ in the
future (or in fact use different MSRs for the same functionality).
As such a run-time mechanism is required to whitelist the appropriate MSR
values.
[ tglx: Made the variable __ro_after_init ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16:
- We don't have __ro_after_init
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 26 Apr 2018 02:04:22 +0000 (22:04 -0400)]
x86/bugs/intel: Set proper CPU features and setup RDS
commit
772439717dbf703b39990be58d8d4e3e4ad0598a upstream.
Intel CPUs expose methods to:
- Detect whether RDS capability is available via CPUID.7.0.EDX[31],
- The SPEC_CTRL MSR(0x48), bit 2 set to enable RDS.
- MSR_IA32_ARCH_CAPABILITIES, Bit(4) no need to enable RRS.
With that in mind if spec_store_bypass_disable=[auto,on] is selected set at
boot-time the SPEC_CTRL MSR to enable RDS if the platform requires it.
Note that this does not fix the KVM case where the SPEC_CTRL is exposed to
guests which can muck with it, see patch titled :
KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS.
And for the firmware (IBRS to be set), see patch titled:
x86/spectre_v2: Read SPEC_CTRL MSR during boot and re-use reserved bits
[ tglx: Distangled it from the intel implementation and kept the call order ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 26 Apr 2018 02:04:21 +0000 (22:04 -0400)]
x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation
commit
24f7fc83b9204d20f878c57cb77d261ae825e033 upstream.
Contemporary high performance processors use a common industry-wide
optimization known as "Speculative Store Bypass" in which loads from
addresses to which a recent store has occurred may (speculatively) see an
older value. Intel refers to this feature as "Memory Disambiguation" which
is part of their "Smart Memory Access" capability.
Memory Disambiguation can expose a cache side-channel attack against such
speculatively read values. An attacker can create exploit code that allows
them to read memory outside of a sandbox environment (for example,
malicious JavaScript in a web page), or to perform more complex attacks
against code running within the same privilege level, e.g. via the stack.
As a first step to mitigate against such attacks, provide two boot command
line control knobs:
nospec_store_bypass_disable
spec_store_bypass_disable=[off,auto,on]
By default affected x86 processors will power on with Speculative
Store Bypass enabled. Hence the provided kernel parameters are written
from the point of view of whether to enable a mitigation or not.
The parameters are as follows:
- auto - Kernel detects whether your CPU model contains an implementation
of Speculative Store Bypass and picks the most appropriate
mitigation.
- on - disable Speculative Store Bypass
- off - enable Speculative Store Bypass
[ tglx: Reordered the checks so that the whole evaluation is not done
when the CPU does not support RDS ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16:
- Renumber the feature bit
- Adjust filenames, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Sat, 28 Apr 2018 20:34:17 +0000 (22:34 +0200)]
x86/cpufeatures: Add X86_FEATURE_RDS
commit
0cc5fa00b0a88dad140b4e5c2cead9951ad36822 upstream.
Add the CPU feature bit CPUID.7.0.EDX[31] which indicates whether the CPU
supports Reduced Data Speculation.
[ tglx: Split it out from a later patch ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16:
- This CPUID word is feature word 10
- Adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 26 Apr 2018 02:04:20 +0000 (22:04 -0400)]
x86/bugs: Expose /sys/../spec_store_bypass
commit
c456442cd3a59eeb1d60293c26cbe2ff2c4e42cf upstream.
Add the sysfs file for the new vulerability. It does not do much except
show the words 'Vulnerable' for recent x86 cores.
Intel cores prior to family 6 are known not to be vulnerable, and so are
some Atoms and some Xeon Phi.
It assumes that older Cyrix, Centaur, etc. cores are immune.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16:
- Renumber X86_BUG_SPEC_STORE_BYPASS
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 26 Apr 2018 02:04:19 +0000 (22:04 -0400)]
x86/bugs, KVM: Support the combination of guest and host IBRS
commit
5cf687548705412da47c9cec342fd952d71ed3d5 upstream.
A guest may modify the SPEC_CTRL MSR from the value used by the
kernel. Since the kernel doesn't use IBRS, this means a value of zero is
what is needed in the host.
But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to
the other bits as reserved so the kernel should respect the boot time
SPEC_CTRL value and use that.
This allows to deal with future extensions to the SPEC_CTRL interface if
any at all.
Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any
difference as paravirt will over-write the callq *0xfff.. with the wrmsrl
assembler code.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 26 Apr 2018 02:04:18 +0000 (22:04 -0400)]
x86/bugs: Read SPEC_CTRL MSR during boot and re-use reserved bits
commit
1b86883ccb8d5d9506529d42dbe1a5257cb30b18 upstream.
The 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to all
the other bits as reserved. The Intel SDM glossary defines reserved as
implementation specific - aka unknown.
As such at bootup this must be taken it into account and proper masking for
the bits in use applied.
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511
[ tglx: Made x86_spec_ctrl_base __ro_after_init ]
Suggested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16:
- We don't have __ro_after_init
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 26 Apr 2018 02:04:17 +0000 (22:04 -0400)]
x86/bugs: Concentrate bug reporting into a separate function
commit
d1059518b4789cabe34bb4b714d07e6089c82ca1 upstream.
Those SysFS functions have a similar preamble, as such make common
code to handle them.
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: s/X86_FEATURE_PTI/X86_FEATURE_KAISER/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Konrad Rzeszutek Wilk [Thu, 26 Apr 2018 02:04:16 +0000 (22:04 -0400)]
x86/bugs: Concentrate bug detection into a separate function
commit
4a28bfe3267b68e22c663ac26185aa16c9b879ef upstream.
Combine the various logic which goes through all those
x86_cpu_id matching structures in one function.
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Linus Torvalds [Tue, 1 May 2018 13:55:51 +0000 (15:55 +0200)]
x86/nospec: Simplify alternative_msr_write()
commit
1aa7a5735a41418d8e01fa7c9565eb2657e2ea3f upstream.
The macro is not type safe and I did look for why that "g" constraint for
the asm doesn't work: it's because the asm is more fundamentally wrong.
It does
movl %[val], %%eax
but "val" isn't a 32-bit value, so then gcc will pass it in a register,
and generate code like
movl %rsi, %eax
and gas will complain about a nonsensical 'mov' instruction (it's moving a
64-bit register to a 32-bit one).
Passing it through memory will just hide the real bug - gcc still thinks
the memory location is 64-bit, but the "movl" will only load the first 32
bits and it all happens to work because x86 is little-endian.
Convert it to a type safe inline function with a little trick which hands
the feature into the ALTERNATIVE macro.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ben Hutchings [Tue, 25 Sep 2018 22:47:37 +0000 (23:47 +0100)]
Linux 3.16.58
Linus Torvalds [Thu, 13 Sep 2018 09:57:48 +0000 (23:57 -1000)]
mm: get rid of vmacache_flush_all() entirely
commit
7a9cdebdcc17e426fb5287e4a82db1dfe86339b2 upstream.
Jann Horn points out that the vmacache_flush_all() function is not only
potentially expensive, it's buggy too. It also happens to be entirely
unnecessary, because the sequence number overflow case can be avoided by
simply making the sequence number be 64-bit. That doesn't even grow the
data structures in question, because the other adjacent fields are
already 64-bit.
So simplify the whole thing by just making the sequence number overflow
case go away entirely, which gets rid of all the complications and makes
the code faster too. Win-win.
[ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics
also just goes away entirely with this ]
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Will Deacon <will.deacon@arm.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: drop changes to mm debug code]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Paolo Bonzini [Tue, 5 May 2015 10:08:55 +0000 (12:08 +0200)]
KVM: x86: introduce num_emulated_msrs
commit
62ef68bb4d00f1a662e487f3fc44ce8521c416aa upstream.
We will want to filter away MSR_IA32_SMBASE from the emulated_msrs if
the host CPU does not support SMM virtualization. Introduce the
logic to do that, and also move paravirt MSRs to emulated_msrs for
simplicity and to get rid of KVM_SAVE_MSRS_BEGIN.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Piotr Luc [Wed, 12 Oct 2016 18:05:20 +0000 (20:05 +0200)]
x86/cpu/intel: Add Knights Mill to Intel family
commit
0047f59834e5947d45f34f5f12eb330d158f700b upstream.
Add CPUID of Knights Mill (KNM) processor to Intel family list.
Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161012180520.30976-1-piotr.luc@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Borislav Petkov [Thu, 7 Sep 2017 17:08:21 +0000 (19:08 +0200)]
x86/cpu/AMD: Fix erratum 1076 (CPB bit)
commit
f7f3dc00f61261cdc9ccd8b886f21bc4dffd6fd9 upstream.
CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do
support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907170821.16021-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- Change the added case into an if-statement
- s/x86_stepping/x86_mask/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kyle Huey [Tue, 14 Feb 2017 08:11:03 +0000 (00:11 -0800)]
x86/process: Correct and optimize TIF_BLOCKSTEP switch
commit
b9894a2f5bd18b1691cb6872c9afe32b148d0132 upstream.
The debug control MSR is "highly magical" as the blockstep bit can be
cleared by hardware under not well documented circumstances.
So a task switch relying on the bit set by the previous task (according to
the previous tasks thread flags) can trip over this and not update the flag
for the next task.
To fix this its required to handle DEBUGCTLMSR_BTF when either the previous
or the next or both tasks have the TIF_BLOCKSTEP flag set.
While at it avoid branching within the TIF_BLOCKSTEP case and evaluating
boot_cpu_data twice in kernels without CONFIG_X86_DEBUGCTLMSR.
x86_64: arch/x86/kernel/process.o
text data bss dec hex
3024 8577 16 11617 2d61 Before
3008 8577 16 11601 2d51 After
i386: No change
[ tglx: Made the shift value explicit, use a local variable to make the
code readable and massaged changelog]
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/20170214081104.9244-3-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kyle Huey [Tue, 14 Feb 2017 08:11:02 +0000 (00:11 -0800)]
x86/process: Optimize TIF checks in __switch_to_xtra()
commit
af8b3cd3934ec60f4c2a420d19a9d416554f140b upstream.
Help the compiler to avoid reevaluating the thread flags for each checked
bit by reordering the bit checks and providing an explicit xor for
evaluation.
With default defconfigs for each arch,
x86_64: arch/x86/kernel/process.o
text data bss dec hex
3056 8577 16 11649 2d81 Before
3024 8577 16 11617 2d61 After
i386: arch/x86/kernel/process.o
text data bss dec hex
2957 8673 8 11638 2d76 Before
2925 8673 8 11606 2d56 After
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/20170214081104.9244-2-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16:
- We don't do refresh_tr_limit() here
- Use ACCESS_ONCE() instead of READ_ONCE()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>