Jeff Mahoney [Fri, 16 Mar 2018 18:36:27 +0000 (14:36 -0400)]
btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
[ Upstream commit
8a5a916d9a35e13576d79cc16e24611821b13e34 ]
While running btrfs/011, I hit the following lockdep splat.
This is the important bit:
pcpu_alloc+0x1ac/0x5e0
__percpu_counter_init+0x4e/0xb0
btrfs_init_fs_root+0x99/0x1c0 [btrfs]
btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
resolve_indirect_refs+0x130/0x830 [btrfs]
find_parent_nodes+0x69e/0xff0 [btrfs]
btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
btrfs_find_all_roots+0x50/0x70 [btrfs]
btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
The percpu_counter_init call in btrfs_alloc_subvolume_writers
uses GFP_KERNEL, which we can't do during transaction commit.
This switches it to GFP_NOFS.
========================================================
WARNING: possible irq lock inversion dependency detected
4.12.14-kvmsmall #8 Tainted: G W
--------------------------------------------------------
kswapd0/50 just changed the state of lock:
(&delayed_node->mutex){+.+.-.}, at: [<
ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
but this lock took another, RECLAIM_FS-unsafe lock in the past:
(pcpu_alloc_mutex){+.+.+.}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:
&delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pcpu_alloc_mutex);
local_irq_disable();
lock(&delayed_node->mutex);
lock(&found->groups_sem);
<Interrupt>
lock(&delayed_node->mutex);
*** DEADLOCK ***
2 locks held by kswapd0/50:
#0: (shrinker_rwsem){++++..}, at: [<
ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
#1: (&type->s_umount_key#30){+++++.}, at: [<
ffffffff8126dec6>] trylock_super+0x16/0x50
the shortest dependencies between 2nd lock and 1st lock:
-> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
HARDIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
pcpu_alloc+0x1ac/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
__do_tune_cpucache+0x2c/0x220
do_tune_cpucache+0x26/0xc0
enable_cpucache+0x6d/0xf0
kmem_cache_init_late+0x42/0x75
start_kernel+0x343/0x4cb
x86_64_start_kernel+0x127/0x134
secondary_startup_64+0xa5/0xb0
SOFTIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
pcpu_alloc+0x1ac/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
__do_tune_cpucache+0x2c/0x220
do_tune_cpucache+0x26/0xc0
enable_cpucache+0x6d/0xf0
kmem_cache_init_late+0x42/0x75
start_kernel+0x343/0x4cb
x86_64_start_kernel+0x127/0x134
secondary_startup_64+0xa5/0xb0
RECLAIM_FS-ON-W at:
__kmalloc+0x47/0x310
pcpu_extend_area_map+0x2b/0xc0
pcpu_alloc+0x3ec/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
__do_tune_cpucache+0x2c/0x220
do_tune_cpucache+0x26/0xc0
enable_cpucache+0x6d/0xf0
__kmem_cache_create+0x1bf/0x390
create_cache+0xba/0x1b0
kmem_cache_create+0x1f8/0x2b0
ksm_init+0x6f/0x19d
do_one_initcall+0x50/0x1b0
kernel_init_freeable+0x201/0x289
kernel_init+0xa/0x100
ret_from_fork+0x3a/0x50
INITIAL USE at:
__mutex_lock+0x4e/0x8c0
pcpu_alloc+0x1ac/0x5e0
alloc_kmem_cache_cpus.isra.70+0x25/0xa0
setup_cpu_cache+0x2f/0x1f0
__kmem_cache_create+0x1bf/0x390
create_boot_cache+0x8b/0xb1
kmem_cache_init+0xa1/0x19e
start_kernel+0x270/0x4cb
x86_64_start_kernel+0x127/0x134
secondary_startup_64+0xa5/0xb0
}
... key at: [<
ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
... acquired at:
pcpu_alloc+0x1ac/0x5e0
__percpu_counter_init+0x4e/0xb0
btrfs_init_fs_root+0x99/0x1c0 [btrfs]
btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
resolve_indirect_refs+0x130/0x830 [btrfs]
find_parent_nodes+0x69e/0xff0 [btrfs]
btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
btrfs_find_all_roots+0x50/0x70 [btrfs]
btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
transaction_kthread+0x176/0x1b0 [btrfs]
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
-> (&fs_info->commit_root_sem){++++..} ops:
1566382 {
HARDIRQ-ON-W at:
down_write+0x3e/0xa0
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
cow_file_range.isra.66+0x133/0x470 [btrfs]
run_delalloc_range+0x121/0x410 [btrfs]
writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
__extent_writepage+0x19a/0x360 [btrfs]
extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
extent_writepages+0x4d/0x60 [btrfs]
do_writepages+0x1a/0x70
__filemap_fdatawrite_range+0xa7/0xe0
btrfs_rename+0x5ee/0xdb0 [btrfs]
vfs_rename+0x52a/0x7e0
SyS_rename+0x351/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
HARDIRQ-ON-R at:
down_read+0x35/0x90
caching_thread+0x57/0x560 [btrfs]
normal_work_helper+0x1c0/0x5e0 [btrfs]
process_one_work+0x1e0/0x5c0
worker_thread+0x44/0x390
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
SOFTIRQ-ON-W at:
down_write+0x3e/0xa0
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
cow_file_range.isra.66+0x133/0x470 [btrfs]
run_delalloc_range+0x121/0x410 [btrfs]
writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
__extent_writepage+0x19a/0x360 [btrfs]
extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
extent_writepages+0x4d/0x60 [btrfs]
do_writepages+0x1a/0x70
__filemap_fdatawrite_range+0xa7/0xe0
btrfs_rename+0x5ee/0xdb0 [btrfs]
vfs_rename+0x52a/0x7e0
SyS_rename+0x351/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-R at:
down_read+0x35/0x90
caching_thread+0x57/0x560 [btrfs]
normal_work_helper+0x1c0/0x5e0 [btrfs]
process_one_work+0x1e0/0x5c0
worker_thread+0x44/0x390
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
INITIAL USE at:
down_write+0x3e/0xa0
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
cow_file_range.isra.66+0x133/0x470 [btrfs]
run_delalloc_range+0x121/0x410 [btrfs]
writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
__extent_writepage+0x19a/0x360 [btrfs]
extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
extent_writepages+0x4d/0x60 [btrfs]
do_writepages+0x1a/0x70
__filemap_fdatawrite_range+0xa7/0xe0
btrfs_rename+0x5ee/0xdb0 [btrfs]
vfs_rename+0x52a/0x7e0
SyS_rename+0x351/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<
ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
... acquired at:
cache_block_group+0x287/0x420 [btrfs]
find_free_extent+0x106c/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
btrfs_create_tree+0xbb/0x2a0 [btrfs]
btrfs_create_uuid_tree+0x37/0x140 [btrfs]
open_ctree+0x23c0/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
-> (&found->groups_sem){++++..} ops:
2134587 {
HARDIRQ-ON-W at:
down_write+0x3e/0xa0
__link_block_group+0x34/0x130 [btrfs]
btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
open_ctree+0x2054/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
HARDIRQ-ON-R at:
down_read+0x35/0x90
btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
open_ctree+0x207b/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-W at:
down_write+0x3e/0xa0
__link_block_group+0x34/0x130 [btrfs]
btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
open_ctree+0x2054/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-R at:
down_read+0x35/0x90
btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
open_ctree+0x207b/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
INITIAL USE at:
down_write+0x3e/0xa0
__link_block_group+0x34/0x130 [btrfs]
btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
open_ctree+0x2054/0x2660 [btrfs]
btrfs_mount+0xd36/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
btrfs_mount+0x18c/0xf90 [btrfs]
mount_fs+0x3a/0x160
vfs_kern_mount+0x66/0x150
do_mount+0x1c1/0xcc0
SyS_mount+0x7e/0xd0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<
ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
... acquired at:
find_free_extent+0xcb4/0x12d0 [btrfs]
btrfs_reserve_extent+0xd8/0x170 [btrfs]
btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
__btrfs_cow_block+0x110/0x5b0 [btrfs]
btrfs_cow_block+0xd7/0x290 [btrfs]
btrfs_search_slot+0x1f6/0x960 [btrfs]
btrfs_lookup_inode+0x2a/0x90 [btrfs]
__btrfs_update_delayed_inode+0x65/0x210 [btrfs]
btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
evict+0xc4/0x190
__dentry_kill+0xbf/0x170
dput+0x2ae/0x2f0
SyS_rename+0x2a6/0x3b0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
-> (&delayed_node->mutex){+.+.-.} ops:
5580204 {
HARDIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
btrfs_update_inode+0x83/0x110 [btrfs]
btrfs_dirty_inode+0x62/0xe0 [btrfs]
touch_atime+0x8c/0xb0
do_generic_file_read+0x818/0xb10
__vfs_read+0xdc/0x150
vfs_read+0x8a/0x130
SyS_read+0x45/0xa0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
SOFTIRQ-ON-W at:
__mutex_lock+0x4e/0x8c0
btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
btrfs_update_inode+0x83/0x110 [btrfs]
btrfs_dirty_inode+0x62/0xe0 [btrfs]
touch_atime+0x8c/0xb0
do_generic_file_read+0x818/0xb10
__vfs_read+0xdc/0x150
vfs_read+0x8a/0x130
SyS_read+0x45/0xa0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
IN-RECLAIM_FS-W at:
__mutex_lock+0x4e/0x8c0
__btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
btrfs_evict_inode+0x22c/0x6a0 [btrfs]
evict+0xc4/0x190
dispose_list+0x35/0x50
prune_icache_sb+0x42/0x50
super_cache_scan+0x139/0x190
shrink_slab+0x262/0x5b0
shrink_node+0x2eb/0x2f0
kswapd+0x2eb/0x890
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
INITIAL USE at:
__mutex_lock+0x4e/0x8c0
btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
btrfs_update_inode+0x83/0x110 [btrfs]
btrfs_dirty_inode+0x62/0xe0 [btrfs]
touch_atime+0x8c/0xb0
do_generic_file_read+0x818/0xb10
__vfs_read+0xdc/0x150
vfs_read+0x8a/0x130
SyS_read+0x45/0xa0
do_syscall_64+0x79/0x1e0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<
ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
... acquired at:
__lock_acquire+0x264/0x11c0
lock_acquire+0xbd/0x1e0
__mutex_lock+0x4e/0x8c0
__btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
btrfs_evict_inode+0x22c/0x6a0 [btrfs]
evict+0xc4/0x190
dispose_list+0x35/0x50
prune_icache_sb+0x42/0x50
super_cache_scan+0x139/0x190
shrink_slab+0x262/0x5b0
shrink_node+0x2eb/0x2f0
kswapd+0x2eb/0x890
kthread+0x102/0x140
ret_from_fork+0x3a/0x50
stack backtrace:
CPU: 1 PID: 50 Comm: kswapd0 Tainted: G W 4.12.14-kvmsmall #8 SLE15 (unreleased)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x78/0xb7
print_irq_inversion_bug.part.38+0x19f/0x1aa
check_usage_forwards+0x102/0x120
? ret_from_fork+0x3a/0x50
? check_usage_backwards+0x110/0x110
mark_lock+0x16c/0x270
__lock_acquire+0x264/0x11c0
? pagevec_lookup_entries+0x1a/0x30
? truncate_inode_pages_range+0x2b3/0x7f0
lock_acquire+0xbd/0x1e0
? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
__mutex_lock+0x4e/0x8c0
? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
__btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
btrfs_evict_inode+0x22c/0x6a0 [btrfs]
evict+0xc4/0x190
dispose_list+0x35/0x50
prune_icache_sb+0x42/0x50
super_cache_scan+0x139/0x190
shrink_slab+0x262/0x5b0
shrink_node+0x2eb/0x2f0
kswapd+0x2eb/0x890
kthread+0x102/0x140
? mem_cgroup_shrink_node+0x2c0/0x2c0
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x3a/0x50
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Mon, 26 Mar 2018 22:59:12 +0000 (23:59 +0100)]
Btrfs: fix copy_items() return value when logging an inode
[ Upstream commit
8434ec46c6e3232cebc25a910363b29f5c617820 ]
When logging an inode, at tree-log.c:copy_items(), if we call
btrfs_next_leaf() at the loop which checks for the need to log holes, we
need to make sure copy_items() returns the value 1 to its caller and
not 0 (on success). This is because the path the caller passed was
released and is now different from what is was before, and the caller
expects a return value of 0 to mean both success and that the path
has not changed, while a return value of 1 means both success and
signals the caller that it can not reuse the path, it has to perform
another tree search.
Even though this is a case that should not be triggered on normal
circumstances or very rare at least, its consequences can be very
unpredictable (especially when replaying a log tree).
Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qu Wenruo [Tue, 27 Mar 2018 12:44:18 +0000 (20:44 +0800)]
btrfs: tests/qgroup: Fix wrong tree backref level
[ Upstream commit
3c0efdf03b2d127f0e40e30db4e7aa0429b1b79a ]
The extent tree of the test fs is like the following:
BTRFS info (device (null)): leaf
16327509003777336587 total ptrs 1 free space 3919
item 0 key (4096 168 4096) itemoff 3944 itemsize 51
extent refs 1 gen 1 flags 2
tree block key (
68719476736 0 0) level 1
^^^^^^^
ref#0: tree block backref root 5
And it's using an empty tree for fs tree, so there is no way that its
level can be 1.
For REAL (created by mkfs) fs tree backref with no skinny metadata, the
result should look like:
item 3 key (
30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
refs 1 gen 4 flags TREE_BLOCK
tree block key (256 INODE_ITEM 0) level 0
^^^^^^^
tree block backref root 5
Fix the level to 0, so it won't break later tree level checker.
Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicholas Piggin [Mon, 26 Mar 2018 15:01:16 +0000 (01:01 +1000)]
powerpc/64s: sreset panic if there is no debugger or crash dump handlers
[ Upstream commit
d40b6768e45bd9213139b2d91d30c7692b6007b1 ]
system_reset_exception does most of its own crash handling now,
invoking the debugger or crash dumps if they are registered. If not,
then it goes through to die() to print stack traces, and then is
supposed to panic (according to comments).
However after die() prints oopses, it does its own handling which
doesn't allow system_reset_exception to panic (e.g., it may just
kill the current process). This patch causes sreset exceptions to
return from die after it prints messages but before acting.
This also stops die from invoking the debugger on 0x100 crashes.
system_reset_exception similarly calls the debugger. It had been
thought this was harmless (because if the debugger was disabled,
neither call would fire, and if it was enabled the first call
would return). However in some cases like xmon 'X' command, the
debugger returns 0, which currently causes it to be entered
again (first in system_reset_exception, then in die), which is
confusing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Scott Branden [Sat, 31 Mar 2018 17:54:09 +0000 (13:54 -0400)]
bnxt_en: fix clear flags in ethtool reset handling
[ Upstream commit
2373d8d6a7932d28b8e31ea2a70bf6c002d97ac8 ]
Clear flags when reset command processed successfully for components
specified.
Fixes: 6502ad5963a5 ("bnxt_en: Add ETH_RESET_AP support")
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Florian Fainelli [Sun, 1 Apr 2018 17:26:29 +0000 (10:26 -0700)]
net: bgmac: Correctly annotate register space
[ Upstream commit
16a1c0646e55c3345bce8e4edfc06ad119d27c04 ]
All the members: base, idm_base and nicpm_base should be annotated with
__iomem since they are pointers to register space. This fixes a bunch of
sparse reported warnings.
Fixes: f6a95a24957a ("net: ethernet: bgmac: Add platform device support")
Fixes: dd5c5d037f5e ("net: ethernet: bgmac: add NS2 support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Florian Fainelli [Sun, 1 Apr 2018 17:26:30 +0000 (10:26 -0700)]
net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
[ Upstream commit
60d6e6f0b9e422dd01aeda39257ee0428e5e2a3f ]
bgmac_dma_tx_ring_free() assigns the ctl1 word which is a litle endian
32-bit word without using proper accessors, fix this, and because a
length cannot be negative, use unsigned int while at it.
Fixes: 9cde94506eac ("bgmac: implement scatter/gather support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Fri, 16 Mar 2018 12:10:15 +0000 (15:10 +0300)]
platform/x86: dell-smbios: Fix memory leaks in build_tokens_sysfs()
[ Upstream commit
0e5b09b165510e2ea5c526e962c4edadd849ef4c ]
We're freeing "value_name" which is NULL, so that's a no-op, but we
intended to free "location_name" instead. And then we don't free the
names in token_location_attrs[0] and token_value_attrs[0].
Fixes: 33b9ca1e53b4 ("platform/x86: dell-smbios: Add a sysfs interface for SMBIOS tokens")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrea Parri [Fri, 9 Mar 2018 12:13:20 +0000 (13:13 +0100)]
riscv/spinlock: Strengthen implementations with fences
[ Upstream commit
0123f4d76ca63b7b895f40089be0ce4809e392d8 ]
Current implementations map locking operations using .rl and .aq
annotations. However, this mapping is unsound w.r.t. the kernel
memory consistency model (LKMM) [1]:
Referring to the "unlock-lock-read-ordering" test reported below,
Daniel wrote:
"I think an RCpc interpretation of .aq and .rl would in fact
allow the two normal loads in P1 to be reordered [...]
The intuition would be that the amoswap.w.aq can forward from
the amoswap.w.rl while that's still in the store buffer, and
then the lw x3,0(x4) can also perform while the amoswap.w.rl
is still in the store buffer, all before the l1 x1,0(x2)
executes. That's not forbidden unless the amoswaps are RCsc,
unless I'm missing something.
Likewise even if the unlock()/lock() is between two stores.
A control dependency might originate from the load part of
the amoswap.w.aq, but there still would have to be something
to ensure that this load part in fact performs after the store
part of the amoswap.w.rl performs globally, and that's not
automatic under RCpc."
Simulation of the RISC-V memory consistency model confirmed this
expectation.
In order to "synchronize" LKMM and RISC-V's implementation, this
commit strengthens the implementations of the locking operations
by replacing .rl and .aq with the use of ("lightweigth") fences,
resp., "fence rw, w" and "fence r , rw".
C unlock-lock-read-ordering
{}
/* s initially owned by P1 */
P0(int *x, int *y)
{
WRITE_ONCE(*x, 1);
smp_wmb();
WRITE_ONCE(*y, 1);
}
P1(int *x, int *y, spinlock_t *s)
{
int r0;
int r1;
r0 = READ_ONCE(*y);
spin_unlock(s);
spin_lock(s);
r1 = READ_ONCE(*x);
}
exists (1:r0=1 /\ 1:r1=0)
[1] https://marc.info/?l=linux-kernel&m=
151930201102853&w=2
https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM
https://marc.info/?l=linux-kernel&m=
151633436614259&w=2
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jade Alglave <j.alglave@ucl.ac.uk>
Cc: Luc Maranget <luc.maranget@inria.fr>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Akira Yokosawa <akiyks@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-riscv@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David S. Miller [Tue, 3 Apr 2018 15:24:35 +0000 (08:24 -0700)]
sparc64: Make atomic_xchg() an inline function rather than a macro.
[ Upstream commit
d13864b68e41c11e4231de90cf358658f6ecea45 ]
This avoids a lot of -Wunused warnings such as:
====================
kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value]
#define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
./arch/sparc/include/asm/atomic_64.h:86:30: note: in expansion of macro ‘xchg’
#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
^~~~
kernel/debug/debug_core.c:508:4: note: in expansion of macro ‘atomic_xchg’
atomic_xchg(&kgdb_active, cpu);
^~~~~~~~~~~
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Geert Uytterhoeven [Thu, 29 Mar 2018 16:53:32 +0000 (18:53 +0200)]
dmaengine: rcar-dmac: Fix too early/late system suspend/resume callbacks
[ Upstream commit
73dcc666d6bd0db56cd556010f93d8f04c1cc70c ]
If serial console wake-up is enabled ("echo enabled >
/sys/.../ttySC0/power/wakeup"), and any serial input is received while
the system is suspended, serial port input no longer works after system
resume.
Note that:
1) The system can still be woken up using the serial console,
2) Serial port input keeps working if the system is woken up in some
other way (e.g. Wake-on-LAN or gpio-keys), and no serial input was
received while suspended.
To fix this, replace SET_LATE_SYSTEM_SLEEP_PM_OPS() by
SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(), as the callbacks installed by the
former happen too early resp. late in the suspend resp. resume process.
Reported-by: RVC test team via Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: 1131b0a4af911de5 ("dmaengine: rcar-dmac: Make DMAC reinit during system resume explicit")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Howells [Wed, 4 Apr 2018 12:41:26 +0000 (13:41 +0100)]
fscache: Fix hanging wait on page discarded by writeback
[ Upstream commit
2c98425720233ae3e135add0c7e869b32913502f ]
If the fscache asynchronous write operation elects to discard a page that's
pending storage to the cache because the page would be over the store limit
then it needs to wake the page as someone may be waiting on completion of
the write.
The problem is that the store limit may be updated by a different
asynchronous operation - and so may miss the write - and that the store
limit may not even get updated until later by the netfs.
Fix the kernel hang by making fscache_write_op() mark as written any pages
that are over the limit.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Graf [Tue, 3 Apr 2018 22:19:35 +0000 (00:19 +0200)]
lan78xx: Connect phy early
[ Upstream commit
92571a1aae40d291158d16e7142637908220f470 ]
When using wicked with a lan78xx device attached to the system, we
end up with ethtool commands issued on the device before an ifup
got issued. That lead to the following crash:
Unable to handle kernel NULL pointer dereference at virtual address
0000039c
pgd =
ffff800035b30000
[
0000039c] *pgd=
0000000000000000
Internal error: Oops:
96000004 [#1] SMP
Modules linked in: [...]
Supported: Yes
CPU: 3 PID: 638 Comm: wickedd Tainted: G E 4.12.14-0-default #1
Hardware name: raspberrypi rpi/rpi, BIOS 2018.03-rc2 02/21/2018
task:
ffff800035e74180 task.stack:
ffff800036718000
PC is at phy_ethtool_ksettings_get+0x20/0x98
LR is at lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
pc : [<
ffff0000086f7f30>] lr : [<
ffff000000dcca84>] pstate:
20000005
sp :
ffff80003671bb20
x29:
ffff80003671bb20 x28:
ffff800035e74180
x27:
ffff000008912000 x26:
000000000000001d
x25:
0000000000000124 x24:
ffff000008f74d00
x23:
0000004000114809 x22:
0000000000000000
x21:
ffff80003671bbd0 x20:
0000000000000000
x19:
ffff80003671bbd0 x18:
000000000000040d
x17:
0000000000000001 x16:
0000000000000000
x15:
0000000000000000 x14:
ffffffffffffffff
x13:
0000000000000000 x12:
0000000000000020
x11:
0101010101010101 x10:
fefefefefefefeff
x9 :
7f7f7f7f7f7f7f7f x8 :
fefefeff31677364
x7 :
0000000080808080 x6 :
ffff80003671bc9c
x5 :
ffff80003671b9f8 x4 :
ffff80002c296190
x3 :
0000000000000000 x2 :
0000000000000000
x1 :
ffff80003671bbd0 x0 :
ffff80003671bc00
Process wickedd (pid: 638, stack limit = 0xffff800036718000)
Call trace:
Exception stack(0xffff80003671b9e0 to 0xffff80003671bb20)
b9e0:
ffff80003671bc00 ffff80003671bbd0 0000000000000000 0000000000000000
ba00:
ffff80002c296190 ffff80003671b9f8 ffff80003671bc9c 0000000080808080
ba20:
fefefeff31677364 7f7f7f7f7f7f7f7f fefefefefefefeff 0101010101010101
ba40:
0000000000000020 0000000000000000 ffffffffffffffff 0000000000000000
ba60:
0000000000000000 0000000000000001 000000000000040d ffff80003671bbd0
ba80:
0000000000000000 ffff80003671bbd0 0000000000000000 0000004000114809
baa0:
ffff000008f74d00 0000000000000124 000000000000001d ffff000008912000
bac0:
ffff800035e74180 ffff80003671bb20 ffff000000dcca84 ffff80003671bb20
bae0:
ffff0000086f7f30 0000000020000005 ffff80002c296000 ffff800035223900
bb00:
0000ffffffffffff 0000000000000000 ffff80003671bb20 ffff0000086f7f30
[<
ffff0000086f7f30>] phy_ethtool_ksettings_get+0x20/0x98
[<
ffff000000dcca84>] lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
[<
ffff0000087cbc40>] ethtool_get_settings+0x68/0x210
[<
ffff0000087cc0d4>] dev_ethtool+0x214/0x2180
[<
ffff0000087e5008>] dev_ioctl+0x400/0x630
[<
ffff00000879dd00>] sock_do_ioctl+0x70/0x88
[<
ffff00000879f5f8>] sock_ioctl+0x208/0x368
[<
ffff0000082cde10>] do_vfs_ioctl+0xb0/0x848
[<
ffff0000082ce634>] SyS_ioctl+0x8c/0xa8
Exception stack(0xffff80003671bec0 to 0xffff80003671c000)
bec0:
0000000000000009 0000000000008946 0000fffff4e841d0 0000aa0032687465
bee0:
0000aaaafa2319d4 0000fffff4e841d4 0000000032687465 0000000032687465
bf00:
000000000000001d 7f7fff7f7f7f7f7f 72606b622e71ff4c 7f7f7f7f7f7f7f7f
bf20:
0101010101010101 0000000000000020 ffffffffffffffff 0000ffff7f510c68
bf40:
0000ffff7f6a9d18 0000ffff7f44ce30 000000000000040d 0000ffff7f6f98f0
bf60:
0000fffff4e842c0 0000000000000001 0000aaaafa2c2e00 0000ffff7f6ab000
bf80:
0000fffff4e842c0 0000ffff7f62a000 0000aaaafa2b9f20 0000aaaafa2c2e00
bfa0:
0000fffff4e84818 0000fffff4e841a0 0000ffff7f5ad0cc 0000fffff4e841a0
bfc0:
0000ffff7f44ce3c 0000000080000000 0000000000000009 000000000000001d
bfe0:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
The culprit is quite simple: The driver tries to access the phy left and right,
but only actually has a working reference to it when the device is up.
The fix thus is quite simple too: Get a reference to the phy on probe already
and keep it even when the device is going down.
With this patch applied, I can successfully run wicked on my system and bring
the interface up and down as many times as I want, without getting NULL pointer
dereferences in between.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sean Christopherson [Fri, 23 Mar 2018 16:34:00 +0000 (09:34 -0700)]
KVM: VMX: raise internal error for exception during invalid protected mode state
[ Upstream commit
add5ff7a216ee545a214013f26d1ef2f44a9c9f8 ]
Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
an exception in Protected Mode while emulating guest due to invalid
guest state. Unlike Big RM, KVM doesn't support emulating exceptions
in PM, i.e. PM exceptions are always injected via the VMCS. Because
we will never do VMRESUME due to emulation_required, the exception is
never realized and we'll keep emulating the faulting instruction over
and over until we receive a signal.
Exit to userspace iff there is a pending exception, i.e. don't exit
simply on a requested event. The purpose of this check and exit is to
aid in debugging a guest that is in all likelihood already doomed.
Invalid guest state in PM is extremely limited in normal operation,
e.g. it generally only occurs for a few instructions early in BIOS,
and any exception at this time is all but guaranteed to be fatal.
Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
handled/emulated, while checking for vectored interrupts, e.g. INTR
and NMI, without hitting false positives would add a fair amount of
complexity for almost no benefit (getting hit by lightning seems
more likely than encountering this specific scenario).
Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
exception via the VMCS and emulation_required is true.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sai Praneeth [Wed, 4 Apr 2018 19:34:19 +0000 (12:34 -0700)]
x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
[ Upstream commit
162ee5a8ab49be40d253f90e94aef712470a3a24 ]
Linus reported the following boot warning:
WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134 load_new_mm_cr3+0x114/0x170
[...]
Call Trace:
switch_mm_irqs_off+0x267/0x590
switch_mm+0xe/0x20
efi_switch_mm+0x3e/0x50
efi_enter_virtual_mode+0x43f/0x4da
start_kernel+0x3bf/0x458
secondary_startup_64+0xa5/0xb0
... after merging:
03781e40890c: x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
When the platform supports PCID and if CONFIG_DEBUG_VM=y is enabled,
build_cr3_noflush() (called via switch_mm()) does a sanity check to see
if X86_FEATURE_PCID is set.
Presently, build_cr3_noflush() uses "this_cpu_has(X86_FEATURE_PCID)" to
perform the check but this_cpu_has() works only after SMP is initialized
(i.e. per cpu cpu_info's should be populated) and this happens to be very
late in the boot process (during rest_init()).
As efi_runtime_services() are called during (early) kernel boot time
and run time, modify build_cr3_noflush() to use boot_cpu_has() all the
time. As suggested by Dave Hansen, this should be OK because all CPU's have
same capabilities on x86.
With this change the warning is fixed.
( Dave also suggested that we put a warning in this_cpu_has() if it's used
early in the boot process. This is still work in progress as it affects
MCE. )
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Lee Chun-Yi <jlee@suse.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1522870459-7432-1-git-send-email-sai.praneeth.prakhya@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davidlohr Bueso [Mon, 2 Apr 2018 16:49:54 +0000 (09:49 -0700)]
sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
[ Upstream commit
d29a20645d5e929aa7e8616f28e5d8e1c49263ec ]
While running rt-tests' pi_stress program I got the following splat:
rq->clock_update_flags < RQCF_ACT_SKIP
WARNING: CPU: 27 PID: 0 at kernel/sched/sched.h:960 assert_clock_updated.isra.38.part.39+0x13/0x20
[...]
<IRQ>
enqueue_top_rt_rq+0xf4/0x150
? cpufreq_dbs_governor_start+0x170/0x170
sched_rt_rq_enqueue+0x65/0x80
sched_rt_period_timer+0x156/0x360
? sched_rt_rq_enqueue+0x80/0x80
__hrtimer_run_queues+0xfa/0x260
hrtimer_interrupt+0xcb/0x220
smp_apic_timer_interrupt+0x62/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
[...]
do_idle+0x183/0x1e0
cpu_startup_entry+0x5f/0x70
start_secondary+0x192/0x1d0
secondary_startup_64+0xa5/0xb0
We can get rid of it be the "traditional" means of adding an
update_rq_clock() call after acquiring the rq->lock in
do_sched_rt_period_timer().
The case for the RT task throttling (which this workload also hits)
can be ignored in that the skip_update call is actually bogus and
quite the contrary (the request bits are removed/reverted).
By setting RQCF_UPDATED we really don't care if the skip is happening
or not and will therefore make the assert_clock_updated() check happy.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: linux-kernel@vger.kernel.org
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20180402164954.16255-1-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicholas Piggin [Thu, 5 Apr 2018 06:10:00 +0000 (16:10 +1000)]
powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
[ Upstream commit
c1b25a17d24925b0961c319cfc3fd7e1dc778914 ]
POWER8 restores AMOR when waking from deep sleep, but POWER9 does not,
because it does not go through the subcore restore.
Have POWER9 restore it in core restore.
Fixes: ee97b6b99f42 ("powerpc/mm/radix: Setup AMOR in HV mode to allow key 0")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jun Piao [Thu, 5 Apr 2018 23:18:48 +0000 (16:18 -0700)]
ocfs2/dlm: don't handle migrate lockres if already in shutdown
[ Upstream commit
bb34f24c7d2c98d0c81838a7700e6068325b17a0 ]
We should not handle migrate lockres if we are already in
'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
dlm domain. At last other nodes will get stuck into infinite loop when
requsting lock from us.
The problem is caused by concurrency umount between nodes. Before
receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
migrate target. So N2 will continue sending lockres to N1 even though
N1 has left domain.
N1 N2 (owner)
touch file
access the file,
and get pr lock
begin leave domain and
pick up N1 as new owner
begin leave domain and
migrate all lockres done
begin migrate lockres to N1
end leave domain, but
the lockres left
unexpectedly, because
migrate task has passed
[piaojun@huawei.com: v3]
Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mikhail Malygin [Mon, 2 Apr 2018 09:26:59 +0000 (12:26 +0300)]
IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
[ Upstream commit
efc365e7290d040fbd43f60b0e97653489a739d4 ]
On ppc64le arch rxe_add command causes oops in kernel log:
[ 92.495140] Oops: Kernel access of bad area, sig: 11 [#1]
[ 92.499710] SMP NR_CPUS=2048 NUMA pSeries
[ 92.499792] Modules linked in: ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) iptable
_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E)
nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) af_packet(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) i
b_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) bochs_drm(E) tt
m(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) drm(E) agpgart(E) virtio_rng(E) virtio_console(E) rtc_
generic(E) dm_ec(OEN) ttln_rdma(OEN) rdma_cm(E) configfs(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_core(E) ql
a2xxx(E)
[ 92.499832] scsi_transport_fc(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E) ipmi_watchdog(E) ipmi_ssif(E) ipmi_poweroff(E) ipmi_powernv(EX) ipmi_devintf(E) ipmi_msghandler(E) dummy(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_service_time(E) scsi_transport_iscsi(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) virtio_blk(E) virtio_scsi(E) virtio_net(E) ibmvscsi(EX) scsi_transport_srp(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_ring(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
[ 92.499834] Supported: No, Unsupported modules are loaded
[ 92.499839] CPU: 3 PID: 5576 Comm: sh Tainted: G OE NX 4.4.120-ttln.17-default #1
[ 92.499841] task:
c0000000afe8a490 ti:
c0000000beba8000 task.ti:
c0000000beba8000
[ 92.499842] NIP:
c00000000008ba3c LR:
c000000000027644 CTR:
c00000000008ba10
[ 92.499844] REGS:
c0000000bebab750 TRAP: 0300 Tainted: G OE NX (4.4.120-ttln.17-default)
[ 92.499850] MSR:
8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR:
28424428 XER:
20000000
[ 92.499871] CFAR:
0000000000002424 DAR:
0000000000000208 DSISR:
40000000 SOFTE: 1
GPR00:
c000000000027644 c0000000bebab9d0 c000000000f09700 0000000000000000
GPR04:
d0000000043d7192 0000000000000002 000000000000001a fffffffffffffffe
GPR08:
000000000000009c c00000000008ba10 d0000000043e5848 d0000000043d3828
GPR12:
c00000000008ba10 c000000007a02400 0000000010062e38 0000010020388860
GPR16:
0000000000000000 0000000000000000 00000100203885f0 00000000100f6c98
GPR20:
c0000000b3f1fcc0 c0000000b3f1fc48 c0000000b3f1fbd0 c0000000b3f1fb58
GPR24:
c0000000b3f1fae0 c0000000b3f1fa68 00000000000005dc c0000000b3f1f9f0
GPR28:
d0000000043e5848 c0000000b3f1f900 c0000000b3f1f320 c0000000b3f1f000
[ 92.499881] NIP [
c00000000008ba3c] dma_get_required_mask_pSeriesLP+0x2c/0x1a0
[ 92.499885] LR [
c000000000027644] dma_get_required_mask+0x44/0xac
[ 92.499886] Call Trace:
[ 92.499891] [
c0000000bebab9d0] [
c0000000bebaba30] 0xc0000000bebaba30 (unreliable)
[ 92.499894] [
c0000000bebaba10] [
c000000000027644] dma_get_required_mask+0x44/0xac
[ 92.499904] [
c0000000bebaba30] [
d0000000043cb4b4] rxe_register_device+0xc4/0x430 [rdma_rxe]
[ 92.499910] [
c0000000bebabab0] [
d0000000043c06c8] rxe_add+0x448/0x4e0 [rdma_rxe]
[ 92.499915] [
c0000000bebabb30] [
d0000000043d28dc] rxe_net_add+0x4c/0xf0 [rdma_rxe]
[ 92.499921] [
c0000000bebabb60] [
d0000000043d305c] rxe_param_set_add+0x6c/0x1ac [rdma_rxe]
[ 92.499924] [
c0000000bebabbf0] [
c0000000000e78c0] param_attr_store+0xa0/0x180
[ 92.499927] [
c0000000bebabc70] [
c0000000000e6448] module_attr_store+0x48/0x70
[ 92.499932] [
c0000000bebabc90] [
c000000000391f60] sysfs_kf_write+0x70/0xb0
[ 92.499935] [
c0000000bebabcb0] [
c000000000390f1c] kernfs_fop_write+0x18c/0x1e0
[ 92.499939] [
c0000000bebabd00] [
c0000000002e22ac] __vfs_write+0x4c/0x1d0
[ 92.499942] [
c0000000bebabd90] [
c0000000002e2f94] vfs_write+0xc4/0x200
[ 92.499945] [
c0000000bebabde0] [
c0000000002e488c] SyS_write+0x6c/0x110
[ 92.499948] [
c0000000bebabe30] [
c000000000009384] system_call+0x38/0xe4
[ 92.499949] Instruction dump:
[ 92.499954]
4e800020 3c4c00e8 3842dcf0 7c0802a6 f8010010 60000000 7c0802a6 fba1ffe8
[ 92.499958]
fbc1fff0 fbe1fff8 f8010010 f821ffc1 <
e9230208>
7c7e1b78 2fa90000 419e0078
[ 92.499962] ---[ end trace
bed077e15eb420cf ]---
It fails in dma_get_required_mask, that has ppc-specific implementation,
and fail if provided device argument is NULL
Signed-off-by: Mikhail Malygin <mikhail@malygin.me>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nikolay Borisov [Thu, 5 Apr 2018 07:40:15 +0000 (10:40 +0300)]
btrfs: Fix possible softlock on single core machines
[ Upstream commit
1e1c50a929bc9e49bc3f9935b92450d9e69f8158 ]
do_chunk_alloc implements a loop checking whether there is a pending
chunk allocation and if so causes the caller do loop. Generally this
loop is executed only once, however testing with btrfs/072 on a single
core vm machines uncovered an extreme case where the system could loop
indefinitely. This is due to a missing cond_resched when loop which
doesn't give a chance to the previous chunk allocator finish its job.
The fix is to simply add the missing cond_resched.
Fixes: 6d74119f1a3e ("Btrfs: avoid taking the chunk_mutex in do_chunk_alloc")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Liu Bo [Mon, 2 Apr 2018 17:59:47 +0000 (01:59 +0800)]
Btrfs: fix NULL pointer dereference in log_dir_items
[ Upstream commit
80c0b4210a963e31529e15bf90519708ec947596 ]
0, 1 and <0 can be returned by btrfs_next_leaf(), and when <0 is
returned, path->nodes[0] could be NULL, log_dir_items lacks such a
check for <0 and we may run into a null pointer dereference panic.
Fixes: e02119d5a7b4 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Liu Bo [Mon, 2 Apr 2018 17:59:48 +0000 (01:59 +0800)]
Btrfs: bail out on error during replay_dir_deletes
[ Upstream commit
b98def7ca6e152ee55e36863dddf6f41f12d1dc6 ]
If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs
to bail out, otherwise @ret would be forced to be 0 after 'break;' and
the caller won't be aware of it.
Fixes: e02119d5a7b4 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yang Shi [Thu, 5 Apr 2018 23:22:35 +0000 (16:22 -0700)]
mm: thp: fix potential clearing to referenced flag in page_idle_clear_pte_refs_one()
[ Upstream commit
f0849ac0b8e072073ec5fcc7fadd05a77434364e ]
For PTE-mapped THP, the compound THP has not been split to normal 4K
pages yet, the whole THP is considered referenced if any one of sub page
is referenced.
When walking PTE-mapped THP by pvmw, all relevant PTEs will be checked
to retrieve referenced bit. But, the current code just returns the
result of the last PTE. If the last PTE has not referenced, the
referenced flag will be cleared.
Just set referenced when ptep{pmdp}_clear_young_notify() returns true.
Link: http://lkml.kernel.org/r/1518212451-87134-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reported-by: Gang Deng <gavin.dg@linux.alibaba.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Huang Ying [Thu, 5 Apr 2018 23:23:20 +0000 (16:23 -0700)]
mm: fix races between address_space dereference and free in page_evicatable
[ Upstream commit
e92bb4dd9673945179b1fc738c9817dd91bfb629 ]
When page_mapping() is called and the mapping is dereferenced in
page_evicatable() through shrink_active_list(), it is possible for the
inode to be truncated and the embedded address space to be freed at the
same time. This may lead to the following race.
CPU1 CPU2
truncate(inode) shrink_active_list()
... page_evictable(page)
truncate_inode_page(mapping, page);
delete_from_page_cache(page)
spin_lock_irqsave(&mapping->tree_lock, flags);
__delete_from_page_cache(page, NULL)
page_cache_tree_delete(..)
... mapping = page_mapping(page);
page->mapping = NULL;
...
spin_unlock_irqrestore(&mapping->tree_lock, flags);
page_cache_free_page(mapping, page)
put_page(page)
if (put_page_testzero(page)) -> false
- inode now has no pages and can be freed including embedded address_space
mapping_unevictable(mapping)
test_bit(AS_UNEVICTABLE, &mapping->flags);
- we've dereferenced mapping which is potentially already free.
Similar race exists between swap cache freeing and page_evicatable()
too.
The address_space in inode and swap cache will be freed after a RCU
grace period. So the races are fixed via enclosing the page_mapping()
and address_space usage in rcu_read_lock/unlock(). Some comments are
added in code to make it clear what is protected by the RCU read lock.
Link: http://lkml.kernel.org/r/20180212081227.1940-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Claudio Imbrenda [Thu, 5 Apr 2018 23:25:41 +0000 (16:25 -0700)]
mm/ksm: fix interaction with THP
[ Upstream commit
77da2ba0648a4fd52e5ff97b8b2b8dd312aec4b0 ]
This patch fixes a corner case for KSM. When two pages belong or
belonged to the same transparent hugepage, and they should be merged,
KSM fails to split the page, and therefore no merging happens.
This bug can be reproduced by:
* making sure ksm is running (in case disabling ksmtuned)
* enabling transparent hugepages
* allocating a THP-aligned 1-THP-sized buffer
e.g. on amd64: posix_memalign(&p, 1<<21, 1<<21)
* filling it with the same values
e.g. memset(p, 42, 1<<21)
* performing madvise to make it mergeable
e.g. madvise(p, 1<<21, MADV_MERGEABLE)
* waiting for KSM to perform a few scans
The expected outcome is that the all the pages get merged (1 shared and
the rest sharing); the actual outcome is that no pages get merged (1
unshared and the rest volatile)
The reason of this behaviour is that we increase the reference count
once for both pages we want to merge, but if they belong to the same
hugepage (or compound page), the reference counter used in both cases is
the one of the head of the compound page. This means that
split_huge_page will find a value of the reference counter too high and
will fail.
This patch solves this problem by testing if the two pages to merge
belong to the same hugepage when attempting to merge them. If so, the
hugepage is split safely. This means that the hugepage is not split if
not necessary.
Link: http://lkml.kernel.org/r/1521548069-24758-1-git-send-email-imbrenda@linux.vnet.ibm.com
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Co-authored-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Wed, 4 Apr 2018 10:40:07 +0000 (12:40 +0200)]
genirq/affinity: Don't return with empty affinity masks on error
[ Upstream commit
0211e12dd0a5385ecffd3557bc570dbad7fcf245 ]
When the allocation of node_to_possible_cpumask fails, then
irq_create_affinity_masks() returns with a pointer to the empty affinity
masks array, which will cause malfunction.
Reorder the allocations so the masks array allocation comes last and every
failure path returns NULL.
Fixes: 9a0ef98e186d ("genirq/affinity: Assign vectors to all present CPUs")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Falcon [Fri, 6 Apr 2018 23:37:03 +0000 (18:37 -0500)]
ibmvnic: Zero used TX descriptor counter on reset
[ Upstream commit
41f714672f93608751dbd2fa2291d476a8ff0150 ]
The counter that tracks used TX descriptors pending completion
needs to be zeroed as part of a device reset. This change fixes
a bug causing transmit queues to be stopped unnecessarily and in
some cases a transmit queue stall and timeout reset. If the counter
is not reset, the remaining descriptors will not be "removed",
effectively reducing queue capacity. If the queue is over half full,
it will cause the queue to stall if stopped.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Esben Haabendal [Sun, 8 Apr 2018 20:17:01 +0000 (22:17 +0200)]
dp83640: Ensure against premature access to PHY registers after reset
[ Upstream commit
76327a35caabd1a932e83d6a42b967aa08584e5d ]
The datasheet specifies a 3uS pause after performing a software
reset. The default implementation of genphy_soft_reset() does not
provide this, so implement soft_reset with the needed pause.
Signed-off-by: Esben Haabendal <eha@deif.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sandipan Das [Wed, 4 Apr 2018 18:04:18 +0000 (23:34 +0530)]
perf clang: Add support for recent clang versions
[ Upstream commit
7854e499f33fd9c7e63288692ffb754d9b1d02fd ]
The clang API calls used by perf have changed in recent releases and
builds succeed with libclang-3.9 only. This introduces compatibility
with libclang-4.0 and above.
Without this patch, we will see the following compilation errors with
libclang-4.0+:
util/c++/clang.cpp: In function ‘clang::CompilerInvocation* perf::createCompilerInvocation(llvm::opt::ArgStringList, llvm::StringRef&, clang::DiagnosticsEngine&)’:
util/c++/clang.cpp:62:33: error: ‘IK_C’ was not declared in this scope
Opts.Inputs.emplace_back(Path, IK_C);
^~~~
util/c++/clang.cpp: In function ‘std::unique_ptr<llvm::Module> perf::getModuleFromSource(llvm::opt::ArgStringList, llvm::StringRef, llvm::IntrusiveRefCntPtr<clang::vfs::FileSystem>)’:
util/c++/clang.cpp:75:26: error: no matching function for call to ‘clang::CompilerInstance::setInvocation(clang::CompilerInvocation*)’
Clang.setInvocation(&*CI);
^
In file included from util/c++/clang.cpp:14:0:
/usr/include/clang/Frontend/CompilerInstance.h:231:8: note: candidate: void clang::CompilerInstance::setInvocation(std::shared_ptr<clang::CompilerInvocation>)
void setInvocation(std::shared_ptr<CompilerInvocation> Value);
^~~~~~~~~~~~~
Committer testing:
Tested on Fedora 27 after installing the clang-devel and llvm-devel
packages, versions:
# rpm -qa | egrep llvm\|clang
llvm-5.0.1-6.fc27.x86_64
clang-libs-5.0.1-5.fc27.x86_64
clang-5.0.1-5.fc27.x86_64
clang-tools-extra-5.0.1-5.fc27.x86_64
llvm-libs-5.0.1-6.fc27.x86_64
llvm-devel-5.0.1-6.fc27.x86_64
clang-devel-5.0.1-5.fc27.x86_64
#
Make sure you don't have some older version lying around in /usr/local,
etc, then:
$ make LIBCLANGLLVM=1 -C tools/perf install-bin
And in the end perf will be linked agains these libraries:
# ldd ~/bin/perf | egrep -i llvm\|clang
libclangAST.so.5 => /lib64/libclangAST.so.5 (0x00007f8bb2eb4000)
libclangBasic.so.5 => /lib64/libclangBasic.so.5 (0x00007f8bb29e3000)
libclangCodeGen.so.5 => /lib64/libclangCodeGen.so.5 (0x00007f8bb23f7000)
libclangDriver.so.5 => /lib64/libclangDriver.so.5 (0x00007f8bb2060000)
libclangFrontend.so.5 => /lib64/libclangFrontend.so.5 (0x00007f8bb1d06000)
libclangLex.so.5 => /lib64/libclangLex.so.5 (0x00007f8bb1a3e000)
libclangTooling.so.5 => /lib64/libclangTooling.so.5 (0x00007f8bb17d4000)
libclangEdit.so.5 => /lib64/libclangEdit.so.5 (0x00007f8bb15c5000)
libclangSema.so.5 => /lib64/libclangSema.so.5 (0x00007f8bb0cc9000)
libclangAnalysis.so.5 => /lib64/libclangAnalysis.so.5 (0x00007f8bb0a23000)
libclangParse.so.5 => /lib64/libclangParse.so.5 (0x00007f8bb0725000)
libclangSerialization.so.5 => /lib64/libclangSerialization.so.5 (0x00007f8bb039a000)
libLLVM-5.0.so => /lib64/libLLVM-5.0.so (0x00007f8bace98000)
libclangASTMatchers.so.5 => /lib64/../lib64/libclangASTMatchers.so.5 (0x00007f8bab735000)
libclangFormat.so.5 => /lib64/../lib64/libclangFormat.so.5 (0x00007f8bab4b2000)
libclangRewrite.so.5 => /lib64/../lib64/libclangRewrite.so.5 (0x00007f8bab2a1000)
libclangToolingCore.so.5 => /lib64/../lib64/libclangToolingCore.so.5 (0x00007f8bab08e000)
#
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Fixes: 00b86691c77c ("perf clang: Add builtin clang support ant test case")
Link: http://lkml.kernel.org/r/20180404180419.19056-2-sandipan@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sandipan Das [Wed, 4 Apr 2018 18:04:17 +0000 (23:34 +0530)]
perf tools: Fix perf builds with clang support
[ Upstream commit
c2fb54a183cfe77c6fdc9d71e2d5299c1c302a6e ]
For libclang, some distro packages provide static libraries (.a) while
some provide shared libraries (.so). Currently, perf code can only be
linked with static libraries. This makes perf build possible for both
cases.
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Fixes: d58ac0bf8d1e ("perf build: Add clang and llvm compile and linking support")
Link: http://lkml.kernel.org/r/20180404180419.19056-1-sandipan@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Anshuman Khandual [Thu, 29 Mar 2018 06:23:37 +0000 (11:53 +0530)]
powerpc/fscr: Enable interrupts earlier before calling get_user()
[ Upstream commit
709b973c844c0b4d115ac3a227a2e5a68722c912 ]
The function get_user() can sleep while trying to fetch instruction
from user address space and causes the following warning from the
scheduler.
BUG: sleeping function called from invalid context
Though interrupts get enabled back but it happens bit later after
get_user() is called. This change moves enabling these interrupts
earlier covering the function get_user(). While at this, lets check
for kernel mode and crash as this interrupt should not have been
triggered from the kernel context.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shunyong Yang [Fri, 6 Apr 2018 02:43:49 +0000 (10:43 +0800)]
cpufreq: CPPC: Initialize shared perf capabilities of CPUs
[ Upstream commit
8913315e9459b146e5888ab5138e10daa061b885 ]
When multiple CPUs are related in one cpufreq policy, the first online
CPU will be chosen by default to handle cpufreq operations. Let's take
cpu0 and cpu1 as an example.
When cpu0 is offline, policy->cpu will be shifted to cpu1. cpu1's perf
capabilities should be initialized. Otherwise, perf capabilities are 0s
and speed change can not take effect.
This patch copies perf capabilities of the first online CPU to other
shared CPUs when policy shared type is CPUFREQ_SHARED_TYPE_ANY.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Carlos Maiolino [Wed, 11 Apr 2018 05:39:04 +0000 (22:39 -0700)]
Force log to disk before reading the AGF during a fstrim
[ Upstream commit
8c81dd46ef3c416b3b95e3020fb90dbd44e6140b ]
Forcing the log to disk after reading the agf is wrong, we might be
calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held.
This can cause a deadlock when racing a fstrim with a filesystem
shutdown.
The deadlock has been identified due a miscalculation bug in device-mapper
dm-thin, which returns lack of space to its users earlier than the device itself
really runs out of space, changing the device-mapper volume into an error state.
The problem happened while filling the filesystem with a single file,
triggering the bug in device-mapper, consequently causing an IO error
and shutting down the filesystem.
If such file is removed, and fstrim executed before the XFS finishes the
shut down process, the fstrim process will end up holding the buffer
lock, and going to sleep on the cil wait queue.
At this point, the shut down process will try to wake up all the threads
waiting on the cil wait queue, but for this, it will try to hold the
same buffer log already held my the fstrim, locking up the filesystem.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jens Axboe [Wed, 11 Apr 2018 17:26:09 +0000 (11:26 -0600)]
sr: get/drop reference to device in revalidate and check_events
[ Upstream commit
2d097c50212e137e7b53ffe3b37561153eeba87d ]
We can't just use scsi_cd() to get the scsi_cd structure, we have
to grab a live reference to the device. For both callbacks, we're
not inside an open where we already hold a reference to the device.
This fixes device removal/addition under concurrent device access,
which otherwise could result in the below oops.
NULL pointer dereference at
0000000000000010
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
sr 12:0:0:0: [sr2] scsi-1 drive
scsi_debug crc_t10dif crct10dif_generic crct10dif_common nvme nvme_core sb_edac xl
sr 12:0:0:0: Attached scsi CD-ROM sr2
sr_mod cdrom btrfs xor zstd_decompress zstd_compress xxhash lzo_compress zlib_defc
sr 12:0:0:0: Attached scsi generic sg7 type 5
igb ahci libahci i2c_algo_bit libata dca [last unloaded: crc_t10dif]
CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ #650
Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
RSP: 0018:
ffff883ff357bb58 EFLAGS:
00010292
RAX:
ffffffffa00b07d0 RBX:
ffff883ff3058000 RCX:
ffff883ff357bb66
RDX:
0000000000000003 RSI:
0000000000007530 RDI:
ffff881fea631000
RBP:
0000000000000000 R08:
ffff881fe4d38400 R09:
0000000000000000
R10:
0000000000000000 R11:
00000000000001b6 R12:
000000000800005d
R13:
000000000800005d R14:
ffff883ffd9b3790 R15:
0000000000000000
FS:
00007f7dc8e6d8c0(0000) GS:
ffff883fff340000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000010 CR3:
0000003ffda98005 CR4:
00000000003606e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
? __invalidate_device+0x48/0x60
check_disk_change+0x4c/0x60
sr_block_open+0x16/0xd0 [sr_mod]
__blkdev_get+0xb9/0x450
? iget5_locked+0x1c0/0x1e0
blkdev_get+0x11e/0x320
? bdget+0x11d/0x150
? _raw_spin_unlock+0xa/0x20
? bd_acquire+0xc0/0xc0
do_dentry_open+0x1b0/0x320
? inode_permission+0x24/0xc0
path_openat+0x4e6/0x1420
? cpumask_any_but+0x1f/0x40
? flush_tlb_mm_range+0xa0/0x120
do_filp_open+0x8c/0xf0
? __seccomp_filter+0x28/0x230
? _raw_spin_unlock+0xa/0x20
? __handle_mm_fault+0x7d6/0x9b0
? list_lru_add+0xa8/0xc0
? _raw_spin_unlock+0xa/0x20
? __alloc_fd+0xaf/0x160
? do_sys_open+0x1a6/0x230
do_sys_open+0x1a6/0x230
do_syscall_64+0x5a/0x100
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xidong Wang [Tue, 10 Apr 2018 23:29:34 +0000 (16:29 -0700)]
z3fold: fix memory leak
[ Upstream commit
1ec6995d1290bfb87cc3a51f0836c889e857cef9 ]
In z3fold_create_pool(), the memory allocated by __alloc_percpu() is not
released on the error path that pool->compact_wq , which holds the
return value of create_singlethread_workqueue(), is NULL. This will
result in a memory leak bug.
[akpm@linux-foundation.org: fix oops on kzalloc() failure, check __alloc_percpu() retval]
Link: http://lkml.kernel.org/r/1522803111-29209-1-git-send-email-wangxidong_97@163.com
Signed-off-by: Xidong Wang <wangxidong_97@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tom Abraham [Tue, 10 Apr 2018 23:29:48 +0000 (16:29 -0700)]
swap: divide-by-zero when zero length swap file on ssd
[ Upstream commit
a06ad633a37c64a0cd4c229fc605cee8725d376e ]
Calling swapon() on a zero length swap file on SSD can lead to a
divide-by-zero.
Although creating such files isn't possible with mkswap and they woud be
considered invalid, it would be better for the swapon code to be more
robust and handle this condition gracefully (return -EINVAL).
Especially since the fix is small and straightforward.
To help with wear leveling on SSD, the swapon syscall calculates a
random position in the swap file using modulo p->highest_bit, which is
set to maxpages - 1 in read_swap_header.
If the swap file is zero length, read_swap_header sets maxpages=1 and
last_page=0, resulting in p->highest_bit=0 and we divide-by-zero when we
modulo p->highest_bit in swapon syscall.
This can be prevented by having read_swap_header return zero if
last_page is zero.
Link: http://lkml.kernel.org/r/5AC747C1020000A7001FA82C@prv-mh.provo.novell.com
Signed-off-by: Thomas Abraham <tabraham@suse.com>
Reported-by: <Mark.Landis@Teradata.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Konovalov [Tue, 10 Apr 2018 23:30:31 +0000 (16:30 -0700)]
kasan, slub: fix handling of kasan_slab_free hook
[ Upstream commit
c3895391df385c6628638f014c87e16f5e2efd45 ]
The kasan_slab_free hook's return value denotes whether the reuse of a
slab object must be delayed (e.g. when the object is put into memory
qurantine).
The current way SLUB handles this hook is by ignoring its return value
and hardcoding checks similar (but not exactly the same) to the ones
performed in kasan_slab_free, which is prone to making mistakes.
The main difference between the hardcoded checks and the ones in
kasan_slab_free is whether we want to perform a free in case when an
invalid-free or a double-free was detected (we don't).
This patch changes the way SLUB handles this by:
1. taking into account the return value of kasan_slab_free for each of
the objects, that are being freed;
2. reconstructing the freelist of objects to exclude the ones, whose
reuse must be delayed.
[andreyknvl@google.com: eliminate unnecessary branch in slab_free]
Link: http://lkml.kernel.org/r/a62759a2545fddf69b0c034547212ca1eb1b3ce2.1520359686.git.andreyknvl@google.com
Link: http://lkml.kernel.org/r/083f58501e54731203801d899632d76175868e97.1519400992.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Konovalov [Tue, 10 Apr 2018 23:30:35 +0000 (16:30 -0700)]
kasan: fix invalid-free test crashing the kernel
[ Upstream commit
91c93ed07f04f5b32a30321d522d8ca9504745bf ]
When an invalid-free is triggered by one of the KASAN tests, the object
doesn't actually get freed. This later leads to a BUG failure in
kmem_cache_destroy that checks that there are no allocated objects in
the cache that is being destroyed.
Fix this by calling kmem_cache_free with the proper object address after
the call that triggers invalid-free.
Link: http://lkml.kernel.org/r/286eaefc0a6c3fa9b83b87e7d6dc0fbb5b5c9926.1519924383.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Chris Mason <clm@fb.com>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Cc: Kostya Serebryany <kcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Danilo Krummrich [Tue, 10 Apr 2018 23:31:38 +0000 (16:31 -0700)]
fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
[ Upstream commit
a0b0d1c345d0317efe594df268feb5ccc99f651e ]
proc_sys_link_fill_cache() does not take currently unregistering sysctl
tables into account, which might result into a page fault in
sysctl_follow_link() - add a check to fix it.
This bug has been present since v3.4.
Link: http://lkml.kernel.org/r/20180228013506.4915-1-danilokrummrich@dk-develop.de
Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: Danilo Krummrich <danilokrummrich@dk-develop.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
James Smart [Thu, 12 Apr 2018 15:16:15 +0000 (09:16 -0600)]
nvme: expand nvmf_check_if_ready checks
[ Upstream commit
bb06ec31452fb2da1594f88035c2ecea4e0652f4 ]
The nvmf_check_if_ready() checks that were added are very simplistic.
As such, the routine allows a lot of cases to fail ios during windows
of reset or re-connection. In cases where there are not multi-path
options present, the error goes back to the callee - the filesystem
or application. Not good.
The common routine was rewritten and calling syntax slightly expanded
so that per-transport is_ready routines don't need to be present.
The transports now call the routine directly. The routine is now a
fabrics routine rather than an inline function.
The routine now looks at controller state to decide the action to
take. Some states mandate io failure. Others define the condition where
a command can be accepted. When the decision is unclear, a generic
queue-or-reject check is made to look for failfast or multipath ios and
only fails the io if it is so marked. Otherwise, the io will be queued
and wait for the controller state to resolve.
Admin commands issued via ioctl share a live admin queue with commands
from the transport for controller init. The ioctls could be intermixed
with the initialization commands. It's possible for the ioctl cmd to
be issued prior to the controller being enabled. To block this, the
ioctl admin commands need to be distinguished from admin commands used
for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to
reflect this division and set it on ioctls requests. As the
nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(),
ensure that commands allocated by the ioctl path (actually anything
in core.c) preps the nvme_req(req) before starting the io. This will
preserve the USERCMD flag during execution and/or retry.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.e>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sriharsha Basavapatna [Wed, 11 Apr 2018 15:50:15 +0000 (11:50 -0400)]
bnxt_en: Ignore src port field in decap filter nodes
[ Upstream commit
479ca3bf91da971fcefc003cf5773e8d7db24794 ]
The driver currently uses src port field (along with other fields) in the
decap tunnel key, while looking up and adding tunnel nodes. This leads to
redundant cfa_decap_filter_alloc() requests to the FW and flow-miss in the
flow engine. Fix this by ignoring the src port field in decap tunnel nodes.
Fixes: f484f6782e01 ("bnxt_en: add hwrm FW cmds for cfa_encap_record and decap_filter")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dave Hansen [Fri, 6 Apr 2018 20:55:14 +0000 (13:55 -0700)]
x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
[ Upstream commit
639d6aafe437a7464399d2a77d006049053df06f ]
__ro_after_init data gets stuck in the .rodata section. That's normally
fine because the kernel itself manages the R/W properties.
But, if we run __change_page_attr() on an area which is __ro_after_init,
the .rodata checks will trigger and force the area to be immediately
read-only, even if it is early-ish in boot. This caused problems when
trying to clear the _PAGE_GLOBAL bit for these area in the PTI code:
it cleared _PAGE_GLOBAL like I asked, but also took it up on itself
to clear _PAGE_RW. The kernel then oopses the next time it wrote to
a __ro_after_init data structure.
To fix this, add the kernel_set_to_readonly check, just like we have
for kernel text, just a few lines below in this function.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205514.8D898241@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Joerg Roedel [Wed, 11 Apr 2018 15:24:38 +0000 (17:24 +0200)]
x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
[ Upstream commit
e3e288121408c3abeed5af60b87b95c847143845 ]
The pmd_set_huge() and pud_set_huge() functions are used from
the generic ioremap() code to establish large mappings where this
is possible.
But the generic ioremap() code does not check whether the
PMD/PUD entries are already populated with a non-leaf entry,
so that any page-table pages these entries point to will be
lost.
Further, on x86-32 with SHARED_KERNEL_PMD=0, this causes a
BUG_ON() in vmalloc_sync_one() when PMD entries are synced
from swapper_pg_dir to the current page-table. This happens
because the PMD entry from swapper_pg_dir was promoted to a
huge-page entry while the current PGD still contains the
non-leaf entry. Because both entries are present and point
to a different page, the BUG_ON() triggers.
This was actually triggered with pti-x32 enabled in a KVM
virtual machine by the graphics driver.
A real and better fix for that would be to improve the
page-table handling in the generic ioremap() code. But that is
out-of-scope for this patch-set and left for later work.
Reported-by: David H. Gutteridge <dhgutteridge@sympatico.ca>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <llong@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180411152437.GC15462@8bytes.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Thu, 5 Apr 2018 21:55:12 +0000 (22:55 +0100)]
Btrfs: fix loss of prealloc extents past i_size after fsync log replay
[ Upstream commit
471d557afed155b85da237ec46c549f443eeb5de ]
Currently if we allocate extents beyond an inode's i_size (through the
fallocate system call) and then fsync the file, we log the extents but
after a power failure we replay them and then immediately drop them.
This behaviour happens since about 2009, commit
c71bf099abdd ("Btrfs:
Avoid orphan inodes cleanup while replaying log"), because it marks
the inode as an orphan instead of dropping any extents beyond i_size
before replaying logged extents, so after the log replay, and while
the mount operation is still ongoing, we find the inode marked as an
orphan and then perform a truncation (drop extents beyond the inode's
i_size). Because the processing of orphan inodes is still done
right after replaying the log and before the mount operation finishes,
the intention of that commit does not make any sense (at least as
of today). However reverting that behaviour is not enough, because
we can not simply discard all extents beyond i_size and then replay
logged extents, because we risk dropping extents beyond i_size created
in past transactions, for example:
add prealloc extent beyond i_size
fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode
transaction commit
add another prealloc extent beyond i_size
fsync - triggers the fast fsync path
power failure
In that scenario, we would drop the first extent and then replay the
second one. To fix this just make sure that all prealloc extents
beyond i_size are logged, and if we find too many (which is far from
a common case), fallback to a full transaction commit (like we do when
logging regular extents in the fast fsync path).
Trivial reproducer:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo
$ sync
$ xfs_io -c "falloc -k 256K 1M" /mnt/foo
$ xfs_io -c "fsync" /mnt/foo
<power failure>
# mount to replay log
$ mount /dev/sdb /mnt
# at this point the file only has one extent, at offset 0, size 256K
A test case for fstests follows soon, covering multiple scenarios that
involve adding prealloc extents with previous shrinking truncates and
without such truncates.
Fixes: c71bf099abdd ("Btrfs: Avoid orphan inodes cleanup while replaying log")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Liu Bo [Fri, 30 Mar 2018 22:11:56 +0000 (06:11 +0800)]
Btrfs: clean up resources during umount after trans is aborted
[ Upstream commit
af7227338135d2f1b1552bf9a6d43e02dcba10b9 ]
Currently if some fatal errors occur, like all IO get -EIO, resources
would be cleaned up when
a) transaction is being committed or
b) BTRFS_FS_STATE_ERROR is set
However, in some rare cases, resources may be left alone after transaction
gets aborted and umount may run into some ASSERT(), e.g.
ASSERT(list_empty(&block_group->dirty_list));
For case a), in btrfs_commit_transaciton(), there're several places at the
beginning where we just call btrfs_end_transaction() without cleaning up
resources. For case b), it is possible that the trans handle doesn't have
any dirty stuff, then only trans hanlde is marked as aborted while
BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
all resources won't stay in memory after umount.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Thumshirn [Thu, 12 Apr 2018 15:16:06 +0000 (09:16 -0600)]
nvme: don't send keep-alives to the discovery controller
[ Upstream commit
74c6c71530847808d4e3be7b205719270efee80c ]
NVMe over Fabrics 1.0 Section 5.2 "Discovery Controller Properties and
Command Support" Figure 31 "Discovery Controller – Admin Commands"
explicitly listst all commands but "Get Log Page" and "Identify" as
reserved, but NetApp report the Linux host is sending Keep Alive
commands to the discovery controller, which is a violation of the
Spec.
We're already checking for discovery controllers when configuring the
keep alive timeout but when creating a discovery controller we're not
hard wiring the keep alive timeout to 0 and thus remain on
NVME_DEFAULT_KATO for the discovery controller.
This can be easily remproduced when issuing a direct connect to the
discovery susbsystem using:
'nvme connect [...] --nqn=nqn.2014-08.org.nvmexpress.discovery'
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: 07bfcd09a288 ("nvme-fabrics: add a generic NVMe over Fabrics library")
Reported-by: Martin George <marting@netapp.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jean Delvare [Fri, 13 Apr 2018 13:37:59 +0000 (15:37 +0200)]
firmware: dmi_scan: Fix UUID length safety check
[ Upstream commit
90fe6f8ff00a07641ca893d64f75ca22ce77cca2 ]
The test which ensures that the DMI type 1 structure is long enough
to hold the UUID is off by one. It would fail if the structure is
exactly 24 bytes long, while that's sufficient to hold the UUID.
I don't expect this bug to cause problem in practice because all
implementations I have seen had length 8, 25 or 27 bytes, in line
with the SMBIOS specifications. But let's fix it still.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: a814c3597a6b ("firmware: dmi_scan: Check DMI structure length")
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rich Felker [Fri, 16 Mar 2018 00:01:36 +0000 (20:01 -0400)]
sh: fix debug trap failure to process signals before return to user
[ Upstream commit
96a598996f6ac518ac79839ecbb17c91af91f4f7 ]
When responding to a debug trap (breakpoint) in userspace, the
kernel's trap handler raised SIGTRAP but returned from the trap via a
code path that ignored pending signals, resulting in an infinite loop
re-executing the trapping instruction.
Signed-off-by: Rich Felker <dalias@libc.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pascal Roeleven [Fri, 20 Apr 2018 10:21:12 +0000 (12:21 +0200)]
ARM: dts: sun4i: Fix incorrect clocks for displays
commit
590b0c0cfc6162aeebbf43eaafb9753b56df1532 upstream.
Some displays on sun4i devices wouldn't properly stay on unless
'clk_ignore_unused' is used.
Change the duplicate clocks to the probably intended ones.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pascal Roeleven <dev@pascalroeleven.nl>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Vrabel [Fri, 18 May 2018 15:55:46 +0000 (16:55 +0100)]
x86/kvm: fix LAPIC timer drift when guest uses periodic mode
commit
d8f2f498d9ed0c5010bc1bbc1146f94c8bf9f8cc upstream.
Since 4.10, commit
8003c9ae204e (KVM: LAPIC: add APIC Timer
periodic/oneshot mode VMX preemption timer support), guests using
periodic LAPIC timers (such as FreeBSD 8.4) would see their timers
drift significantly over time.
Differences in the underlying clocks and numerical errors means the
periods of the two timers (hv and sw) are not the same. This
difference will accumulate with every expiry resulting in a large
error between the hv and sw timer.
This means the sw timer may be running slow when compared to the hv
timer. When the timer is switched from hv to sw, the now active sw
timer will expire late. The guest VCPU is reentered and it switches to
using the hv timer. This timer catches up, injecting multiple IRQs
into the guest (of which the guest only sees one as it does not get to
run until the hv timer has caught up) and thus the guest's timer rate
is low (and becomes increasing slower over time as the sw timer lags
further and further behind).
I believe a similar problem would occur if the hv timer is the slower
one, but I have not observed this.
Fix this by synchronizing the deadlines for both timers to the same
time source on every tick. This prevents the errors from accumulating.
Fixes: 8003c9ae204e21204e49816c5ea629357e283b06
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: David Vrabel <david.vrabel@nutanix.com>
Cc: stable@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jim Mattson [Wed, 9 May 2018 21:29:35 +0000 (14:29 -0700)]
kvm: x86: IA32_ARCH_CAPABILITIES is always supported
commit
1eaafe91a0df4157521b6417b3dd8430bf5f52f0 upstream.
If there is a possibility that a VM may migrate to a Skylake host,
then the hypervisor should report IA32_ARCH_CAPABILITIES.RSBA[bit 2]
as being set (future work, of course). This implies that
CPUID.(EAX=7,ECX=0):EDX.ARCH_CAPABILITIES[bit 29] should be
set. Therefore, kvm should report this CPUID bit as being supported
whether or not the host supports it. Userspace is still free to clear
the bit if it chooses.
For more information on RSBA, see Intel's white paper, "Retpoline: A
Branch Target Injection Mitigation" (Document Number 337131-001),
currently available at https://bugzilla.kernel.org/show_bug.cgi?id=199511.
Since the IA32_ARCH_CAPABILITIES MSR is emulated in kvm, there is no
dependency on hardware support for this feature.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Fixes: 28c1c9fabf48 ("KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES")
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wei Huang [Tue, 1 May 2018 14:49:54 +0000 (09:49 -0500)]
KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
commit
c4d2188206bafa177ea58e9a25b952baa0bf7712 upstream.
The CPUID bits of OSXSAVE (function=0x1) and OSPKE (func=0x7, leaf=0x0)
allows user apps to detect if OS has set CR4.OSXSAVE or CR4.PKE. KVM is
supposed to update these CPUID bits when CR4 is updated. Current KVM
code doesn't handle some special cases when updates come from emulator.
Here is one example:
Step 1: guest boots
Step 2: guest OS enables XSAVE ==> CR4.OSXSAVE=1 and CPUID.OSXSAVE=1
Step 3: guest hot reboot ==> QEMU reset CR4 to 0, but CPUID.OSXAVE==1
Step 4: guest os checks CPUID.OSXAVE, detects 1, then executes xgetbv
Step 4 above will cause an #UD and guest crash because guest OS hasn't
turned on OSXAVE yet. This patch solves the problem by comparing the the
old_cr4 with cr4. If the related bits have been changed,
kvm_update_cpuid() needs to be called.
Signed-off-by: Wei Huang <wei@redhat.com>
Reviewed-by: Bandan Das <bsd@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Hildenbrand [Wed, 9 May 2018 14:12:17 +0000 (16:12 +0200)]
KVM: s390: vsie: fix < 8k check for the itdba
commit
f4a551b72358facbbe5714248dff78404272feee upstream.
By missing an "L", we might detect some addresses to be <8k,
although they are not.
e.g. for itdba =
100001fff
!(gpa & ~0x1fffU) -> 1
!(gpa & ~0x1fffUL) -> 0
So we would report a SIE validity intercept although everything is fine.
Fixes: 166ecb3 ("KVM: s390: vsie: support transactional execution")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Konrad Rzeszutek Wilk [Mon, 21 May 2018 21:54:49 +0000 (17:54 -0400)]
KVM/VMX: Expose SSBD properly to guests
commit
0aa48468d00959c8a37cd3ac727284f4f7359151 upstream.
The X86_FEATURE_SSBD is an synthetic CPU feature - that is
it bit location has no relevance to the real CPUID 0x7.EBX[31]
bit position. For that we need the new CPU feature name.
Fixes: 52817587e706 ("x86/cpufeatures: Disentangle SSBD enumeration")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20180521215449.26423-2-konrad.wilk@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rafael J. Wysocki [Tue, 22 May 2018 11:02:17 +0000 (13:02 +0200)]
PM / core: Fix direct_complete handling for devices with no callbacks
commit
c62ec4610c40bcc44f2d3d5ed1c312737279e2f3 upstream.
Commit
08810a4119aa (PM / core: Add NEVER_SKIP and SMART_PREPARE
driver flags) inadvertently prevented the power.direct_complete flag
from being set for devices without PM callbacks and with disabled
runtime PM which also prevents power.direct_complete from being set
for their parents. That led to problems including a resume crash on
HP ZBook 14u.
Restore the previous behavior by causing power.direct_complete to be
set for those devices again, but do that in a more direct way to
avoid overlooking that case in the future.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199693
Fixes: 08810a4119aa (PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags)
Reported-by: Thomas Martitz <kugel@rockbox.org>
Tested-by: Thomas Martitz <kugel@rockbox.org>
Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gustavo A. R. Silva [Fri, 25 May 2018 21:47:57 +0000 (14:47 -0700)]
kernel/sys.c: fix potential Spectre v1 issue
commit
23d6aef74da86a33fa6bb75f79565e0a16ee97c2 upstream.
`resource' can be controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
Fix this by sanitizing *resource* before using it to index
current->signal->rlim
Notice that given that speculation windows are large, the policy is to
kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=
152449131114778&w=2
Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Hildenbrand [Fri, 25 May 2018 21:48:11 +0000 (14:48 -0700)]
kasan: fix memory hotplug during boot
commit
3f1959721558a976aaf9c2024d5bc884e6411bf7 upstream.
Using module_init() is wrong. E.g. ACPI adds and onlines memory before
our memory notifier gets registered.
This makes sure that ACPI memory detected during boot up will not result
in a kernel crash.
Easily reproducible with QEMU, just specify a DIMM when starting up.
Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com
Fixes: 786a8959912e ("kasan: disable memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Hildenbrand [Fri, 25 May 2018 21:48:08 +0000 (14:48 -0700)]
kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
commit
ed1596f9ab958dd156a66c9ff1029d3761c1786a upstream.
We have to free memory again when we cancel onlining, otherwise a later
onlining attempt will fail.
Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com
Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Ryabinin [Fri, 25 May 2018 21:47:38 +0000 (14:47 -0700)]
mm/kasan: don't vfree() nonexistent vm_area
commit
0f901dcbc31f88ae41a2aaa365f7802b5d520a28 upstream.
KASAN uses different routines to map shadow for hot added memory and
memory obtained in boot process. Attempt to offline memory onlined by
normal boot process leads to this:
Trying to vfree() nonexistent vm area (
000000005d3b34b9)
WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190
Call Trace:
kasan_mem_notifier+0xad/0xb9
notifier_call_chain+0x166/0x260
__blocking_notifier_call_chain+0xdb/0x140
__offline_pages+0x96a/0xb10
memory_subsys_offline+0x76/0xc0
device_offline+0xb8/0x120
store_mem_state+0xfa/0x120
kernfs_fop_write+0x1d5/0x320
__vfs_write+0xd4/0x530
vfs_write+0x105/0x340
SyS_write+0xb0/0x140
Obviously we can't call vfree() to free memory that wasn't allocated via
vmalloc(). Use find_vm_area() to see if we can call vfree().
Unfortunately it's a bit tricky to properly unmap and free shadow
allocated during boot, so we'll have to keep it. If memory will come
online again that shadow will be reused.
Matthew asked: how can you call vfree() on something that isn't a
vmalloc address?
vfree() is able to free any address returned by
__vmalloc_node_range(). And __vmalloc_node_range() gives you any
address you ask. It doesn't have to be an address in [VMALLOC_START,
VMALLOC_END] range.
That's also how the module_alloc()/module_memfree() works on
architectures that have designated area for modules.
[aryabinin@virtuozzo.com: improve comments]
Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com
[akpm@linux-foundation.org: fix typos, reflow comment]
Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com
Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Paul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davidlohr Bueso [Fri, 25 May 2018 21:47:30 +0000 (14:47 -0700)]
ipc/shm: fix shmat() nil address after round-down when remapping
commit
8f89c007b6dec16a1793cb88de88fcc02117bbbc upstream.
shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
fact the very first thing we check for. Andrea reported that for
SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
but we need to check again if the address was rounded down to nil. As
of this patch, such cases will return -EINVAL.
Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davidlohr Bueso [Fri, 25 May 2018 21:47:27 +0000 (14:47 -0700)]
Revert "ipc/shm: Fix shmat mmap nil-page protection"
commit
a73ab244f0dad8fffb3291b905f73e2d3eaa7c00 upstream.
Patch series "ipc/shm: shmat() fixes around nil-page".
These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.
The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.
I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour. Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be
modified to just test for SHM_RND|SHM_REMAP).
[1] lkml.kernel.org/r/
20180430172152.nfa564pvgpk3ut7p@linux-n805
This patch (of 2):
Commit
95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
worked on the idea that we should not be mapping as root addr=0 and
MAP_FIXED. However, it was reported that this scenario is in fact
valid, thus making the patch both bogus and breaks userspace as well.
For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
initialization[1].
[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Matthew Wilcox [Fri, 25 May 2018 21:47:24 +0000 (14:47 -0700)]
idr: fix invalid ptr dereference on item delete
commit
7a4deea1aa8bddfed4ef1b35fc2b6732563d8ad5 upstream.
If the radix tree underlying the IDR happens to be full and we attempt
to remove an id which is larger than any id in the IDR, we will call
__radix_tree_delete() with an uninitialised 'slot' pointer, at which
point anything could happen. This was easiest to hit with a single
entry at id 0 and attempting to remove a non-0 id, but it could have
happened with 64 entries and attempting to remove an id >= 64.
Roman said:
The syzcaller test boils down to opening /dev/kvm, creating an
eventfd, and calling a couple of KVM ioctls. None of this requires
superuser. And the result is dereferencing an uninitialized pointer
which is likely a crash. The specific path caught by syzbot is via
KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
other user-triggerable paths, so cc:stable is probably justified.
Matthew added:
We have around 250 calls to idr_remove() in the kernel today. Many of
them pass an ID which is embedded in the object they're removing, so
they're safe. Picking a few likely candidates:
drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
drivers/atm/nicstar.c could be taken down by a handcrafted packet
Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
Debugged-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rafał Miłecki [Tue, 8 May 2018 09:31:04 +0000 (11:31 +0200)]
bcma: fix buffer size caused crash in bcma_core_mips_print_irq()
commit
361de091a4b97aa9081d304d742f80d486ab7125 upstream.
Used buffer wasn't big enough to hold whole strings. Example output of
this function is:
[ 0.180892] bcma: bus0: core 0x0800, irq: 2(S)* 3 4 5 6 D I
[ 0.180948] bcma: bus0: core 0x0812, irq: 2(S) 3* 4 5 6 D I
[ 0.180998] bcma: bus0: core 0x082d, irq: 2(S) 3 4* 5 6 D I
[ 0.181046] bcma: bus0: core 0x082c, irq: 2(S) 3 4 5 6 D I*
which means we need to store up to 24 chars.
Fixes: 758f7e06063a8 ("bcma: Use bcma_debug and not pr_cont in MIPS driver")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jens Axboe [Mon, 21 May 2018 18:21:14 +0000 (12:21 -0600)]
sr: pass down correctly sized SCSI sense buffer
commit
f7068114d45ec55996b9040e98111afa56e010fe upstream.
We're casting the CDROM layer request_sense to the SCSI sense
buffer, but the former is 64 bytes and the latter is 96 bytes.
As we generally allocate these on the stack, we end up blowing
up the stack.
Fix this by wrapping the scsi_execute() call with a properly
sized sense buffer, and copying back the bits for the CDROM
layer.
Cc: stable@vger.kernel.org
Reported-by: Piotr Gabriel Kosinski <pg.kosinski@gmail.com>
Reported-by: Daniel Shapira <daniel@twistlock.com>
Tested-by: Kees Cook <keescook@chromium.org>
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lidong Chen [Tue, 8 May 2018 08:50:16 +0000 (16:50 +0800)]
IB/umem: Use the correct mm during ib_umem_release
commit
8e907ed4882714fd13cfe670681fc6cb5284c780 upstream.
User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads.
If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has
exited, get_pid_task will return NULL and ib_umem_release will not
decrease mm->pinned_vm.
Instead of using threads to locate the mm, use the overall tgid from the
ib_ucontext struct instead. This matches the behavior of ODP and
disassociate in handling the mm of the process that called ibv_reg_mr.
Cc: <stable@vger.kernel.org>
Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael J. Ruhl [Wed, 2 May 2018 13:42:51 +0000 (06:42 -0700)]
IB/hfi1: Use after free race condition in send context error path
commit
f9e76ca3771bf23d2142a81a88ddd8f31f5c4c03 upstream.
A pio send egress error can occur when the PSM library attempts to
to send a bad packet. That issue is still being investigated.
The pio error interrupt handler then attempts to progress the recovery
of the errored pio send context.
Code inspection reveals that the handling lacks the necessary locking
if that recovery interleaves with a PSM close of the "context" object
contains the pio send context.
The lack of the locking can cause the recovery to access the already
freed pio send context object and incorrectly deduce that the pio
send context is actually a kernel pio send context as shown by the
NULL deref stack below:
[<
ffffffff8143d78c>] _dev_info+0x6c/0x90
[<
ffffffffc0613230>] sc_restart+0x70/0x1f0 [hfi1]
[<
ffffffff816ab124>] ? __schedule+0x424/0x9b0
[<
ffffffffc06133c5>] sc_halted+0x15/0x20 [hfi1]
[<
ffffffff810aa3ba>] process_one_work+0x17a/0x440
[<
ffffffff810ab086>] worker_thread+0x126/0x3c0
[<
ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0
[<
ffffffff810b252f>] kthread+0xcf/0xe0
[<
ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
[<
ffffffff816b8798>] ret_from_fork+0x58/0x90
[<
ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
This is the best case scenario and other scenarios can corrupt the
already freed memory.
Fix by adding the necessary locking in the pio send context error
handler.
Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Neuling [Fri, 18 May 2018 01:37:42 +0000 (11:37 +1000)]
powerpc/64s: Clear PCR on boot
commit
faf37c44a105f3608115785f17cbbf3500f8bc71 upstream.
Clear the PCR (Processor Compatibility Register) on boot to ensure we
are not running in a compatibility mode.
We've seen this cause problems when a crash (and kdump) occurs while
running compat mode guests. The kdump kernel then runs with the PCR
set and causes problems. The symptom in the kdump kernel (also seen in
petitboot after fast-reboot) is early userspace programs taking
sigills on newer instructions (seen in libc).
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason A. Donenfeld [Fri, 27 Apr 2018 22:42:52 +0000 (00:42 +0200)]
arm64: export tishift functions to modules
commit
255845fc43a3aaf806852a1d3bc89bff1411ebe3 upstream.
Otherwise modules that use these arithmetic operations will fail to
link. We accomplish this with the usual EXPORT_SYMBOL, which on most
architectures goes in the .S file but the ARM64 maintainers prefer that
insead it goes into arm64ksyms.
While we're at it, we also fix this up to use SPDX, and I personally
choose to relicense this as GPL2||BSD so that these symbols don't need
to be export_symbol_gpl, so all modules can use the routines, since
these are important general purpose compiler-generated function calls.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reported-by: PaX Team <pageexec@freemail.hu>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Will Deacon [Mon, 21 May 2018 16:44:57 +0000 (17:44 +0100)]
arm64: lse: Add early clobbers to some input/output asm operands
commit
32c3fa7cdf0c4a3eb8405fc3e13398de019e828b upstream.
For LSE atomics that read and write a register operand, we need to
ensure that these operands are annotated as "early clobber" if the
register is written before all of the input operands have been consumed.
Failure to do so can result in the compiler allocating the same register
to both operands, leading to splats such as:
Unable to handle kernel paging request at virtual address
11111122222221
[...]
x1 :
1111111122222222 x0 :
1111111122222221
Process swapper/0 (pid: 1, stack limit = 0x000000008209f908)
Call trace:
test_atomic64+0x1360/0x155c
where x0 has been allocated as both the value to be stored and also the
atomic_t pointer.
This patch adds the missing clobbers.
Cc: <stable@vger.kernel.org>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Reported-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Hellstrom [Wed, 23 May 2018 14:11:24 +0000 (16:11 +0200)]
drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
commit
938ae7259c908ad031da35d551da297640bb640c upstream.
Depending on whether the kernel is compiled with frame-pointer or not,
the temporary memory location used for the bp parameter in these macros
is referenced relative to the stack pointer or the frame pointer.
Hence we can never reference that parameter when we've modified either
the stack pointer or the frame pointer, because then the compiler would
generate an incorrect stack reference.
Fix this by pushing the temporary memory parameter on a known location on
the stack before modifying the stack- and frame pointers.
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Joe Jin [Thu, 17 May 2018 19:33:28 +0000 (12:33 -0700)]
xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
commit
4855c92dbb7b3b85c23e88ab7ca04f99b9677b41 upstream.
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
but Dom Heap is increased by the same size. Tracing raidconfig we found
that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
to apply memory. If the memory allocated by Dom0 is not in the DMA area,
it will exchange memory with Xen to meet the requiment. Later drivers
call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
the check condition (dev_addr + size - 1 <= dma_mask) is always false,
it prevents calling xen_destroy_contiguous_region() to return the memory
to the Xen DMA heap.
This issue introduced by commit
6810df88dcfc2 "xen-swiotlb: When doing
coherent alloc/dealloc check before swizzling the MFNs.".
Signed-off-by: Joe Jin <joe.jin@oracle.com>
Tested-by: John Sobecki <john.sobecki@oracle.com>
Reviewed-by: Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sudip Mukherjee [Sat, 19 May 2018 21:29:36 +0000 (22:29 +0100)]
libata: blacklist Micron 500IT SSD with MU01 firmware
commit
136d769e0b3475d71350aa3648a116a6ee7a8f6c upstream.
While whitelisting Micron M500DC drives, the tweaked blacklist entry
enabled queued TRIM from M500IT variants also. But these do not support
queued TRIM. And while using those SSDs with the latest kernel we have
seen errors and even the partition table getting corrupted.
Some part from the dmesg:
[ 6.727384] ata1.00: ATA-9: Micron_M500IT_MTFDDAK060MBD, MU01, max UDMA/133
[ 6.727390] ata1.00:
117231408 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 6.741026] ata1.00: supports DRM functions and may not be fully accessible
[ 6.759887] ata1.00: configured for UDMA/133
[ 6.762256] scsi 0:0:0:0: Direct-Access ATA Micron_M500IT_MT MU01 PQ: 0 ANSI: 5
and then for the error:
[ 120.860334] ata1.00: exception Emask 0x1 SAct 0x7ffc0007 SErr 0x0 action 0x6 frozen
[ 120.860338] ata1.00: irq_stat 0x40000008
[ 120.860342] ata1.00: failed command: SEND FPDMA QUEUED
[ 120.860351] ata1.00: cmd 64/01:00:00:00:00/00:00:00:00:00/a0 tag 0 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x5 (timeout)
[ 120.860353] ata1.00: status: { DRDY }
[ 120.860543] ata1: hard resetting link
[ 121.166128] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 121.166376] ata1.00: supports DRM functions and may not be fully accessible
[ 121.186238] ata1.00: supports DRM functions and may not be fully accessible
[ 121.204445] ata1.00: configured for UDMA/133
[ 121.204454] ata1.00: device reported invalid CHS sector 0
[ 121.204541] sd 0:0:0:0: [sda] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[ 121.204546] sd 0:0:0:0: [sda] tag#18 Sense Key : 0x5 [current]
[ 121.204550] sd 0:0:0:0: [sda] tag#18 ASC=0x21 ASCQ=0x4
[ 121.204555] sd 0:0:0:0: [sda] tag#18 CDB: opcode=0x93 93 08 00 00 00 00 00 04 28 80 00 00 00 30 00 00
[ 121.204559] print_req_error: I/O error, dev sda, sector 272512
After few reboots with these errors, and the SSD is corrupted.
After blacklisting it, the errors are not seen and the SSD does not get
corrupted any more.
Fixes: 243918be6393 ("libata: Do not blacklist Micron M500DC")
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tejun Heo [Tue, 8 May 2018 21:21:56 +0000 (14:21 -0700)]
libata: Blacklist some Sandisk SSDs for NCQ
commit
322579dcc865b94b47345ad1b6002ad167f85405 upstream.
Sandisk SSDs SD7SN6S256G and SD8SN8U256G are regularly locking up
regularly under sustained moderate load with NCQ enabled. Blacklist
for now.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mika Westerberg [Thu, 24 May 2018 08:12:16 +0000 (11:12 +0300)]
ahci: Add PCI ID for Cannon Lake PCH-LP AHCI
commit
4544e403eb25552aed7f0ee181a7a506b8800403 upstream.
This one should be using the default LPM policy for mobile chipsets so
add the PCI ID to the driver list of supported revices.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Corneliu Doban [Fri, 18 May 2018 22:03:57 +0000 (15:03 -0700)]
mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
commit
3de06d5a1f05c11c94cbb68af14dbfa7fb81d78b upstream.
The SDHCI_QUIRK2_HOST_OFF_CARD_ON is needed for the driver to
properly reset the host controller (reset all) on initialization
after exiting deep sleep.
Signed-off-by: Corneliu Doban <corneliu.doban@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Srinath Mannam <srinath.mannam@broadcom.com>
Fixes: c833e92bbb60 ("mmc: sdhci-iproc: support standard byte register accesses")
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Corneliu Doban [Fri, 18 May 2018 22:03:56 +0000 (15:03 -0700)]
mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
commit
5f651b870485ee60f5abbbd85195a6852978894a upstream.
When the host controller accepts only 32bit writes, the value of the
16bit TRANSFER_MODE register, that has the same 32bit address as the
16bit COMMAND register, needs to be saved and it will be written
in a 32bit write together with the command as this will trigger the
host to send the command on the SD interface.
When sending the tuning command, TRANSFER_MODE is written and then
sdhci_set_transfer_mode reads it back to clear AUTO_CMD12 bit and
write it again resulting in wrong value to be written because the
initial write value was saved in a shadow and the read-back returned
a wrong value, from the register.
Fix sdhci_iproc_readw to return the saved value of TRANSFER_MODE
when a saved value exist.
Same fix for read of BLOCK_SIZE and BLOCK_COUNT registers, that are
saved for a different reason, although a scenario that will cause the
mentioned problem on this registers is not probable.
Fixes: b580c52d58d9 ("mmc: sdhci-iproc: add IPROC SDHCI driver")
Signed-off-by: Corneliu Doban <corneliu.doban@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Srinath Mannam [Fri, 18 May 2018 22:03:55 +0000 (15:03 -0700)]
mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
commit
4c94238f37af87a2165c3fb491b4a8b50e90649c upstream.
Remove hard coded mmc cap 1.8v from platform data as it is board specific.
The 1.8v DDR mmc caps can be enabled using DTS property for those
boards that support it.
Fixes: b17b4ab8ce38 ("mmc: sdhci-iproc: define MMC caps in platform data")
Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathieu Malaterre [Wed, 16 May 2018 19:20:20 +0000 (21:20 +0200)]
mmc: block: propagate correct returned value in mmc_rpmb_ioctl
commit
b25b750df99bcba29317d3f9d9f93c4ec58890e6 upstream.
In commit
97548575bef3 ("mmc: block: Convert RPMB to a character device") a
new function `mmc_rpmb_ioctl` was added. The final return is simply
returning a value of `0` instead of propagating the correct return code.
Discovered during a compilation with W=1, silence the following gcc warning
drivers/mmc/core/block.c:2470:6: warning: variable ‘ret’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Fixes: 97548575bef3 ("mmc: block: Convert RPMB to a character device")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Fri, 4 May 2018 12:23:01 +0000 (08:23 -0400)]
do d_instantiate/unlock_new_inode combinations safely
commit
1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream.
For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex. Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.
Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode(). All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.
Cc: stable@kernel.org # 2.6.29 and later
Tested-by: Mike Marshall <hubcap@omnibond.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben Hutchings [Thu, 17 May 2018 21:34:39 +0000 (22:34 +0100)]
ALSA: timer: Fix pause event notification
commit
3ae180972564846e6d794e3615e1ab0a1e6c4ef9 upstream.
Commit
f65e0d299807 ("ALSA: timer: Call notifier in the same spinlock")
combined the start/continue and stop/pause functions, and in doing so
changed the event code for the pause case to SNDRV_TIMER_EVENT_CONTINUE.
Change it back to SNDRV_TIMER_EVENT_PAUSE.
Fixes: f65e0d299807 ("ALSA: timer: Call notifier in the same spinlock")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Omar Sandoval [Tue, 22 May 2018 16:47:58 +0000 (09:47 -0700)]
Btrfs: fix error handling in btrfs_truncate()
commit
d50147381aa0c9725d63a677c138c47f55d6d3bc upstream.
Jun Wu at Facebook reported that an internal service was seeing a return
value of 1 from ftruncate() on Btrfs in some cases. This is coming from
the NEED_TRUNCATE_BLOCK return value from btrfs_truncate_inode_items().
btrfs_truncate() uses two variables for error handling, ret and err.
When btrfs_truncate_inode_items() returns non-zero, we set err to the
return value. However, NEED_TRUNCATE_BLOCK is not an error. Make sure we
only set err if ret is an error (i.e., negative).
To reproduce the issue: mount a filesystem with -o compress-force=zstd
and the following program will encounter return value of 1 from
ftruncate:
int main(void) {
char buf[256] = { 0 };
int ret;
int fd;
fd = open("test", O_CREAT | O_WRONLY | O_TRUNC, 0666);
if (fd == -1) {
perror("open");
return EXIT_FAILURE;
}
if (write(fd, buf, sizeof(buf)) != sizeof(buf)) {
perror("write");
close(fd);
return EXIT_FAILURE;
}
if (fsync(fd) == -1) {
perror("fsync");
close(fd);
return EXIT_FAILURE;
}
ret = ftruncate(fd, 128);
if (ret) {
printf("ftruncate() returned %d\n", ret);
close(fd);
return EXIT_FAILURE;
}
close(fd);
return EXIT_SUCCESS;
}
Fixes: ddfae63cc8e0 ("btrfs: move btrfs_truncate_block out of trans handle")
CC: stable@vger.kernel.org # 4.15+
Reported-by: Jun Wu <quark@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Sun, 20 May 2018 20:46:23 +0000 (16:46 -0400)]
aio: fix io_destroy(2) vs. lookup_ioctx() race
commit
baf10564fbb66ea222cae66fbff11c444590ffd9 upstream.
kill_ioctx() used to have an explicit RCU delay between removing the
reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
At some point that delay had been removed, on the theory that
percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was
the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
by lookup_ioctx(). As the result, we could get ctx freed right under
lookup_ioctx(). Tejun has fixed that in
a6d7cff472e ("fs/aio: Add explicit
RCU grace period when freeing kioctx"); however, that fix is not enough.
Suppose io_destroy() from one thread races with e.g. io_setup() from another;
CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the
refcount, getting it to 0 and triggering a call of free_ioctx_users(),
which proceeds to drop the secondary refcount and once that reaches zero
calls free_ioctx_reqs(). That does
INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
queue_rcu_work(system_wq, &ctx->free_rwork);
and schedules freeing the whole thing after RCU delay.
In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
refcount from 0 to 1 and returned the reference to io_setup().
Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
freed until after percpu_ref_get(). Sure, we'd increment the counter before
ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to
stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it
has grabbed the reference, ctx is *NOT* going away until it gets around to
dropping that reference.
The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
It's not costlier than what we currently do in normal case, it's safe to
call since freeing *is* delayed and it closes the race window - either
lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
the object in question at all.
Cc: stable@kernel.org
Fixes: a6d7cff472e "fs/aio: Add explicit RCU grace period when freeing kioctx"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dave Chinner [Fri, 11 May 2018 01:20:57 +0000 (11:20 +1000)]
fs: don't scan the inode cache before SB_BORN is set
commit
79f546a696bff2590169fb5684e23d65f4d9f591 upstream.
We recently had an oops reported on a 4.14 kernel in
xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
and so the m_perag_tree lookup walked into lala land. It produces
an oops down this path during the failed mount:
radix_tree_gang_lookup_tag+0xc4/0x130
xfs_perag_get_tag+0x37/0xf0
xfs_reclaim_inodes_count+0x32/0x40
xfs_fs_nr_cached_objects+0x11/0x20
super_cache_count+0x35/0xc0
shrink_slab.part.66+0xb1/0x370
shrink_node+0x7e/0x1a0
try_to_free_pages+0x199/0x470
__alloc_pages_slowpath+0x3a1/0xd20
__alloc_pages_nodemask+0x1c3/0x200
cache_grow_begin+0x20b/0x2e0
fallback_alloc+0x160/0x200
kmem_cache_alloc+0x111/0x4e0
The problem is that the superblock shrinker is running before the
filesystem structures it depends on have been fully set up. i.e.
the shrinker is registered in sget(), before ->fill_super() has been
called, and the shrinker can call into the filesystem before
fill_super() does it's setup work. Essentially we are exposed to
both use-after-free and use-before-initialisation bugs here.
To fix this, add a check for the SB_BORN flag in super_cache_count.
In general, this flag is not set until ->fs_mount() completes
successfully, so we know that it is set after the filesystem
setup has completed. This matches the trylock_super() behaviour
which will not let super_cache_scan() run if SB_BORN is not set, and
hence will not allow the superblock shrinker from entering the
filesystem while it is being set up or after it has failed setup
and is being torn down.
Cc: stable@kernel.org
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Wed, 25 Apr 2018 14:28:38 +0000 (10:28 -0400)]
fix breakage caused by d_find_alias() semantics change
commit
b127125d9db23e4856156a7c909a3c8e18b69f99 upstream.
"VFS: don't keep disconnected dentries on d_anon" had a non-trivial
side-effect - d_unhashed() now returns true for those dentries,
making d_find_alias() skip them altogether. For most of its callers
that's fine - we really want a connected alias there. However,
there is a codepath where we relied upon picking such aliases
if nothing else could be found - selinux delayed initialization
of contexts for inodes on already mounted filesystems used to
rely upon that.
Cc: stable@kernel.org # f1ee616214cb "VFS: don't keep disconnected dentries on d_anon"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Sun, 6 May 2018 16:15:20 +0000 (12:15 -0400)]
affs_lookup(): close a race with affs_remove_link()
commit
30da870ce4a4e007c901858a96e9e394a1daa74a upstream.
we unlock the directory hash too early - if we are looking at secondary
link and primary (in another directory) gets removed just as we unlock,
we could have the old primary moved in place of the secondary, leaving
us to look into freed entry (and leaving our dentry with ->d_fsdata
pointing to a freed entry).
Cc: stable@vger.kernel.org # 2.4.4+
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Colin Ian King [Mon, 14 May 2018 17:23:50 +0000 (18:23 +0100)]
KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
commit
ba3696e94d9d590d9a7e55f68e81c25dba515191 upstream.
Trivial fix to spelling mistake in debugfs_entries text.
Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kernel-janitors@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Maciej W. Rozycki [Mon, 14 May 2018 15:49:43 +0000 (16:49 +0100)]
MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
commit
9a3a92ccfe3620743d4ae57c987dc8e9c5f88996 upstream.
Check the TIF_32BIT_FPREGS task setting of the tracee rather than the
tracer in determining the layout of floating-point general registers in
the floating-point context, correcting access to odd-numbered registers
for o32 tracees where the setting disagrees between the two processes.
Fixes: 597ce1723e0f ("MIPS: Support for 64-bit FP with O32 binaries")
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.14+
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Maciej W. Rozycki [Mon, 30 Apr 2018 14:56:47 +0000 (15:56 +0100)]
MIPS: ptrace: Expose FIR register through FP regset
commit
71e909c0cdad28a1df1fa14442929e68615dee45 upstream.
Correct commit
7aeb753b5353 ("MIPS: Implement task_user_regset_view.")
and expose the FIR register using the unused 4 bytes at the end of the
NT_PRFPREG regset. Without that register included clients cannot use
the PTRACE_GETREGSET request to retrieve the complete FPU register set
and have to resort to one of the older interfaces, either PTRACE_PEEKUSR
or PTRACE_GETFPREGS, to retrieve the missing piece of data. Also the
register is irreversibly missing from core dumps.
This register is architecturally hardwired and read-only so the write
path does not matter. Ignore data supplied on writes then.
Fixes: 7aeb753b5353 ("MIPS: Implement task_user_regset_view.")
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.13+
Patchwork: https://patchwork.linux-mips.org/patch/19273/
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paul Cercueil [Wed, 28 Mar 2018 15:38:12 +0000 (17:38 +0200)]
MIPS: Fix build with DEBUG_ZBOOT and MACH_JZ4770
commit
c60128ce97674fd05adb8b5ae79eb6745a03192e upstream.
The debug definitions were missing for MACH_JZ4770, resulting in a build
failure when DEBUG_ZBOOT was set.
Since the UART addresses are the same across all Ingenic SoCs, we just
use a #ifdef CONFIG_MACH_INGENIC instead of checking for individual
Ingenic SoCs.
Additionally, I added a #define for the UART0 address in-code and
dropped the <asm/mach-jz4740/base.h> include, for the reason that this
include file is slowly being phased out as the whole platform is being
moved to devicetree.
Fixes: 9be5f3e92ed5 ("MIPS: ingenic: Initial JZ4770 support")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.16
Patchwork: https://patchwork.linux-mips.org/patch/18957/
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NeilBrown [Thu, 26 Apr 2018 23:28:34 +0000 (09:28 +1000)]
MIPS: c-r4k: Fix data corruption related to cache coherence
commit
55a2aa08b3af519a9693f99cdf7fa6d8b62d9f65 upstream.
When DMA will be performed to a MIPS32 1004K CPS, the L1-cache for the
range needs to be flushed and invalidated first.
The code currently takes one of two approaches.
1/ If the range is less than the size of the dcache, then HIT type
requests flush/invalidate cache lines for the particular addresses.
HIT-type requests a globalised by the CPS so this is safe on SMP.
2/ If the range is larger than the size of dcache, then INDEX type
requests flush/invalidate the whole cache. INDEX type requests affect
the local cache only. CPS does not propagate them in any way. So this
invalidation is not safe on SMP CPS systems.
Data corruption due to '2' can quite easily be demonstrated by
repeatedly "echo 3 > /proc/sys/vm/drop_caches" and then sha1sum a file
that is several times the size of available memory. Dropping caches
means that large contiguous extents (large than dcache) are more likely.
This was not a problem before Linux-4.8 because option 2 was never used
if CONFIG_MIPS_CPS was defined. The commit which removed that apparently
didn't appreciate the full consequence of the change.
We could, in theory, globalize the INDEX based flush by sending an IPI
to other cores. These cache invalidation routines can be called with
interrupts disabled and synchronous IPI require interrupts to be
enabled. Asynchronous IPI may not trigger writeback soon enough. So we
cannot use IPI in practice.
We can already test if IPI would be needed for an INDEX operation with
r4k_op_needs_ipi(R4K_INDEX). If this is true then we mustn't try the
INDEX approach as we cannot use IPI. If this is false (e.g. when there
is only one core and hence one L1 cache) then it is safe to use the
INDEX approach without IPI.
This patch avoids options 2 if r4k_op_needs_ipi(R4K_INDEX), and so
eliminates the corruption.
Fixes: c00ab4896ed5 ("MIPS: Remove cpu_has_safe_index_cacheops")
Signed-off-by: NeilBrown <neil@brown.name>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.8+
Patchwork: https://patchwork.linux-mips.org/patch/19259/
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexandre Belloni [Wed, 25 Apr 2018 21:10:36 +0000 (23:10 +0200)]
MIPS: xilfpga: Actually include FDT in fitImage
commit
947bc875116042d5375446aa29bc1073c2d38977 upstream.
Commit
b35565bb16a5 ("MIPS: generic: Add support for MIPSfpga") added
and its.S file for xilfpga but forgot to add it to
arch/mips/generic/Platform so it is never used.
Fixes: b35565bb16a5 ("MIPS: generic: Add support for MIPSfpga")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.15+
Patchwork: https://patchwork.linux-mips.org/patch/19245/
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexandre Belloni [Wed, 25 Apr 2018 21:10:35 +0000 (23:10 +0200)]
MIPS: xilfpga: Stop generating useless dtb.o
commit
a5a92abbce56c41ff121db41a33b9c0a0ff39365 upstream.
A dtb.o is generated from nexys4ddr.dts but this is never used since it
has been moved to mips/generic with commit
b35565bb16a5 ("MIPS: generic:
Add support for MIPSfpga").
Fixes: b35565bb16a5 ("MIPS: generic: Add support for MIPSfpga")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.15+
Patchwork: https://patchwork.linux-mips.org/patch/19244/
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 25 May 2018 14:46:20 +0000 (16:46 +0200)]
Linux 4.16.12
James Hogan [Tue, 16 Jan 2018 14:45:21 +0000 (14:45 +0000)]
rtc: goldfish: Add missing MODULE_LICENSE
[ Upstream commit
82d632b85eb89f97051530f556cb49ee1c04bde7 ]
Fix the following warning in MIPS allmodconfig by adding a
MODULE_LICENSE() at the end of rtc-goldfish.c, based on the file header
comment which says GNU General Public License version 2:
WARNING: modpost: missing MODULE_LICENSE() in drivers/rtc/rtc-goldfish.o
Fixes: f22d9cdcb5eb ("rtc: goldfish: Add RTC driver for Android emulator")
Signed-off-by: James Hogan <jhogan@kernel.org>
Cc: Miodrag Dinic <miodrag.dinic@mips.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: linux-rtc@vger.kernel.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexandre Belloni [Mon, 12 Feb 2018 22:47:49 +0000 (23:47 +0100)]
rtc: rp5c01: fix possible race condition
[ Upstream commit
bcdd559268039d8340d38fa58668393596e29fdc ]
The probe function is not allowed to fail after registering the RTC because
the following may happen:
CPU0: CPU1:
sys_load_module()
do_init_module()
do_one_initcall()
cmos_do_probe()
rtc_device_register()
__register_chrdev()
cdev->owner = struct module*
open("/dev/rtc0")
rtc_device_unregister()
module_put()
free_module()
module_free(mod->module_core)
/* struct module *module is now
freed */
chrdev_open()
spin_lock(cdev_lock)
cdev_get()
try_module_get()
module_is_live()
/* dereferences already
freed struct module* */
Switch to devm_rtc_allocate_device/rtc_register_device to register the rtc
as late as possible.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Colin Ian King [Thu, 15 Feb 2018 19:36:14 +0000 (19:36 +0000)]
rtc: tx4939: avoid unintended sign extension on a 24 bit shift
[ Upstream commit
347876ad47b9923ce26e686173bbf46581802ffa ]
The shifting of buf[5] by 24 bits to the left will be promoted to
a 32 bit signed int and then sign-extended to an unsigned long. If
the top bit of buf[5] is set then all then all the upper bits sec
end up as also being set because of the sign-extension. Fix this by
casting buf[5] to an unsigned long before the shift.
Detected by CoverityScan, CID#
1465292 ("Unintended sign extension")
Fixes: 0e1492330cd2 ("rtc: add rtc-tx4939 driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexandre Belloni [Sun, 25 Feb 2018 20:14:31 +0000 (21:14 +0100)]
rtc: m41t80: fix race conditions
[ Upstream commit
10d0c768cc6d581523d673b9d1b54213f8a5eb24 ]
The IRQ is requested before the struct rtc is allocated and registered, but
this struct is used in the IRQ handler, leading to:
Unable to handle kernel NULL pointer dereference at virtual address
0000017c
pgd =
a38a2f9b
[
0000017c] *pgd=
00000000
Internal error: Oops: 5 [#1] ARM
Modules linked in:
CPU: 0 PID: 613 Comm: irq/48-m41t80 Not tainted 4.16.0-rc1+ #42
Hardware name: Atmel SAMA5
PC is at mutex_lock+0x14/0x38
LR is at m41t80_handle_irq+0x1c/0x9c
pc : [<
c06e864c>] lr : [<
c04b70f0>] psr:
20000013
sp :
dec73f30 ip :
00000000 fp :
dec56d98
r10:
df437cf0 r9 :
c0a03008 r8 :
c0145ffc
r7 :
df5c4300 r6 :
dec568d0 r5 :
df593000 r4 :
0000017c
r3 :
df592800 r2 :
60000013 r1 :
df593000 r0 :
0000017c
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control:
10c53c7d Table:
20004059 DAC:
00000051
Process irq/48-m41t80 (pid: 613, stack limit = 0xb52d091e)
Stack: (0xdec73f30 to 0xdec74000)
3f20:
dec56840 df5c4300 00000001 df5c4300
3f40:
c0145ffc c0146018 dec56840 ffffe000 00000001 c0146290 dec567c0 00000000
3f60:
c0146084 ed7c9a62 c014615c dec56d80 dec567c0 00000000 dec72000 dec56840
3f80:
c014615c c012ffc0 dec72000 dec567c0 c012fe80 00000000 00000000 00000000
3fa0:
00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
3fc0:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0:
00000000 00000000 00000000 00000000 00000013 00000000 29282726 2d2c2b2a
[<
c06e864c>] (mutex_lock) from [<
c04b70f0>] (m41t80_handle_irq+0x1c/0x9c)
[<
c04b70f0>] (m41t80_handle_irq) from [<
c0146018>] (irq_thread_fn+0x1c/0x54)
[<
c0146018>] (irq_thread_fn) from [<
c0146290>] (irq_thread+0x134/0x1c0)
[<
c0146290>] (irq_thread) from [<
c012ffc0>] (kthread+0x140/0x148)
[<
c012ffc0>] (kthread) from [<
c01010e8>] (ret_from_fork+0x14/0x2c)
Exception stack(0xdec73fb0 to 0xdec73ff8)
3fa0:
00000000 00000000 00000000 00000000
3fc0:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0:
00000000 00000000 00000000 00000000 00000013 00000000
Code:
e3c33d7f e3c3303f f5d0f000 e593300c (
e1901f9f)
---[ end trace
22b027302eb7c604 ]---
genirq: exiting task "irq/48-m41t80" (613) is an active IRQ thread (irq 48)
Also, there is another possible race condition. The probe function is not
allowed to fail after the RTC is registered because the following may
happen:
CPU0: CPU1:
sys_load_module()
do_init_module()
do_one_initcall()
cmos_do_probe()
rtc_device_register()
__register_chrdev()
cdev->owner = struct module*
open("/dev/rtc0")
rtc_device_unregister()
module_put()
free_module()
module_free(mod->module_core)
/* struct module *module is now
freed */
chrdev_open()
spin_lock(cdev_lock)
cdev_get()
try_module_get()
module_is_live()
/* dereferences already
freed struct module* */
Switch to devm_rtc_allocate_device/rtc_register_device to allocate the rtc
before requesting the IRQ and register it as late as possible.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexandre Belloni [Wed, 21 Feb 2018 10:57:05 +0000 (11:57 +0100)]
rtc: rk808: fix possible race condition
[ Upstream commit
201fac95e799c3d0304ec724d555e1251b9f6e84 ]
The probe function is not allowed to fail after registering the RTC because
the following may happen:
CPU0: CPU1:
sys_load_module()
do_init_module()
do_one_initcall()
cmos_do_probe()
rtc_device_register()
__register_chrdev()
cdev->owner = struct module*
open("/dev/rtc0")
rtc_device_unregister()
module_put()
free_module()
module_free(mod->module_core)
/* struct module *module is now
freed */
chrdev_open()
spin_lock(cdev_lock)
cdev_get()
try_module_get()
module_is_live()
/* dereferences already
freed struct module* */
Switch to devm_rtc_allocate_device/rtc_register_device to register the rtc
as late as possible.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexandre Belloni [Thu, 8 Mar 2018 22:27:31 +0000 (23:27 +0100)]
rtc: hctosys: Ensure system time doesn't overflow time_t
[ Upstream commit
b3a5ac42ab18b7d1a8f2f072ca0ee76a3b754a43 ]
On 32bit platforms, time_t is still a signed 32bit long. If it is
overflowed, userspace and the kernel cant agree on the current system time.
This causes multiple issues, in particular with systemd:
https://github.com/systemd/systemd/issues/1143
A good workaround is to simply avoid using hctosys which is something I
greatly encourage as the time is better set by userspace.
However, many distribution enable it and use systemd which is rendering the
system unusable in case the RTC holds a date after 2038 (and more so after
2106). Many drivers have workaround for this case and they should be
eliminated so there is only one place left to fix when userspace is able to
cope with dates after the 31bit overflow.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bryan O'Donoghue [Wed, 28 Mar 2018 19:14:05 +0000 (20:14 +0100)]
rtc: snvs: Fix usage of snvs_rtc_enable
[ Upstream commit
1485991c024603b2fb4ae77beb7a0d741128a48e ]
commit
179a502f8c46 ("rtc: snvs: add Freescale rtc-snvs driver") introduces
the SNVS RTC driver with a function snvs_rtc_enable().
snvs_rtc_enable() can return an error on the enable path however this
driver does not currently trap that failure on the probe() path and
consequently if enabling the RTC fails we encounter a later error spinning
forever in rtc_write_sync_lp().
[ 36.093481] [<
c010d630>] (__irq_svc) from [<
c0c2e9ec>] (_raw_spin_unlock_irqrestore+0x34/0x44)
[ 36.102122] [<
c0c2e9ec>] (_raw_spin_unlock_irqrestore) from [<
c072e32c>] (regmap_read+0x4c/0x5c)
[ 36.110938] [<
c072e32c>] (regmap_read) from [<
c085d0f4>] (rtc_write_sync_lp+0x6c/0x98)
[ 36.118881] [<
c085d0f4>] (rtc_write_sync_lp) from [<
c085d160>] (snvs_rtc_alarm_irq_enable+0x40/0x4c)
[ 36.128041] [<
c085d160>] (snvs_rtc_alarm_irq_enable) from [<
c08567b4>] (rtc_timer_do_work+0xd8/0x1a8)
[ 36.137291] [<
c08567b4>] (rtc_timer_do_work) from [<
c01441b8>] (process_one_work+0x28c/0x76c)
[ 36.145840] [<
c01441b8>] (process_one_work) from [<
c01446cc>] (worker_thread+0x34/0x58c)
[ 36.153961] [<
c01446cc>] (worker_thread) from [<
c014aee4>] (kthread+0x138/0x150)
[ 36.161388] [<
c014aee4>] (kthread) from [<
c0107e14>] (ret_from_fork+0x14/0x20)
[ 36.168635] rcu_sched kthread starved for 2602 jiffies! g496 c495 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
[ 36.178564] rcu_sched R running task 0 8 2 0x00000000
[ 36.185664] [<
c0c288b0>] (__schedule) from [<
c0c29134>] (schedule+0x3c/0xa0)
[ 36.192739] [<
c0c29134>] (schedule) from [<
c0c2db80>] (schedule_timeout+0x78/0x4e0)
[ 36.200422] [<
c0c2db80>] (schedule_timeout) from [<
c01a7ab0>] (rcu_gp_kthread+0x648/0x1864)
[ 36.208800] [<
c01a7ab0>] (rcu_gp_kthread) from [<
c014aee4>] (kthread+0x138/0x150)
[ 36.216309] [<
c014aee4>] (kthread) from [<
c0107e14>] (ret_from_fork+0x14/0x20)
This patch fixes by parsing the result of rtc_write_sync_lp() and
propagating both in the probe and elsewhere. If the RTC doesn't start we
don't proceed loading the driver and don't get into this loop mess later
on.
Fixes: 179a502f8c46 ("rtc: snvs: add Freescale rtc-snvs driver")
Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>