Paul Gortmaker [Wed, 16 Jan 2013 21:55:31 +0000 (16:55 -0500)]
Linux 2.6.34.14
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Matthew Garrett [Wed, 6 Jul 2011 20:52:37 +0000 (16:52 -0400)]
x86: Don't use the EFI reboot method by default
commit
f70e957cda22d309c769805cbb932407a5232219 upstream.
Testing suggests that at least some Lenovos and some Intels will
fail to reboot via EFI, attempting to jump to an unmapped
physical address. In the long run we could handle this by
providing a page table with a 1:1 mapping of physical addresses,
but for now it's probably just easier to assume that ACPI or
legacy methods will be present and reboot via those.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/1309985557-15350-1-git-send-email-mjg@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[PG: in 2.6.34, file is x86/platform/efi/efi.c --> x86/kernel/efi.c]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Richard Weinberger [Mon, 23 May 2011 22:18:05 +0000 (00:18 +0200)]
x86: Get rid of asmregparm
commit
1b4ac2a935aaf194241a2f4165d6407ba9650e1a upstream.
As UML does no longer need asmregparm we can remove it.
Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: namhyung@gmail.com
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: dhowells@redhat.com
Link: http://lkml.kernel.org/r/%3C1306189085-29896-1-git-send-email-richard%40nod.at%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Richard Weinberger [Mon, 23 May 2011 20:51:33 +0000 (22:51 +0200)]
um: Use RWSEM_GENERIC_SPINLOCK on x86
commit
3a3679078aed2c451ebc32836bbd3b8219a65e01 upstream.
Commit d12337 (rwsem: Remove redundant asmregparm annotation)
broke rwsem on UML.
As we cannot compile UML with -mregparm=3 and keeping asmregparm only
for UML is inadequate the easiest solution is using RWSEM_GENERIC_SPINLOCK.
Thanks to Thomas Gleixner for the idea.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: user-mode-linux-devel@lists.sourceforge.net
Link: http://lkml.kernel.org/r/%3C1306183893-26655-1-git-send-email-richard%40nod.at%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Thomas Gleixner [Wed, 26 Jan 2011 20:32:01 +0000 (21:32 +0100)]
rwsem: Remove redundant asmregparm annotation
commit
d123375425d7df4b6081a631fc1203fceafa59b2 upstream.
Peter Zijlstra pointed out, that the only user of asmregparm (x86) is
compiling the kernel already with -mregparm=3. So the annotation of
the rwsem functions is redundant. Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
LKML-Reference: <alpine.LFD.2.00.
1101262130450.31804@localhost6.localdomain6>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[PG: fixes compile errors when using newer gcc on 2.6.34 baseline]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Chanho Min [Thu, 5 Jan 2012 11:00:19 +0000 (20:00 +0900)]
sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW
commit
cb297a3e433dbdcf7ad81e0564e7b804c941ff0d upstream.
This issue happens under the following conditions:
1. preemption is off
2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined
3. RT scheduling class
4. SMP system
Sequence is as follows:
1.suppose current task is A. start schedule()
2.task A is enqueued pushable task at the entry of schedule()
__schedule
prev = rq->curr;
...
put_prev_task
put_prev_task_rt
enqueue_pushable_task
4.pick the task B as next task.
next = pick_next_task(rq);
3.rq->curr set to task B and context_switch is started.
rq->curr = next;
4.At the entry of context_swtich, release this cpu's rq->lock.
context_switch
prepare_task_switch
prepare_lock_switch
raw_spin_unlock_irq(&rq->lock);
5.Shortly after rq->lock is released, interrupt is occurred and start IRQ context
6.try_to_wake_up() which called by ISR acquires rq->lock
try_to_wake_up
ttwu_remote
rq = __task_rq_lock(p)
ttwu_do_wakeup(rq, p, wake_flags);
task_woken_rt
7.push_rt_task picks the task A which is enqueued before.
task_woken_rt
push_rt_tasks(rq)
next_task = pick_next_pushable_task(rq)
8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
lowest_rq can be the remote rq.
(But,If preemption is on, double_lock_balance always return 1 and it
does't happen.)
push_rt_task
find_lock_lowest_rq
if (double_lock_balance(rq, lowest_rq))..
9.find_lock_lowest_rq return the available rq. task A is migrated to
the remote cpu/rq.
push_rt_task
...
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
activate_task(lowest_rq, next_task, 0);
10. But, task A is on irq context at this cpu.
So, task A is scheduled by two cpus at the same time until restore from IRQ.
Task A's stack is corrupted.
To fix it, don't migrate an RT task if it's still running.
Signed-off-by: Chanho Min <chanho.min@lge.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/CAOAMb1BHA=5fm7KTewYyke6u-8DP0iUuJMpgQw54vNeXFsGpoQ@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[PG: in 2.6.34, sched/rt.c is just sched_rt.c]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eric W. Biederman [Sat, 29 Jan 2011 14:57:22 +0000 (14:57 +0000)]
net: Fix ip link add netns oops
commit
13ad17745c2cbd437d9e24b2d97393e0be11c439 upstream.
Ed Swierk <eswierk@bigswitch.com> writes:
> On 2.6.35.7
> ip link add link eth0 netns 9999 type macvlan
> where 9999 is a nonexistent PID triggers an oops and causes all network functions to hang:
> [10663.821898] BUG: unable to handle kernel NULL pointer dereference at
000000000000006d
> [10663.821917] IP: [<
ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
> [10663.821933] PGD
1d3927067 PUD
22f5c5067 PMD 0
> [10663.821944] Oops: 0000 [#1] SMP
> [10663.821953] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
> [10663.821959] CPU 3
> [10663.821963] Modules linked in: macvlan ip6table_filter ip6_tables rfcomm ipt_MASQUERADE binfmt_misc iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack sco ipt_REJECT bnep l2cap xt_tcpudp iptable_filter ip_tables x_tables bridge stp vboxnetadp vboxnetflt vboxdrv kvm_intel kvm parport_pc ppdev snd_hda_codec_intelhdmi snd_hda_codec_conexant arc4 iwlagn iwlcore mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi i915 snd_seq_midi_event snd_seq thinkpad_acpi drm_kms_helper btusb tpm_tis nvram uvcvideo snd_timer snd_seq_device bluetooth videodev v4l1_compat v4l2_compat_ioctl32 tpm drm tpm_bios snd cfg80211 psmouse serio_raw intel_ips soundcore snd_page_alloc intel_agp i2c_algo_bit video output netconsole configfs lp parport usbhid hid e1000e sdhci_pci ahci libahci sdhci led_class
> [10663.822155]
> [10663.822161] Pid: 6000, comm: ip Not tainted 2.6.35-23-generic #41-Ubuntu 2901CTO/2901CTO
> [10663.822167] RIP: 0010:[<
ffffffff8149c2fa>] [<
ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
> [10663.822177] RSP: 0018:
ffff88014aebf7b8 EFLAGS:
00010286
> [10663.822182] RAX:
00000000fffffff4 RBX:
ffff8801ad900800 RCX:
0000000000000000
> [10663.822187] RDX:
ffff880000000000 RSI:
0000000000000000 RDI:
ffff88014ad63000
> [10663.822191] RBP:
ffff88014aebf808 R08:
0000000000000041 R09:
0000000000000041
> [10663.822196] R10:
0000000000000000 R11:
dead000000200200 R12:
ffff88014aebf818
> [10663.822201] R13:
fffffffffffffffd R14:
ffff88014aebf918 R15:
ffff88014ad62000
> [10663.822207] FS:
00007f00c487f700(0000) GS:
ffff880001f80000(0000) knlGS:
0000000000000000
> [10663.822212] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
> [10663.822216] CR2:
000000000000006d CR3:
0000000231f19000 CR4:
00000000000026e0
> [10663.822221] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
> [10663.822226] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
> [10663.822231] Process ip (pid: 6000, threadinfo
ffff88014aebe000, task
ffff88014afb16e0)
> [10663.822236] Stack:
> [10663.822240]
ffff88014aebf808 ffffffff814a2bb5 ffff88014aebf7e8 00000000a00ee8d6
> [10663.822251] <0>
0000000000000000 ffffffffa00ef940 ffff8801ad900800 ffff88014aebf818
> [10663.822265] <0>
ffff88014aebf918 ffff8801ad900800 ffff88014aebf858 ffffffff8149c413
> [10663.822281] Call Trace:
> [10663.822290] [<
ffffffff814a2bb5>] ? dev_addr_init+0x75/0xb0
> [10663.822298] [<
ffffffff8149c413>] dev_alloc_name+0x43/0x90
> [10663.822307] [<
ffffffff814a85ee>] rtnl_create_link+0xbe/0x1b0
> [10663.822314] [<
ffffffff814ab2aa>] rtnl_newlink+0x48a/0x570
> [10663.822321] [<
ffffffff814aafcc>] ? rtnl_newlink+0x1ac/0x570
> [10663.822332] [<
ffffffff81030064>] ? native_x2apic_icr_read+0x4/0x20
> [10663.822339] [<
ffffffff814a8c17>] rtnetlink_rcv_msg+0x177/0x290
> [10663.822346] [<
ffffffff814a8aa0>] ? rtnetlink_rcv_msg+0x0/0x290
> [10663.822354] [<
ffffffff814c25d9>] netlink_rcv_skb+0xa9/0xd0
> [10663.822360] [<
ffffffff814a8a85>] rtnetlink_rcv+0x25/0x40
> [10663.822367] [<
ffffffff814c223e>] netlink_unicast+0x2de/0x2f0
> [10663.822374] [<
ffffffff814c303e>] netlink_sendmsg+0x1fe/0x2e0
> [10663.822383] [<
ffffffff81488533>] sock_sendmsg+0xf3/0x120
> [10663.822391] [<
ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
> [10663.822400] [<
ffffffff81168656>] ? __d_lookup+0x136/0x150
> [10663.822406] [<
ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
> [10663.822414] [<
ffffffff812b7a0d>] ? _atomic_dec_and_lock+0x4d/0x80
> [10663.822422] [<
ffffffff8116ea90>] ? mntput_no_expire+0x30/0x110
> [10663.822429] [<
ffffffff81486ff5>] ? move_addr_to_kernel+0x65/0x70
> [10663.822435] [<
ffffffff81493308>] ? verify_iovec+0x88/0xe0
> [10663.822442] [<
ffffffff81489020>] sys_sendmsg+0x240/0x3a0
> [10663.822450] [<
ffffffff8111e2a9>] ? __do_fault+0x479/0x560
> [10663.822457] [<
ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
> [10663.822465] [<
ffffffff8116cf4a>] ? alloc_fd+0x10a/0x150
> [10663.822473] [<
ffffffff8158d76e>] ? do_page_fault+0x15e/0x350
> [10663.822482] [<
ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
> [10663.822487] Code: 90 48 8d 78 02 be 25 00 00 00 e8 92 1d e2 ff 48 85 c0 75 cf bf 20 00 00 00 e8 c3 b1 c6 ff 49 89 c7 b8 f4 ff ff ff 4d 85 ff 74 bd <4d> 8b 75 70 49 8d 45 70 48 89 45 b8 49 83 ee 58 eb 28 48 8d 55
> [10663.822618] RIP [<
ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
> [10663.822627] RSP <
ffff88014aebf7b8>
> [10663.822631] CR2:
000000000000006d
> [10663.822636] ---[ end trace
3dfd6c3ad5327ca7 ]---
This bug was introduced in:
commit
81adee47dfb608df3ad0b91d230fb3cef75f0060
Author: Eric W. Biederman <ebiederm@aristanetworks.com>
Date: Sun Nov 8 00:53:51 2009 -0800
net: Support specifying the network namespace upon device creation.
There is no good reason to not support userspace specifying the
network namespace during device creation, and it makes it easier
to create a network device and pass it to a child network namespace
with a well known name.
We have to be careful to ensure that the target network namespace
for the new device exists through the life of the call. To keep
that logic clear I have factored out the network namespace grabbing
logic into rtnl_link_get_net.
In addtion we need to continue to pass the source network namespace
to the rtnl_link_ops.newlink method so that we can find the base
device source network namespace.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Where apparently I forgot to add error handling to the path where we create
a new network device in a new network namespace, and pass in an invalid pid.
Reported-by: Ed Swierk <eswierk@bigswitch.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Will Deacon [Fri, 10 Aug 2012 14:22:09 +0000 (15:22 +0100)]
mutex: Place lock in contended state after fastpath_lock failure
commit
0bce9c46bf3b15f485d82d7e81dabed6ebcc24b1 upstream.
ARM recently moved to asm-generic/mutex-xchg.h for its mutex
implementation after the previous implementation was found to be missing
some crucial memory barriers. However, this has revealed some problems
running hackbench on SMP platforms due to the way in which the
MUTEX_SPIN_ON_OWNER code operates.
The symptoms are that a bunch of hackbench tasks are left waiting on an
unlocked mutex and therefore never get woken up to claim it. This boils
down to the following sequence of events:
Task A Task B Task C Lock value
0 1
1 lock() 0
2 lock() 0
3 spin(A) 0
4 unlock() 1
5 lock() 0
6 cmpxchg(1,0) 0
7 contended() -1
8 lock() 0
9 spin(C) 0
10 unlock() 1
11 cmpxchg(1,0) 0
12 unlock() 1
At this point, the lock is unlocked, but Task B is in an uninterruptible
sleep with nobody to wake it up.
This patch fixes the problem by ensuring we put the lock into the
contended state if we fail to acquire it on the fastpath, ensuring that
any blocked waiters are woken up when the mutex is released.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-6e9lrw2avczr0617fzl5vqb8@git.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Stanislaw Gruszka [Wed, 8 Aug 2012 09:27:15 +0000 (11:27 +0200)]
sched: fix divide by zero at {thread_group,task}_times
commit
bea6832cc8c4a0a9a65dd17da6aaa657fe27bc3e upstream.
On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.
This problem can be triggered in practice on very long lived processes:
PID: 2331 TASK:
ffff880472814b00 CPU: 2 COMMAND: "oraagent.bin"
#0 [
ffff880472a51b70] machine_kexec at
ffffffff8103214b
#1 [
ffff880472a51bd0] crash_kexec at
ffffffff810b91c2
#2 [
ffff880472a51ca0] oops_end at
ffffffff814f0b00
#3 [
ffff880472a51cd0] die at
ffffffff8100f26b
#4 [
ffff880472a51d00] do_trap at
ffffffff814f03f4
#5 [
ffff880472a51d60] do_divide_error at
ffffffff8100cfff
#6 [
ffff880472a51e00] divide_error at
ffffffff8100be7b
[exception RIP: thread_group_times+0x56]
RIP:
ffffffff81056a16 RSP:
ffff880472a51eb8 RFLAGS:
00010046
RAX:
bc3572c9fe12d194 RBX:
ffff880874150800 RCX:
0000000110266fad
RDX:
0000000000000000 RSI:
ffff880472a51eb8 RDI:
001038ae7d9633dc
RBP:
ffff880472a51ef8 R8:
00000000b10a3a64 R9:
ffff880874150800
R10:
00007fcba27ab680 R11:
0000000000000202 R12:
ffff880472a51f08
R13:
ffff880472a51f10 R14:
0000000000000000 R15:
0000000000000007
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0018
#7 [
ffff880472a51f00] do_sys_times at
ffffffff8108845d
#8 [
ffff880472a51f40] sys_times at
ffffffff81088524
#9 [
ffff880472a51f80] system_call_fastpath at
ffffffff8100b0f2
RIP:
0000003808caac3a RSP:
00007fcba27ab6d8 RFLAGS:
00000202
RAX:
0000000000000064 RBX:
ffffffff8100b0f2 RCX:
0000000000000000
RDX:
00007fcba27ab6e0 RSI:
000000000076d58e RDI:
00007fcba27ab6e0
RBP:
00007fcba27ab700 R8:
0000000000000020 R9:
000000000000091b
R10:
00007fcba27ab680 R11:
0000000000000202 R12:
00007fff9ca41940
R13:
0000000000000000 R14:
00007fcba27ac9c0 R15:
00007fff9ca41940
ORIG_RAX:
0000000000000064 CS: 0033 SS: 002b
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[PG: sched/core.c is just sched.c in 2.6.34; also the do_div() on
__force u32 isn't explicitly seen since that is in v3.3-rc1~191^2~11]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Peter Zijlstra [Tue, 15 Mar 2011 13:37:10 +0000 (14:37 +0100)]
perf: Fix tear-down of inherited group events
commit
38b435b16c36b0d863efcf3f07b34a6fac9873fd upstream.
When destroying inherited events, we need to destroy groups too,
otherwise the event iteration in perf_event_exit_task_context() will
miss group siblings and we leak events with all the consequences.
Reported-and-tested-by: Vince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1300196470.2203.61.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Peter Zijlstra [Thu, 27 May 2010 13:47:49 +0000 (15:47 +0200)]
perf_events: Fix races in group composition
commit
8a49542c0554af7d0073aac0ee73ee65b807ef34 upstream.
Group siblings don't pin each-other or the parent, so when we destroy
events we must make sure to clean up all cross referencing pointers.
In particular, for destruction of a group leader we must be able to
find all its siblings and remove their reference to it.
This means that detaching an event from its context must not detach it
from the group, otherwise we can end up failing to clear all pointers.
Solve this by clearly separating the attachment to a context and
attachment to a group, and keep the group composed until we destroy
the events.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Jan Kiszka [Wed, 14 Dec 2011 18:25:13 +0000 (19:25 +0100)]
KVM: x86: Prevent starting PIT timers in the absence of irqchip support
commit
0924ab2cfa98b1ece26c033d696651fd62896c69 upstream.
User space may create the PIT and forgets about setting up the irqchips.
In that case, firing PIT IRQs will crash the host:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000128
IP: [<
ffffffffa10f6280>] kvm_set_irq+0x30/0x170 [kvm]
...
Call Trace:
[<
ffffffffa11228c1>] pit_do_work+0x51/0xd0 [kvm]
[<
ffffffff81071431>] process_one_work+0x111/0x4d0
[<
ffffffff81071bb2>] worker_thread+0x152/0x340
[<
ffffffff81075c8e>] kthread+0x7e/0x90
[<
ffffffff815a4474>] kernel_thread_helper+0x4/0x10
Prevent this by checking the irqchip mode before starting a timer. We
can't deny creating the PIT if the irqchips aren't set up yet as
current user land expects this order to work.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Jacek Luczak [Thu, 19 May 2011 09:55:13 +0000 (09:55 +0000)]
SCTP: fix race between sctp_bind_addr_free() and sctp_bind_addr_conflict()
commit
c182f90bc1f22ce5039b8722e45621d5f96862c2 upstream.
During the sctp_close() call, we do not use rcu primitives to
destroy the address list attached to the endpoint. At the same
time, we do the removal of addresses from this list before
attempting to remove the socket from the port hash
As a result, it is possible for another process to find the socket
in the port hash that is in the process of being closed. It then
proceeds to traverse the address list to find the conflict, only
to have that address list suddenly disappear without rcu() critical
section.
Fix issue by closing address list removal inside RCU critical
section.
Race can result in a kernel crash with general protection fault or
kernel NULL pointer dereference:
kernel: general protection fault: 0000 [#1] SMP
kernel: RIP: 0010:[<
ffffffffa02f3dde>] [<
ffffffffa02f3dde>] sctp_bind_addr_conflict+0x64/0x82 [sctp]
kernel: Call Trace:
kernel: [<
ffffffffa02f415f>] ? sctp_get_port_local+0x17b/0x2a3 [sctp]
kernel: [<
ffffffffa02f3d45>] ? sctp_bind_addr_match+0x33/0x68 [sctp]
kernel: [<
ffffffffa02f4416>] ? sctp_do_bind+0xd3/0x141 [sctp]
kernel: [<
ffffffffa02f5030>] ? sctp_bindx_add+0x4d/0x8e [sctp]
kernel: [<
ffffffffa02f5183>] ? sctp_setsockopt_bindx+0x112/0x4a4 [sctp]
kernel: [<
ffffffff81089e82>] ? generic_file_aio_write+0x7f/0x9b
kernel: [<
ffffffffa02f763e>] ? sctp_setsockopt+0x14f/0xfee [sctp]
kernel: [<
ffffffff810c11fb>] ? do_sync_write+0xab/0xeb
kernel: [<
ffffffff810e82ab>] ? fsnotify+0x239/0x282
kernel: [<
ffffffff810c2462>] ? alloc_file+0x18/0xb1
kernel: [<
ffffffff8134a0b1>] ? compat_sys_setsockopt+0x1a5/0x1d9
kernel: [<
ffffffff8134aaf1>] ? compat_sys_socketcall+0x143/0x1a4
kernel: [<
ffffffff810467dc>] ? sysenter_dispatch+0x7/0x32
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Thomas Graf [Thu, 7 Jul 2011 00:28:35 +0000 (00:28 +0000)]
sctp: Enforce retransmission limit during shutdown
commit
f8d9605243280f1870dd2c6c37a735b925c15f3c upstream.
When initiating a graceful shutdown while having data chunks
on the retransmission queue with a peer which is in zero
window mode the shutdown is never completed because the
retransmission error count is reset periodically by the
following two rules:
- Do not timeout association while doing zero window probe.
- Reset overall error count when a heartbeat request has
been acknowledged.
The graceful shutdown will wait for all outstanding TSN to
be acknowledged before sending the SHUTDOWN request. This
never happens due to the peer's zero window not acknowledging
the continuously retransmitted data chunks. Although the
error counter is incremented for each failed retransmission,
the receiving of the SACK announcing the zero window clears
the error count again immediately. Also heartbeat requests
continue to be sent periodically. The peer acknowledges these
requests causing the error counter to be reset as well.
This patch changes behaviour to only reset the overall error
counter for the above rules while not in shutdown. After
reaching the maximum number of retransmission attempts, the
T5 shutdown guard timer is scheduled to give the receiver
some additional time to recover. The timer is stopped as soon
as the receiver acknowledges any data.
The issue can be easily reproduced by establishing a sctp
association over the loopback device, constantly queueing
data at the sender while not reading any at the receiver.
Wait for the window to reach zero, then initiate a shutdown
by killing both processes simultaneously. The association
will never be freed and the chunks on the retransmission
queue will be retransmitted indefinitely.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Thomas Graf [Fri, 8 Jul 2011 04:37:46 +0000 (04:37 +0000)]
sctp: ABORT if receive, reassmbly, or reodering queue is not empty while closing socket
commit
cd4fcc704f30f2064ab30b5300d44d431e46db50 upstream.
Trigger user ABORT if application closes a socket which has data
queued on the socket receive queue or chunks waiting on the
reassembly or ordering queue as this would imply data being lost
which defeats the point of a graceful shutdown.
This behavior is already practiced in TCP.
We do not check the input queue because that would mean to parse
all chunks on it to look for unacknowledged data which seems too
much of an effort. Control chunks or duplicated chunks may also
be in the input queue and should not be stopping a graceful
shutdown.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Neil Horman [Mon, 16 Jul 2012 09:13:51 +0000 (09:13 +0000)]
sctp: Fix list corruption resulting from freeing an association on a list
commit
2eebc1e188e9e45886ee00662519849339884d6d upstream.
A few days ago Dave Jones reported this oops:
[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137]
ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo
ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140]
ffff880147c03a10
[22766.387140]
ffffffffa169f2b6
[22766.387140]
ffff88013ed95728
[22766.387143]
0000000000000002
[22766.387143]
0000000000000000
[22766.387143]
ffff880003fad062
[22766.387144]
ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145] <IRQ>
[22766.387150] [<
ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154] [<
ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157] [<
ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161] [<
ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163] [<
ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166] [<
ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168] [<
ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169] [<
ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171] [<
ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172] [<
ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174] [<
ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176] [<
ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178] [<
ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180] [<
ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182] [<
ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183] [<
ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185] [<
ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187] [<
ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218] [<
ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242] [<
ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266] [<
ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268] [<
ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270] [<
ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273] [<
ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275] [<
ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278] [<
ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279] [<
ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281] [<
ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283] [<
ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283] <EOI>
[22766.387284]
[22766.387285] [<
ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311] [<
ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311] RSP <
ffff880147c039b0>
[22766.387142]
ffffffffa16ab120
[22766.599537] ---[ end trace
3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.
Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance. What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table. the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.
I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete. That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.
I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.
I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising). Given the trace above
however, I think its likely that this is what we hit.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: davej@redhat.com
CC: davej@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Wei Yongjun [Thu, 31 Mar 2011 23:42:55 +0000 (23:42 +0000)]
sctp: malloc enough room for asconf-ack chunk
commit
2cab86bee8e7f353e6ac8c15b3eb906643497644 upstream.
Sometime the ASCONF_ACK parameters can equal to the fourfold of
ASCONF parameters, this only happend in some special case:
ASCONF parameter is :
Unrecognized Parameter (4 bytes)
ASCONF_ACK parameter should be:
Error Cause Indication parameter (8 bytes header)
+ Error Cause (4 bytes header)
+ Unrecognized Parameter (4bytes)
Four 4bytes Unrecognized Parameters in ASCONF chunk will cause panic.
Pid: 0, comm: swapper Not tainted 2.6.38-next+ #22 Bochs Bochs
EIP: 0060:[<
c0717eae>] EFLAGS:
00010246 CPU: 0
EIP is at skb_put+0x60/0x70
EAX:
00000077 EBX:
c09060e2 ECX:
dec1dc30 EDX:
c09469c0
ESI:
00000000 EDI:
de3c8d40 EBP:
dec1dc58 ESP:
dec1dc2c
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=
dec1c000 task=
c09aef20 task.ti=
c0980000)
Stack:
c09469c0 e1894fa4 00000044 00000004 de3c8d00 de3c8d00 de3c8d44 de3c8d40
c09060e2 de25dd80 de3c8d40 dec1dc7c e1894fa4 dec1dcb0 00000040 00000004
00000000 00000800 00000004 00000004 dec1dce0 e1895a2b dec1dcb4 de25d960
Call Trace:
[<
e1894fa4>] ? sctp_addto_chunk+0x4e/0x89 [sctp]
[<
e1894fa4>] sctp_addto_chunk+0x4e/0x89 [sctp]
[<
e1895a2b>] sctp_process_asconf+0x32f/0x3d1 [sctp]
[<
e188d554>] sctp_sf_do_asconf+0xf8/0x173 [sctp]
[<
e1890b02>] sctp_do_sm+0xb8/0x159 [sctp]
[<
e18a2248>] ? sctp_cname+0x0/0x52 [sctp]
[<
e189392d>] sctp_assoc_bh_rcv+0xac/0xe3 [sctp]
[<
e1897d76>] sctp_inq_push+0x2d/0x30 [sctp]
[<
e18a21b2>] sctp_rcv+0x7a7/0x83d [sctp]
[<
c077a95c>] ? ipv4_confirm+0x118/0x125
[<
c073a970>] ? nf_iterate+0x34/0x62
[<
c074789d>] ? ip_local_deliver_finish+0x0/0x194
[<
c074789d>] ? ip_local_deliver_finish+0x0/0x194
[<
c0747992>] ip_local_deliver_finish+0xf5/0x194
[<
c074789d>] ? ip_local_deliver_finish+0x0/0x194
[<
c0747a6e>] NF_HOOK.clone.1+0x3d/0x44
[<
c0747ab3>] ip_local_deliver+0x3e/0x44
[<
c074789d>] ? ip_local_deliver_finish+0x0/0x194
[<
c074775c>] ip_rcv_finish+0x29f/0x2c7
[<
c07474bd>] ? ip_rcv_finish+0x0/0x2c7
[<
c0747a6e>] NF_HOOK.clone.1+0x3d/0x44
[<
c0747cae>] ip_rcv+0x1f5/0x233
[<
c07474bd>] ? ip_rcv_finish+0x0/0x2c7
[<
c071dce3>] __netif_receive_skb+0x310/0x336
[<
c07221f3>] netif_receive_skb+0x4b/0x51
[<
e0a4ed3d>] cp_rx_poll+0x1e7/0x29c [8139cp]
[<
c072275e>] net_rx_action+0x65/0x13a
[<
c0445a54>] __do_softirq+0xa1/0x149
[<
c04459b3>] ? __do_softirq+0x0/0x149
<IRQ>
[<
c0445891>] ? irq_exit+0x37/0x72
[<
c040a7e9>] ? do_IRQ+0x81/0x95
[<
c07b3670>] ? common_interrupt+0x30/0x38
[<
c0428058>] ? native_safe_halt+0xa/0xc
[<
c040f5d7>] ? default_idle+0x58/0x92
[<
c0408fb0>] ? cpu_idle+0x96/0xb2
[<
c0797989>] ? rest_init+0x5d/0x5f
[<
c09fd90c>] ? start_kernel+0x34b/0x350
[<
c09fd0cb>] ? i386_start_kernel+0xba/0xc1
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Jan Kara [Tue, 10 Jul 2012 15:58:04 +0000 (17:58 +0200)]
udf: Improve table length check to avoid possible overflow
commit
57b9655d01ef057a523e810d29c37ac09b80eead upstream.
When a partition table length is corrupted to be close to 1 << 32, the
check for its length may overflow on 32-bit systems and we will think
the length is valid. Later on the kernel can crash trying to read beyond
end of buffer. Fix the check to avoid possible overflow.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Jan Kara [Wed, 27 Jun 2012 18:20:22 +0000 (20:20 +0200)]
udf: Avoid run away loop when partition table length is corrupted
commit
adee11b2085bee90bd8f4f52123ffb07882d6256 upstream.
Check provided length of partition table so that (possibly maliciously)
corrupted partition table cannot cause accessing data beyond current buffer.
Signed-off-by: Jan Kara <jack@suse.cz>
[PG: in 2.6.34 udf_err() is called udf_error()]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Jan Kara [Wed, 27 Jun 2012 19:23:07 +0000 (21:23 +0200)]
udf: Fortify loading of sparing table
commit
1df2ae31c724e57be9d7ac00d78db8a5dabdd050 upstream.
Add sanity checks when loading sparing table from disk to avoid accessing
unallocated memory or writing to it.
Signed-off-by: Jan Kara <jack@suse.cz>
[PG: in 2.6.34 udf_err() is called udf_error()]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Krzysztof Hałasa [Mon, 12 Dec 2011 13:51:00 +0000 (14:51 +0100)]
USB: cdc-acm: add IDs for Motorola H24 HSPA USB module.
commit
6abff5dc4d5a2c90e597137ce8987e7fd439259b upstream.
Add USB IDs for Motorola H24 HSPA USB module.
Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Andrea Arcangeli [Wed, 14 Dec 2011 02:41:15 +0000 (21:41 -0500)]
ext4: avoid hangs in ext4_da_should_update_i_disksize()
commit
ea51d132dbf9b00063169c1159bee253d9649224 upstream.
If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.
gdb> bt
#0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
#1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
#2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
#3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
#4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
xffff88001e26be40) at mm/filemap.c:2600
#5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
zed out>, pos=<value optimized out>) at mm/filemap.c:2632
#6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
t fs/ext4/file.c:136
#7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
ppos=0xffff88001e26bf48) at fs/read_write.c:406
#8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
000, pos=0xffff88001e26bf48) at fs/read_write.c:435
#9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
4000) at fs/read_write.c:487
#10 <signal handler called>
#11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
#12 0x0000000000000000 in ?? ()
gdb> print offset
$22 = 0xffffffffffffffff
gdb> print idx
$23 = 0xffffffff
gdb> print inode->i_blkbits
$24 = 0xc
gdb> up
#1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
2512 if (ext4_da_should_update_i_disksize(page, end)) {
gdb> print start
$25 = 0x0
gdb> print end
$26 = 0xffffffffffffffff
gdb> print pos
$27 = 0x108000
gdb> print new_i_size
$28 = 0x108000
gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
$29 = 0xd9000
gdb> down
2467 for (i = 0; i < idx; i++)
gdb> print i
$30 = 0xd44acbee
This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Robert Richter [Mon, 12 Dec 2011 23:40:35 +0000 (00:40 +0100)]
oprofile, x86: Fix nmi-unsafe callgraph support
commit
a0e3e70243f5b270bc3eca718f0a9fa5e6b8262e upstream.
Backport for stable kernel v2.6.32.y to v2.6.36.y.
Current oprofile's x86 callgraph support may trigger page faults
throwing the BUG_ON(in_nmi()) message below. This patch fixes this by
using the same nmi-safe copy-from-user code as in perf.
------------[ cut here ]------------
kernel BUG at .../arch/x86/kernel/traps.c:436!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:07:00.0/0000:08:04.0/net/eth0/broadcast
CPU 5
Modules linked in:
Pid: 8611, comm: opcontrol Not tainted
2.6.39-00007-gfe47ae7 #1 Advanced Micro Device Anaheim/Anaheim
RIP: 0010:[<
ffffffff813e8e35>] [<
ffffffff813e8e35>] do_nmi+0x22/0x1ee
RSP: 0000:
ffff88042fd47f28 EFLAGS:
00010002
RAX:
ffff88042c0a7fd8 RBX:
0000000000000001 RCX:
00000000c0000101
RDX:
00000000ffff8804 RSI:
ffffffffffffffff RDI:
ffff88042fd47f58
RBP:
ffff88042fd47f48 R08:
0000000000000004 R09:
0000000000001484
R10:
0000000000000001 R11:
0000000000000000 R12:
ffff88042fd47f58
R13:
0000000000000000 R14:
ffff88042fd47d98 R15:
0000000000000020
FS:
00007fca25e56700(0000) GS:
ffff88042fd40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000074 CR3:
000000042d28b000 CR4:
00000000000006e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process opcontrol (pid: 8611, threadinfo
ffff88042c0a6000, task
ffff88042c532310)
Stack:
0000000000000000 0000000000000001 ffff88042c0a7fd8 0000000000000000
ffff88042fd47de8 ffffffff813e897a 0000000000000020 ffff88042fd47d98
0000000000000000 ffff88042c0a7fd8 ffff88042fd47de8 0000000000000074
Call Trace:
<NMI>
[<
ffffffff813e897a>] nmi+0x1a/0x20
[<
ffffffff813f08ab>] ? bad_to_user+0x25/0x771
<<EOE>>
Code: ff 59 5b 41 5c 41 5d c9 c3 55 65 48 8b 04 25 88 b5 00 00 48 89 e5 41 55 41 54 49 89 fc 53 48 83 ec 08 f6 80 47 e0 ff ff 04 74 04 <0f> 0b eb fe 81 80 44 e0 ff ff 00 00 01 04 65 ff 04 25 c4 0f 01
RIP [<
ffffffff813e8e35>] do_nmi+0x22/0x1ee
RSP <
ffff88042fd47f28>
---[ end trace
ed6752185092104b ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 8611, comm: opcontrol Tainted: G D
2.6.39-00007-gfe47ae7 #1
Call Trace:
<NMI> [<
ffffffff813e5e0a>] panic+0x8c/0x188
[<
ffffffff813e915c>] oops_end+0x81/0x8e
[<
ffffffff8100403d>] die+0x55/0x5e
[<
ffffffff813e8c45>] do_trap+0x11c/0x12b
[<
ffffffff810023c8>] do_invalid_op+0x91/0x9a
[<
ffffffff813e8e35>] ? do_nmi+0x22/0x1ee
[<
ffffffff8131e6fa>] ? oprofile_add_sample+0x83/0x95
[<
ffffffff81321670>] ? op_amd_check_ctrs+0x4f/0x2cf
[<
ffffffff813ee4d5>] invalid_op+0x15/0x20
[<
ffffffff813e8e35>] ? do_nmi+0x22/0x1ee
[<
ffffffff813e8e7a>] ? do_nmi+0x67/0x1ee
[<
ffffffff813e897a>] nmi+0x1a/0x20
[<
ffffffff813f08ab>] ? bad_to_user+0x25/0x771
<<EOE>>
Cc: John Lumby <johnlumby@hotmail.com>
Cc: Maynard Johnson <maynardj@us.ibm.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Xiao Guangrong [Sun, 22 Aug 2010 11:08:57 +0000 (19:08 +0800)]
export __get_user_pages_fast() function
commit
45888a0c6edc305495b6bd72a30e66bc40b324c6 upstream.
This function is used by KVM to pin process's page in the atomic context.
Define the 'weak' function to avoid other architecture not support it
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Phillip Lougher [Wed, 2 Nov 2011 20:38:01 +0000 (13:38 -0700)]
hfs: fix hfs_find_init() sb->ext_tree NULL ptr oops
commit
434a964daa14b9db083ce20404a4a2add54d037a upstream.
Clement Lecigne reports a filesystem which causes a kernel oops in
hfs_find_init() trying to dereference sb->ext_tree which is NULL.
This proves to be because the filesystem has a corrupted MDB extent
record, where the extents file does not fit into the first three extents
in the file record (the first blocks).
In hfs_get_block() when looking up the blocks for the extent file
(HFS_EXT_CNID), it fails the first blocks special case, and falls
through to the extent code (which ultimately calls hfs_find_init())
which is in the process of being initialised.
Hfs avoids this scenario by always having the extents b-tree fitting
into the first blocks (the extents B-tree can't have overflow extents).
The fix is to check at mount time that the B-tree fits into first
blocks, i.e. fail if HFS_I(inode)->alloc_blocks >=
HFS_I(inode)->first_blocks
Note, the existing commit
47f365eb57573 ("hfs: fix oops on mount with
corrupted btree extent records") becomes subsumed into this as a special
case, but only for the extents B-tree (HFS_EXT_CNID), it is perfectly
acceptable for the catalog B-Tree file to grow beyond three extents,
with the remaining extent descriptors in the extents overfow.
This fixes CVE-2011-2203
Reported-by: Clement LECIGNE <clement.lecigne@netasq.com>
Signed-off-by: Phillip Lougher <plougher@redhat.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Linus Torvalds [Tue, 20 Sep 2011 00:04:37 +0000 (17:04 -0700)]
Make TASKSTATS require root access
commit
1a51410abe7d0ee4b1d112780f46df87d3621043 upstream.
Ok, this isn't optimal, since it means that 'iotop' needs admin
capabilities, and we may have to work on this some more. But at the
same time it is very much not acceptable to let anybody just read
anybody elses IO statistics quite at this level.
Use of the GENL_ADMIN_PERM suggested by Johannes Berg as an alternative
to checking the capabilities by hand.
Reported-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eryu Guan [Tue, 1 Nov 2011 23:04:59 +0000 (19:04 -0400)]
jbd/jbd2: validate sb->s_first in journal_get_superblock()
commit
8762202dd0d6e46854f786bdb6fb3780a1625efe upstream.
I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
image has s_first = 0 in journal superblock, and the 0 is passed to
journal->j_head in journal_reset(), then to blocknr in
cleanup_journal_tail(), in the end the J_ASSERT failed.
So validate s_first after reading journal superblock from disk in
journal_get_superblock() to ensure s_first is valid.
The following script could reproduce it:
fstype=ext3
blocksize=1024
img=$fstype.img
offset=0
found=0
magic="c0 3b 39 98"
dd if=/dev/zero of=$img bs=1M count=8
mkfs -t $fstype -b $blocksize -F $img
filesize=`stat -c %s $img`
while [ $offset -lt $filesize ]
do
if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
echo "Found journal: $offset"
found=1
break
fi
offset=`echo "$offset+$blocksize" | bc`
done
if [ $found -ne 1 ];then
echo "Magic \"$magic\" not found"
exit 1
fi
dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1
mkdir -p ./mnt
mount -o loop $img ./mnt
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Linus Torvalds [Tue, 13 Dec 2011 06:06:55 +0000 (22:06 -0800)]
linux/log2.h: Fix rounddown_pow_of_two(1)
commit
13c07b0286d340275f2d97adf085cecda37ede37 upstream.
Exactly like roundup_pow_of_two(1), the rounddown version was buggy for
the case of a compile-time constant '1' argument. Probably because it
originated from the same code, sharing history with the roundup version
from before the bugfix (for that one, see commit
1a06a52ee1b0: "Fix
roundup_pow_of_two(1)").
However, unlike the roundup version, the fix for rounddown is to just
remove the broken special case entirely. It's simply not needed - the
generic code
1UL << ilog2(n)
does the right thing for the constant '1' argment too. The only reason
roundup needed that special case was because rounding up does so by
subtracting one from the argument (and then adding one to the result)
causing the obvious problems with "ilog2(0)".
But rounddown doesn't do any of that, since ilog2() naturally truncates
(ie "rounds down") to the right rounded down value. And without the
ilog2(0) case, there's no reason for the special case that had the wrong
value.
tl;dr: rounddown_pow_of_two(1) should be 1, not 0.
Acked-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tushar Gohad [Thu, 28 Jul 2011 10:36:20 +0000 (10:36 +0000)]
xfrm: Fix key lengths for rfc3686(ctr(aes))
commit
4203223a1aed862b4445fdcd260d6139603a51d9 upstream.
Fix the min and max bit lengths for AES-CTR (RFC3686) keys.
The number of bits in key spec is the key length (128/256)
plus 32 bits of nonce.
This change takes care of the "Invalid key length" errors
reported by setkey when specifying 288 bit keys for aes-ctr.
Signed-off-by: Tushar Gohad <tgohad@mvista.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tejun Heo [Fri, 18 Nov 2011 18:55:35 +0000 (10:55 -0800)]
percpu: fix chunk range calculation
commit
a855b84c3d8c73220d4d3cd392a7bee7c83de70e upstream.
Percpu allocator recorded the cpus which map to the first and last
units in pcpu_first/last_unit_cpu respectively and used them to
determine the address range of a chunk - e.g. it assumed that the
first unit has the lowest address in a chunk while the last unit has
the highest address.
This simply isn't true. Groups in a chunk can have arbitrary positive
or negative offsets from the previous one and there is no guarantee
that the first unit occupies the lowest offset while the last one the
highest.
Fix it by actually comparing unit offsets to determine cpus occupying
the lowest and highest offsets. Also, rename pcu_first/last_unit_cpu
to pcpu_low/high_unit_cpu to avoid confusion.
The chunk address range is used to flush cache on vmalloc area
map/unmap and decide whether a given address is in the first chunk by
per_cpu_ptr_to_phys() and the bug was discovered by invalid
per_cpu_ptr_to_phys() translation for crash_note.
Kudos to Dave Young for tracking down the problem.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: WANG Cong <xiyou.wangcong@gmail.com>
Reported-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
LKML-Reference: <
4EC21F67.10905@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tejun Heo [Fri, 18 Jun 2010 09:44:31 +0000 (11:44 +0200)]
percpu: fix first chunk match in per_cpu_ptr_to_phys()
commit
9983b6f0cf8263e51bcf4c8a9dc0c1ef175b3c60 upstream.
per_cpu_ptr_to_phys() determines whether the passed in @addr belongs
to the first_chunk or not by just matching the address against the
address range of the base unit (unit0, used by cpu0). When an adress
from another cpu was passed in, it will always determine that the
address doesn't belong to the first chunk even when it does. This
makes the function return a bogus physical address which may lead to
crash.
This problem was discovered by Cliff Wickman while investigating a
crash during kdump on a SGI UV system.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Cliff Wickman <cpw@sgi.com>
Tested-by: Cliff Wickman <cpw@sgi.com>
[PG: for 2.6.34, diffstat differs slightly due to a trivial indenting
difference, and 34 does not have the _maybe_unused annotation to delete]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Robert Richter [Thu, 26 May 2011 16:39:35 +0000 (18:39 +0200)]
oprofile: Fix locking dependency in sync_start()
commit
130c5ce716c9bfd1c2a2ec840a746eb7ff9ce1e6 upstream.
This fixes the A->B/B->A locking dependency, see the warning below.
The function task_exit_notify() is called with (task_exit_notifier)
.rwsem set and then calls sync_buffer() which locks buffer_mutex. In
sync_start() the buffer_mutex was set to prevent notifier functions to
be started before sync_start() is finished. But when registering the
notifier, (task_exit_notifier).rwsem is locked too, but now in
different order than in sync_buffer(). In theory this causes a locking
dependency, what does not occur in practice since task_exit_notify()
is always called after the notifier is registered which means the lock
is already released.
However, after checking the notifier functions it turned out the
buffer_mutex in sync_start() is unnecessary. This is because
sync_buffer() may be called from the notifiers even if sync_start()
did not finish yet, the buffers are already allocated but empty. No
need to protect this with the mutex.
So we fix this theoretical locking dependency by removing buffer_mutex
in sync_start(). This is similar to the implementation before commit:
750d857 oprofile: fix crash when accessing freed task structs
which introduced the locking dependency.
Lockdep warning:
oprofiled/4447 is trying to acquire lock:
(buffer_mutex){+.+...}, at: [<
ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile]
but task is already holding lock:
((task_exit_notifier).rwsem){++++..}, at: [<
ffffffff81058026>] __blocking_notifier_call_chain+0x39/0x67
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 ((task_exit_notifier).rwsem){++++..}:
[<
ffffffff8106557f>] lock_acquire+0xf8/0x11e
[<
ffffffff81463a2b>] down_write+0x44/0x67
[<
ffffffff810581c0>] blocking_notifier_chain_register+0x52/0x8b
[<
ffffffff8105a6ac>] profile_event_register+0x2d/0x2f
[<
ffffffffa00013c1>] sync_start+0x47/0xc6 [oprofile]
[<
ffffffffa00001bb>] oprofile_setup+0x60/0xa5 [oprofile]
[<
ffffffffa00014e3>] event_buffer_open+0x59/0x8c [oprofile]
[<
ffffffff810cd3b9>] __dentry_open+0x1eb/0x308
[<
ffffffff810cd59d>] nameidata_to_filp+0x60/0x67
[<
ffffffff810daad6>] do_last+0x5be/0x6b2
[<
ffffffff810dbc33>] path_openat+0xc7/0x360
[<
ffffffff810dbfc5>] do_filp_open+0x3d/0x8c
[<
ffffffff810ccfd2>] do_sys_open+0x110/0x1a9
[<
ffffffff810cd09e>] sys_open+0x20/0x22
[<
ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
-> #0 (buffer_mutex){+.+...}:
[<
ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
[<
ffffffff8106557f>] lock_acquire+0xf8/0x11e
[<
ffffffff814634f0>] mutex_lock_nested+0x63/0x309
[<
ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile]
[<
ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile]
[<
ffffffff81467b96>] notifier_call_chain+0x37/0x63
[<
ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67
[<
ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16
[<
ffffffff8105a718>] profile_task_exit+0x1a/0x1c
[<
ffffffff81039e8f>] do_exit+0x2a/0x6fc
[<
ffffffff8103a5e4>] do_group_exit+0x83/0xae
[<
ffffffff8103a626>] sys_exit_group+0x17/0x1b
[<
ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
1 lock held by oprofiled/4447:
#0: ((task_exit_notifier).rwsem){++++..}, at: [<
ffffffff81058026>] __blocking_notifier_call_chain+0x39/0x67
stack backtrace:
Pid: 4447, comm: oprofiled Not tainted
2.6.39-00007-gcf4d8d4 #10
Call Trace:
[<
ffffffff81063193>] print_circular_bug+0xae/0xbc
[<
ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
[<
ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
[<
ffffffff8106557f>] lock_acquire+0xf8/0x11e
[<
ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
[<
ffffffff81062627>] ? mark_lock+0x42f/0x552
[<
ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
[<
ffffffff814634f0>] mutex_lock_nested+0x63/0x309
[<
ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
[<
ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile]
[<
ffffffff81058026>] ? __blocking_notifier_call_chain+0x39/0x67
[<
ffffffff81058026>] ? __blocking_notifier_call_chain+0x39/0x67
[<
ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile]
[<
ffffffff81467b96>] notifier_call_chain+0x37/0x63
[<
ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67
[<
ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16
[<
ffffffff8105a718>] profile_task_exit+0x1a/0x1c
[<
ffffffff81039e8f>] do_exit+0x2a/0x6fc
[<
ffffffff81465031>] ? retint_swapgs+0xe/0x13
[<
ffffffff8103a5e4>] do_group_exit+0x83/0xae
[<
ffffffff8103a626>] sys_exit_group+0x17/0x1b
[<
ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Carl Love <carll@us.ibm.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Robert Richter [Thu, 26 May 2011 16:22:54 +0000 (18:22 +0200)]
oprofile: Free potentially owned tasks in case of errors
commit
6ac6519b93065625119a347be1cbcc1b89edb773 upstream.
After registering the task free notifier we possibly have tasks in our
dying_tasks list. Free them after unregistering the notifier in case
of an error.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Hans Verkuil [Mon, 14 Nov 2011 18:20:49 +0000 (19:20 +0100)]
ARM: davinci: dm646x evm: wrong register used in setup_vpif_input_channel_mode
commit
83713fc9373be2e943f82e9d36213708c6b0050e upstream.
The function setup_vpif_input_channel_mode() used the VSCLKDIS register
instead of VIDCLKCTL. This meant that when in HD mode videoport channel 0
used a different clock from channel 1.
Clearly a copy-and-paste error.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Manjunath Hadli <manjunath.hadli@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Takashi Iwai [Fri, 2 Dec 2011 14:29:12 +0000 (15:29 +0100)]
ALSA: hda/realtek - Fix Oops in alc_mux_select()
commit
cce4aa378a049f4275416ee6302dd24f37b289df upstream.
When no imux is available (e.g. a single capture source),
alc_auto_init_input_src() may trigger an Oops due to the access to -1.
Add a proper zero-check to avoid it.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[PG: in mainline,
21268961d3 rewrites and creates alc_mux_select, but the
code that needed the check still existed prior to that in alc_mux_enum_put]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
David Dillow [Fri, 2 Dec 2011 04:26:53 +0000 (23:26 -0500)]
ALSA: sis7019 - give slow codecs more time to reset
commit
fc084e0b930d546872ab23667052499f7daf0fed upstream.
There are some AC97 codec and board combinations that have been observed
to take a very long time to respond after the cold reset has completed.
In one case, more than 350 ms was required. To allow users to have sound
on those platforms, we'll wait up to 500ms for the codec to become
ready.
As a board may have multiple codecs, with some faster than others to
reset, we add a module parameter to inform the driver which codecs
should be present.
Reported-by: KotCzarny <tjosko@yahoo.com>
Signed-off-by: David Dillow <dave@thedillows.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Thomas Gleixner [Fri, 2 Dec 2011 11:34:16 +0000 (12:34 +0100)]
tick-broadcast: Stop active broadcast device when replacing it
commit
c1be84309c58b1e7c6d626e28fba41a22b364c3d upstream.
When a better rated broadcast device is installed, then the current
active device is not disabled, which results in two running broadcast
devices.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Ido Yariv [Thu, 1 Dec 2011 11:55:08 +0000 (13:55 +0200)]
genirq: Fix race condition when stopping the irq thread
commit
550acb19269d65f32e9ac4ddb26c2b2070e37f1c upstream.
In irq_wait_for_interrupt(), the should_stop member is verified before
setting the task's state to TASK_INTERRUPTIBLE and calling schedule().
In case kthread_stop sets should_stop and wakes up the process after
should_stop is checked by the irq thread but before the task's state
is changed, the irq thread might never exit:
kthread_stop irq_wait_for_interrupt
------------ ----------------------
...
... while (!kthread_should_stop()) {
kthread->should_stop = 1;
wake_up_process(k);
wait_for_completion(&kthread->exited);
...
set_current_state(TASK_INTERRUPTIBLE);
...
schedule();
}
Fix this by checking if the thread should stop after modifying the
task's state.
[ tglx: Simplified it a bit ]
Signed-off-by: Ido Yariv <ido@wizery.com>
Link: http://lkml.kernel.org/r/1322740508-22640-1-git-send-email-ido@wizery.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Robert Richter [Mon, 10 Oct 2011 14:21:10 +0000 (16:21 +0200)]
oprofile, x86: Fix crash when unloading module (nmi timer mode)
commit
97f7f8189fe54e3cfe324ef9ad35064f3d2d3bff upstream.
If oprofile uses the nmi timer interrupt there is a crash while
unloading the module. The bug can be triggered with oprofile build as
module and kernel parameter nolapic set. This patch fixes this.
oprofile: using NMI timer interrupt.
BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
IP: [<
ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
PGD
42dbca067 PUD
41da6a067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
CPU 5
Modules linked in: oprofile(-) [last unloaded: oprofile]
Pid: 2518, comm: modprobe Not tainted
3.1.0-rc7-00019-gb2fb49d #19 Advanced Micro Device Anaheim/Anaheim
RIP: 0010:[<
ffffffff8123c226>] [<
ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
RSP: 0018:
ffff88041ef71e98 EFLAGS:
00010296
RAX:
0000000000000000 RBX:
ffffffffa0017100 RCX:
dead000000200200
RDX:
0000000000000000 RSI:
dead000000100100 RDI:
ffffffff8178c620
RBP:
ffff88041ef71ea8 R08:
0000000000000001 R09:
0000000000000082
R10:
0000000000000000 R11:
ffff88041ef71de8 R12:
0000000000000080
R13:
fffffffffffffff5 R14:
0000000000000001 R15:
0000000000610210
FS:
00007fc902f20700(0000) GS:
ffff88042fd40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
0000000000000008 CR3:
000000041cdb6000 CR4:
00000000000006e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process modprobe (pid: 2518, threadinfo
ffff88041ef70000, task
ffff88041d348040)
Stack:
ffff88041ef71eb8 ffffffffa0017790 ffff88041ef71eb8 ffffffffa0013532
ffff88041ef71ec8 ffffffffa00132d6 ffff88041ef71ed8 ffffffffa00159b2
ffff88041ef71f78 ffffffff81073115 656c69666f72706f 0000000000610200
Call Trace:
[<
ffffffffa0013532>] op_nmi_exit+0x15/0x17 [oprofile]
[<
ffffffffa00132d6>] oprofile_arch_exit+0xe/0x10 [oprofile]
[<
ffffffffa00159b2>] oprofile_exit+0x1e/0x20 [oprofile]
[<
ffffffff81073115>] sys_delete_module+0x1c3/0x22f
[<
ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<
ffffffff8148070b>] system_call_fastpath+0x16/0x1b
Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81
89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b
RIP [<
ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
RSP <
ffff88041ef71e98>
CR2:
0000000000000008
---[ end trace
43a541a52956b7b0 ]---
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Bjorn Helgaas [Sun, 25 Sep 2011 21:29:00 +0000 (15:29 -0600)]
x86/mpparse: Account for bus types other than ISA and PCI
commit
9e6866686bdf2dcf3aeb0838076237ede532dcc8 upstream.
In commit
f8924e770e04 ("x86: unify mp_bus_info"), the 32-bit
and 64-bit versions of MP_bus_info were rearranged to match each
other better. Unfortunately it introduced a regression: prior
to that change we used to always set the mp_bus_not_pci bit,
then clear it if we found a PCI bus. After it, we set
mp_bus_not_pci for ISA buses, clear it for PCI buses, and leave
it alone otherwise.
In the cases of ISA and PCI, there's not much difference. But
ISA is not the only non-PCI bus, so it's better to always set
mp_bus_not_pci and clear it only for PCI.
Without this change, Dan's Dell PowerEdge 4200 panics on boot
with a log indicating interrupt routing trouble unless the
"noapic" option is supplied. With this change, the machine
boots reliably without "noapic".
Fixes http://bugs.debian.org/586494
Reported-bisected-and-tested-by: Dan McGrath <troubledaemon@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dan McGrath <troubledaemon@gmail.com>
Cc: Alexey Starikovskiy <aystarik@gmail.com>
[jrnieder@gmail.com: clarified commit message]
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Link: http://lkml.kernel.org/r/20111122215000.GA9151@elie.hsd1.il.comcast.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Salman Qazi [Tue, 15 Nov 2011 22:12:06 +0000 (14:12 -0800)]
sched, x86: Avoid unnecessary overflow in sched_clock
commit
4cecf6d401a01d054afc1e5f605bcbfe553cb9b9 upstream.
(Added the missing signed-off-by line)
In hundreds of days, the __cycles_2_ns calculation in sched_clock
has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero. We can solve this without losing
any precision.
We can decompose TSC into quotient and remainder of division by the
scale factor, and then use this to convert TSC into nanoseconds.
Signed-off-by: Salman Qazi <sqazi@google.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111115221121.7262.88871.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Hannes Reinecke [Wed, 9 Nov 2011 07:39:24 +0000 (08:39 +0100)]
Silencing 'killing requests for dead queue'
commit
745718132c3c7cac98a622b610e239dcd5217f71 upstream.
When we tear down a device we try to flush all outstanding
commands in scsi_free_queue(). However the check in
scsi_request_fn() is imperfect as it only signals that
we _might start_ aborting commands, not that we've actually
aborted some.
So move the printk inside the scsi_kill_request function,
this will also give us a hint about which commands are aborted.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Qinglin Ye [Wed, 23 Nov 2011 15:39:32 +0000 (23:39 +0800)]
USB: usb-storage: unusual_devs entry for Kingston DT 101 G2
commit
cec28a5428793b6bc64e56687fb239759d6da74e upstream.
Kingston DT 101 G2 replies a wrong tag while transporting, add an
unusal_devs entry to ignore the tag validation.
Signed-off-by: Qinglin Ye <yestyle@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Veli-Pekka Peltola [Thu, 24 Nov 2011 20:08:56 +0000 (22:08 +0200)]
usb: option: add SIMCom SIM5218
commit
ec0cd94d881ca89cc9fb61d00d0f4b2b52e605b3 upstream.
Tested with SIM5218EVB-KIT evaluation kit.
Signed-off-by: Veli-Pekka Peltola <veli-pekka.peltola@bluegiga.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Marcin Kościelnicki [Wed, 30 Nov 2011 16:01:04 +0000 (17:01 +0100)]
usb: ftdi_sio: add PID for Propox ISPcable III
commit
307369b0ca06b27b511b61714e335ddfccf19c4f upstream.
Signed-off-by: Marcin Kościelnicki <koriakin@0x04.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Dan Carpenter [Tue, 22 Nov 2011 07:28:31 +0000 (10:28 +0300)]
USB: whci-hcd: fix endian conversion in qset_clear()
commit
8746c83d538cab273d335acb2be226d096f4a5af upstream.
qset->qh.link is an __le64 field and we should be using cpu_to_le64()
to fill it.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Federico Vaga [Sat, 29 Oct 2011 07:47:39 +0000 (09:47 +0200)]
Staging: comedi: fix signal handling in read and write
commit
6a9ce6b654e491981f6ef7e214cbd4f63e033848 upstream.
After sleeping on a wait queue, signal_pending(current) should be
checked (not before sleeping).
Acked-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Bernd Porr [Tue, 8 Nov 2011 21:23:03 +0000 (21:23 +0000)]
staging: comedi: fix oops for USB DAQ devices.
commit
3ffab428f40849ed5f21bcfd7285bdef7902f9ca upstream.
This fixes kernel oops when an USB DAQ device is plugged out while it's
communicating with the userspace software.
Signed-off-by: Bernd Porr <berndporr@f2s.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Bart Westgeest [Tue, 1 Nov 2011 19:01:28 +0000 (15:01 -0400)]
staging: usbip: bugfix for deadlock
commit
438957f8d4a84daa7fa5be6978ad5897a2e9e5e5 upstream.
Interrupts must be disabled prior to calling usb_hcd_unlink_urb_from_ep.
If interrupts are not disabled, it can potentially lead to a deadlock.
The deadlock is readily reproduceable on a slower (ARM based) device
such as the TI Pandaboard.
Signed-off-by: Bart Westgeest <bart@elbrys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eliad Peller [Thu, 24 Nov 2011 16:13:56 +0000 (18:13 +0200)]
nl80211: fix MAC address validation
commit
e007b857e88097c96c45620bf3b04a4e309053d1 upstream.
MAC addresses have a fixed length. The current
policy allows passing < ETH_ALEN bytes, which
might result in reading beyond the buffer.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Michael Büsch [Wed, 16 Nov 2011 22:55:46 +0000 (23:55 +0100)]
p54spi: Fix workqueue deadlock
commit
2d1618170eb493d18f66f2ac03775409a6fb97c6 upstream.
priv->work must not be synced while priv->mutex is locked, because
the mutex is taken in the work handler.
Move cancel_work_sync down to after the device shutdown code.
This is safe, because the work handler checks fw_state and bails out
early in case of a race.
Signed-off-by: Michael Buesch <m@bues.ch>
Acked-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Michael Büsch [Wed, 16 Nov 2011 22:48:31 +0000 (23:48 +0100)]
p54spi: Add missing spin_lock_init
commit
32d3a3922d617a5a685a5e2d24b20d0e88f192a9 upstream.
The tx_lock is not initialized properly. Add spin_lock_init().
Signed-off-by: Michael Buesch <m@bues.ch>
Acked-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Hector Palacios [Mon, 14 Nov 2011 10:15:25 +0000 (11:15 +0100)]
timekeeping: add arch_offset hook to ktime_get functions
commit
d004e024058a0eaca097513ce62cbcf978913e0a upstream.
ktime_get and ktime_get_ts were calling timekeeping_get_ns()
but later they were not calling arch_gettimeoffset() so architectures
using this mechanism returned 0 ns when calling these functions.
This happened for example when running Busybox's ping which calls
syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually
calls ktime_get. As a result the returned ping travel time was zero.
Signed-off-by: Hector Palacios <hector.palacios@digi.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Trond Myklebust [Tue, 22 Nov 2011 12:44:28 +0000 (14:44 +0200)]
SUNRPC: Ensure we return EAGAIN in xs_nospace if congestion is cleared
commit
24ca9a847791fd53d9b217330b15f3c285827a18 upstream.
By returning '0' instead of 'EAGAIN' when the tests in xs_nospace() fail
to find evidence of socket congestion, we are making the RPC engine believe
that the message was incorrectly sent and so it disconnects the socket
instead of just retrying.
The bug appears to have been introduced by commit
5e3771ce2d6a69e10fcc870cdf226d121d868491 (SUNRPC: Ensure that xs_nospace
return values are propagated).
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tim Blechmann [Tue, 22 Nov 2011 10:15:45 +0000 (11:15 +0100)]
ALSA: lx6464es - fix device communication via command bus
commit
a29878553a9a7b4c06f93c7e383527cf014d4ceb upstream.
commit
6175ddf06b6172046a329e3abfd9c901a43efd2e optimized the mem*io
functions that have been used to send commands to the device. these
optimizations somehow corrupted the communication with the lx6464es,
that resulted the device to be unusable with kernels after 2.6.33.
this patch emulates the memcpy_*_io functions via a loop to avoid these
problems.
Signed-off-by: Tim Blechmann <tim@klingt.org>
LKML-Reference: <
4ECB5257.
4040600@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Will Deacon [Mon, 14 Nov 2011 16:24:58 +0000 (17:24 +0100)]
ARM: 7161/1: errata: no automatic store buffer drain
commit
11ed0ba1754841316d4095478944300acf19acc3 upstream.
This patch implements a workaround for PL310 erratum 769419. On
revisions of the PL310 prior to r3p2, the Store Buffer does not
automatically drain. This can cause normal, non-cacheable writes to be
retained when the memory system is idle, leading to suboptimal I/O
performance for drivers using coherent DMA.
This patch adds an optional wmb() call to the cpu_idle loop. On systems
with an outer cache, this causes an explicit flush of the store buffer.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Bjorn Helgaas [Tue, 23 Aug 2011 16:16:43 +0000 (10:16 -0600)]
PCI hotplug: shpchp: don't blindly claim non-AMD 0x7450 device IDs
commit
4cac2eb158c6da0c761689345c6cc5df788a6292 upstream.
Previously we claimed device ID 0x7450, regardless of the vendor, which is
clearly wrong. Now we'll claim that device ID only for AMD.
I suspect this was just a typo in the original code, but it's possible this
change will break shpchp on non-7450 AMD bridges. If so, we'll have to fix
them as we find them.
Reference: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638863
Reported-by: Ralf Jung <ralfjung-e@gmx.de>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tyler Hicks [Wed, 23 Nov 2011 17:31:24 +0000 (11:31 -0600)]
eCryptfs: Extend array bounds for all filename chars
commit
0f751e641a71157aa584c2a2e22fda52b52b8a56 upstream.
From mhalcrow's original commit message:
Characters with ASCII values greater than the size of
filename_rev_map[] are valid filename characters.
ecryptfs_decode_from_filename() will access kernel memory beyond
that array, and ecryptfs_parse_tag_70_packet() will then decrypt
those characters. The attacker, using the FNEK of the crafted file,
can then re-encrypt the characters to reveal the kernel memory past
the end of the filename_rev_map[] array. I expect low security
impact since this array is statically allocated in the text area,
and the amount of memory past the array that is accessible is
limited by the largest possible ASCII filename character.
This patch solves the issue reported by mhalcrow but with an
implementation suggested by Linus to simply extend the length of
filename_rev_map[] to 256. Characters greater than 0x7A are mapped to
0x00, which is how invalid characters less than 0x7A were previously
being handled.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Jeffrey (Sheng-Hui) Chu [Wed, 23 Nov 2011 10:33:07 +0000 (11:33 +0100)]
i2c-algo-bit: Generate correct i2c address sequence for 10-bit target
commit
cc6bcf7d2ec2234e7b41770185e4dc826390185e upstream.
The wrong bits were put on the wire, fix that.
This fixes kernel bug #42562.
Signed-off-by: Sheng-Hui J. Chu <jeffchu@broadcom.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Suresh Siddha [Wed, 24 Oct 2012 22:24:30 +0000 (15:24 -0700)]
x86, ioapic: initialize nr_ioapic_registers early in mp_register_ioapic()
Lin Bao reported that one of the HP platforms failed to boot
2.6.32 kernel, when the BIOS enabled interrupt-remapping and
x2apic before handing over the control to the Linux kernel.
During boot, Linux kernel masks all the interrupt sources
(8259, IO-APIC RTE's), setup the interrupt-remapping hardware
with the OS controlled table and unmasks the 8259 interrupts
but not the IO-APIC RTE's (as the newly setup interrupt-remapping
table and the IO-APIC RTE's are not yet programmed by the kernel).
Shortly after this, IO-APIC RTE's and the interrupt-remapping table
entries are programmed based on the ACPI tables etc. So the
expectation is that any interrupt during this window will be dropped
and not see the intermediate configuration.
In the reported problematic case, BIOS has configured the IO-APIC
in virtual wire-B mode. Between the window of the kernel setting up
new interrupt-remapping table and the IO-APIC RTE's are properly
configured, an interrupt gets routed by the IO-APIC RTE (setup
by the virtual wire-B configuration) and sees the empty
interrupt-remapping table entry, resulting in vt-d fault causing
the platform to generate NMI. And the OS panics on this unexpected NMI.
This problem doesn't happen with more recent kernels and closer
look at the 2.6.32 kernel shows that the code which masks
the IO-APIC RTE's is not working as expected as the nr_ioapic_registers
for each IO-APIC is not yet initialized at this point. In the later
kernels we initialize nr_ioapic_registers much before and
everything works as expected.
For 2.6.[32..34] kernels, fix this issue by initializing
nr_ioapic_registers early in mp_register_ioapic()
[ Relevant upstream commit info:
commit
7716a5c4ff5f1f3dc5e9edcab125cbf7fceef0af
Author: Eric W. Biederman <ebiederm@xmission.com>
Date: Tue Mar 30 01:07:12 2010 -0700
x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic.
As the upstream commit depends on quite a few prior commits
and some followup fixes in the mainline, we just picked
the smallest relevant hunk for fixing the issue at hand.
Problematic platform uses ACPI for IO-APIC, VT-d enumeration etc
and this hunk only touches the ACPI based platforms.
nr_ioapic_reigsters initialization in enable_IO_APIC() is still
retained, so that other configurations like legacy MPS table based
enumeration etc works with no change.
]
Reported-and-tested-by: Zhang, Lin-Bao <linbao.zhang@hp.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Xi Wang [Mon, 12 Dec 2011 21:55:52 +0000 (21:55 +0000)]
xfs: fix acl count validation in xfs_acl_from_disk()
commit
093019cf1b18dd31b2c3b77acce4e000e2cbc9ce upstream.
Commit
fa8b18ed didn't prevent the integer overflow and possible
memory corruption. "count" can go negative and bypass the check.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
[PG: in 2.6.34, xfs still had "linux-2.6" as a path component.]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Christoph Hellwig [Sun, 20 Nov 2011 15:35:32 +0000 (15:35 +0000)]
xfs: validate acl count
commit
fa8b18edd752a8b4e9d1ee2cd615b82c93cf8bba upstream.
This prevents in-memory corruption and possible panics if the on-disk
ACL is badly corrupted.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
[PG: in 2.6.34, xfs still had "linux-2.6" as a path component.]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eric Paris [Tue, 23 Nov 2010 23:18:37 +0000 (18:18 -0500)]
inotify: stop kernel memory leak on file creation failure
commit
a2ae4cc9a16e211c8a128ba10d22a85431f093ab upstream.
If inotify_init is unable to allocate a new file for the new inotify
group we leak the new group. This patch drops the reference on the
group on file allocation failure.
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Thadeu Lima de Souza Cascardo [Thu, 25 Oct 2012 20:37:51 +0000 (13:37 -0700)]
genalloc: stop crashing the system when destroying a pool
commit
eedce141cd2dad8d0cefc5468ef41898949a7031 upstream.
The genalloc code uses the bitmap API from include/linux/bitmap.h and
lib/bitmap.c, which is based on long values. Both bitmap_set from
lib/bitmap.c and bitmap_set_ll, which is the lockless version from
genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in
the bitmap.
That one uses (1 << bits) - 1, 0b111, if you are setting the first three
bits. This means that the API counts from the least significant bits
(LSB from now on) to the MSB. The LSB in the first long is bit 0, then.
The same works for the lookup functions.
The genalloc code uses longs for the bitmap, as it should. In
include/linux/genalloc.h, struct gen_pool_chunk has unsigned long
bits[0] as its last member. When allocating the struct, genalloc should
reserve enough space for the bitmap. This should be a proper number of
longs that can fit the amount of bits in the bitmap.
However, genalloc allocates an integer number of bytes that fit the
amount of bits, but may not be an integer amount of longs. 9 bytes, for
example, could be allocated for 70 bits.
This is a problem in itself if the Least Significat Bit in a long is in
the byte with the largest address, which happens in Big Endian machines.
This means genalloc is not allocating the byte in which it will try to
set or check for a bit.
This may end up in memory corruption, where genalloc will try to set the
bits it has not allocated. In fact, genalloc may not set these bits
because it may find them already set, because they were not zeroed since
they were not allocated. And that's what causes a BUG when
gen_pool_destroy is called and check for any set bits.
What really happens is that genalloc uses kmalloc_node with __GFP_ZERO
on gen_pool_add_virt. With SLAB and SLUB, this means the whole slab
will be cleared, not only the requested bytes. Since struct
gen_pool_chunk has a size that is a multiple of 8, and slab sizes are
multiples of 8, we get lucky and allocate and clear the right amount of
bytes.
Hower, this is not the case with SLOB or with older code that did memset
after allocating instead of using __GFP_ZERO.
So, a simple module as this (running 3.6.0), will cause a crash when
rmmod'ed.
[root@phantom-lp2 foo]# cat foo.c
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/genalloc.h>
MODULE_LICENSE("GPL");
MODULE_VERSION("0.1");
static struct gen_pool *foo_pool;
static __init int foo_init(void)
{
int ret;
foo_pool = gen_pool_create(10, -1);
if (!foo_pool)
return -ENOMEM;
ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1);
if (ret) {
gen_pool_destroy(foo_pool);
return ret;
}
return 0;
}
static __exit void foo_exit(void)
{
gen_pool_destroy(foo_pool);
}
module_init(foo_init);
module_exit(foo_exit);
[root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB
CONFIG_SLOB=y
[root@phantom-lp2 foo]# insmod ./foo.ko
[root@phantom-lp2 foo]# rmmod foo
------------[ cut here ]------------
kernel BUG at lib/genalloc.c:243!
cpu 0x4: Vector: 700 (Program Check) at [
c0000000bb0e7960]
pc:
c0000000003cb50c: .gen_pool_destroy+0xac/0x110
lr:
c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110
sp:
c0000000bb0e7be0
msr:
8000000000029032
current = 0xc0000000bb0e0000
paca = 0xc000000006d30e00 softe: 0 irq_happened: 0x01
pid = 13044, comm = rmmod
kernel BUG at lib/genalloc.c:243!
[
c0000000bb0e7ca0]
d000000004b00020 .foo_exit+0x20/0x38 [foo]
[
c0000000bb0e7d20]
c0000000000dff98 .SyS_delete_module+0x1a8/0x290
[
c0000000bb0e7e30]
c0000000000097d4 syscall_exit+0x0/0x94
--- Exception: c00 (System Call) at
000000800753d1a0
SP (
fffd0b0e640) is in userspace
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Benjamin Gaignard <benjamin.gaignard@stericsson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
NeilBrown [Thu, 16 Aug 2012 06:46:12 +0000 (16:46 +1000)]
md: Don't truncate size at 4TB for RAID0 and Linear
commit
667a5313ecd7308d79629c0738b0db588b0b0a4e upstream.
commit
27a7b260f71439c40546b43588448faac01adb93
md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
changed 0.90 metadata handling to truncated size to 4TB as that is
all that 0.90 can record.
However for RAID0 and Linear, 0.90 doesn't need to record the size, so
this truncation is not needed and causes working arrays to become too small.
So avoid the truncation for RAID0 and Linear
This bug was introduced in 3.1 and is suitable for any stable kernels
from then onwards.
As the offending commit was tagged for 'stable', any stable kernel
that it was applied to should also get this patch. That includes
at least 2.6.32, 2.6.33 and 3.0. (Thanks to Ben Hutchings for
providing that list).
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
NeilBrown [Sat, 10 Sep 2011 07:21:28 +0000 (17:21 +1000)]
md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
commit
27a7b260f71439c40546b43588448faac01adb93 upstream.
0.90 metadata uses an unsigned 32bit number to count the number of
kilobytes used from each device.
This should allow up to 4TB per device.
However we multiply this by 2 (to get sectors) before casting to a
larger type, so sizes above 2TB get truncated.
Also we allow rdev->sectors to be larger than 4TB, so it is possible
for the array to be resized larger than the metadata can handle.
So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in
used.
Also the sanity check at the end of super_90_load should include level
1 as it used ->size too. (RAID0 and Linear don't use ->size at all).
Reported-by: Pim Zandbergen <P.Zandbergen@macroscoop.nl>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Benjamin Poirier [Wed, 30 Nov 2011 12:47:18 +0000 (07:47 -0500)]
gro: reset vlan_tci on reuse
This one liner is part of upstream
commit
3701e51382a026cba10c60b03efabe534fba4ca4
Author: Jesse Gross <jesse@nicira.com>
vlan: Centralize handling of hardware acceleration.
The bulk of that commit is a rework of the hardware assisted vlan tagging
driver interface, and as such doesn't classify for -stable inclusion. The fix
that is needed is a part of that commit but can work independently of the
rest.
This patch can avoid panics on the 2.6.32.y -stable kernels and is in the same
spirit as mainline commits
66c46d7 gro: Reset dev pointer on reuse
6d152e2 gro: reset skb_iif on reuse
which are already in -stable.
For drivers using the vlan_gro_frags() interface, a packet with an invalid tci
leads to GRO_DROP and napi_reuse_skb(). The skb has to be sanitized before
being reused or we may send an skb with an invalid vlan_tci field up the stack
where it is not expected.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Cc: Jesse Gross <jesse@nicira.com>
Acked-by: David S. Miller <davem@davemloft.net>
[PG: taken from v2.6.32.y stable, commit
5aff28abc7e]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Suresh Jayaraman [Fri, 2 Dec 2011 10:54:56 +0000 (16:24 +0530)]
cifs: fix cifs stable patch cifs-fix-oplock-break-handling-try-2.patch
commit
4708ad6374f07cdfb379c5d4100125e2cfd339d9 in v2.6.32.x stable
The stable release 2.6.32.32 added the upstream commit
12fed00de963433128b5366a21a55808fab2f756. However, one of the hunks of
the original patch seems missing from the stable backport which can be
found here:
http://permalink.gmane.org/gmane.linux.kernel.stable/5676
This hunk corresponds to the change in is_valid_oplock_break() at
fs/cifs/misc.c.
This patch backports the missing hunk and is against
linux-2.6.32.y stable kernel.
Cc: Steve French <sfrench@us.ibm.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PG: I incorrectly dropped the same hunk in
v2.6.34.9-152-g0c55f20
since the code in question was relocated/rewritten in
e66673e39a ]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Daniel T Chen [Sun, 5 Dec 2010 13:43:14 +0000 (08:43 -0500)]
ALSA: hda: Use position_fix=1 for Acer Aspire 5538 to enable capture on internal mic
commit
dd5a089edfa51a74692604b4b427953d8e16bc35 upstream.
BugLink: https://launchpad.net/bugs/685161
The reporter of the bug states that he must use position_fix=1 to enable
capture for the internal microphone, so set it for his machine's PCI
SSID. Verified using 2.6.35 and the 2010-12-04 alsa-driver build.
Reported-and-tested-by: Ralph Wabel <rwabel@gmx.net>
Signed-off-by: Daniel T Chen <crimsun@ubuntu.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Seth Heasley [Wed, 20 Apr 2011 17:59:57 +0000 (10:59 -0700)]
ALSA: hda - ALSA HD Audio patch for Intel Panther Point DeviceIDs
commit
d2edeb7c6f1dada8ca7d5c23e42d604e92ae0c76 upstream.
This patch adds the HD Audio Controller DeviceIDs for the Intel Panther Point PCH.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Takashi Iwai [Wed, 15 Sep 2010 08:17:26 +0000 (10:17 +0200)]
ALSA: hda - Reduce pci id list for Intel with class id
commit
b686453543fd56332e8730a2abd7bf5bca756149 upstream.
Most of Intel controllers work as generic HD-audio without quirks,
and it'll be hopefully so in future. Let's mark pci id with the
PCI_CLASS_MULTIMEDIA_HD_AUDIO for Intel so that the driver will work
with any new control chips in future.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Bankim Bhavsar [Mon, 17 Jan 2011 14:23:21 +0000 (15:23 +0100)]
ALSA: hda - Add support for VMware controller
commit
0f0714c5ed0a98fdeaa2287d3b159989bbe6d842 upstream.
Add the new PCI ID 0x15ad and device ID 0x1977 for VMware HDAudio
Controller.
[changed to use AZX_DRIVER_GENERIC by tiwai]
Signed-off-by: Bankim Bhavsar <bbhavsar@vmware.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Otavio Salvador [Mon, 27 Sep 2010 02:35:06 +0000 (23:35 -0300)]
ALSA: hda: add Vortex86MX PCI ids
commit
e35d4b119578a054515ccb4ed5dddc4e8a81ec15 upstream.
Signed-off-by: Otavio Salvador <otavio@ossystems.com.br>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Seth Heasley [Fri, 10 Sep 2010 23:29:56 +0000 (16:29 -0700)]
ALSA: hda_intel: ALSA HD Audio patch for Intel Patsburg DeviceIDs
commit
cea310e8f8702226f982f09386cfd3c5793c5e2f upstream.
This patch adds the Intel Patsburg (PCH) HD Audio Controller DeviceIDs.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
John Stultz [Tue, 18 Sep 2012 01:38:47 +0000 (21:38 -0400)]
time: Move ktime_t overflow checking into timespec_valid_strict
commit
cee58483cf56e0ba355fdd97ff5e8925329aa936 upstream.
Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit
4e8b14526ca7 ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.
Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.
This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.
Reported-and-tested-by: Andreas Bombe <aeb@debian.org>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
John Stultz [Tue, 18 Sep 2012 01:38:46 +0000 (21:38 -0400)]
time: Avoid making adjustments if we haven't accumulated anything
commit
bf2ac312195155511a0f79325515cbb61929898a upstream.
If update_wall_time() is called and the current offset isn't large
enough to accumulate, avoid re-calling timekeeping_adjust which may
change the clock freq and can cause 1ns inconsistencies with
CLOCK_REALTIME_COARSE/CLOCK_MONOTONIC_COARSE.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1345595449-34965-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
John Stultz [Tue, 18 Sep 2012 01:38:45 +0000 (21:38 -0400)]
time: Improve sanity checking of timekeeping inputs
commit
4e8b14526ca7fb046a81c94002c1c43b6fdf0e9b upstream.
Unexpected behavior could occur if the time is set to a value large
enough to overflow a 64bit ktime_t (which is something larger then the
year 2262).
Also unexpected behavior could occur if large negative offsets are
injected via adjtimex.
So this patch improves the sanity check timekeeping inputs by
improving the timespec_valid() check, and then makes better use of
timespec_valid() to make sure we don't set the time to an invalid
negative value or one that overflows ktime_t.
Note: This does not protect from setting the time close to overflowing
ktime_t and then letting natural accumulation cause the overflow.
Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1344454580-17031-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Jason Wang [Wed, 30 May 2012 21:18:10 +0000 (21:18 +0000)]
net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
commit
cc9b17ad29ecaa20bfe426a8d4dbfb94b13ff1cc upstream.
We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Paul Gortmaker [Mon, 20 Aug 2012 18:45:22 +0000 (14:45 -0400)]
Linux 2.6.34.13
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tony Luck [Fri, 20 Jul 2012 20:15:20 +0000 (13:15 -0700)]
dmi: Feed DMI table to /dev/random driver
commit
d114a33387472555188f142ed8e98acdb8181c6d upstream.
Send the entire DMI (SMBIOS) table to the /dev/random driver to
help seed its pools.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Mark Brown [Thu, 5 Jul 2012 20:23:21 +0000 (20:23 +0000)]
mfd: wm831x: Feed the device UUID into device_add_randomness()
commit
27130f0cc3ab97560384da437e4621fc4e94f21c upstream.
wm831x devices contain a unique ID value. Feed this into the newly added
device_add_randomness() to add some per device seed data to the pool.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Mark Brown [Thu, 5 Jul 2012 20:19:17 +0000 (20:19 +0000)]
rtc: wm831x: Feed the write counter into device_add_randomness()
commit
9dccf55f4cb011a7552a8a2749a580662f5ed8ed upstream.
The tamper evident features of the RTC include the "write counter" which
is a pseudo-random number regenerated whenever we set the RTC. Since this
value is unpredictable it should provide some useful seeding to the random
number generator.
Only do this on boot since the goal is to seed the pool rather than add
useful entropy.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Tony Luck [Mon, 23 Jul 2012 16:47:57 +0000 (09:47 -0700)]
random: Add comment to random_initialize()
commit
cbc96b7594b5691d61eba2db8b2ea723645be9ca upstream.
Many platforms have per-machine instance data (serial numbers,
asset tags, etc.) squirreled away in areas that are accessed
during early system bringup. Mixing this data into the random
pools has a very high value in providing better random data,
so we should allow (and even encourage) architecture code to
call add_device_randomness() from the setup_arch() paths.
However, this limits our options for internal structure of
the random driver since random_initialize() is not called
until long after setup_arch().
Add a big fat comment to rand_initialize() spelling out
this requirement.
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Theodore Ts'o [Sun, 15 Jul 2012 00:27:52 +0000 (20:27 -0400)]
random: remove rand_initialize_irq()
commit
c5857ccf293968348e5eb4ebedc68074de3dcda6 upstream.
With the new interrupt sampling system, we are no longer using the
timer_rand_state structure in the irq descriptor, so we can stop
initializing it now.
[ Merged in fixes from Sedat to find some last missing references to
rand_initialize_irq() ]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
[PG: in .34 the irqdesc.h content is in irq.h instead.]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Theodore Ts'o [Thu, 5 Jul 2012 01:23:25 +0000 (21:23 -0400)]
net: feed /dev/random with the MAC address when registering a device
commit
7bf2357524408b97fec58344caf7397f8140c3fd upstream.
Cc: David Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Theodore Ts'o [Wed, 4 Jul 2012 15:22:20 +0000 (11:22 -0400)]
usb: feed USB device information to the /dev/random driver
commit
b04b3156a20d395a7faa8eed98698d1e17a36000 upstream.
Send the USB device's serial, product, and manufacturer strings to the
/dev/random driver to help seed its pools.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Theodore Ts'o [Wed, 4 Jul 2012 15:32:48 +0000 (11:32 -0400)]
MAINTAINERS: Theodore Ts'o is taking over the random driver
commit
330e0a01d54c2b8606c56816f99af6ebc58ec92c upstream.
Matt Mackall stepped down as the /dev/random driver maintainer last
year, so Theodore Ts'o is taking back the /dev/random driver.
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
H. Peter Anvin [Sat, 28 Jul 2012 02:26:08 +0000 (22:26 -0400)]
random: mix in architectural randomness in extract_buf()
commit
d2e7c96af1e54b507ae2a6a7dd2baf588417a7e5 upstream.
Mix in any architectural randomness in extract_buf() instead of
xfer_secondary_buf(). This allows us to mix in more architectural
randomness, and it also makes xfer_secondary_buf() faster, moving a
tiny bit of additional CPU overhead to process which is extracting the
randomness.
[ Commit description modified by tytso to remove an extended
advertisement for the RDRAND instruction. ]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: DJ Johnston <dj.johnston@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Theodore Ts'o [Wed, 4 Jul 2012 20:19:30 +0000 (16:19 -0400)]
random: add tracepoints for easier debugging and verification
commit
00ce1db1a634746040ace24c09a4e3a7949a3145 upstream.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Theodore Ts'o [Thu, 5 Jul 2012 14:35:23 +0000 (10:35 -0400)]
random: add new get_random_bytes_arch() function
commit
c2557a303ab6712bb6e09447df828c557c710ac9 upstream.
Create a new function, get_random_bytes_arch() which will use the
architecture-specific hardware random number generator if it is
present. Change get_random_bytes() to not use the HW RNG, even if it
is avaiable.
The reason for this is that the hw random number generator is fast (if
it is present), but it requires that we trust the hardware
manufacturer to have not put in a back door. (For example, an
increasing counter encrypted by an AES key known to the NSA.)
It's unlikely that Intel (for example) was paid off by the US
Government to do this, but it's impossible for them to prove otherwise
--- especially since Bull Mountain is documented to use AES as a
whitener. Hence, the output of an evil, trojan-horse version of
RDRAND is statistically indistinguishable from an RDRAND implemented
to the specifications claimed by Intel. Short of using a tunnelling
electronic microscope to reverse engineer an Ivy Bridge chip and
disassembling and analyzing the CPU microcode, there's no way for us
to tell for sure.
Since users of get_random_bytes() in the Linux kernel need to be able
to support hardware systems where the HW RNG is not present, most
time-sensitive users of this interface have already created their own
cryptographic RNG interface which uses get_random_bytes() as a seed.
So it's much better to use the HW RNG to improve the existing random
number generator, by mixing in any entropy returned by the HW RNG into
/dev/random's entropy pool, but to always _use_ /dev/random's entropy
pool.
This way we get almost of the benefits of the HW RNG without any
potential liabilities. The only benefits we forgo is the
speed/performance enhancements --- and generic kernel code can't
depend on depend on get_random_bytes() having the speed of a HW RNG
anyway.
For those places that really want access to the arch-specific HW RNG,
if it is available, we provide get_random_bytes_arch().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Theodore Ts'o [Thu, 5 Jul 2012 14:21:01 +0000 (10:21 -0400)]
random: use the arch-specific rng in xfer_secondary_pool
commit
e6d4947b12e8ad947add1032dd754803c6004824 upstream.
If the CPU supports a hardware random number generator, use it in
xfer_secondary_pool(), where it will significantly improve things and
where we can afford it.
Also, remove the use of the arch-specific rng in
add_timer_randomness(), since the call is significantly slower than
get_cycles(), and we're much better off using it in
xfer_secondary_pool() anyway.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Linus Torvalds [Wed, 4 Jul 2012 15:16:01 +0000 (11:16 -0400)]
random: create add_device_randomness() interface
commit
a2080a67abe9e314f9e9c2cc3a4a176e8a8f8793 upstream.
Add a new interface, add_device_randomness() for adding data to the
random pool that is likely to differ between two devices (or possibly
even per boot). This would be things like MAC addresses or serial
numbers, or the read-out of the RTC. This does *not* add any actual
entropy to the pool, but it initializes the pool to different values
for devices that might otherwise be identical and have very little
entropy available to them (particularly common in the embedded world).
[ Modified by tytso to mix in a timestamp, since there may be some
variability caused by the time needed to detect/configure the hardware
in question. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Theodore Ts'o [Wed, 4 Jul 2012 14:38:30 +0000 (10:38 -0400)]
random: use lockless techniques in the interrupt path
commit
902c098a3663de3fa18639efbb71b6080f0bcd3c upstream.
The real-time Linux folks don't like add_interrupt_randomness() taking
a spinlock since it is called in the low-level interrupt routine.
This also allows us to reduce the overhead in the fast path, for the
random driver, which is the interrupt collection path.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Theodore Ts'o [Mon, 2 Jul 2012 11:52:16 +0000 (07:52 -0400)]
random: make 'add_interrupt_randomness()' do something sane
commit
775f4b297b780601e61787b766f306ed3e1d23eb upstream.
We've been moving away from add_interrupt_randomness() for various
reasons: it's too expensive to do on every interrupt, and flooding the
CPU with interrupts could theoretically cause bogus floods of entropy
from a somewhat externally controllable source.
This solves both problems by limiting the actual randomness addition
to just once a second or after 64 interrupts, whicever comes first.
During that time, the interrupt cycle data is buffered up in a per-cpu
pool. Also, we make sure the the nonblocking pool used by urandom is
initialized before we start feeding the normal input pool. This
assures that /dev/urandom is returning unpredictable data as soon as
possible.
(Based on an original patch by Linus, but significantly modified by
tytso.)
Tested-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Nadia Heninger <nadiah@cs.ucsd.edu>
Reported-by: Zakir Durumeric <zakir@umich.edu>
Reported-by: J. Alex Halderman <jhalderm@umich.edu>.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[PG: minor adjustment required since .34 doesn't have
f9e4989eb8
which renames "status" to "random" in kernel/irq/handle.c ]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Mathieu Desnoyers [Thu, 12 Apr 2012 19:49:12 +0000 (12:49 -0700)]
drivers/char/random.c: fix boot id uniqueness race
commit
44e4360fa3384850d65dd36fb4e6e5f2f112709b upstream.
/proc/sys/kernel/random/boot_id can be read concurrently by userspace
processes. If two (or more) user-space processes concurrently read
boot_id when sysctl_bootid is not yet assigned, a race can occur making
boot_id differ between the reads. Because the whole point of the boot id
is to be unique across a kernel execution, fix this by protecting this
operation with a spinlock.
Given that this operation is not frequently used, hitting the spinlock
on each call should not be an issue.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
H. Peter Anvin [Mon, 16 Jan 2012 19:23:29 +0000 (11:23 -0800)]
random: Adjust the number of loops when initializing
commit
2dac8e54f988ab58525505d7ef982493374433c3 upstream.
When we are initializing using arch_get_random_long() we only need to
loop enough times to touch all the bytes in the buffer; using
poolwords for that does twice the number of operations necessary on a
64-bit machine, since in the random number generator code "word" means
32 bits.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Theodore Ts'o [Thu, 22 Dec 2011 21:28:01 +0000 (16:28 -0500)]
random: Use arch-specific RNG to initialize the entropy store
commit
3e88bdff1c65145f7ba297ccec69c774afe4c785 upstream.
If there is an architecture-specific random number generator (such as
RDRAND for Intel architectures), use it to initialize /dev/random's
entropy stores. Even in the worst case, if RDRAND is something like
AES(NSA_KEY, counter++), it won't hurt, and it will definitely help
against any other adversaries.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Linus Torvalds [Thu, 22 Dec 2011 19:36:22 +0000 (11:36 -0800)]
random: Use arch_get_random_int instead of cycle counter if avail
commit
cf833d0b9937874b50ef2867c4e8badfd64948ce upstream.
We still don't use rdrand in /dev/random, which just seems stupid. We
accept the *cycle*counter* as a random input, but we don't accept
rdrand? That's just broken.
Sure, people can do things in user space (write to /dev/random, use
rdrand in addition to /dev/random themselves etc etc), but that
*still* seems to be a particularly stupid reason for saying "we
shouldn't bother to try to do better in /dev/random".
And even if somebody really doesn't trust rdrand as a source of random
bytes, it seems singularly stupid to trust the cycle counter *more*.
So I'd suggest the attached patch. I'm not going to even bother
arguing that we should add more bits to the entropy estimate, because
that's not the point - I don't care if /dev/random fills up slowly or
not, I think it's just stupid to not use the bits we can get from
rdrand and mix them into the strong randomness pool.
Link: http://lkml.kernel.org/r/CA%2B55aFwn59N1=m651QAyTy-1gO1noGbK18zwKDwvwqnravA84A@mail.gmail.com
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Luck, Tony [Wed, 16 Nov 2011 18:50:56 +0000 (10:50 -0800)]
fix typo/thinko in get_random_bytes()
commit
bd29e568a4cb6465f6e5ec7c1c1f3ae7d99cbec1 upstream.
If there is an architecture-specific random number generator we use it
to acquire randomness one "long" at a time. We should put these random
words into consecutive words in the result buffer - not just overwrite
the first word again and again.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
H. Peter Anvin [Sun, 31 Jul 2011 20:59:29 +0000 (13:59 -0700)]
x86, random: Architectural inlines to get random integers with RDRAND
commit
628c6246d47b85f5357298601df2444d7f4dd3fd upstream.
Architectural inlines to get random ints and longs using the RDRAND
instruction.
Intel has introduced a new RDRAND instruction, a Digital Random Number
Generator (DRNG), which is functionally an high bandwidth entropy
source, cryptographic whitener, and integrity monitor all built into
hardware. This enables RDRAND to be used directly, bypassing the
kernel random number pool.
For technical documentation, see:
http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/
In this patch, this is *only* used for the nonblocking random number
pool. RDRAND is a nonblocking source, similar to our /dev/urandom,
and is therefore not a direct replacement for /dev/random. The
architectural hooks presented in the previous patch only feed the
kernel internal users, which only use the nonblocking pool, and so
this is not a problem.
Since this instruction is available in userspace, there is no reason
to have a /dev/hw_rng device driver for the purpose of feeding rngd.
This is especially so since RDRAND is a nonblocking source, and needs
additional whitening and reduction (see the above technical
documentation for details) in order to be of "pure entropy source"
quality.
The CONFIG_EXPERT compile-time option can be used to disable this use
of RDRAND.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Originally-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>