Greg Kroah-Hartman [Fri, 14 Dec 2007 18:34:15 +0000 (10:34 -0800)]
Linux 2.6.22.15
Pavel Emelyanov [Thu, 13 Dec 2007 04:57:24 +0000 (12:57 +0800)]
BRIDGE: Properly dereference the br_should_route_hook
[BRIDGE]: Properly dereference the br_should_route_hook
[ Upstream commit:
82de382ce8e1c7645984616728dc7aaa057821e4 ]
This hook is protected with the RCU, so simple
if (br_should_route_hook)
br_should_route_hook(...)
is not enough on some architectures.
Use the rcu_dereference/rcu_assign_pointer in this case.
Fixed Stephen's comment concerning using the typeof().
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tejun Heo [Sat, 8 Dec 2007 00:25:31 +0000 (09:25 +0900)]
libata: kill spurious NCQ completion detection
patch
459ad68893a84fb0881e57919340b97edbbc3dc7 in mainline.
Spurious NCQ completion detection implemented in ahci was incorrect.
On AHCI receving and processing FISes and raising interrupts are not
interlocked and spurious interrupts are expected.
For example, if an interrupt occurs while interrupt handler is running
and the running interrupt handler handles the event the new IRQ
indicated, after IRQ handler finishes, it will be executed again
because IRQ pending bit is set by the new interrupt but there won't be
anything to process.
Please read the following message for more information.
http://article.gmane.org/gmane.linux.ide/26012
This patch...
* Removes all spurious IRQ whining from ahci. Spurious NCQ completion
detection was completely wrong. Spurious D2H Register FIS taught us
that some early drives send spurious D2H Register FIS with I bit set
while NCQ commands are in progress but none of recent drives does
that and even the ones which show such behavior can do NCQ fine.
* Kills all NCQ blacklist entries which were added because of spurious
NCQ completions. I tracked down each commit and verified all
removed ones are actually added because of spurious completions.
WD740ADFD-00NLR1 wasn't deleted but moved upward because the drive
not only had spurious NCQ completions but also is slow on sequential
data transfers if NCQ is enabled.
Maxtor 7V300F0 was added by
0e3dbc01d53940fe10e5a5cfec15ede3e929c918
from Alan Cox. I can only find evidences that the drive only had
troubles with spuruious completions by searching the mailing list.
This entry needs to be verified and removed if it doesn't have other
NCQ related problems.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patrick McHardy [Thu, 13 Dec 2007 04:42:34 +0000 (12:42 +0800)]
NETFILTER: xt_TCPMSS: remove network triggerable WARN_ON
[NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON
[ Upstream commit:
9dc0564e862b1b9a4677dec2c736b12169e03e99 ]
ipv6_skip_exthdr() returns -1 for invalid packets. don't WARN_ON
that.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patrick McHardy [Thu, 29 Nov 2007 12:07:57 +0000 (23:07 +1100)]
XFRM: Fix leak of expired xfrm_states
[XFRM]: Fix leak of expired xfrm_states
[ Upstream commit:
5dba4797115c8fa05c1a4d12927a6ae0b33ffc41 ]
The xfrm_timer calls __xfrm_state_delete, which drops the final reference
manually without triggering destruction of the state. Change it to use
xfrm_state_put to add the state to the gc list when we're dropping the
last reference. The timer function may still continue to use the state
safely since the final destruction does a del_timer_sync().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Thu, 13 Dec 2007 05:20:32 +0000 (13:20 +0800)]
Revert "Fix SMP poweroff hangs"
This reverts the following changeset in 2.6.22.10 that caused a lot of
reported problems.
From: Mark Lord <lkml@rtr.ca>
commit
4047727e5ae33f9b8d2b7766d1994ea6e5ec2991 from upstream
We need to disable all CPUs other than the boot CPU (usually 0) before
attempting to power-off modern SMP machines. This fixes the
hang-on-poweroff issue on my MythTV SMP box, and also on Thomas Gleixner's
new toybox.
Signed-off-by: Mark Lord <mlord@pobox.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There still is a remaining shutdown problem in 2.6.22 with old APM based
systems, but this fix is not the correct one
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Neil Brown [Wed, 5 Sep 2007 21:22:13 +0000 (17:22 -0400)]
knfsd: Validate filehandle type in fsid_source
patch
b8da0d1c27f144bce999c653467106f3f0d5a308 in mainline.
fsid_source decided where to get the 'fsid' number to
return for a GETATTR based on the type of filehandle.
It can be from the device, from the fsid, or from the
UUID.
It is possible for the filehandle to be inconsistent
with the export information, so make sure the export information
actually has the info implied by the value returned by
fsid_source.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: "Luiz Fernando N. Capitulino" <lcapitulino@gmail.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oliver Pintr <oliver.pntr@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Pavel Emelyanov [Tue, 11 Dec 2007 01:39:30 +0000 (09:39 +0800)]
BRIDGE: Lost call to br_fdb_fini() in br_init() error path
[BRIDGE]: Lost call to br_fdb_fini() in br_init() error path
[ Upstream commit:
17efdd45755c0eb8d1418a1368ef7c7ebbe98c6e ]
In case the br_netfilter_init() (or any subsequent call)
fails, the br_fdb_fini() must be called to free the allocated
in br_fdb_init() br_fdb_cache kmem cache.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Pavel Emelyanov [Tue, 11 Dec 2007 01:39:32 +0000 (09:39 +0800)]
DECNET: dn_nl_deladdr() almost always returns no error
[DECNET]: dn_nl_deladdr() almost always returns no error
[ Upstream commit:
3ccd86241b277249d5ac08e91eddfade47184520 ]
As far as I see from the err variable initialization
the dn_nl_deladdr() routine was designed to report errors
like "EADDRNOTAVAIL" and probaby "ENODEV".
But the code sets this err to 0 after the first nlmsg_parse
and goes on, returning this 0 in any case.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Evgeniy Polyakov [Tue, 11 Dec 2007 01:39:34 +0000 (09:39 +0800)]
IPV6: Restore IPv6 when MTU is big enough
[IPV6]: Restore IPv6 when MTU is big enough
[ Upstream commit:
d31c7b8fa303eb81311f27b80595b8d2cbeef950 ]
Avaid provided test application, so bug got fixed.
IPv6 addrconf removes ipv6 inner device from netdev each time cmu
changes and new value is less than IPV6_MIN_MTU (1280 bytes).
When mtu is changed and new value is greater than IPV6_MIN_MTU,
it does not add ipv6 addresses and inner device bac.
This patch fixes that.
Tested with Avaid's application, which works ok now.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Howells [Tue, 11 Dec 2007 01:39:36 +0000 (09:39 +0800)]
RXRPC: Add missing select on CRYPTO
[RXRPC]: Add missing select on CRYPTO
[ Upstream commit:
d5a784b3719ae364f49ecff12a0248f6e4252720 ]
AF_RXRPC uses the crypto services, so should depend on or select CRYPTO.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stephen Hemminger [Tue, 11 Dec 2007 01:39:37 +0000 (09:39 +0800)]
TCP: illinois: Incorrect beta usage
[TCP] illinois: Incorrect beta usage
[ Upstream commit:
a357dde9df33f28611e6a3d4f88265e39bcc8880 ]
Lachlan Andrew observed that my TCP-Illinois implementation uses the
beta value incorrectly:
The parameter beta in the paper specifies the amount to decrease
*by*: that is, on loss,
W <- W - beta*W
but in tcp_illinois_ssthresh() uses beta as the amount
to decrease *to*: W <- beta*W
This bug makes the Linux TCP-Illinois get less-aggressive on uncongested network,
hurting performance. Note: since the base beta value is .5, it has no
impact on a congested network.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Pablo Neira Ayuso [Tue, 11 Dec 2007 01:39:38 +0000 (09:39 +0800)]
TEXTSEARCH: Do not allow zero length patterns in the textsearch infrastructure
[TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure
[ Upstream commit:
e03ba84adb62fbc6049325a5bc00ef6932fa5e39 ]
If a zero length pattern is passed then return EINVAL.
Avoids infinite loops (bm) or invalid memory accesses (kmp).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Florian Zumbiehl [Tue, 11 Dec 2007 01:39:39 +0000 (09:39 +0800)]
UNIX: EOF on non-blocking SOCK_SEQPACKET
[UNIX]: EOF on non-blocking SOCK_SEQPACKET
[ Upstream commit:
0a11225887fe6cbccd882404dc36ddc50f47daf9 ]
I am not absolutely sure whether this actually is a bug (as in: I've got
no clue what the standards say or what other implementations do), but at
least I was pretty surprised when I noticed that a recv() on a
non-blocking unix domain socket of type SOCK_SEQPACKET (which is connection
oriented, after all) where the remote end has closed the connection
returned -1 (EAGAIN) rather than 0 to indicate end of file.
This is a test case:
| #include <sys/types.h>
| #include <unistd.h>
| #include <sys/socket.h>
| #include <sys/un.h>
| #include <fcntl.h>
| #include <string.h>
| #include <stdlib.h>
|
| int main(){
| int sock;
| struct sockaddr_un addr;
| char buf[4096];
| int pfds[2];
|
| pipe(pfds);
| sock=socket(PF_UNIX,SOCK_SEQPACKET,0);
| addr.sun_family=AF_UNIX;
| strcpy(addr.sun_path,"/tmp/foobar_testsock");
| bind(sock,(struct sockaddr *)&addr,sizeof(addr));
| listen(sock,1);
| if(fork()){
| close(sock);
| sock=socket(PF_UNIX,SOCK_SEQPACKET,0);
| connect(sock,(struct sockaddr *)&addr,sizeof(addr));
| fcntl(sock,F_SETFL,fcntl(sock,F_GETFL)|O_NONBLOCK);
| close(pfds[1]);
| read(pfds[0],buf,sizeof(buf));
| recv(sock,buf,sizeof(buf),0); // <-- this one
| }else accept(sock,NULL,NULL);
| exit(0);
| }
If you try it, make sure /tmp/foobar_testsock doesn't exist.
The marked recv() returns -1 (EAGAIN) on 2.6.23.9. Below you find a
patch that fixes that.
Signed-off-by: Florian Zumbiehl <florz@florz.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
chas williams [Thu, 29 Nov 2007 12:07:57 +0000 (23:07 +1100)]
ATM: [he] initialize lock and tasklet earlier
[ATM]: [he] initialize lock and tasklet earlier
[ Upstream commit:
8a8037ac9dbe4eb20ce50aa20244faf77444f4a3 ]
if you are lucky (unlucky?) enough to have shared interrupts, the
interrupt handler can be called before the tasklet and lock are ready
for use.
Signed-off-by: chas williams <chas@cmf.nrl.navy.mil>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Thu, 29 Nov 2007 12:07:57 +0000 (23:07 +1100)]
CRYPTO api: Fix potential race in crypto_remove_spawn
[CRYPTO] api: Fix potential race in crypto_remove_spawn
[ Upstream commit:
38cb2419f544ad413c7f7aa8c17fd7377610cdd8 ]
As it is crypto_remove_spawn may try to unregister an instance which is
yet to be registered. This patch fixes this by checking whether the
instance has been registered before attempting to remove it.
It also removes a bogus cra_destroy check in crypto_register_instance as
1) it's outside the mutex;
2) we have a check in __crypto_register_alg already.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adrian Bunk [Thu, 29 Nov 2007 12:07:57 +0000 (23:07 +1100)]
IPV4: Remove bogus ifdef mess in arp_process
[IPV4]: Remove bogus ifdef mess in arp_process
[ Upstream commit:
3660019e5f96fd9a8b7d4214a96523c0bf7b676d ]
The #ifdef's in arp_process() were not only a mess, they were also wrong
in the CONFIG_NET_ETHERNET=n and (CONFIG_NETDEV_1000=y or
CONFIG_NETDEV_10000=y) cases.
Since they are not required this patch removes them.
Also removed are some #ifdef's around #include's that caused compile
errors after this change.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric Dumazet [Thu, 29 Nov 2007 12:07:57 +0000 (23:07 +1100)]
NET: Corrects a bug in ip_rt_acct_read()
[NET]: Corrects a bug in ip_rt_acct_read()
[ Upstream commit:
483b23ffa3a5f44767038b0a676d757e0668437e ]
It seems that stats of cpu 0 are counted twice, since
for_each_possible_cpu() is looping on all possible cpus, including 0
Before percpu conversion of ip_rt_acct, we should also remove the
assumption that CPU 0 is online (or even possible)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Charles Hardin [Thu, 29 Nov 2007 12:07:57 +0000 (23:07 +1100)]
PFKEY: Sending an SADB_GET responds with an SADB_GET
[PFKEY]: Sending an SADB_GET responds with an SADB_GET
[ Upstream commit:
435000bebd94aae3a7a50078d142d11683d3b193 ]
Kernel needs to respond to an SADB_GET with the same message type to
conform to the RFC 2367 Section 3.1.5
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ilpo Järvinen [Thu, 29 Nov 2007 12:07:58 +0000 (23:07 +1100)]
TCP: MTUprobe: fix potential sk_send_head corruption
[TCP] MTUprobe: fix potential sk_send_head corruption
[ Upstream commit:
6e42141009ff18297fe19d19296738b742f861db ]
When the abstraction functions got added, conversion here was
made incorrectly. As a result, the skb may end up pointing
to skb which got included to the probe skb and then was freed.
For it to trigger, however, skb_transmit must fail sending as
well.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sam Jansen [Thu, 29 Nov 2007 12:07:57 +0000 (23:07 +1100)]
TCP: Problem bug with sysctl_tcp_congestion_control function
[TCP]: Problem bug with sysctl_tcp_congestion_control function
[ Upstream commit:
5487796f0c9475586277a0a7a91211ce5746fa6a ]
sysctl_tcp_congestion_control seems to have a bug that prevents it
from actually calling the tcp_set_default_congestion_control
function. This is not so apparent because it does not return an error
and generally the /proc interface is used to configure the default TCP
congestion control algorithm. This is present in 2.6.18 onwards and
probably earlier, though I have not inspected 2.6.15--2.6.17.
sysctl_tcp_congestion_control calls sysctl_string and expects a successful
return code of 0. In such a case it actually sets the congestion control
algorithm with tcp_set_default_congestion_control. Otherwise, it returns the
value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string
returned 0 on success. However, sysctl_string was updated to return 1 on
success around about 2.6.15 and sysctl_tcp_congestion_control was not updated.
Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy
converts this return code to '0', so the caller never notices the error.
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jean Delvare [Thu, 29 Nov 2007 00:21:35 +0000 (16:21 -0800)]
fb_ddc: fix DDC lines quirk
patch
b64d70825abbf706bbe80be1b11b09514b71f45e in mainline.
The code in fb_ddc_read() is said to be based on the implementation of the
radeon driver:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=
fc5891c8a3ba284f13994d7bc1f1bfa8283982de
However, comparing the old radeon driver code with the new fb_ddc code
reveals some differences. Most notably, the I2C bus lines are held at the
end of the function, while the original code was releasing them (as the
comment above correctly says.)
There are a few other differences, which appear to be responsible for read
failures on my system. While tracing low-level I2C code in i2c-algo-bit, I
noticed that the initial attempt to read the EDID always failed. It takes
one retry for the read to succeed. As we are about to remove this
automatic retry property from i2c-algo-bit, reading the EDID would really
fail.
As a summary, the I2C lines quirk which is supposedly needed to read EDID
on some older monitors is currently breaking the (first) read on all other
monitors (and might not even work with older ones - did anyone try since
October 2006?)
After applying the patch below, which makes the code in fb_ddc_read()
really similar to what the radeon driver used to have, the first EDID read
succeeds again.
On top of that, as it appears that this code has been broken for one year
now and nobody seems to have complained, I'm curious if it makes sense to
keep this quirk in place. It makes the code more complex and slower just
for the sake of monitors which I guess nobody uses anymore. Can't we just
get rid of it?
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Roger Leigh <rleigh@whinlatter.ukfsn.org>
Tested-by: Michael Buesch <mb@bu3sch.de>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ayaz Abdulla [Wed, 21 Nov 2007 23:02:58 +0000 (15:02 -0800)]
forcedeth boot delay fix
patch
9e555930bd873d238f5f7b9d76d3bf31e6e3ce93 in mainline.
Fix a long boot delay in the forcedeth driver. During initialization, the
timeout for the handshake between mgmt unit and driver can be very long.
The patch reduces the timeout by eliminating a extra loop around the
timeout logic.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9308
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Cc: Alex Howells <astinus@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ayaz Abdulla [Sat, 24 Nov 2007 01:54:01 +0000 (20:54 -0500)]
forcedeth: new mcp79 pci ids
patch
490dde8990c55662596a4be71b5070bd7d382d4a in mainline.
This patch adds new device ids and features for mcp79 devices into the
forcedeth driver.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
index
92ce2e3..
f9ba0ac 100644
Steven Rostedt [Wed, 5 Dec 2007 14:46:09 +0000 (15:46 +0100)]
futex: fix for futex_wait signal stack corruption
From Steven Rostedt <srostedt@redhat.com>
patch
ce6bd420f43b28038a2c6e8fbb86ad24014727b6 in mainline.
David Holmes found a bug in the -rt tree with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too. The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.
Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.
Here's the relevant code from kernel/futex.c: (not in order in the file)
[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
struct timespec __user *utime, u32 __user *uaddr2,
u32 val3)
{
struct timespec ts;
ktime_t t, *tp = NULL;
u32 val2 = 0;
int cmd = op & FUTEX_CMD_MASK;
if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
return -EINVAL;
t = timespec_to_ktime(ts);
if (cmd == FUTEX_WAIT)
t = ktime_add(ktime_get(), t);
tp = &t;
}
[...]
return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}
[...]
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
int ret;
int cmd = op & FUTEX_CMD_MASK;
struct rw_semaphore *fshared = NULL;
if (!(op & FUTEX_PRIVATE_FLAG))
fshared = ¤t->mm->mmap_sem;
switch (cmd) {
case FUTEX_WAIT:
ret = futex_wait(uaddr, fshared, val, timeout);
[...]
static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
u32 val, ktime_t *abs_time)
{
[...]
struct restart_block *restart;
restart = ¤t_thread_info()->restart_block;
restart->fn = futex_wait_restart;
restart->arg0 = (unsigned long)uaddr;
restart->arg1 = (unsigned long)val;
restart->arg2 = (unsigned long)abs_time;
restart->arg3 = 0;
if (fshared)
restart->arg3 |= ARG3_SHARED;
return -ERESTART_RESTARTBLOCK;
[...]
static long futex_wait_restart(struct restart_block *restart)
{
u32 __user *uaddr = (u32 __user *)restart->arg0;
u32 val = (u32)restart->arg1;
ktime_t *abs_time = (ktime_t *)restart->arg2;
struct rw_semaphore *fshared = NULL;
restart->fn = do_no_restart_syscall;
if (restart->arg3 & ARG3_SHARED)
fshared = ¤t->mm->mmap_sem;
return (long)futex_wait(uaddr, fshared, val, abs_time);
}
So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK. The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.
This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.
I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.
The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters. This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.
Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for. If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Fri, 7 Dec 2007 18:16:17 +0000 (19:16 +0100)]
hrtimers: avoid overflow for large relative timeouts (CVE-2007-5966)
patch
62f0f61e6673e67151a7c8c0f9a09c7ea43fe2b5 in mainline
Relative hrtimers with a large timeout value might end up as negative
timer values, when the current time is added in hrtimer_start().
This in turn is causing the clockevents_set_next() function to set an
huge timeout and sleep for quite a long time when we have a clock
source which is capable of long sleeps like HPET. With PIT this almost
goes unnoticed as the maximum delta is ~27ms. The non-hrt/nohz code
sorts this out in the next timer interrupt, so we never noticed that
problem which has been there since the first day of hrtimers.
This bug became more apparent in 2.6.24 which activates HPET on more
hardware.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Karsten Keil [Sat, 1 Dec 2007 20:16:15 +0000 (12:16 -0800)]
I4L: fix isdn_ioctl memory overrun vulnerability
patch
eafe1aa37e6ec2d56f14732b5240c4dd09f0613a in mainline.
Fix possible memory overrun issue in the isdn ioctl code. Found by ADLAB
<adlab@venustech.com.cn>
Signed-off-by: Karsten Keil <kkeil@suse.de>
Cc: ADLAB <adlab@venustech.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Karsten Keil [Thu, 22 Nov 2007 11:43:13 +0000 (12:43 +0100)]
isdn: avoid copying overly-long strings
patch
0f13864e5b24d9cbe18d125d41bfa4b726a82e40 in mainline.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9416
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Thu, 15 Nov 2007 01:07:23 +0000 (09:07 +0800)]
libcrc32c: keep intermediate crc state in cpu order
It's upstream changeset
ef19454bd437b2ba14c9cda1de85debd9f383484.
[LIB] crc32c: Keep intermediate crc state in cpu order
crypto/crc32.c:chksum_final() is computing the digest as
*(__le32 *)out = ~cpu_to_le32(mctx->crc);
so the low-level crc32c_le routines should just keep
the crc in cpu order, otherwise it is getting swabbed
one too many times on big-endian machines.
Signed-off-by: Benny Halevy <bhalevy@fs1.bhalevy.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Li Zefan [Wed, 28 Nov 2007 08:56:27 +0000 (09:56 +0100)]
nf_nat: fix memset error
This patch fixes an incorrect memset in the NAT code, causing
misbehaviour when unloading and reloading the NAT module.
Applies to stable-2.6.22 and stable-2.6.23.
Please apply, thanks.
[NETFILTER]: nf_nat: fix memset error
Upstream commit
e0bf9cf15fc30d300b7fbd821c6bc975531fab44
The size passing to memset is the size of a pointer. Fixes
misbehaviour when unloading and reloading the NAT module.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Hugh Dickins [Wed, 28 Nov 2007 18:55:10 +0000 (18:55 +0000)]
tmpfs: restore missing clear_highpage
patch
e84e2e132c9c66d8498e7710d4ea532d1feaaac5 in mainline
tmpfs was misconverted to __GFP_ZERO in 2.6.11. There's an unusual case in
which shmem_getpage receives the page from its caller instead of allocating.
We must cover this case by clear_highpage before SetPageUptodate, as before.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Brownell [Wed, 28 Nov 2007 22:50:03 +0000 (14:50 -0800)]
USB: fix up EHCI startup synchronization
patch
1cb52658b4f5b10a9e91f8e1c21ca2bcc1b9a3ca in mainline.
A recent patch added software synchronization during EHCI startup,
so ports aren't switched away from the companion controllers after
resets have started. This patch adds a short delay letting hardware
finish that port switching before any new resets begin ... so both
ends of that hardware race window are closed.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Dave Miller <davem@davemloft.net>
Cc: Dely Sy <dely.l.sy@intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Oliver Neukum [Wed, 28 Nov 2007 22:50:02 +0000 (14:50 -0800)]
USB: make the microtek driver and HAL cooperate
patch
5cf1973a44bd298e3cfce6f6af8faa8c9d0a6d55 in mainline
to make HAL like the microtek driver's devices the parent must be
correctly set.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Scott James Remnant [Thu, 29 Nov 2007 00:22:07 +0000 (16:22 -0800)]
wait_task_stopped(): pass correct exit_code to wait_noreap_copyout()
patch
e6ceb32aa25fc33f21af84cc7a32fe289b3e860c in mainline.
In wait_task_stopped() exit_code already contains the right value for the
si_status member of siginfo, and this is simply set in the non WNOWAIT
case.
If you call waitid() with a stopped or traced process, you'll get the signal
in siginfo.si_status as expected -- however if you call waitid(WNOWAIT) at the
same time, you'll get the signal << 8 | 0x7f
Pass it unchanged to wait_noreap_copyout(); we would only need to shift it
and add 0x7f if we were returning it in the user status field and that
isn't used for any function that permits WNOWAIT.
Signed-off-by: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Christian Borntraeger [Tue, 6 Nov 2007 11:26:15 +0000 (12:26 +0100)]
Future of Linux 2.6.22.y series
commit
5d0360ee96a5ef953dbea45873c2a8c87e77d59b upstream.
We have seen ramdisk based install systems, where some pages of mapped
libraries and programs were suddendly zeroed under memory pressure. This
should not happen, as the ramdisk avoids freeing its pages by keeping
them dirty all the time.
It turns out that there is a case, where the VM makes a ramdisk page
clean, without telling the ramdisk driver. On memory pressure
shrink_zone runs and it starts to run shrink_active_list. There is a
check for buffer_heads_over_limit, and if true, pagevec_strip is called.
pagevec_strip calls try_to_release_page. If the mapping has no
releasepage callback, try_to_free_buffers is called. try_to_free_buffers
has now a special logic for some file systems to make a dirty page
clean, if all buffers are clean. Thats what happened in our test case.
The simplest solution is to provide a noop-releasepage callback for the
ramdisk driver. This avoids try_to_free_buffers for ramdisk pages.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Luca Tettamanti [Sat, 24 Nov 2007 19:15:18 +0000 (13:15 -0600)]
atl1: disable broken 64-bit DMA
atl1: disable broken 64-bit DMA
[ Upstream commit:
5f08e46b621a769e52a9545a23ab1d5fb2aec1d4 ]
The L1 network chip can DMA to 64-bit addresses, but multiple descriptor
rings share a single register for the high 32 bits of their address, so
only a single, aligned, 4 GB physical address range can be used at a time.
As a result, we need to confine the driver to a 32-bit DMA mask, otherwise
we see occasional data corruption errors in systems containing 4 or more
gigabytes of RAM.
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Acked-by: Chris Snook <csnook@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Wed, 21 Nov 2007 17:30:59 +0000 (09:30 -0800)]
Linux 2.6.22.14
Jean Delvare [Fri, 16 Nov 2007 09:37:55 +0000 (10:37 +0100)]
i2c/eeprom: Recognize VGN as a valid Sony Vaio name prefix
patch
8b925a3dd8a4d7451092cb9aa11da727ba69e0f0 in mainline.
Recent (i.e. 2005 and later) Sony Vaio laptops have names beginning
with VGN rather than PCG. Update the eeprom driver so that it
recognizes these.
Why this matters: the eeprom driver hides private data from the
EEPROMs it recognizes as Vaio EEPROMs (passwords, serial number...) so
if the driver fails to recognize a Vaio EEPROM as such, the private
data is exposed to the world.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jean Delvare [Fri, 16 Nov 2007 09:34:17 +0000 (10:34 +0100)]
i2c/eeprom: Hide Sony Vaio serial numbers
patch
0f2cbd38aa377e30df3b7602abed69464d1970aa in mainline.
The sysfs interface to DMI data takes care to not make the system
serial number and UUID world-readable, presumably due to privacy
concerns. For consistency, we should not let the eeprom driver
export these same strings to the world on Sony Vaio laptops.
Instead, only make them readable by root, as we already do for BIOS
passwords.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jean Delvare [Fri, 16 Nov 2007 09:24:36 +0000 (10:24 +0100)]
i2c-pasemi: Fix NACK detection
patch
be8a1f7cd4501c3b4b32543577a33aee6d2193ac in mainline.
Turns out we don't actually check the status to see if there was a
device out there to talk to, just if we had a timeout when doing so.
Add the proper check, so we don't falsly think there are devices
on the bus that are not there, etc.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mark Fasheh [Wed, 14 Nov 2007 21:33:27 +0000 (13:33 -0800)]
ocfs2: fix write() performance regression
ocfs2: fix write() performance regression
patch
4e9563fd55ff4479f2b118d0757d121dd0cfc39c in mainline.
On file systems which don't support sparse files, Ocfs2_map_page_blocks()
was reading blocks on appending writes. This caused write performance to
suffer dramatically. Fix this by detecting an appending write on a nonsparse
fs and skipping the read.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tony Battersby [Tue, 16 Oct 2007 20:29:52 +0000 (22:29 +0200)]
ide: fix serverworks.c UDMA regression
patch
0c824b51b338c808de650b440ba5f9f4a725f7fc in mainline.
The patch described by the following excerpt from ChangeLog-2.6.22 makes
it impossible to use UDMA on a Tyan S2707 motherboard (SvrWks CSB5):
commit
2d5eaa6dd744a641e75503232a01f52d0768884c
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Thu May 10 00:01:08 2007 +0200
ide: rework the code for selecting the best DMA transfer mode (v3)
...
This one-line patch against 2.6.23 fixes the problem.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Karsten Keil [Thu, 18 Oct 2007 10:04:31 +0000 (03:04 -0700)]
i4l: fix random freezes with AVM B1 drivers
patch
9713d9e650045f7f2afd81d58a068827be306993 in mainline.
This fix the same issue which was debbuged for the C4 controller for the B1
versions.
The capilib_ function modify or traverse a linked list without locking.
This patch extends the existing locking to the calls of these function to
prevent access to a list which is in the middle of a modification.
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Karsten Keil [Thu, 18 Oct 2007 10:04:32 +0000 (03:04 -0700)]
i4l: Fix random hard freeze with AVM c4 card
patch
1ccfd63367c1a6aaf8b33943f18856dde85f2f0b in mainline.
The patch
- Includes the call to capilib_data_b3_req in the spinlock. This routine
in turn calls the offending mq_enqueue routine that triggered the
freeze if not locked. This should also fix other indicators of
incosistent capilib_msgidqueue list, that trigger messages like:
Oct 5 03:05:57 BERL0 kernel: kcapi: msgid 3019 ncci 0x30301 not on queue
that we saw several times a day (usually several in a row).
- Fixes all occurrences of c4_dispatch_tx to be called with active
spinlock, there were some instances where no lock was active. Mostly
these are in very infrequently called routines, so the additional
performance penalty is minimal.
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Rainer Brestan <rainer.brestan@frequentis.com>
Signed-off-by: Ralf Schlatterbeck <rsc@runtux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Fri, 12 Oct 2007 22:19:14 +0000 (15:19 -0700)]
USB: mutual exclusion for EHCI init and port resets
patch
32fe01985aa2cb2562f6fc171e526e279abe10db in mainline.
This patch (as999) fixes a problem that sometimes shows up when host
controller driver modules are loaded in the wrong order. If ehci-hcd
happens to initialize an EHCI controller while the companion OHCI or
UHCI controller is in the middle of a port reset, the reset can fail
and the companion may get very confused. The patch adds an
rw-semaphore and uses it to keep EHCI initialization and port resets
mutually exclusive.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: David Brownell <david-b@pacbell.net>
Cc: David Miller <davem@davemloft.net>
Cc: Dely L Sy <dely.l.sy@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jiri Kosina [Fri, 19 Oct 2007 22:05:19 +0000 (00:05 +0200)]
USB: usbserial - fix potential deadlock between write() and IRQ
patch
acd2a847e7fee7df11817f67dba75a2802793e5d in mainline.
USB: usbserial - fix potential deadlock between write() and IRQ
usb_serial_generic_write() doesn't disable interrupts when taking port->lock,
and could therefore deadlock with usb_serial_generic_read_bulk_callback()
being called from interrupt, taking the same lock. Fix it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Larry Finger <larry.finger@lwfinger.net>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Frank Seidel [Fri, 9 Nov 2007 18:44:40 +0000 (19:44 +0100)]
USB: kobil_sct: trivial backport to fix libct
Backport of a patch by Alan Cox <alan@lxorguk.ukuu.org.uk> in the kernel tree
with commit
94d0f7eac77a84da2cee41b8038796891f75f09e
Original comments:
USB: kobil_sct: Rework driver
No hardware but this driver is currently totally broken so we can't make
it much worse. Remove all tbe broken invalid termios handling and replace
it with a proper set_termios method.
Frank's comments:
Without this patch the userspace libct (to access the cardreader)
segfaults.
Signed-off-by: Frank Seidel <fseidel@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
HighPoint Linux Team [Tue, 16 Oct 2007 21:28:24 +0000 (14:28 -0700)]
hptiop: avoid buffer overflow when returning sense data
patch
0fec02c93f60fb44ba3a24a0d3e4a52521d34d3f in mainline.
avoid buffer overflow when returning sense data.
With current adapter firmware the driver is working but future firmware
updates may return sense data larger than 96 bytes, causing overflow on
scp->sense_buffer and a kernel crash.
This fix should be backported to earlier kernels.
Signed-off-by: HighPoint Linux Team <linux@highpoint-tech.com>
Signed-off-by: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Manfred Spraul [Wed, 17 Oct 2007 19:52:33 +0000 (21:52 +0200)]
forcedeth msi bugfix
patch
a7475906bc496456ded9e4b062f94067fb93057a in mainline.
pci_enable_msi() replaces the INTx irq number in pci_dev->irq with the
new MSI irq number.
The forcedeth driver did not update the copy in netdevice->irq and
parts of the driver used the stale copy.
See bugzilla.kernel.org, bug 9047.
The patch
- updates netdevice->irq
- replaces all accesses to netdevice->irq with pci_dev->irq.
The patch is against 2.6.23.1. IMHO suitable for both 2.6.23 and 2.6.24
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Takashi Iwai [Mon, 15 Oct 2007 12:37:11 +0000 (14:37 +0200)]
ALSA: hda-codec - Add array terminator for dmic in STAC codec
patch
f6e9852ad05fa28301c83d4e2b082620de010358 in mainline.
[ALSA] hda-codec - Add array terminator for dmic in STAC codec
Reported by Jan-Marek Glogowski.
The dmic array is passed to snd_hda_parse_pin_def_config() and
should be zero-terminated.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Takashi Iwai [Tue, 16 Oct 2007 12:26:32 +0000 (14:26 +0200)]
ALSA: hdsp - Fix zero division
patch
2a3988f6d2c5be9d02463097775d1c66a8290527 in mainline.
Fix zero-division bug in the calculation dds offset.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Cc: Maarten Bressers <mbressers@gmail.com>
Cc: gentoo kernel <kernel@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Tue, 13 Nov 2007 10:48:28 +0000 (02:48 -0800)]
Fix crypto_alloc_comp() error checking.
[IPSEC]: Fix crypto_alloc_comp error checking
[ Upstream commit:
4999f3621f4da622e77931b3d33ada6c7083c705 ]
The function crypto_alloc_comp returns an errno instead of NULL
to indicate error. So it needs to be tested with IS_ERR.
This is based on a patch by Vicenç Beltran Querol.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Radu Rendec [Tue, 13 Nov 2007 08:09:56 +0000 (00:09 -0800)]
Fix endianness bug in U32 classifier.
changeset
543821c6f5dea5221426eaf1eac98b100249c7ac in mainline.
[PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks.
While trying to implement u32 hashes in my shaping machine I ran into
a possible bug in the u32 hash/bucket computing algorithm
(net/sched/cls_u32.c).
The problem occurs only with hash masks that extend over the octet
boundary, on little endian machines (where htonl() actually does
something).
Let's say that I would like to use 0x3fc0 as the hash mask. This means
8 contiguous "1" bits starting at b6. With such a mask, the expected
(and logical) behavior is to hash any address in, for instance,
192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
bucket 1, then 192.168.0.128/26 in bucket 2 and so on.
This is exactly what would happen on a big endian machine, but on
little endian machines, what would actually happen with current
implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl()
in the userspace tool and then applied to 192.168.x.x in the u32
classifier. When shifting right by 16 bits (rank of first "1" bit in
the reversed mask) and applying the divisor mask (0xff for divisor
256), what would actually remain is 0x3f applied on the "168" octet of
the address.
One could say is this can be easily worked around by taking endianness
into account in userspace and supplying an appropriate mask (0xfc03)
that would be turned into contiguous "1" bits when reversed
(0x03fc0000). But the actual problem is the network address (inside
the packet) not being converted to host order, but used as a
host-order value when computing the bucket.
Let's say the network address is written as n31 n30 ... n0, with n0
being the least significant bit. When used directly (without any
conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15
etc in the machine's registers. Thus bits n7 and n8 would no longer be
adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be
consecutive.
The fix is to apply ntohl() on the hmask before computing fshift,
and in u32_hash_fold() convert the packet data to host order before
shifting down by fshift.
With helpful feedback from Jamal Hadi Salim and Jarek Poplawski.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Tue, 13 Nov 2007 08:02:56 +0000 (00:02 -0800)]
Fix error returns in sys_socketpair()
patch
bf3c23d171e35e6e168074a1514b0acd59cfd81a in mainline.
[NET]: Fix error reporting in sys_socketpair().
If either of the two sock_alloc_fd() calls fail, we
forget to update 'err' and thus we'll erroneously
return zero in these cases.
Based upon a report and patch from Rich Paul, and
commentary from Chuck Ebbert.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patrick McHardy [Tue, 13 Nov 2007 11:03:00 +0000 (03:03 -0800)]
Fix netlink timeouts.
[NETLINK]: Fix unicast timeouts
[ Upstream commit:
c3d8d1e30cace31fed6186a4b8c6b1401836d89c ]
Commit
ed6dcf4a in the history.git tree broke netlink_unicast timeouts
by moving the schedule_timeout() call to a new function that doesn't
propagate the remaining timeout back to the caller. This means on each
retry we start with the full timeout again.
ipc/mqueue.c seems to actually want to wait indefinitely so this
behaviour is retained.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Evgeniy Polyakov [Tue, 13 Nov 2007 08:07:45 +0000 (00:07 -0800)]
Fix TEQL oops.
[PKT_SCHED]: Fix OOPS when removing devices from a teql queuing discipline
[ Upstream commit:
4f9f8311a08c0d95c70261264a2b47f2ae99683a ]
tecl_reset() is called from deactivate and qdisc is set to noop already,
but subsequent teql_xmit does not know about it and dereference private
data as teql qdisc and thus oopses.
not catch it first :)
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jozsef Kadlecsik [Mon, 5 Nov 2007 11:37:55 +0000 (12:37 +0100)]
NETFILTER: nf_conntrack_tcp: fix connection reopening
Upstream commits:
17311393 +
bc34b841 merged together. Merge done by
Patrick McHardy <kaber@trash.net>
[NETFILTER]: nf_conntrack_tcp: fix connection reopening
With your description I could reproduce the bug and actually you were
completely right: the code above is incorrect. Somehow I was able to
misread RFC1122 and mixed the roles :-(:
When a connection is >>closed actively<<, it MUST linger in
TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
However, it MAY >>accept<< a new SYN from the remote TCP to
reopen the connection directly from TIME-WAIT state, if it:
[...]
The fix is as follows: if the receiver initiated an active close, then the
sender may reopen the connection - otherwise try to figure out if we hold
a dead connection.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jan Kiszka [Thu, 15 Nov 2007 01:00:08 +0000 (17:00 -0800)]
fix param_sysfs_builtin name length check
patch
22800a2830ec07e7cc5c837999890ac47cc7f5de in mainline.
Commit
faf8c714f4508207a9c81cc94dafc76ed6680b44 caused a regression:
parameter names longer than MAX_KBUILD_MODNAME will now be rejected,
although we just need to keep the module name part that short. This patch
restores the old behaviour while still avoiding that memchr is called with
its length parameter larger than the total string length.
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Hugh Dickins [Mon, 29 Oct 2007 21:37:20 +0000 (14:37 -0700)]
fix tmpfs BUG and AOP_WRITEPAGE_ACTIVATE
patch
487e9bf25cbae11b131d6a14bdbb3a6a77380837 in mainline.
It's possible to provoke unionfs (not yet in mainline, though in mm and
some distros) to hit shmem_writepage's BUG_ON(page_mapped(page)). I expect
it's possible to provoke the 2.6.23 ecryptfs in the same way (but the
2.6.24 ecryptfs no longer calls lower level's ->writepage).
This came to light with the recent find that AOP_WRITEPAGE_ACTIVATE could
leak from tmpfs via write_cache_pages and unionfs to userspace. There's
already a fix (
e423003028183df54f039dfda8b58c49e78c89d7 - writeback: don't
propagate AOP_WRITEPAGE_ACTIVATE) in the tree for that, and it's okay so
far as it goes; but insufficient because it doesn't address the underlying
issue, that shmem_writepage expects to be called only by vmscan (relying on
backing_dev_info capabilities to prevent the normal writeback path from
ever approaching it).
That's an increasingly fragile assumption, and ramdisk_writepage (the other
source of AOP_WRITEPAGE_ACTIVATEs) is already careful to check
wbc->for_reclaim before returning it. Make the same check in
shmem_writepage, thereby sidestepping the page_mapped BUG also.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andrew Morton [Wed, 17 Oct 2007 06:18:32 +0000 (23:18 -0700)]
writeback: don't propagate AOP_WRITEPAGE_ACTIVATE
patch
e423003028183df54f039dfda8b58c49e78c89d7 in mainline.
This is a writeback-internal marker but we're propagating it all the way back
to userspace!.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Johnson [Tue, 23 Oct 2007 20:37:22 +0000 (22:37 +0200)]
x86: fix TSC clock source calibration error
patch
edaf420fdc122e7a42326fe39274c8b8c9b19d41 in mainline.
I ran into this problem on a system that was unable to obtain NTP sync
because the clock was running very slow (over 10000ppm slow). ntpd had
declared all of its peers 'reject' with 'peer_dist' reason.
On investigation, the tsc_khz variable was significantly incorrect
causing xtime to run slow. After a reboot tsc_khz was correct so I
did a reboot test to see how often the problem occurred:
Test was done on a 2000 Mhz Xeon system. Of 689 reboots, 8 of them
had unacceptable tsc_khz values (>500ppm):
range of tsc_khz # of boots % of boots
---------------- ---------- ----------
<
1999750 0 0.000%
1999750 -
1999800 21 3.048%
1999800 -
1999850 166 24.128%
1999850 -
1999900 241 35.029%
1999900 -
1999950 211 30.669%
1999950 -
2000000 42 6.105%
2000000 -
2000000 0 0.000%
2000050 -
2000100 0 0.000%
[...]
2000100 -
2015000 1 0.145% << BAD
2015000 -
2030000 6 0.872% << BAD
2030000 -
2045000 1 0.145% << BAD
2045000 < 0 0.000%
The worst boot was 2032.577 Mhz, over 1.5% off!
It appears that on rare occasions, mach_countup() is taking longer to
complete than necessary.
I suspect that this is caused by the CPU taking a periodic SMI
interrupt right at the end of the 30ms calibration loop. This would
cause the loop to delay while the SMI BIOS hander runs. The resulting
TSC value is beyond what it actually should be resulting in a higher
tsc_khz.
The below patch makes native_calculate_cpu_khz() take the best
(shortest duration, lowest khz) run of it's 3 calibration loops. If a
SMI goes off causing a bad result (long duration, higher khz) it will
be discarded.
With the patch applied, 300 boots of the same system produce good
results:
range of tsc_khz # of boots % of boots
---------------- ---------- ----------
<
1999750 0 0.000%
1999750 -
1999800 30 10.000%
1999800 -
1999850 166 55.333%
1999850 -
1999900 89 29.667%
1999900 -
1999950 15 5.000%
1999950 < 0 0.000%
Problem was found and tested against 2.6.18. Patch is against 2.6.22.
Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Tue, 13 Nov 2007 07:59:05 +0000 (23:59 -0800)]
Fix compat futex hangs.
[FUTEX]: Fix address computation in compat code.
[ Upstream commit:
3c5fd9c77d609b51c0bab682c9d40cbb496ec6f1 ]
compat_exit_robust_list() computes a pointer to the
futex entry in userspace as follows:
(void __user *)entry + futex_offset
'entry' is a 'struct robust_list __user *', and
'futex_offset' is a 'compat_long_t' (typically a 's32').
Things explode if the 32-bit sign bit is set in futex_offset.
Type promotion sign extends futex_offset to a 64-bit value before
adding it to 'entry'.
This triggered a problem on sparc64 running 32-bit applications which
would lock up a cpu looping forever in the fault handling for the
userspace load in handle_futex_death().
Compat userspace runs with address masking (wherein the cpu zeros out
the top 32-bits of every effective address given to a memory operation
instruction) so the sparc64 fault handler accounts for this by
zero'ing out the top 32-bits of the fault address too.
Since the kernel properly uses the compat_uptr interfaces, kernel side
accesses to compat userspace work too since they will only use
addresses with the top 32-bit clear.
Because of this compat futex layer bug we get into the following loop
when executing the get_user() load near the top of handle_futex_death():
1) load from address '0xfffffffff7f16bd8', FAULT
2) fault handler clears upper 32-bits, processes fault
for address '0xf7f16bd8' which succeeds
3) goto #1
I want to thank Bernd Zeimetz, Josip Rodin, and Fabio Massimo Di Nitto
for their tireless efforts helping me track down this bug.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Christoph Lameter [Mon, 5 Nov 2007 19:23:51 +0000 (11:23 -0800)]
SLUB: Fix memory leak by not reusing cpu_slab
backport of
05aa345034de6ae9c77fb93f6a796013641d57d5 from Linus's tree.
SLUB: Fix memory leak by not reusing cpu_slab
Fix the memory leak that may occur when we attempt to reuse a cpu_slab
that was allocated while we reenabled interrupts in order to be able to
grow a slab cache. The per cpu freelist may contain objects and in that
situation we may overwrite the per cpu freelist pointer loosing objects.
This only occurs if we find that the concurrently allocated slab fits
our allocation needs.
If we simply always deactivate the slab then the freelist will be properly
reintegrated and the memory leak will go away.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Fri, 16 Nov 2007 18:27:09 +0000 (10:27 -0800)]
Linux 2.6.22.13
Ilpo Järvinen [Wed, 14 Nov 2007 23:47:18 +0000 (15:47 -0800)]
TCP: Make sure write_queue_from does not begin with NULL ptr (CVE-2007-5501)
patch
96a2d41a3e495734b63bff4e5dd0112741b93b38 in mainline.
NULL ptr can be returned from tcp_write_queue_head to cached_skb
and then assigned to skb if packets_out was zero. Without this,
system is vulnerable to a carefully crafted ACKs which obviously
is remotely triggerable.
Besides, there's very little that needs to be done in sacktag
if there weren't any packets outstanding, just skipping the rest
doesn't hurt.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Roland McGrath [Wed, 14 Nov 2007 06:11:50 +0000 (22:11 -0800)]
wait_task_stopped: Check p->exit_state instead of TASK_TRACED (CVE-2007-5500)
patch
a3474224e6a01924be40a8255636ea5522c1023a in mainline
The original meaning of the old test (p->state > TASK_STOPPED) was
"not dead", since it was before TASK_TRACED existed and before the
state/exit_state split. It was a wrong correction in commit
14bf01bb0599c89fc7f426d20353b76e12555308 to make this test for
TASK_TRACED instead. It should have been changed when TASK_TRACED
was introducted and again when exit_state was introduced.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Kees Cook <kees@ubuntu.com>
Acked-by: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Mon, 5 Nov 2007 17:59:33 +0000 (09:59 -0800)]
Linux 2.6.22.12
Linus Torvalds [Mon, 29 Oct 2007 18:36:04 +0000 (11:36 -0700)]
Revert "x86_64: allocate sparsemem memmap above 4G"
patch
6a22c57b8d2a62dea7280a6b2ac807a539ef0716 in mainline.
This reverts commit
2e1c49db4c640b35df13889b86b9d62215ade4b6.
First off, testing in Fedora has shown it to cause boot failures,
bisected down by Martin Ebourne, and reported by Dave Jobes. So the
commit will likely be reverted in the 2.6.23 stable kernels.
Secondly, in the 2.6.24 model, x86-64 has now grown support for
SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
bug is not visible any more, it's become invisible due to the code just
being irrelevant and no longer enabled on the only architecture that
this ever affected.
backported to 2.6.22 by Chuck Ebbert
Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Martin Ebourne <fedora@ebourne.me.uk>
Cc: Zou Nan hai <nanhai.zou@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Milan Broz [Thu, 12 Jul 2007 16:28:00 +0000 (17:28 +0100)]
dm snapshot: fix invalidation deadlock
patch
fcac03abd325e4f7a4cc8fe05fea2793b1c8eb75 in mainline
Process persistent exception store metadata IOs in a separate thread.
A snapshot may become invalid while inside generic_make_request().
A synchronous write is then needed to update the metadata while still
inside that function. Since the introduction of
md-dm-reduce-stack-usage-with-stacked-block-devices.patch this has to
be performed by a separate thread to avoid deadlock.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ingo Molnar [Fri, 19 Oct 2007 10:19:26 +0000 (12:19 +0200)]
x86: fix global_flush_tlb() bug
patch
9a24d04a3c26c223f22493492c5c9085b8773d4a upstream
While we were reviewing pageattr_32/64.c for unification,
Thomas Gleixner noticed the following serious SMP bug in
global_flush_tlb():
down_read(&init_mm.mmap_sem);
list_replace_init(&deferred_pages, &l);
up_read(&init_mm.mmap_sem);
this is SMP-unsafe because list_replace_init() done on two CPUs in
parallel can corrupt the list.
This bug has been introduced about a year ago in the 64-bit tree:
commit
ea7322decb974a4a3e804f96a0201e893ff88ce3
Author: Andi Kleen <ak@suse.de>
Date: Thu Dec 7 02:14:05 2006 +0100
[PATCH] x86-64: Speed and clean up cache flushing in change_page_attr
down_read(&init_mm.mmap_sem);
- dpage = xchg(&deferred_pages, NULL);
+ list_replace_init(&deferred_pages, &l);
up_read(&init_mm.mmap_sem);
the xchg() based version was SMP-safe, but list_replace_init() is not.
So this "cleanup" introduced a nasty bug.
why this bug never become prominent is a mystery - it can probably be
explained with the (still) relative obscurity of the x86_64 architecture.
the safe fix for now is to write-lock init_mm.mmap_sem.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Young [Thu, 18 Oct 2007 10:05:07 +0000 (03:05 -0700)]
param_sysfs_builtin memchr argument fix
patch
faf8c714f4508207a9c81cc94dafc76ed6680b44 in mainline.
If memchr argument is longer than strlen(kp->name), there will be some
weird result.
It will casuse duplicate filenames in sysfs for the "nousb". kernel
warning messages are as bellow:
sysfs: duplicate filename 'usbcore' can not be created
WARNING: at fs/sysfs/dir.c:416 sysfs_add_one()
[<
c01c4750>] sysfs_add_one+0xa0/0xe0
[<
c01c4ab8>] create_dir+0x48/0xb0
[<
c01c4b69>] sysfs_create_dir+0x29/0x50
[<
c024e0fb>] create_dir+0x1b/0x50
[<
c024e3b6>] kobject_add+0x46/0x150
[<
c024e2da>] kobject_init+0x3a/0x80
[<
c053b880>] kernel_param_sysfs_setup+0x50/0xb0
[<
c053b9ce>] param_sysfs_builtin+0xee/0x130
[<
c053ba33>] param_sysfs_init+0x23/0x60
[<
c024d062>] __next_cpu+0x12/0x20
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052a856>] do_initcalls+0x46/0x1e0
[<
c01bdb12>] create_proc_entry+0x52/0x90
[<
c0158d4c>] register_irq_proc+0x9c/0xc0
[<
c01bda94>] proc_mkdir_mode+0x34/0x50
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052aa92>] kernel_init+0x62/0xb0
[<
c0104f83>] kernel_thread_helper+0x7/0x14
=======================
kobject_add failed for usbcore with -EEXIST, don't try to register things with the same name in the same directory.
[<
c024e466>] kobject_add+0xf6/0x150
[<
c053b880>] kernel_param_sysfs_setup+0x50/0xb0
[<
c053b9ce>] param_sysfs_builtin+0xee/0x130
[<
c053ba33>] param_sysfs_init+0x23/0x60
[<
c024d062>] __next_cpu+0x12/0x20
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052a856>] do_initcalls+0x46/0x1e0
[<
c01bdb12>] create_proc_entry+0x52/0x90
[<
c0158d4c>] register_irq_proc+0x9c/0xc0
[<
c01bda94>] proc_mkdir_mode+0x34/0x50
[<
c052aa30>] kernel_init+0x0/0xb0
[<
c052aa92>] kernel_init+0x62/0xb0
[<
c0104f83>] kernel_thread_helper+0x7/0x14
=======================
Module 'usbcore' failed to be added to sysfs, error number -17
The system will be unstable now.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric Sandeen [Wed, 17 Oct 2007 06:27:15 +0000 (23:27 -0700)]
minixfs: limit minixfs printks on corrupted dir i_size (CVE-2006-6058)
patch
44ec6f3f89889a469773b1fd894f8fcc07c29cf in mainline
This attempts to address CVE-2006-6058
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058
first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html
Essentially a corrupted minix dir inode reporting a very large
i_size will loop for a very long time in minix_readdir, minix_find_entry,
etc, because on EIO they just move on to try the next page. This is
under the BKL, printk-storming as well. This can lock up the machine
for a very long time. Simply ratelimiting the printks gets things back
under control. Make the message a bit more informative while we're here.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Bodo Eggert <7eggert@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Roland Dreier [Sun, 28 Oct 2007 17:14:32 +0000 (10:14 -0700)]
IB/uverbs: Fix checking of userspace object ownership
Upstream as
cbfb50e6e2e9c580848c0f51d37c24cdfb1cb704
Commit
9ead190b ("IB/uverbs: Don't serialize with ib_uverbs_idr_mutex")
rewrote how userspace objects are looked up in the uverbs module's
idrs, and introduced a severe bug in the process: there is no checking
that an operation is being performed by the right process any more.
Fix this by adding the missing check of uobj->context in __idr_get_uobj().
Apparently everyone is being very careful to only touch their own
objects, because this bug was introduced in June 2006 in 2.6.18, and
has gone undetected until now.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Sun, 12 Aug 2007 15:46:36 +0000 (15:46 +0000)]
genirq: mark io_apic level interrupts to avoid resend
patch
cc75b92d11384ba14f93828a2a0040344ae872e7 in mainline.
Level type interrupts do not need to be resent. It was also found that
some chipsets get confused in case of the resend.
Mark the ioapic level type interrupts as such to avoid the resend
functionality in the generic irq code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Sun, 12 Aug 2007 15:46:35 +0000 (15:46 +0000)]
genirq: suppress resend of level interrupts
patch
2464286ace55b3abddfb9cc30ab95e2dac1de9a6 in mainline.
Level type interrupts are resent by the interrupt hardware when they are
still active at irq_enable().
Suppress the resend mechanism for interrupts marked as level.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Sun, 12 Aug 2007 15:46:34 +0000 (15:46 +0000)]
genirq: cleanup mismerge artifact
patch
496634217e5671ed876a0348e9f5b7165e830b20 in mainline.
Commit
5a43a066b11ac2fe84cf67307f20b83bea390f83: "genirq: Allow fasteoi
handler to retrigger disabled interrupts" was erroneously applied to
handle_level_irq(). This added the irq retrigger / resend functionality
to the level irq handler.
Revert the offending bits.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Fri, 2 Nov 2007 15:48:19 +0000 (08:48 -0700)]
Linux 2.6.22.11
Gregory Haskins [Wed, 31 Oct 2007 15:44:05 +0000 (11:44 -0400)]
lockdep: fix mismatched lockdep_depth/curr_chain_hash
patch
3aa416b07f0adf01c090baab26fb70c35ec17623 in mainline.
lockdep: fix mismatched lockdep_depth/curr_chain_hash
It is possible for the current->curr_chain_key to become inconsistent with the
current index if the chain fails to validate. The end result is that future
lock_acquire() operations may inadvertently fail to find a hit in the cache
resulting in a new node being added to the graph for every acquire.
[ peterz: this might explain some of the lockdep is so _slow_ complaints. ]
[ mingo: this does not impact the correctness of validation, but may slow
down future operations significantly, if the chain gets very long. ]
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kumar Gala [Thu, 11 Oct 2007 22:07:34 +0000 (17:07 -0500)]
POWERPC: Fix handling of stfiwx math emulation
patch
ba02946a903015840ef672ccc9dc8620a7e83de6 in mainline
Its legal for the stfiwx instruction to have RA = 0 as part of its
effective address calculation. This is illegal for all other XE
form instructions.
Add code to compute the proper effective address for stfiwx if
RA = 0 rather than treating it as illegal.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Airlie [Tue, 16 Oct 2007 00:05:49 +0000 (01:05 +0100)]
i915: fix vbl swap allocation size.
This is upstream as
54583bf4efda79388fc13163e35c016c8bc5de81
Oops...
Signed-off-by: Dave Airlie <airlied@linux.ie>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jean Delvare [Mon, 15 Oct 2007 13:02:42 +0000 (15:02 +0200)]
hwmon/w83627hf: Don't assume bank 0
Already in Linus' tree:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=
d58df9cd788e6fb4962e1c8d5ba7b8b95d639a44
The bank switching code assumes that the bank selector is set to 0
when the driver is loaded. This might not be the case. This is exactly
the same bug as was fixed in the w83627ehf driver two months ago:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=
0956895aa6f8dc6a33210967252fd7787652537d
In practice, this bug was causing the sensor thermal types to be
improperly reported for my W83627THF the first time I was loading the
w83627hf driver. From the driver history, I'd say that it has been
broken since September 2005 (when we stopped resetting the chip by
default at driver load.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jean Delvare [Mon, 15 Oct 2007 12:32:27 +0000 (14:32 +0200)]
hwmon/w83627hf: Fix setting fan min right after driver load
Already in Linus' tree:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=
c09c5184a26158da32801e89d5849d774605f0dd
We need to read the fan clock dividers at initialization time,
otherwise the code in store_fan_min() may use uninitialized values.
That's pretty much the same bug and same fix as for the w83627ehf
driver last month.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jean Delvare [Mon, 15 Oct 2007 12:02:36 +0000 (14:02 +0200)]
hwmon/lm87: Disable VID when it should be
Already in Linus' tree:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=
889af3d5d9586db795a06c619e416b4baee11da8
A stupid bit shifting bug caused the VID value to be always exported
even when the hardware is configured for something different.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jean Delvare [Mon, 15 Oct 2007 11:49:50 +0000 (13:49 +0200)]
hwmon/lm87: Fix a division by zero
Already in Linus' tree:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=
b965d4b7f614522170af6a7e450be0333792ccd2
Missing parentheses in the definition of FAN_FROM_REG cause a
division by zero for a specific register value.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ian Armstrong [Sun, 14 Oct 2007 15:53:46 +0000 (11:53 -0400)]
V4L: ivtv: fix udma yuv bug
Based on
cb50f548c0ee9b2aac39743fc4021a7188825a98 in mainline
[PATCH] V4L: ivtv: fix udma yuv bug
Using udma yuv causes the driver to become locked into that mode. This
prevents use of the mpeg decoder & non-udma yuv output.
This patch clears the operating mode when the device is closed.
Signed-off-by: Ian Armstrong <ian@iarmst.demon.co.uk>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Peter Korsgaard [Fri, 12 Oct 2007 12:14:02 +0000 (14:14 +0200)]
dm9601: Fix receive MTU
patch
f662fe5a0b144efadbfc00e8040e603ec318746e in mainline.
dm9601: Fix receive MTU
dm9601 didn't take the ethernet header into account when calculating
RX MTU, causing packets bigger than 1486 to fail.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Garzik [Tue, 17 Jul 2007 04:01:09 +0000 (00:01 -0400)]
netdrvr: natsemi: Fix device removal bug
This episode illustrates how an overused warning can train people to
ignore that warning, which winds up hiding bugs.
The warning
drivers/net/natsemi.c: In function ‘natsemi_remove1’:
drivers/net/natsemi.c:3222: warning: ignoring return value of
‘device_create_file’, declared with attribute warn_unused_result
is oft-ignored, even though at close inspection one notices this occurs
in the /remove/ function, not normally where creation occurs. A quick
s/create/remove/ and we are fixed, with the warning gone.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stefan Richter [Wed, 10 Oct 2007 20:37:25 +0000 (22:37 +0200)]
firewire: fix unloading of fw-ohci while devices are attached
Fix panic in run_timer_softirq right after "modprobe -r firewire-ohci"
if a FireWire disk was attached and firewire-sbp2 loaded.
Same as commit
8a2d9ed3210464d22fccb9834970629c1c36fa36.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andy Green [Wed, 10 Oct 2007 02:46:33 +0000 (22:46 -0400)]
Add get_unaligned to ieee80211_get_radiotap_len
patch
dfe6e81deaa79c85086c0cc8d85b229e444ab97f in mainline.
ieee80211_get_radiotap_len() tries to dereference radiotap length without
taking care that it is completely unaligned and get_unaligned()
is required.
Signed-off-by: Andy Green <andy@warmcat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Al Viro [Wed, 10 Oct 2007 02:46:37 +0000 (22:46 -0400)]
libertas: more endianness breakage
based on patch
8362cd413e8116306fafbaf414f0419db0595142 in mainline.
domain->header.len is le16 and has just been assigned
cpu_to_le16(arithmetical expression). And all fields of adapter->logmsg
are __le32; not a single 16-bit among them...
That's incremental to the previous one
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Al Viro [Wed, 10 Oct 2007 02:46:36 +0000 (22:46 -0400)]
libertas: fix endianness breakage
patch
5707708111ca6c4e9a1160acffdc98a98d95e462 in mainline.
wep->keytype[] is u8
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
John W. Linville [Wed, 10 Oct 2007 02:46:35 +0000 (22:46 -0400)]
mac80211: filter locally-originated multicast frames
patch
b331615722779b078822988843ddffd4eaec9f83 in mainline.
In STA mode, the AP will echo our traffic. This includes multicast
traffic.
Receiving these frames confuses some protocols and applications,
notably IPv6 Duplicate Address Detection.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric Dumazet [Wed, 10 Oct 2007 10:28:33 +0000 (03:28 -0700)]
Fix TCP initial sequence number selection.
changeset
162f6690a65075b49f242d3c8cdb5caaa959a060 in mainline.
TCP V4 sequence numbers are 32bits, and RFC 793 assumed a 250 KHz clock.
In order to follow network speed increase, we can use a faster clock, but
we should limit this clock so that the delay between two rollovers is
greater than MSL (TCP Maximum Segment Lifetime : 2 minutes)
Choosing a 64 nsec clock should be OK, since the rollovers occur every
274 seconds.
Problem spotted by Denys Fedoryshchenko
[ This bug was introduced by
f85958151900f9d30fa5ff941b0ce71eaa45a7de ]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Wed, 10 Oct 2007 10:27:19 +0000 (03:27 -0700)]
Fix TCP MD5 on big-endian.
changeset
f8ab18d2d987a59ccbf0495032b2aef05b730037 in mainline.
Based upon a report and initial patch by Peter Lieven.
tcp4_md5sig_key and tcp6_md5sig_key need to start with
the exact same members as tcp_md5sig_key. Because they
are both cast to that type by tcp_v{4,6}_md5_do_lookup().
Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
length instead of a u8, which is what tcp_md5sig_key
uses. This just so happens to work by accident on
little-endian, but on big-endian it doesn't.
Instead of casting, just place tcp_md5sig_key as the first member of
the address-family specific structures, adjust the access sites, and
kill off the ugly casts.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ilpo Järvinen [Wed, 10 Oct 2007 10:25:53 +0000 (03:25 -0700)]
Fix TCP's ->fastpath_cnt_hit handling.
changeset
48611c47d09023d9356e78550d1cadb8d61da9c8 in mainline.
When only GSO skb was partially ACKed, no hints are reset,
therefore fastpath_cnt_hint must be tweaked too or else it can
corrupt fackets_out. The corruption to occur, one must have
non-trivial ACK/SACK sequence, so this bug is not very often
that harmful. There's a fackets_out state reset in TCP because
fackets_out is known to be inaccurate and that fixes the issue
eventually anyway.
In case there was also at least one skb that got fully ACKed,
the fastpath_skb_hint is set to NULL which causes a recount for
fastpath_cnt_hint (the old value won't be accessed anymore),
thus it can safely be decremented without additional checking.
Reported by Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Wed, 10 Oct 2007 10:22:30 +0000 (03:22 -0700)]
Fix sys_ipc() SEMCTL on sparc64.
changeset
6536a6b331d3225921c398eb7c6e4ecedb9b05e0 from mainline
Thanks to Tom Callaway for the excellent bug report and
test case.
sys_ipc() has several problems, most to due with semaphore
call handling:
1) 'err' return should be a 'long'
2) "union semun" is passed in a register on 64-bit compared
to 32-bit which provides it on the stack and therefore
by reference
3) Second and third arguments to SEMCTL are swapped compared
to 32-bit.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Wed, 10 Oct 2007 10:21:37 +0000 (03:21 -0700)]
Fix zero length socket write() semantics.
changeset
e79ad711a0108475c1b3a03815527e7237020b08 from mainline.
This fixes kernel bugzilla #5731
It should generate an empty packet for datagram protocols when the
socket is connected, for one.
The check is doubly-wrong because all that a write() can be is a
sendmsg() call with a NULL msg_control and a single entry iovec. No
special semantics should be assigned to it, therefore the zero length
check should be removed entirely.
This matches the behavior of BSD and several other systems.
Alan Cox notes that SuSv3 says the behavior of a zero length write on
non-files is "unspecified", but that's kind of useless since BSD has
defined this behavior for a quarter century and BSD is essentially
what application folks code to.
Based upon a patch from Stephen Hemminger.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alexey Dobriyan [Wed, 10 Oct 2007 10:20:01 +0000 (03:20 -0700)]
Fix ROSE module unload oops.
changeset
891e6a931255238dddd08a7b306871240961a27f from mainline.
Commit
a3d384029aa304f8f3f5355d35f0ae274454f7cd aka
"[AX.25]: Fix unchecked rose_add_loopback_neigh uses"
transformed rose_loopback_neigh var into statically allocated one.
However, on unload it will be kfree's which can't work.
Steps to reproduce:
modprobe rose
rmmod rose
BUG: unable to handle kernel NULL pointer dereference at virtual address
00000008
printing eip:
c014c664
*pde =
00000000
Oops: 0000 [#1]
PREEMPT DEBUG_PAGEALLOC
Modules linked in: rose ax25 fan ufs loop usbhid rtc snd_intel8x0 snd_ac97_codec ehci_hcd ac97_bus uhci_hcd thermal usbcore button processor evdev sr_mod cdrom
CPU: 0
EIP: 0060:[<
c014c664>] Not tainted VLI
EFLAGS:
00210086 (2.6.23-rc9 #3)
EIP is at kfree+0x48/0xa1
eax:
00000556 ebx:
c1734aa0 ecx:
f6a5e000 edx:
f7082000
esi:
00000000 edi:
f9a55d20 ebp:
00200287 esp:
f6a5ef28
ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0068
Process rmmod (pid: 1823, ti=
f6a5e000 task=
f7082000 task.ti=
f6a5e000)
Stack:
f9a55d20 f9a5200c 00000000 00000000 00000000 f6a5e000 f9a5200c f9a55a00
00000000 bf818cf0 f9a51f3f f9a55a00 00000000 c0132c60 65736f72 00000000
f69f9630 f69f9528 c014244a f6a4e900 00200246 f7082000 c01025e6 00000000
Call Trace:
[<
f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
[<
f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
[<
f9a51f3f>] rose_exit+0x4c/0xd5 [rose]
[<
c0132c60>] sys_delete_module+0x15e/0x186
[<
c014244a>] remove_vma+0x40/0x45
[<
c01025e6>] sysenter_past_esp+0x8f/0x99
[<
c012bacf>] trace_hardirqs_on+0x118/0x13b
[<
c01025b6>] sysenter_past_esp+0x5f/0x99
=======================
Code: 05 03 1d 80 db 5b c0 8b 03 25 00 40 02 00 3d 00 40 02 00 75 03 8b 5b 0c 8b 73 10 8b 44 24 18 89 44 24 04 9c 5d fa e8 77 df fd ff <8b> 56 08 89 f8 e8 84 f4 fd ff e8 bd 32 06 00 3b 5c 86 60 75 0f
EIP: [<
c014c664>] kfree+0x48/0xa1 SS:ESP 0068:
f6a5ef28
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Brian Haley [Wed, 10 Oct 2007 10:19:06 +0000 (03:19 -0700)]
Fix ipv6 redirect processing, leads to TAHI failures.
changeset
bf0b48dfc368c07c42b5a3a5658c8ee81b4283ac from mainline.
When the ICMPv6 Target address is multicast, Linux processes the
redirect instead of dropping it. The problem is in this code in
ndisc_redirect_rcv():
if (ipv6_addr_equal(dest, target)) {
on_link = 1;
} else if (!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) {
ND_PRINTK2(KERN_WARNING
"ICMPv6 Redirect: target address is not
link-local.\n");
return;
}
This second check will succeed if the Target address is, for example,
FF02::1 because it has link-local scope. Instead, it should be checking
if it's a unicast link-local address, as stated in RFC 2461/4861 Section
8.1:
- The ICMP Target Address is either a link-local address (when
redirected to a router) or the same as the ICMP Destination
Address (when redirected to the on-link destination).
I know this doesn't explicitly say unicast link-local address, but it's
implied.
This bug is preventing Linux kernels from achieving IPv6 Logo Phase II
certification because of a recent error that was found in the TAHI test
suite - Neighbor Disovery suite test 206 (v6LC.2.3.6_G) had the
multicast address in the Destination field instead of Target field, so
we were passing the test. This won't be the case anymore.
The patch below fixes this problem, and also fixes ndisc_send_redirect()
to not send an invalid redirect with a multicast address in the Target
field. I re-ran the TAHI Neighbor Discovery section to make sure Linux
passes all 245 tests now.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mitsuru Chinen [Wed, 10 Oct 2007 10:16:26 +0000 (03:16 -0700)]
Fix some cases of missed IPV6 DAD
changeset
0fcace22d38ce9216f5ba52f929a99d284aa7e49 from mainline
To judge the timing for DAD, netif_carrier_ok() is used. However,
there is a possibility that dev->qdisc stays noop_qdisc even if
netif_carrier_ok() returns true. In that case, DAD NS is not sent out.
We need to defer the IPv6 device initialization until a valid qdisc
is specified.
Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>