git.fsl.cs.sunysb.edu Git - unionfs-3.17.y.git/log

tg3: Allow for recieve of full-size 8021AD frames

[ Upstream commit 7d3083ee36b51e425b6abd76778a2046906b0fd3 ]

When receiving a vlan-tagged frame that still contains
a vlan header, the length of the packet will be greater
then MTU+ETH_HLEN since it will account of the extra
vlan header. TG3 checks this for the case for 802.1Q,
but not for 802.1ad. As a result, full sized 802.1ad
frames get dropped by the card.

Add a check for 802.1ad protocol when receving full
sized frames.

Suggested-by: Prashant Sreedharan <prashant@broadcom.com>
CC: Prashant Sreedharan <prashant@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tg3: Work around HW/FW limitations with vlan encapsulated frames

[ Upstream commit 476c18850c6cbaa3f2bb661ae9710645081563b9 ]

TG3 appears to have an issue performing TSO and checksum offloading
correclty when the frame has been vlan encapsulated (non-accelrated).
In these cases, tcp checksum is not correctly updated.

This patch attempts to work around this issue. After the patch,
802.1ad vlans start working correctly over tg3 devices.

CC: Prashant Sreedharan <prashant@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

macvlan: allow to enqueue broadcast pkt on virtual device

[ Upstream commit 07d92d5cc977a7fe1e683e1d4a6f723f7f2778cb ]

Since commit 412ca1550cbe ("macvlan: Move broadcasts into a work queue"), the
driver uses tx_queue_len of the master device as the limit of packets enqueuing.
Problem is that virtual drivers have this value set to 0, thus all broadcast
packets were rejected.
Because tx_queue_len was arbitrarily chosen, I replace it with a static limit
of 1000 (also arbitrarily chosen).

CC: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Thibaut Collet <thibaut.collet@6wind.com>
Suggested-by: Thibaut Collet <thibaut.collet@6wind.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: allow macvlans to move to net namespace

[ Upstream commit 0d0162e7a33d3710b9604e7c68c0f31f5c457428 ]

I cannot move a macvlan interface created on top of a bonding interface
to a different namespace:

% ip netns add dummy0
% ip link add link bond0 mac0 type macvlan
% ip link set mac0 netns dummy0
RTNETLINK answers: Invalid argument
%

The problem seems to be that commit f9399814927a ("bonding: Don't allow
bond devices to change network namespaces.") sets NETIF_F_NETNS_LOCAL
on bonding interfaces, and commit 797f87f83b60 ("macvlan: fix netdev
feature propagation from lower device") causes macvlan interfaces
to inherit its features from the lower device.

NETIF_F_NETNS_LOCAL should not be inherited from the lower device
by a macvlan.
Patch tested on 3.16.

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bridge: Fix br_should_learn to check vlan_enabled

[ Upstream commit c095f248e63ada504dd90c90baae673ae10ee3fe ]

As Toshiaki Makita pointed out, the BRIDGE_INPUT_SKB_CB will
not be initialized in br_should_learn() as that function
is called only from br_handle_local_finish(). That is
an input handler for link-local ethernet traffic so it perfectly
correct to check br->vlan_enabled here.

Reported-by: Toshiaki Makita<toshiaki.makita1@gmail.com>
Fixes: 20adfa1 bridge: Check if vlan filtering is enabled only once.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bridge: Check if vlan filtering is enabled only once.

[ Upstream commit 20adfa1a81af00bf2027644507ad4fa9cd2849cf ]

The bridge code checks if vlan filtering is enabled on both
ingress and egress.   When the state flip happens, it
is possible for the bridge to currently be forwarding packets
and forwarding behavior becomes non-deterministic.  Bridge
may drop packets on some interfaces, but not others.

This patch solves this by caching the filtered state of the
packet into skb_cb on ingress.  The skb_cb is guaranteed to
not be over-written between the time packet entres bridge
forwarding path and the time it leaves it.  On egress, we
can then check the cached state to see if we need to
apply filtering information.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: filter: fix possible use after free

[ No appicable upstream commit, this bug has been subsequently been
  fixed as a side effect of other changes. ]

If kmemdup() fails, we free fp->orig_prog and return -ENOMEM

sk_attach_filter()
-> sk_filter_uncharge(sk, fp)
  -> sk_filter_release(fp)
   -> call_rcu(&fp->rcu, sk_filter_release_rcu)
    -> sk_filter_release_rcu()
     -> sk_release_orig_filter()
        fprog = fp->orig_prog; // not NULL, but points to freed memory
  kfree(fprog->filter); // use after free, potential corruption
          kfree(fprog); // double free or corruption

Note: This was fixed in 3.17+ with commit 278571baca2a
("net: filter: simplify socket charging")

Found by AddressSanitizer

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: a3ea269b8bcdb ("net: filter: keep original BPF program around")
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bonding: fix div by zero while enslaving and transmitting

[ Upstream commit 9a72c2da690d78e93cff24b9f616412508678dd5 ]

The problem is that the slave is first linked and slave_cnt is
incremented afterwards leading to a div by zero in the modes that use it
as a modulus. What happens is that in bond_start_xmit()
bond_has_slaves() is used to evaluate further transmission and it becomes
true after the slave is linked in, but when slave_cnt is used in the xmit
path it is still 0, so fetch it once and transmit based on that. Since
it is used only in round-robin and XOR modes, the fix is only for them.
Thanks to Eric Dumazet for pointing out the fault in my first try to fix
this.

Call trace (took it out of net-next kernel, but it's the same with net):
[46934.330038] divide error: 0000 [#1] SMP
[46934.330041] Modules linked in: bonding(O) 9p fscache
snd_hda_codec_generic crct10dif_pclmul
[46934.330041] bond0: Enslaving eth1 as an active interface with an up
link
[46934.330051]  ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio
ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc
serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console
snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore
virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic
pata_acpi floppy [last unloaded: bonding]
[46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G           O
3.17.0-rc4+ #27
[46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti:
ffff88005b728000
[46934.330059] RIP: 0010:[<ffffffffa0198c33>]  [<ffffffffa0198c33>]
bond_start_xmit+0x1c3/0x450 [bonding]
[46934.330060] RSP: 0018:ffff88005b72b7f8  EFLAGS: 00010246
[46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX:
000000000000002a
[46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI:
ffff88004b077940
[46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09:
ffff88004a83e000
[46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12:
ffff88004b3f0500
[46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15:
ffff88004b077940
[46934.330063] FS:  00007fbd91a4c740(0000) GS:ffff88005f080000(0000)
knlGS:0000000000000000
[46934.330064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4:
00000000000406e0
[46934.330069] Stack:
[46934.330071]  ffffffff811e6169 00000000e772fa05 ffff88004b077000
ffff88004b3f0500
[46934.330072]  ffffffff81d17d18 000000000000002a 0000000000000000
ffff88005b72b8a0
[46934.330073]  ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4
ffff88005b302000
[46934.330073] Call Trace:
[46934.330077]  [<ffffffff811e6169>] ?
__kmalloc_node_track_caller+0x119/0x300
[46934.330084]  [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410
[46934.330086]  [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90
[46934.330088]  [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590
[46934.330089]  [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20
[46934.330090]  [<ffffffff8168f022>] arp_xmit+0x22/0x60
[46934.330091]  [<ffffffff8168f090>] arp_send.part.16+0x30/0x40
[46934.330092]  [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0
[46934.330094]  [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0
[46934.330096]  [<ffffffff8162875a>] neigh_probe+0x4a/0x70
[46934.330097]  [<ffffffff8162979c>] __neigh_event_send+0xac/0x230
[46934.330098]  [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220
[46934.330100]  [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0
[46934.330101]  [<ffffffff81660478>] ip_finish_output+0x1f8/0x860
[46934.330102]  [<ffffffff81661f08>] ip_output+0x58/0x90
[46934.330103]  [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0
[46934.330104]  [<ffffffff81661640>] ip_local_out_sk+0x30/0x40
[46934.330105]  [<ffffffff81662a66>] ip_send_skb+0x16/0x50
[46934.330106]  [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40
[46934.330107]  [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30
[46934.330110]  [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60
[46934.330111]  [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0
[46934.330113]  [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0
[46934.330114]  [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0
[46934.330115] bond0: Adding slave eth2
[46934.330116]  [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0
[46934.330118]  [<ffffffff81603248>] ?
move_addr_to_kernel.part.20+0x28/0x80
[46934.330121]  [<ffffffff811b4477>] ? might_fault+0x47/0x50
[46934.330122]  [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0
[46934.330125]  [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530
[46934.330127]  [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50
[46934.330129]  [<ffffffff81242b38>] ? fsnotify+0x238/0x310
[46934.330130]  [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90
[46934.330131]  [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20
[46934.330134]  [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b
[46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9
1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c
89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00
[46934.330146] RIP  [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450
[bonding]
[46934.330146]  RSP <ffff88005b72b7f8>

CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
Fixes: 278b208375 ("bonding: initial RCU conversion")
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipv6: restore the behavior of ipv6_sock_ac_drop()

[ Upstream commit de185ab46cb02df9738b0d898b0c3a89181c5526 ]

It is possible that the interface is already gone after joining
the list of anycast on this interface as we don't hold a refcount
for the device, in this case we are safe to ignore the error.

What's more important, for API compatibility we should not
change this behavior for applications even if it were correct.

Fixes: commit a9ed4a2986e13011 ("ipv6: fix rtnl locking in setsockopt for anycast and multicast")
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

l2tp: fix race while getting PMTU on PPP pseudo-wire

[ Upstream commit eed4d839b0cdf9d84b0a9bc63de90fd5e1e886fb ]

Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.

The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get()
could return NULL if tunnel->sock->sk_dst_cache was reset just before the
call, thus making dst_mtu() dereference a NULL pointer:

[ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
[ 1937.664005] Oops: 0000 [#1] SMP
[ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
[ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G           O   3.17.0-rc1 #1
[ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
[ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
[ 1937.664005] RIP: 0010:[<ffffffffa049db88>]  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] RSP: 0018:ffff8800c43c7de8  EFLAGS: 00010282
[ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
[ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
[ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
[ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
[ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
[ 1937.664005] FS:  00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
[ 1937.664005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
[ 1937.664005] Stack:
[ 1937.664005]  ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
[ 1937.664005]  ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
[ 1937.664005]  ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
[ 1937.664005] Call Trace:
[ 1937.664005]  [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
[ 1937.664005]  [<ffffffff81109b57>] ? might_fault+0x9e/0xa5
[ 1937.664005]  [<ffffffff81109b0e>] ? might_fault+0x55/0xa5
[ 1937.664005]  [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26
[ 1937.664005]  [<ffffffff81309196>] SYSC_connect+0x87/0xb1
[ 1937.664005]  [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56
[ 1937.664005]  [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1
[ 1937.664005]  [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1937.664005]  [<ffffffff8114c262>] ? spin_lock+0x9/0xb
[ 1937.664005]  [<ffffffff813092b4>] SyS_connect+0x9/0xb
[ 1937.664005]  [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b
[ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
[ 1937.664005] RIP  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005]  RSP <ffff8800c43c7de8>
[ 1937.664005] CR2: 0000000000000020
[ 1939.559375] ---[ end trace 82d44500f28f8708 ]---

Fixes: f34c4a35d879 ("l2tp: take PMTU from tunnel UDP socket")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipv6: fix rtnl locking in setsockopt for anycast and multicast

[ Upstream commit a9ed4a2986e13011fcf4ed2d1a1647c53112f55b ]

Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST
triggers the assertion in addrconf_join_solict()/addrconf_leave_solict()

ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to
take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with
ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before
calling ipv6_dev_mc_inc/dec.

This patch moves ASSERT_RTNL() up a level in the call stack.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: fix checksum features handling in netif_skb_features()

[ Upstream commit db115037bb57cdfe97078b13da762213f7980e81 ]

This is follow-up to

da08143b8520 ("vlan: more careful checksum features handling")

which introduced more careful feature intersection in vlan code,
taking into account that HW_CSUM should be considered superset
of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features()
in order to avoid offloading mismatch warning when vlan is
created on top of a bond consisting of slaves supporting IP/IPv6
checksumming but not vlan Tx offloading.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vxlan: fix incorrect initializer in union vxlan_addr

[ Upstream commit a45e92a599e77ee6a850eabdd0141633fde03915 ]

The first initializer in the following

        union vxlan_addr ipa = {
            .sin.sin_addr.s_addr = tip,
            .sa.sa_family = AF_INET,
        };

is optimised away by the compiler, due to the second initializer,
therefore initialising .sin.sin_addr.s_addr always to 0.
This results in netlink messages indicating a L3 miss never contain the
missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions.
The problem affects user space programs relying on an IP address being
sent as part of a netlink message indicating a L3 miss.

Changing
            .sa.sa_family = AF_INET,
to
            .sin.sin_family = AF_INET,
fixes the problem.

Signed-off-by: Gerhard Stenzel <gerhard.stenzel@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

openvswitch: fix panic with multiple vlan headers

[ Upstream commit 2ba5af42a7b59ef01f9081234d8855140738defd ]

When there are multiple vlan headers present in a received frame, the first
one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
skb beyond the VLAN TPID may be still non-linear, including the inner TCI
and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.

This leads to two things:

1. Part of the resulting ethernet header is in the non-linear part of the
   skb. When eth_type_trans is called later as the result of
   OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
   is in fact accessing random data when it reads past the TPID.

2. network_header points into the ethernet header instead of behind it.
   mac_len is set to a wrong value (10), too.

Reported-by: Yulong Pei <ypei@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: ipv6: fib: don't sleep inside atomic lock

[ Upstream commit 793c3b4000a1ef611ae7e5c89bd2a9c6b776cb5e ]

The function fib6_commit_metrics() allocates a piece of memory in mode
GFP_KERNEL while holding an atomic lock from higher up in the stack, in
the function __ip6_ins_rt(). This produces the following BUG:

> BUG: sleeping function called from invalid context at mm/slub.c:1250
> in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
> 2 locks held by dhcpcd/2909:
>  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81978e67>] rtnl_lock+0x17/0x20
>  #1:  (&tb->tb6_lock){++--+.}, at: [<ffffffff81a6951a>] ip6_route_add+0x65a/0x800
> CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
> Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
>  0000000000000008 ffff8800c8f13858 ffffffff81af135a 0000000000000000
>  ffff880212202430 ffff8800c8f13878 ffffffff810f8d3a ffff880212202c98
>  0000000000000010 ffff8800c8f138c8 ffffffff8121ad0e 0000000000000001
> Call Trace:
>  [<ffffffff81af135a>] dump_stack+0x4e/0x68
>  [<ffffffff810f8d3a>] __might_sleep+0x10a/0x120
>  [<ffffffff8121ad0e>] kmem_cache_alloc_trace+0x4e/0x190
>  [<ffffffff81a6bcd6>] ? fib6_commit_metrics+0x66/0x110
>  [<ffffffff81a6bcd6>] fib6_commit_metrics+0x66/0x110
>  [<ffffffff81a6cbf3>] fib6_add+0x883/0xa80
>  [<ffffffff81a6951a>] ? ip6_route_add+0x65a/0x800
>  [<ffffffff81a69535>] ip6_route_add+0x675/0x800
>  [<ffffffff81a68f2a>] ? ip6_route_add+0x6a/0x800
>  [<ffffffff81a6990c>] inet6_rtm_newroute+0x5c/0x80
>  [<ffffffff8197cf01>] rtnetlink_rcv_msg+0x211/0x260
>  [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20
>  [<ffffffff81119708>] ? lock_release_holdtime+0x28/0x180
>  [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20
>  [<ffffffff8197ccf0>] ? __rtnl_unlock+0x20/0x20
>  [<ffffffff819a989e>] netlink_rcv_skb+0x6e/0xd0
>  [<ffffffff81978ee5>] rtnetlink_rcv+0x25/0x40
>  [<ffffffff819a8e59>] netlink_unicast+0xd9/0x180
>  [<ffffffff819a9600>] netlink_sendmsg+0x700/0x770
>  [<ffffffff81103735>] ? local_clock+0x25/0x30
>  [<ffffffff8194e83c>] sock_sendmsg+0x6c/0x90
>  [<ffffffff811f98e3>] ? might_fault+0xa3/0xb0
>  [<ffffffff8195ca6d>] ? verify_iovec+0x7d/0xf0
>  [<ffffffff8194ec3e>] ___sys_sendmsg+0x37e/0x3b0
>  [<ffffffff8111ef15>] ? trace_hardirqs_on_caller+0x185/0x220
>  [<ffffffff81af979e>] ? mutex_unlock+0xe/0x10
>  [<ffffffff819a55ec>] ? netlink_insert+0xbc/0xe0
>  [<ffffffff819a65e5>] ? netlink_autobind.isra.30+0x125/0x150
>  [<ffffffff819a6520>] ? netlink_autobind.isra.30+0x60/0x150
>  [<ffffffff819a84f9>] ? netlink_bind+0x159/0x230
>  [<ffffffff811f989a>] ? might_fault+0x5a/0xb0
>  [<ffffffff8194f25e>] ? SYSC_bind+0x7e/0xd0
>  [<ffffffff8194f8cd>] __sys_sendmsg+0x4d/0x80
>  [<ffffffff8194f912>] SyS_sendmsg+0x12/0x20
>  [<ffffffff81afc692>] system_call_fastpath+0x16/0x1b

Fixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC.

Signed-off-by: Benjamin Block <bebl@mageta.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bnx2x: Revert UNDI flushing mechanism

[ Upstream commit 7c3afd85dc1610bb2fc049644cd1b52c7af96f98 ]

Commit 91ebb929b6f8 ("bnx2x: Add support for Multi-Function UNDI") [which was
later supposedly fixed by de682941eef3 ("bnx2x: Fix UNDI driver unload")]
introduced a bug in which in some [yet-to-be-determined] scenarios the
alternative flushing mechanism which was to guarantee the Rx buffers are
empty before resetting them during device probe will fail.
If this happens, when device will be loaded once more a fatal attention will
occur; Since this most likely happens in boot from SAN scenarios, the machine
will fail to load.

Notice this may occur not only in the 'Multi-Function' scenario but in the
regular scenario as well, i.e., this introduced a regression in the driver's
ability to perform boot from SAN.

The patch reverts the mechanism and applies the old scheme to multi-function
devices as well as to single-function devices.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

packet: handle too big packets for PACKET_V3

[ Upstream commit dc808110bb62b64a448696ecac3938902c92e1ab ]

af_packet can currently overwrite kernel memory by out of bound
accesses, because it assumed a [new] block can always hold one frame.

This is not generally the case, even if most existing tools do it right.

This patch clamps too long frames as API permits, and issue a one time
error on syslog.

[ 394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82

In this example, packet header tp_snaplen was set to 3966,
and tp_len was set to 5042 (skb->len)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tipc: fix message importance range check

[ Upstream commit ac32c7f705692b92fe12dcbe88fe87136fdfff6f ]

Commit 3b4f302d8578 ("tipc: eliminate
redundant locking") introduced a bug by removing the sanity check
for message importance, allowing programs to assign any value to
the msg_user field. This will mess up the packet reception logic
and may cause random link resets.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: phy: smsc: move smsc_phy_config_init reset part in a soft_reset function

[ Upstream commit 21009686662fd21412ca35def7cb3cc8346e1c3d ]

On the one hand, phy_device.c provides a generic reset function if the phy
driver does not provide a soft_reset pointer. This generic reset does not take
into account the state of the phy, with a potential failure if the phy is in
powerdown mode. On the other hand, smsc driver provides a function with both
correct reset behaviour and configuration.

This patch moves the reset part into a new smsc_phy_reset function and provides
the soft_reset pointer to have a correct reset behaviour by default.

Signed-off-by: Gwenhael Goavec-Merou <gwenhael.goavec-merou@armadeus.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tcp: fix ssthresh and undo for consecutive short FRTO episodes

[ Upstream commit 0c9ab09223fe9922baeb22546c9a90d774a4bde6 ]

Fix TCP FRTO logic so that it always notices when snd_una advances,
indicating that any RTO after that point will be a new and distinct
loss episode.

Previously there was a very specific sequence that could cause FRTO to
fail to notice a new loss episode had started:

(1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
(2) receiver ACKs packet 1
(3) FRTO sends 2 more packets
(4) RTO timer fires again (should start a new loss episode)

The problem was in step (3) above, where tcp_process_loss() returned
early (in the spot marked "Step 2.b"), so that it never got to the
logic to clear icsk_retransmits. Thus icsk_retransmits stayed
non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
icsk_retransmits, decide that this RTO is not a new episode, and
decide not to cut ssthresh and remember the current cwnd and ssthresh
for undo.

There were two main consequences to the bug that we have
observed. First, ssthresh was not decreased in step (4). Second, when
there was a series of such FRTO (1-4) sequences that happened to be
followed by an FRTO undo, we would restore the cwnd and ssthresh from
before the entire series started (instead of the cwnd and ssthresh
from before the most recent RTO). This could result in cwnd and
ssthresh being restored to values much bigger than the proper values.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()

[ Upstream commit 4fab9071950c2021d846e18351e0f46a1cffd67b ]

Make sure we use the correct address-family-specific function for
handling MTU reductions from within tcp_release_cb().

Previously AF_INET6 sockets were incorrectly always using the IPv6
code path when sometimes they were handling IPv4 traffic and thus had
an IPv4 dst.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Willem de Bruijn <willemb@google.com>
Fixes: 563d34d057862 ("tcp: dont drop MTU reduction indications")
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sit: Fix ipip6_tunnel_lookup device matching criteria

[ Upstream commit bc8fc7b8f825ef17a0fb9e68c18ce94fa66ab337 ]

As of 4fddbf5d78 ("sit: strictly restrict incoming traffic to tunnel link device"),
when looking up a tunnel, tunnel's underlying interface (t->parms.link)
is verified to match incoming traffic's ingress device.

However the comparison was incorrectly based on skb->dev->iflink.

Instead, dev->ifindex should be used, which correctly represents the
interface from which the IP stack hands the ipip6 packets.

This allows setting up sit tunnels bound to vlan interfaces (otherwise
incoming ipip6 traffic on the vlan interface was dropped due to
ipip6_tunnel_lookup match failure).

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tcp: don't use timestamp from repaired skb-s to calculate RTT (v2)

[ Upstream commit 9d186cac7ffb1831e9f34cb4a3a8b22abb9dd9d4 ]

We don't know right timestamp for repaired skb-s. Wrong RTT estimations
isn't good, because some congestion modules heavily depends on it.

This patch adds the TCPCB_REPAIRED flag, which is included in
TCPCB_RETRANS.

Thanks to Eric for the advice how to fix this issue.

This patch fixes the warning:
[  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
[  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
[  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
[  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
[  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
[  879.569982] Call Trace:
[  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
[  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
[  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
[  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
[  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
[  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
[  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
[  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
[  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
[  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
[  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
[  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
[  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
[  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---

v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
    where it's set for other skb-s.

Fixes: 431a91242d8d ("tcp: timestamp SYN+DATA messages")
Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "macvlan: simplify the structure port"

[ Upstream commit 5e3c516b512c0f8f18359413b04918f6347f67e7 ]

This reverts commit a188a54d11629bef2169052297e61f3767ca8ce5.

It causes crashes

====================
[   80.643286] BUG: unable to handle kernel NULL pointer dereference at 0000000000000878
[   80.670103] IP: [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
[   80.691289] PGD 22c102067 PUD 235bf0067 PMD 0
[   80.706611] Oops: 0002 [#1] SMP
[   80.717836] Modules linked in: macvlan nfsd lockd nfs_acl exportfs auth_rpcgss sunrpc oid_registry ioatdma ixgbe(-) mdio igb dca
[   80.757935] CPU: 37 PID: 6724 Comm: rmmod Not tainted 3.16.0-net-next-08-12-2014-FCoE+ #1
[   80.785688] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
[   80.820310] task: ffff880235a9eae0 ti: ffff88022e844000 task.ti: ffff88022e844000
[   80.845770] RIP: 0010:[<ffffffff810832e4>]  [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
[   80.875326] RSP: 0018:ffff88022e847b28  EFLAGS: 00010046
[   80.893251] RAX: 0000000000037a6a RBX: 0000000000000878 RCX: 0000000000000000
[   80.917187] RDX: ffff880235a9eae0 RSI: 0000000000000001 RDI: ffffffff810832db
[   80.941125] RBP: ffff88022e847b58 R08: 0000000000000000 R09: 0000000000000000
[   80.965056] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022e847b70
[   80.988994] R13: 0000000000000000 R14: ffff88022e847be8 R15: ffffffff81ebe440
[   81.012929] FS:  00007fab90b07700(0000) GS:ffff88043f7a0000(0000) knlGS:0000000000000000
[   81.040400] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   81.059757] CR2: 0000000000000878 CR3: 0000000235a42000 CR4: 00000000001407e0
[   81.083689] Stack:
[   81.090739]  ffff880235a9eae0 0000000000000878 ffff88022e847b70 0000000000000000
[   81.116253]  ffff88022e847be8 ffffffff81ebe440 ffff88022e847b98 ffffffff810847f1
[   81.141766]  ffff88022e847b78 0000000000000286 ffff880234200000 0000000000000000
[   81.167282] Call Trace:
[   81.175768]  [<ffffffff810847f1>] __cancel_work_timer+0x31/0x170
[   81.195985]  [<ffffffff8108494b>] cancel_work_sync+0xb/0x10
[   81.214769]  [<ffffffffa015ae68>] macvlan_port_destroy+0x28/0x60 [macvlan]
[   81.237844]  [<ffffffffa015b930>] macvlan_uninit+0x40/0x50 [macvlan]
[   81.259209]  [<ffffffff816bf6e2>] rollback_registered_many+0x1a2/0x2c0
[   81.281140]  [<ffffffff816bf81a>] unregister_netdevice_many+0x1a/0xb0
[   81.302786]  [<ffffffffa015a4ff>] macvlan_device_event+0x1ef/0x240 [macvlan]
[   81.326439]  [<ffffffff8108a13d>] notifier_call_chain+0x4d/0x70
[   81.346366]  [<ffffffff8108a201>] raw_notifier_call_chain+0x11/0x20
[   81.367439]  [<ffffffff816bf25b>] call_netdevice_notifiers_info+0x3b/0x70
[   81.390228]  [<ffffffff816bf2a1>] call_netdevice_notifiers+0x11/0x20
[   81.411587]  [<ffffffff816bf6bd>] rollback_registered_many+0x17d/0x2c0
[   81.433518]  [<ffffffff816bf925>] unregister_netdevice_queue+0x75/0x110
[   81.455735]  [<ffffffff816bfb2b>] unregister_netdev+0x1b/0x30
[   81.475094]  [<ffffffffa0039b50>] ixgbe_remove+0x170/0x1d0 [ixgbe]
[   81.495886]  [<ffffffff813512a2>] pci_device_remove+0x32/0x60
[   81.515246]  [<ffffffff814c75c4>] __device_release_driver+0x64/0xd0
[   81.536321]  [<ffffffff814c76f8>] driver_detach+0xc8/0xd0
[   81.554530]  [<ffffffff814c656e>] bus_remove_driver+0x4e/0xa0
[   81.573888]  [<ffffffff814c828b>] driver_unregister+0x2b/0x60
[   81.593246]  [<ffffffff8135143e>] pci_unregister_driver+0x1e/0xa0
[   81.613749]  [<ffffffffa005db18>] ixgbe_exit_module+0x1c/0x2e [ixgbe]
[   81.635401]  [<ffffffff810e738b>] SyS_delete_module+0x15b/0x1e0
[   81.655334]  [<ffffffff8187a395>] ? sysret_check+0x22/0x5d
[   81.673833]  [<ffffffff810abd2d>] ? trace_hardirqs_on_caller+0x11d/0x1e0
[   81.696339]  [<ffffffff8132bfde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   81.717985]  [<ffffffff8187a369>] system_call_fastpath+0x16/0x1b
[   81.738199] Code: 00 48 83 3d 6e bb da 00 00 48 89 c2 0f 84 67 01 00 00 fa 66 0f 1f 44 00 00 49 89 14 24 e8 b5 4b 02 00 45 84 ed 0f 85 ac 00 00 00 <f0> 0f ba 2b 00 72 1d 31 c0 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8
[   81.807026] RIP  [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
[   81.828468]  RSP <ffff88022e847b28>
[   81.840384] CR2: 0000000000000878
[   81.851731] ---[ end trace 9f6c7232e3464e11 ]---
====================

This bug could be triggered by these steps:

modprobe ixgbe ; modprobe macvlan
ip link add link p96p1 address 00:1B:21:6E:06:00 macvlan0 type macvlan
ip link add link p96p1 address 00:1B:21:6E:06:01 macvlan1 type macvlan
ip link add link p96p1 address 00:1B:21:6E:06:02 macvlan2 type macvlan
ip link add link p96p1 address 00:1B:21:6E:06:03 macvlan3 type macvlan
rmmod ixgbe

Reported-by: "Keller, Jacob E" <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

myri10ge: check for DMA mapping errors

[ Upstream commit 10545937e866ccdbb7ab583031dbdcc6b14e4eb4 ]

On IOMMU systems DMA mapping can fail, we need to check for
that possibility.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: Always untag vlan-tagged traffic on input.

[ Upstream commit 0d5501c1c828fb97d02af50aa9d2b1a5498b94e4 ]

Currently the functionality to untag traffic on input resides
as part of the vlan module and is build only when VLAN support
is enabled in the kernel.  When VLAN is disabled, the function
vlan_untag() turns into a stub and doesn't really untag the
packets.  This seems to create an interesting interaction
between VMs supporting checksum offloading and some network drivers.

There are some drivers that do not allow the user to change
tx-vlan-offload feature of the driver.  These drivers also seem
to assume that any VLAN-tagged traffic they transmit will
have the vlan information in the vlan_tci and not in the vlan
header already in the skb.  When transmitting skbs that already
have tagged data with partial checksum set, the checksum doesn't
appear to be updated correctly by the card thus resulting in a
failure to establish TCP connections.

The following is a packet trace taken on the receiver where a
sender is a VM with a VLAN configued.  The host VM is running on
doest not have VLAN support and the outging interface on the
host is tg3:
10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
(0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
offset 0, flags [DF], proto TCP (6), length 60)
    10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
-> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
4294837885 ecr 0,nop,wscale 7], length 0
10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
(0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
offset 0, flags [DF], proto TCP (6), length 60)
    10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
-> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
4294838888 ecr 0,nop,wscale 7], length 0

This connection finally times out.

I've only access to the TG3 hardware in this configuration thus have
only tested this with TG3 driver.  There are a lot of other drivers
that do not permit user changes to vlan acceleration features, and
I don't know if they all suffere from a similar issue.

The patch attempt to fix this another way.  It moves the vlan header
stipping code out of the vlan module and always builds it into the
kernel network core.  This way, even if vlan is not supported on
a virtualizatoin host, the virtual machines running on top of such
host will still work with VLANs enabled.

CC: Patrick McHardy <kaber@trash.net>
CC: Nithin Nayak Sujir <nsujir@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtnetlink: fix VF info size

[ Upstream commit 945a36761fd7877660f630bbdeb4ff9ff80d1935 ]

Commit 1d8faf48c74b8 ("net/core: Add VF link state control") added new
attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size
of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we
may trigger warnings in rtnl_getlink and similar functions when many VF
links are enabled, as the information does not fit into the allocated skb.

Fixes: 1d8faf48c74b8 ("net/core: Add VF link state control")
Reported-by: Yulong Pei <ypei@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netlink: reset network header before passing to taps

[ Upstream commit 4e48ed883c72e78c5a910f8831ffe90c9b18f0ec ]

netlink doesn't set any network header offset thus when the skb is
being passed to tap devices via dev_queue_xmit_nit(), it emits klog
false positives due to it being unset like:

  ...
  [  124.990397] protocol 0000 is buggy, dev nlmon0
  [  124.990411] protocol 0000 is buggy, dev nlmon0
  ...

So just reset the network header before passing to the device; for
packet sockets that just means nothing will change - mac and net
offset hold the same value just as before.

Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 3.16.5

jiffies: Fix timeval conversion to jiffies

commit d78c9300c51d6ceed9f6d078d4e9366f259de28c upstream.

timeval_to_jiffies tried to round a timeval up to an integral number
of jiffies, but the logic for doing so was incorrect: intervals
corresponding to exactly N jiffies would become N+1. This manifested
itself particularly repeatedly stopping/starting an itimer:

setitimer(ITIMER_PROF, &val, NULL);
setitimer(ITIMER_PROF, NULL, &val);

would add a full tick to val, _even if it was exactly representable in
terms of jiffies_ (say, the result of a previous rounding.)  Doing
this repeatedly would cause unbounded growth in val.  So fix the math.

Here's what was wrong with the conversion: we essentially computed
(eliding seconds)

jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)

by using scaling arithmetic, which took the best approximation of
NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
x/(2^USEC_JIFFIE_SC), and computed:

jiffies = (usec * x) >> USEC_JIFFIE_SC

and rounded this calculation up in the intermediate form (since we
can't necessarily exactly represent TICK_NSEC in usec.) But the
scaling arithmetic is a (very slight) *over*approximation of the true
value; that is, instead of dividing by (1 usec/ 1 jiffie), we
effectively divided by (1 usec/1 jiffie)-epsilon (rounding
down). This would normally be fine, but we want to round timeouts up,
and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
would be fine if our division was exact, but dividing this by the
slightly smaller factor was equivalent to adding just _over_ 1 to the
final result (instead of just _under_ 1, as desired.)

In particular, with HZ=1000, we consistently computed that 10000 usec
was 11 jiffies; the same was true for any exact multiple of
TICK_NSEC.

We could possibly still round in the intermediate form, adding
something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
convert usec->nsec, round in nanoseconds, and then convert using
time*spec*_to_jiffies.  This adds one constant multiplication, and is
not observably slower in microbenchmarks on recent x86 hardware.

Tested: the following program:

int main() {
  struct itimerval zero = {{0, 0}, {0, 0}};
  /* Initially set to 10 ms. */
  struct itimerval initial = zero;
  initial.it_interval.tv_usec = 10000;
  setitimer(ITIMER_PROF, &initial, NULL);
  /* Save and restore several times. */
  for (size_t i = 0; i < 10; ++i) {
    struct itimerval prev;
    setitimer(ITIMER_PROF, &zero, &prev);
    /* on old kernels, this goes up by TICK_USEC every iteration */
    printf("previous value: %ld %ld %ld %ld\n",
           prev.it_interval.tv_sec, prev.it_interval.tv_usec,
           prev.it_value.tv_sec, prev.it_value.tv_usec);
    setitimer(ITIMER_PROF, &prev, NULL);
  }
    return 0;
}

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Paul Turner <pjt@google.com>
Reported-by: Aaron Jacobs <jacobsa@google.com>
Signed-off-by: Andrew Hunter <ahh@google.com>
[jstultz: Tweaked to apply to 3.17-rc]
Signed-off-by: John Stultz <john.stultz@linaro.org>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: vb2: fix VBI/poll regression

commit 58d75f4b1ce26324b4d809b18f94819843a98731 upstream.

The recent conversion of saa7134 to vb2 unconvered a poll() bug that
broke the teletext applications alevt and mtt. These applications
expect that calling poll() without having called VIDIOC_STREAMON will
cause poll() to return POLLERR. That did not happen in vb2.

This patch fixes that behavior. It also fixes what should happen when
poll() is called when STREAMON is called but no buffers have been
queued. In that case poll() will also return POLLERR, but only for
capture queues since output queues will always return POLLOUT
anyway in that situation.

This brings the vb2 behavior in line with the old videobuf behavior.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: numa: Do not mark PTEs pte_numa when splitting huge pages

commit abc40bd2eeb77eb7c2effcaf63154aad929a1d5f upstream.

This patch reverts 1ba6e0b50b ("mm: numa: split_huge_page: transfer the
NUMA type from the pmd to the pte"). If a huge page is being split due
a protection change and the tail will be in a PROT_NONE vma then NUMA
hinting PTEs are temporarily created in the protected VMA.

VM_RW|VM_PROTNONE
|-----------------|
^
split here

In the specific case above, it should get fixed up by change_pte_range()
but there is a window of opportunity for weirdness to happen. Similarly,
if a huge page is shrunk and split during a protection update but before
pmd_numa is cleared then a pte_numa can be left behind.

Instead of adding complexity trying to deal with the case, this patch
will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults
will not be triggered which is marginal in comparison to the complexity
in dealing with the corner cases during THP split.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm, thp: move invariant bug check out of loop in __split_huge_page_map

commit f8303c2582b889351e261ff18c4d8eb197a77db2 upstream.

In __split_huge_page_map(), the check for page_mapcount(page) is
invariant within the for loop.  Because of the fact that the macro is
implemented using atomic_read(), the redundant check cannot be optimized
away by the compiler leading to unnecessary read to the page structure.

This patch moves the invariant bug check out of the loop so that it will
be done only once.  On a 3.16-rc1 based kernel, the execution time of a
microbenchmark that broke up 1000 transparent huge pages using munmap()
had an execution time of 38,245us and 38,548us with and without the
patch respectively.  The performance gain is about 1%.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vgaarb: Don't default exclusively to first video device with mem+io

commit 86fd887b7fe350819dae5b55e7fef05b511c8656 upstream.

Commit 20cde694027e ("x86, ia64: Move EFI_FB vga_default_device()
initialization to pci_vga_fixup()") moved boot video device detection from
efifb to x86 and ia64 pci/fixup.c.

For dual-GPU Apple computers above change represents a regression as code
in efifb did forcefully override vga_default_device while the merge did not
(vgaarb happens prior to PCI fixup).

To improve on initial device selection by vgaarb (it cannot know if PCI
device not behind bridges see/decode legacy VGA I/O or not), move the
screen_info based check from pci_video_fixup() to vgaarb's init function and
use it to refine/override decision taken while adding the individual PCI
VGA devices. This way PCI fixup has no reason to adjust vga_default_device
anymore but can depend on its value for flagging shadowed VBIOS.

This has the nice benefit of removing duplicated code but does introduce a
#if defined() block in vgaarb. Not all architectures have screen_info and
would cause compile to fail without it.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=84461
Reported-and-Tested-By: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86, ia64: Move EFI_FB vga_default_device() initialization to pci_vga_fixup()

commit 20cde694027e7477cc532833e38ab9fcaa83fb64 upstream.

Commit b4aa0163056b ("efifb: Implement vga_default_device() (v2)") added
efifb vga_default_device() so EFI systems that do not load shadow VBIOS or
setup VGA get proper value for boot_vga PCI sysfs attribute on the
corresponding PCI device.

Xorg doesn't detect devices when boot_vga=0, e.g., on some EFI systems such
as MacBookAir2,1. Xorg detects the GPU and finds the DRI device but then
bails out with "no devices detected".

Note: When vga_default_device() is set boot_vga PCI sysfs attribute
reflects its state. When unset this attribute is 1 whenever
IORESOURCE_ROM_SHADOW flag is set.

With introduction of sysfb/simplefb/simpledrm efifb is getting obsolete
while having native drivers for the GPU also makes selecting sysfb/efifb
optional.

Remove the efifb implementation of vga_default_device() and initialize
vgaarb's vga_default_device() with the PCI GPU that matches boot
screen_info in pci_fixup_video().

[bhelgaas: remove unused "dev" in efifb_setup()]
Fixes: b4aa0163056b ("efifb: Implement vga_default_device() (v2)")
Tested-by: Anibal Francisco Martinez Cortina <linuxkid.zeuz@gmail.com>
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

uas: Add missing le16_to_cpu calls to asm1051 / asm1053 usb-id check

commit a79e5bc53a9519202dfad7d916761601fcbf8db1 upstream.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

uas: Disable uas on ASM1051 devices

commit a9c54caa456dccba938005f6479892b589975e6a upstream.

There are a large numbers of issues with ASM1051 devices in uas mode:

1) They do not support REPORT SUPPORTED OPERATION CODES

2) They use out of spec 8 byte status iu-s when they have no sense data,
switching to normal 16 byte status iu-s when they do have sense data.

3) They hang / crash when combined with some disks, e.g. a Crucial M500 ssd.

4) They hang / crash when stressed (through e.g. sg_reset --bus) with disks
with which then normally do work (once 1 & 2 are worked around).

Where as in BOT mode they appear to work fine, so the best way forward with
these devices is to just blacklist them for uas usage.

Unfortunately this is easier said then done. as older versions of the ASM1053
(which works fine) use the same usb-id as the ASM1051.

When connected over USB-3 the 2 can be told apart by the number of streams
they support. So this patch adds some less then pretty code to disable uas for
the ASM1051. When connected over USB-2, simply disable uas alltogether for
devices with the shared usb-id.

Cc: stable@vger.kernel.org # 3.16
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

uas: Log a warning when we cannot use uas because the hcd lacks streams

commit 43508be512661c905d0320ee73e0b65ef36d2459 upstream.

So that an user who wants to use uas can see why he is not getting uas.

Also move the check down so that we don't warn if there are other reasons
why uas cannot work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

uas: Only complain about missing sg if all other checks succeed

commit cc4deafc86f75f4e716b37fb4ea3572eb1e49e50 upstream.

Don't complain about controllers without sg support if there are other
reasons why uas cannot be used anyways.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ring-buffer: Fix infinite spin in reading buffer

commit 24607f114fd14f2f37e3e0cb3d47bce96e81e848 upstream.

Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page"
fixed one bug but in the process caused another one. The reset is to
update the header page, but that fix also changed the way the cached
reads were updated. The cache reads are used to test if an iterator
needs to be updated or not.

A ring buffer iterator, when created, disables writes to the ring buffer
but does not stop other readers or consuming reads from happening.
Although all readers are synchronized via a lock, they are only
synchronized when in the ring buffer functions. Those functions may
be called by any number of readers. The iterator continues down when
its not interrupted by a consuming reader. If a consuming read
occurs, the iterator starts from the beginning of the buffer.

The way the iterator sees that a consuming read has happened since
its last read is by checking the reader "cache". The cache holds the
last counts of the read and the reader page itself.

Commit 651e22f2701b changed what was saved by the cache_read when
the rb_iter_reset() occurred, making the iterator never match the cache.
Then if the iterator calls rb_iter_reset(), it will go into an
infinite loop by checking if the cache doesn't match, doing the reset
and retrying, just to see that the cache still doesn't match! Which
should never happen as the reset is suppose to set the cache to the
current value and there's locks that keep a consuming reader from
having access to the data.

Fixes: 651e22f2701b "ring-buffer: Always reset iterator to reader page"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu

commit 62b4d2041117f35ab2409c9f5c4b8d3dc8e59d0f upstream.

commit 03b8c7b623c80af264c4c8d6111e5c6289933666 ("futex: Allow
architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
HAVE_FUTEX_CMPXCHG symbol right below FUTEX.  This placed it right in
the middle of the options for the EXPERT menu.  However,
HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
placing items in the EXPERT menu, and displays the remaining several
EXPERT items (starting with EPOLL) directly in the General Setup menu.

Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
HAVE_FUTEX_CMPXCHG itself depend on FUTEX.  With this change, the
subsequent items display as part of the EXPERT menu again; the EMBEDDED
menu now appears as the next top-level item in the General Setup menu,
which makes General Setup much shorter and more usable.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Fix problem recognizing symlinks

commit 19e81573fca7b87ced7701e01ba164b968d929bd upstream.

Changeset eb85d94bd introduced a problem where if a cifs open
fails during query info of a file we
will still try to close the file (happens with certain types
of reparse points) even though the file handle is not valid.

In addition for SMB2/SMB3 we were not mapping the return code returned
by Windows when trying to open a file (like a Windows NFS symlink)
which is a reparse point.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915: Flush the PTEs after updating them before suspend

commit 91e56499304f3d612053a9cf17f350868182c7d8 upstream.

As we use WC updates of the PTE, we are responsible for notifying the
hardware when to flush its TLBs. Do so after we zap all the PTEs before
suspend (and the BIOS tries to read our GTT).

Fixes a regression from

commit 828c79087cec61eaf4c76bb32c222fbe35ac3930
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Wed Oct 16 09:21:30 2013 -0700

    drm/i915: Disable GGTT PTEs on GEN6+ suspend

that survived and continue to cause harm even after

commit e568af1c626031925465a5caaab7cca1303d55c7
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Mar 26 20:08:20 2014 +0100

    drm/i915: Undo gtt scratch pte unmapping again

v2: Trivial rebase.
v3: Fixes requires pointer dances.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82340
Tested-by: ming.yao@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Todd Previte <tprevite@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid5: disable 'DISCARD' by default due to safety concerns.

commit 8e0e99ba64c7ba46133a7c8a3e3f7de01f23bd93 upstream.

It has come to my attention (thanks Martin) that 'discard_zeroes_data'
is only a hint.  Some devices in some cases don't do what it
says on the label.

The use of DISCARD in RAID5 depends on reads from discarded regions
being predictably zero.  If a write to a previously discarded region
performs a read-modify-write cycle it assumes that the parity block
was consistent with the data blocks.  If all were zero, this would
be the case.  If some are and some aren't this would not be the case.
This could lead to data corruption after a device failure when
data needs to be reconstructed from the parity.

As we cannot trust 'discard_zeroes_data', ignore it by default
and so disallow DISCARD on all raid4/5/6 arrays.

As many devices are trustworthy, and as there are benefits to using
DISCARD, add a module parameter to over-ride this caution and cause
DISCARD to work if discard_zeroes_data is set.

If a site want to enable DISCARD on some arrays but not on others they
should select DISCARD support at the filesystem level, and set the
raid456 module parameter.
    raid456.devices_handle_discard_safely=Y

As this is a data-safety issue, I believe this patch is suitable for
-stable.
DISCARD support for RAID456 was added in 3.7

Cc: Shaohua Li <shli@kernel.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Heinz Mauelshagen <heinzm@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Fixes: 620125f2bf8ff0c4969b79653b54d7bcc9d40637
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpufreq: pcc-cpufreq: Fix wait_event() under spinlock

commit e65b5ddba84634f31d42dfd86013f4c6be5e9e32 upstream.

Fix the following bug introduced by commit 8fec051eea73 (cpufreq:
Convert existing drivers to use cpufreq_freq_transition_{begin|end})
that forgot to move the spin_lock() in pcc_cpufreq_target() past
cpufreq_freq_transition_begin() which calls wait_event():

BUG: sleeping function called from invalid context at drivers/cpufreq/cpufreq.c:370
in_atomic(): 1, irqs_disabled(): 0, pid: 2636, name: modprobe
Preemption disabled at:[<ffffffffa04d74d7>] pcc_cpufreq_target+0x27/0x200 [pcc_cpufreq]
[ 51.025044]
CPU: 57 PID: 2636 Comm: modprobe Tainted: G E 3.17.0-default #7
Hardware name: Hewlett-Packard ProLiant DL980 G7, BIOS P66 07/07/2010
00000000ffffffff ffff88026c46b828 ffffffff81589dbd 0000000000000000
ffff880037978090 ffff88026c46b848 ffffffff8108e1df ffff880037978090
0000000000000000 ffff88026c46b878 ffffffff8108e298 ffff88026d73ec00
Call Trace:
[<ffffffff81589dbd>] dump_stack+0x4d/0x90
[<ffffffff8108e1df>] ___might_sleep+0x10f/0x180
[<ffffffff8108e298>] __might_sleep+0x48/0xd0
[<ffffffff8145b905>] cpufreq_freq_transition_begin+0x75/0x140 drivers/cpufreq/cpufreq.c:370 wait_event(policy->transition_wait, !policy->transition_ongoing);
[<ffffffff8108fc99>] ? preempt_count_add+0xb9/0xc0
[<ffffffffa04d7513>] pcc_cpufreq_target+0x63/0x200 [pcc_cpufreq] drivers/cpufreq/pcc-cpufreq.c:207 spin_lock(&pcc_lock);
[<ffffffff810e0d0f>] ? update_ts_time_stats+0x7f/0xb0
[<ffffffff8145be55>] __cpufreq_driver_target+0x85/0x170
[<ffffffff8145e4c8>] od_check_cpu+0xa8/0xb0
[<ffffffff8145ef10>] dbs_check_cpu+0x180/0x1d0
[<ffffffff8145f310>] cpufreq_governor_dbs+0x3b0/0x720
[<ffffffff8145ebe3>] od_cpufreq_governor_dbs+0x33/0xe0
[<ffffffff814593d9>] __cpufreq_governor+0xa9/0x210
[<ffffffff81459fb2>] cpufreq_set_policy+0x1e2/0x2e0
[<ffffffff8145a6cc>] cpufreq_init_policy+0x8c/0x110
[<ffffffff8145c9a0>] ? cpufreq_update_policy+0x1b0/0x1b0
[<ffffffff8108fb99>] ? preempt_count_sub+0xb9/0x100
[<ffffffff8145c6c6>] __cpufreq_add_dev+0x596/0x6b0
[<ffffffffa016c608>] ? pcc_cpufreq_probe+0x4b4/0x4b4 [pcc_cpufreq]
[<ffffffff8145c7ee>] cpufreq_add_dev+0xe/0x10
[<ffffffff81408e81>] subsys_interface_register+0xc1/0xf0
[<ffffffff8108fb99>] ? preempt_count_sub+0xb9/0x100
[<ffffffff8145b3d7>] cpufreq_register_driver+0x117/0x2a0
[<ffffffffa016c65d>] pcc_cpufreq_init+0x55/0x9f8 [pcc_cpufreq]
[<ffffffffa016c608>] ? pcc_cpufreq_probe+0x4b4/0x4b4 [pcc_cpufreq]
[<ffffffff81000298>] do_one_initcall+0xc8/0x1f0
[<ffffffff811a731d>] ? __vunmap+0x9d/0x100
[<ffffffff810eb9a0>] do_init_module+0x30/0x1b0
[<ffffffff810edfa6>] load_module+0x686/0x710
[<ffffffff810ebb20>] ? do_init_module+0x1b0/0x1b0
[<ffffffff810ee1db>] SyS_init_module+0x9b/0xc0
[<ffffffff8158f7a9>] system_call_fastpath+0x16/0x1b

Fixes: 8fec051eea73 (cpufreq: Convert existing drivers to use cpufreq_freq_transition_{begin|end})
Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpufreq: integrator: fix integrator_cpufreq_remove return type

commit d62dbf77f7dfaa6fb455b4b9828069a11965929c upstream.

When building this driver as a module, we get a helpful warning
about the return type:

drivers/cpufreq/integrator-cpufreq.c:232:2: warning: initialization from incompatible pointer type
.remove = __exit_p(integrator_cpufreq_remove),

If the remove callback returns void, the caller gets an undefined
value as it expects an integer to be returned. This fixes the
problem by passing down the value from cpufreq_unregister_driver.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPI / i915: Update the condition to ignore firmware backlight change request

commit 77076c7aac0184cae2d8a358cf6e6ed1f195fe3f upstream.

Some of the Thinkpads' firmware will issue a backlight change request
through i915 operation region unconditionally on AC plug/unplug, the
backlight level used is arbitrary and thus should be ignored. This is
handled by commit 0b9f7d93ca61 (ACPI / i915: ignore firmware requests
for backlight change). Then there is a Dell laptop whose vendor backlight
interface also makes use of operation region to change backlight level
and with the above commit, that interface no long works. The condition
used to ignore the backlight change request from firmware is thus
changed to: if the vendor backlight interface is not in use and the ACPI
backlight interface is broken, we ignore the requests; oterwise, we keep
processing them.

Fixes: 0b9f7d93ca61 (ACPI / i915: ignore firmware requests for backlight change)
Link: https://lkml.org/lkml/2014/9/23/854
Reported-and-tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: rk3x: fix 0 length write transfers

commit cf27020d2f253bac6457d6833b97141030f0122a upstream.

i2cdetect -q was broken (everything was a false positive, and no transfers were
actually being sent over i2c). The way it works is by sending a 0 length write
request and checking for NACK. This patch fixes the 0 length writes and actually
sends them.

Reported-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Alexandru M Stan <amstan@chromium.org>
Tested-by: Doug Anderson <dianders@chromium.org>
Tested-by: Max Schwarz <max.schwarz@online.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: qup: Fix order of runtime pm initialization

commit 86b59bbfae2a895aa26b3d15f31b1a705dbfede1 upstream.

The runtime pm calls need to be done before populating the children via the
i2c_add_adapter call. If this is not done, a child can run into issues trying
to do i2c read/writes due to the pm_runtime_sync failing.

Signed-off-by: Andy Gross <agross@codeaurora.org>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: migrate: Close race between migration completion and mprotect

commit d3cb8bf6081b8b7a2dabb1264fe968fd870fa595 upstream.

A migration entry is marked as write if pte_write was true at the time the
entry was created. The VMA protections are not double checked when migration
entries are being removed as mprotect marks write-migration-entries as
read. It means that potentially we take a spurious fault to mark PTEs write
again but it's straight-forward. However, there is a race between write
migrations being marked read and migrations finishing. This potentially
allows a PTE to be write that should have been read. Close this race by
double checking the VMA permissions using maybe_mkwrite when migration
completes.

[torvalds@linux-foundation.org: use maybe_mkwrite]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: memcontrol: do not iterate uninitialized memcgs

commit 2f7dd7a4100ad4affcb141605bef178ab98ccb18 upstream.

The cgroup iterators yield css objects that have not yet gone through
css_online(), but they are not complete memcgs at this point and so the
memcg iterators should not return them. Commit d8ad30559715 ("mm/memcg:
iteration skip memcgs not yet fully initialized") set out to implement
exactly this, but it uses CSS_ONLINE, a cgroup-internal flag that does
not meet the ordering requirements for memcg, and so the iterator may
skip over initialized groups, or return partially initialized memcgs.

The cgroup core can not reasonably provide a clear answer on whether the
object around the css has been fully initialized, as that depends on
controller-specific locking and lifetime rules. Thus, introduce a
memcg-specific flag that is set after the memcg has been initialized in
css_online(), and read before mem_cgroup_iter() callers access the memcg
members.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf: fix perf bug in fork()

commit 6c72e3501d0d62fc064d3680e5234f3463ec5a86 upstream.

Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
calling perf_event_free_task() when failing sched_fork() we will not yet
have done the memset() on ->perf_event_ctxp[] and will therefore try and
'free' the inherited contexts, which are still in use by the parent
process. This is bad..

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: core: fix possible ZERO_SIZE_PTR pointer dereferencing error.

commit 6596aa047b624aeec2ea321962cfdecf9953a383 upstream.

Since we cannot make sure the 'params->num_regs' will always be none
zero here, and then if it equals to zero, the kmemdup() will return
ZERO_SIZE_PTR, which equals to ((void *)16).

So this patch fix this with just doing the zero check before calling
kmemdup().

Signed-off-by: Xiubo Li <Li.Xiubo@freescale.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: ssm2602: do not hardcode type to SSM2602

commit fe2a08b3bf1a6e35c00e18843bc19aa1778432c3 upstream.

The correct type (SSM2602/SSM2603/SSM2604) is passed down
from the ssm2602_spi_probe()/ssm2602_spi_probe() functions,
so use that instead of hardcoding it to SSM2602 in
ssm2602_probe().

Fixes: c924dc68f737 ("ASoC: ssm2602: Split SPI and I2C code into different modules")
Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

udf: Avoid infinite loop when processing indirect ICBs

commit c03aa9f6e1f938618e6db2e23afef0574efeeb65 upstream.

We did not implement any bound on number of indirect ICBs we follow when
loading inode. Thus corrupted medium could cause kernel to go into an
infinite loop, possibly causing a stack overflow.

Fix the possible stack overflow by removing recursion from
__udf_read_inode() and limit number of indirect ICBs we follow to avoid
infinite loops.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 3.16.4

ARM: DRA7: Add support for soc_is_dra74x() and soc_is_dra72x() variants

commit af438fec6cb99fc2e2faf8b16b865af26ce722e6 upstream.

Use the corresponding compatibles to identify the devices.

Signed-off-by: Rajendra Nayak <rnayak@ti.com>
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Acked-by: Nishanth Menon <nm@ti.com>
Tested-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: qcom: Fix PLL rate configurations

commit 5b6b7490af110c2b0df807eddd00ae6290bcf50a upstream.

Sometimes we need to program PLLs with a fixed rate
configuration during driver probe. Doing this after we register
the PLLs with the clock framework causes the common clock
framework to assume the rate of the PLLs are 0. This causes all
sorts of problems for rate recalculations because the common
clock framework caches the rate once at registration time unless
a flag is set to always recalculate the rates.

Split the qcom_cc_probe() function into two pieces, map and
everything else, so that drivers which need to configure some
PLL rates or otherwise twiddle bits in the clock controller can
do so before registering clocks. This allows us to properly
detect the rates of PLLs that are programmed at boot.

Fixes: 49fc825f0cc2 "clk: qcom: Consolidate common probe code"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: qcom: mdp_lut_clk is a child of mdp_src

commit f87dfcabc6f173cc811d185d33327f50a8c88399 upstream.

The mdp_lut_clk isn't a child of the mdp_clk. Instead it's the
child of the mdp_src clock. Fix it.

Fixes: 6d00b56fe "clk: qcom: Add support for MSM8960's multimedia clock controller (MMCC)"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: qcom: Fix MN frequency tables, parent map, and jpegd

commit ff20783f7b9f35b29e768d8ecc7076c1ca1a60ca upstream.

Clocks that don't have a pre-divider don't list any pre-divider
in their frequency tables, but their tables are initialized using
aggregate initializers. Use tagged initializers so we properly
assign the m and n values for each frequency. Furthermore, the
mmcc_pxo_pll8_pll2_pll3 array improperly mapped the second
element to pll2 instead of pll8, causing the clock driver to
recalculate the wrong rate for any clocks using this array along
with a rate that uses pll2. Plus the .num_parents field is 3
instead of 4 so you can't even switch the parent to pll3. Finally
I noticed that the jpegd clock improperly indicates that the
pre-divider width is only 2, when it's actually 4 bits wide.

Fixes: 6d00b56fe "clk: qcom: Add support for MSM8960's multimedia clock controller (MMCC)"
Tested-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: disable virtual block device for 64K pages

commit 0bf22be0da8ea74bc7ccc5b07d7855830be16eca upstream.

The lustre virtual block device cannot handle 64K pages and fails at compile
time. To avoid running into this error, let's disable the Kconfig option
for this driver in cases it doesn't support.

Reported-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: avoid trying to kfree an ERR_PTR pointer

commit a9cfcd63e8d206ce4235c355d857c4fbdf0f4587 upstream.

Thanks to Dan Carpenter for extending smatch to find bugs like this.
(This was found using a development version of smatch.)

Fixes: 36de928641ee48b2078d3fe9514242aaa2f92013
Reported-by: Dan Carpenter <dan.carpenter@oracle.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: propagate errors up to ext4_find_entry()'s callers

commit 36de928641ee48b2078d3fe9514242aaa2f92013 upstream.

If we run into some kind of error, such as ENOMEM, while calling
ext4_getblk() or ext4_dx_find_entry(), we need to make sure this error
gets propagated up to ext4_find_entry() and then to its callers. This
way, transient errors such as ENOMEM can get propagated to the VFS.
This is important so that the system calls return the appropriate
error, and also so that in the case of ext4_lookup(), we return an
error instead of a NULL inode, since that will result in a negative
dentry cache entry that will stick around long past the OOM condition
which caused a transient ENOMEM error.

Google-Bug-Id: #17142205

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

aio: block exit_aio() until all context requests are completed

commit 6098b45b32e6baeacc04790773ced9340601d511 upstream.

It seems that exit_aio() also needs to wait for all iocbs to complete (like
io_destroy), but we missed the wait step in current implemention, so fix
it in the same way as we did in io_destroy.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ahci_xgene: Removing NCQ support from the APM X-Gene SoC AHCI SATA Host Controller driver.

commit 72f79f9e35bd3f78ee8853f2fcacaa197d23ebac upstream.

This patch removes the NCQ support from the APM X-Gene SoC AHCI
Host Controller driver as it doesn't support it.

Signed-off-by: Loc Ho <lho@apm.com>
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[bwh: Backported to 3.16: host flags are passed to ahci_platform_init_host()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: ti: divider: Provide error check for incoming parameters in set_rate

commit 2f1032517623b70920d99529e5c87c8c680ab8bf upstream.

Check for valid parameters in check rate. Else, we end up getting errors
like:
[    0.000000] Division by zero in kernel.
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc1 #1
[    0.000000] [<c0015160>] (unwind_backtrace) from [<c0011978>] (show_stack+0x10/0x14)
[    0.000000] [<c0011978>] (show_stack) from [<c055f5f4>] (dump_stack+0x78/0x94)
[    0.000000] [<c055f5f4>] (dump_stack) from [<c02e17cc>] (Ldiv0+0x8/0x10)
[    0.000000] [<c02e17cc>] (Ldiv0) from [<c047d228>] (ti_clk_divider_set_rate+0x14/0x14c)
[    0.000000] [<c047d228>] (ti_clk_divider_set_rate) from [<c047a938>] (clk_change_rate+0x138/0x180)
[    0.000000] [<c047a938>] (clk_change_rate) from [<c047a908>] (clk_change_rate+0x108/0x180)

This occurs as part of the inital clock tree update of child clock nodes
where new_rate could be 0 for non functional clocks.

Fixes: b4761198bfaf296 ("CLK: ti: add support for ti divider-clock")
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: prevent erronous parsing of children during rate change

commit 067bb1741c27c8d3b74ac98c0b8fc12b31e67005 upstream.

In some cases, clocks can switch their parent with clk_set_rate, for
example clk_mux can do this in some cases. Current implementation of
clk_change_rate uses un-safe list iteration on the clock children, which
will cause wrong clocks to be parsed in case any of the clock children
change their parents during the change rate operation. Fixed by using
the safe list iterator instead.

The problem was detected due to some divide by zero errors generated
by clock init on dra7-evm board, see discussion under
http://article.gmane.org/gmane.linux.ports.arm.kernel/349180 for details.

Fixes: 71472c0c06cf ("clk: add support for clock reparent on set_rate")
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reported-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: ti: dra7-atl: Provide error check for incoming parameters in set_rate

commit 20411dad75ece9a613af715df4489e60990c4017 upstream.

Check for valid parameters in check rate. Else, we end up getting
errors.

This occurs as part of the inital clock tree update of child clock
nodes where new_rate could be 0 for non functional clocks.

Fixes: 9ac33b0ce81fa48 (" CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)")
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpufreq: update 'cpufreq_suspended' after stopping governors

commit b1b12babe3b72cfb08b875245e5a5d7c2747c772 upstream.

Commit 8e30444e1530 ("cpufreq: fix cpufreq suspend/resume for intel_pstate")
introduced a bug where the governors wouldn't be stopped anymore for
->target{_index}() drivers during suspend. This happens because
'cpufreq_suspended' is updated before stopping the governors during suspend
and due to this __cpufreq_governor() would return early due to this check:

/* Don't start any governor operations if we are entering suspend */
if (cpufreq_suspended)
return 0;

Fixes: 8e30444e1530 ("cpufreq: fix cpufreq suspend/resume for intel_pstate")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

partitions: aix.c: off by one bug

commit d97a86c170b4e432f76db072a827fe30b4d6f659 upstream.

The lvip[] array has "state->limit" elements so the condition here
should be >= instead of >.

Fixes: 6ceea22bbbc8 ('partitions: add aix lvm partition support files')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dmaengine: dw: don't perform DMA when dmaengine_submit is called

commit dd8ecfcac66b4485416b2d1df0ec4798b198d7d6 upstream.

Accordingly to discussion [1] and followed up documentation the DMA controller
driver shouldn't start any DMA operations when dmaengine_submit() is called.

This patch fixes the workflow in dw_dmac driver to follow the documentation.

[1] http://www.spinics.net/lists/arm-kernel/msg125987.html

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Cc: "Petallo, MauriceX R" <mauricex.r.petallo@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dmaengine: dw: introduce dwc_dostart_first_queued() helper

commit e7637c6c0382485f4d2e20715d058dae6f2b6a7c upstream.

We have a duplicate code which starts first descriptor in the queue. Let's make
this as a separate helper that can be used in future as well.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Cc: "Petallo, MauriceX R" <mauricex.r.petallo@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: mmci: Reverse IRQ handling for the arm_variant

commit 7878289b269d41c8e611aa6d4519feae706e49f3 upstream.

Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
when using the ARM model of the PL181 and running QEMU.

The bug was reported for the following QEMU version:
$ qemu-system-arm -version
QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
(c) 2003-2008 Fabrice Bellard

To resolve the problem, let's restore the old behavior were the DATA
irq is handled prior the CMD irq, but only for the arm_variant, which
the problem was reported for.

Reported-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[kees: backported to 3.16]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nf_tables: don't update chain with unset counters

commit b88825de8545ad252c31543fef13cadf4de7a2bc upstream.

Fix possible replacement of the per-cpu chain counters by null
pointer when updating an existing chain in the commit path.

Reported-by: Matteo Croce <technoboy85@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipvs: fix ipv6 hook registration for local replies

commit eb90b0c734ad793d5f5bf230a9e9a4dcc48df8aa upstream.

commit fc604767613b6d2036cdc35b660bc39451040a47
("ipvs: changes for local real server") from 2.6.37
introduced DNAT support to local real server but the
IPv6 LOCAL_OUT handler ip_vs_local_reply6() is
registered incorrectly as IPv4 hook causing any outgoing
IPv4 traffic to be dropped depending on the IP header values.

Chris tracked down the problem to CONFIG_IP_VS_IPV6=y
Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768

Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Tested-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: x_tables: allow to use default cgroup match

commit caa8ad94edf686d02b555c65a6162c0d1b434958 upstream.

There's actually no good reason why we cannot use cgroup id 0,
so lets just remove this artificial barrier.

Reported-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding

commit 76f084bc10004b3050b2cff9cfac29148f1f6088 upstream.

Previously, only the four high bits of the tclass were maintained in the
ipv6 case. This matches the behavior of ipv4, though whether or not we
should reflect ECN bits may be up for debate.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: xt_hashlimit: perform garbage collection from process context

commit 7bd8490eef9776ced7632345df5133384b6be0fe upstream.

xt_hashlimit cannot be used with large hash tables, because garbage
collector is run from a timer. If table is really big, its possible
to hold cpu for more than 500 msec, which is unacceptable.

Switch to a work queue, and use proper scheduling points to remove
latencies spikes.

Later, we also could switch to a smoother garbage collection done
at lookup time, one bucket at a time...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid1: intialise start_next_window for READ case to avoid hang

commit f0cc9a057151892b885be21a1d19b0185568281d upstream.

r1_bio->start_next_window is not initialised in the READ
case, so allow_barrier may incorrectly decrement
conf->current_window_requests
which can cause raise_barrier() to block forever.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Reported-by: Brassow Jonathan <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid1: fix_read_error should act on all non-faulty devices.

commit b8cb6b4c121e1bf1963c16ed69e7adcb1bc301cd upstream.

If a devices is being recovered it is not InSync and is not Faulty.

If a read error is experienced on that device, fix_read_error()
will be called, but it ignores non-InSync devices.  So it will
neither fix the error nor fail the device.

It is incorrect that fix_read_error() ignores non-InSync devices.
It should only ignore Faulty devices.  So fix it.

This became a bug when we allowed reading from a device that was being
recovered.  It is suitable for any subsequent -stable kernel.

Fixes: da8840a747c0dbf49506ec906757a6b87b9741e9
Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid1: count resync requests in nr_pending.

commit 34e97f170149bfa14979581c4c748bc9b4b79d5b upstream.

Both normal IO and resync IO can be retried with reschedule_retry()
and so be counted into ->nr_queued, but only normal IO gets counted in
->nr_pending.

Before the recent improvement to RAID1 resync there could only
possibly have been one or the other on the queue. When handling a
read failure it could only be normal IO. So when handle_read_error()
called freeze_array() the fact that freeze_array only compares
->nr_queued against ->nr_pending was safe.

But now that these two types can interleave, we can have both normal
and resync IO requests queued, so we need to count them both in
nr_pending.

This error can lead to freeze_array() hanging if there is a read
error, so it is suitable for -stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Reported-by: Brassow Jonathan <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid1: update next_resync under resync_lock.

commit c2fd4c94deedb89ac1746c4a53219be499372c06 upstream.

raise_barrier() uses next_resync as part of its calculations, so it
really should be updated first, instead of afterwards.

next_resync is always used under resync_lock so update it under
resync lock to, just before it is used. That is safest.

This could cause normal IO and resync IO to interact badly so
it suitable for -stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid1: Don't use next_resync to determine how far resync has progressed

commit 235549605eb7f1c5a37cef8b09d12e6d412c5cd6 upstream.

next_resync is (approximately) the location for the next resync request.
However it does *not* reliably determine the earliest location
at which resync might be happening.
This is because resync requests can complete out of order, and
we only limit the number of current requests, not the distance
from the earliest pending request to the latest.

mddev->curr_resync_completed is a reliable indicator of the earliest
position at which resync could be happening. It is updated less
frequently, but is actually reliable which is more important.

So use it to determine if a write request is before the region
being resynced and so safe from conflict.

This error can allow resync IO to interfere with normal IO which
could lead to data corruption. Hence: stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid1: make sure resync waits for conflicting writes to complete.

commit 2f73d3c55d09ce60647b96ad2a9b539c95a530ee upstream.

The resync/recovery process for raid1 was recently changed
so that writes could happen in parallel with resync providing
they were in different regions of the device.

There is a problem though:  While a write request will always
wait for conflicting resync to complete, a resync request
will *not* always wait for conflicting writes to complete.

Two changes are needed to fix this:

1/ raise_barrier (which waits until it is safe to do resync)
   must wait until current_window_requests is zero
2/ wait_battier (which waits at the start of a new write request)
   must update current_window_requests if the request could
   possible conflict with a concurrent resync.

As concurrent writes and resync can lead to data loss,
this patch is suitable for -stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Cc: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid1: be more cautious where we read-balance during resync.

commit c6d119cf1b5a778e9ed60a006e2a434fcc4471a2 upstream.

commit 79ef3a8aa1cb1523cc231c9a90a278333c21f761 made
it possible for reads to happen concurrently with resync.
This means that we need to be more careful where read_balancing
is allowed during resync - we can no longer be sure that any
resync that has already started will definitely finish.

So keep read_balancing to before recovery_cp, which is conservative
but safe.

This bug makes it possible to read from a device that doesn't
have up-to-date data, so it can cause data corruption.
So it is suitable for any kernel since 3.11.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid1: clean up request counts properly in close_sync()

commit 669cc7ba77864e7b1ac39c9f2b2afb8730f341f4 upstream.

If there are outstanding writes when close_sync is called,
the change to ->start_next_window might cause them to
decrement the wrong counter when they complete. Fix this
by merging the two counters into the one that will be decremented.

Having an incorrect value in a counter can cause raise_barrier()
to hangs, so this is suitable for -stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: vb2: fix plane index sanity check in vb2_plane_cookie()

commit a9ae4692eda4b99f85757b15d60971ff78a0a0e2 upstream.

It's also invalid when plane_no is equal to vb->num_planes

Signed-off-by: Zhaowei Yuan <zhaowei.yuan@samsung.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: vb2: fix vb2 state check when start_streaming fails

commit bf3593d939520559774cbfee03ba5f314d909620 upstream.

Commit bd994ddb2a12a3ff48cd549ec82cdceaea9614df (vb2: Fix stream start and
buffer completion race) broke the buffer state check in vb2_buffer_done.

So accept all three possible states there since I can no longer tell the
difference between vb2_buffer_done called from start_streaming or from
elsewhere.

Instead add a WARN_ON at the end of start_streaming that will check whether
any buffers were added to the done list, since that implies that the wrong
state was used as well.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: videobuf2-dma-sg: fix for wrong GFP mask to sg_alloc_table_from_pages

commit 47bc59c52b005f546343c373370a7eec6a2b0f84 upstream.

sg_alloc_table_from_pages() only allocates a sg_table, so it should just use
GFP_KERNEL, not gfp_flags. If gfp_flags contains __GFP_DMA32 then mm/sl[au]b.c
will call BUG_ON:

[  358.027515] ------------[ cut here ]------------
[  358.027546] kernel BUG at mm/slub.c:1416!
[  358.027558] invalid opcode: 0000 [#1] PREEMPT SMP
[  358.027576] Modules linked in: mt2131 s5h1409 tda8290 tuner cx25840 cx23885 btcx_risc altera_ci tda18271 altera_stapl videobuf2_dvb tveeprom cx2341x videobuf2_dma_sg dvb_core rc_core videobuf2_memops videobuf2_core nouveau zr36067 videocodec v4l2_common videodev media x86_pkg_temp_thermal cfbfillrect cfbimgblt cfbcopyarea ttm drm_kms_helper processor button isci
[  358.027712] CPU: 19 PID: 3654 Comm: cat Not tainted 3.16.0-rc6-telek #167
[  358.027723] Hardware name: ASUSTeK COMPUTER INC. Z9PE-D8 WS/Z9PE-D8 WS, BIOS 5404 02/10/2014
[  358.027741] task: ffff880897c7d960 ti: ffff88089b4d4000 task.ti: ffff88089b4d4000
[  358.027753] RIP: 0010:[<ffffffff81196040>]  [<ffffffff81196040>] new_slab+0x280/0x320
[  358.027776] RSP: 0018:ffff88089b4d7ae8  EFLAGS: 00010002
[  358.027787] RAX: ffff880897c7d960 RBX: 0000000000000000 RCX: ffff88089b4d7b50
[  358.027798] RDX: 00000000ffffffff RSI: 0000000000000004 RDI: ffff88089f803b00
[  358.027809] RBP: ffff88089b4d7bb8 R08: 0000000000000000 R09: 0000000100400040
[  358.027821] R10: 0000160000000000 R11: ffff88109bc02c40 R12: 0000000000000001
[  358.027832] R13: ffff88089f8000c0 R14: ffff88089f803b00 R15: ffff8810bfcf4be0
[  358.027845] FS:  00007f83fe5c0700(0000) GS:ffff8810bfce0000(0000) knlGS:0000000000000000
[  358.027858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  358.027868] CR2: 0000000001dfd568 CR3: 0000001097d5a000 CR4: 00000000000407e0
[  358.027878] Stack:
[  358.027885]  ffffffff81198860 ffff8810bfcf4be0 ffff880897c7d960 0000000000001b00
[  358.027905]  ffff880897c7d960 0000000000000000 ffff8810bfcf4bf0 0000000000000000
[  358.027924]  0000000000000000 0000000100000100 ffffffff813ef84a 00000004ffffffff
[  358.027944] Call Trace:
[  358.027956]  [<ffffffff81198860>] ? __slab_alloc+0x400/0x4e0
[  358.027973]  [<ffffffff813ef84a>] ? sg_kmalloc+0x1a/0x30
[  358.027985]  [<ffffffff81198f17>] __kmalloc+0x127/0x150
[  358.027997]  [<ffffffff813ef84a>] ? sg_kmalloc+0x1a/0x30
[  358.028009]  [<ffffffff813ef84a>] sg_kmalloc+0x1a/0x30
[  358.028023]  [<ffffffff813eff84>] __sg_alloc_table+0x74/0x180
[  358.028035]  [<ffffffff813ef830>] ? sg_kfree+0x20/0x20
[  358.028048]  [<ffffffff813f00af>] sg_alloc_table+0x1f/0x60
[  358.028061]  [<ffffffff813f0174>] sg_alloc_table_from_pages+0x84/0x1f0
[  358.028077]  [<ffffffffa007c3f9>] vb2_dma_sg_alloc+0x159/0x230 [videobuf2_dma_sg]
[  358.028095]  [<ffffffffa003d55a>] __vb2_queue_alloc+0x10a/0x680 [videobuf2_core]
[  358.028113]  [<ffffffffa003e110>] __reqbufs.isra.14+0x220/0x3e0 [videobuf2_core]
[  358.028130]  [<ffffffffa003e79d>] __vb2_init_fileio+0xbd/0x380 [videobuf2_core]
[  358.028147]  [<ffffffffa003f563>] __vb2_perform_fileio+0x5b3/0x6e0 [videobuf2_core]
[  358.028164]  [<ffffffffa003f871>] vb2_fop_read+0xb1/0x100 [videobuf2_core]
[  358.028184]  [<ffffffffa06dd2e5>] v4l2_read+0x65/0xb0 [videodev]
[  358.028198]  [<ffffffff811a243f>] vfs_read+0x8f/0x170
[  358.028210]  [<ffffffff811a30a1>] SyS_read+0x41/0xb0
[  358.028224]  [<ffffffff818f02e9>] system_call_fastpath+0x16/0x1b
[  358.028234] Code: 66 90 e9 dc fd ff ff 0f 1f 40 00 41 8b 4d 68 e9 d5 fe ff ff 0f 1f 80 00 00 00 00 f0 41 80 4d 00 40 e9 03 ff ff ff 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 44 89 c6 4c 89 45 d0 e8 0c 82 ff ff 48
[  358.028415] RIP  [<ffffffff81196040>] new_slab+0x280/0x320
[  358.028432]  RSP <ffff88089b4d7ae8>
[  358.032208] ---[ end trace 6443240199c706e4 ]---

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: em28xx: fix VBI handling logic

commit c7854c2c5d692a329b4d9a9a73bcf36ae137ee7c upstream.

When both VBI and video are streaming, and video stream is stopped,
a subsequent trial to restart it will fail, because S_FMT will
return -EBUSY.

That prevents applications like zvbi to work properly.

Please notice that, while this fix it fully for zvbi, the
best is to get rid of streaming_users and res_get logic as a hole.

However, this single-line patch is better to be merged at -stable.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: adv7604: fix inverted condition

commit 77639ff2b3404a913b8037d230a384798b854bae upstream.

The log_status function should show HDMI information, but the test checking for
an HDMI input was inverted. Fix this.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: af9033: update IT9135 tuner inittabs

commit 0df6580c5fc115034de29aa52af5cf6bd83d37d8 upstream.

Update IT9135 BX tuner config 60 and 61 inittabs.

[crope@iki.fi: removed two reg writes from driver init itself]
Signed-off-by: Bimow Chen <Bimow.Chen@ite.com.tw>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>

media: cx18: fix kernel oops with tda8290 tuner

commit 6a03dc92cc2edfa2257502557b9f714893987383 upstream.

This was caused by an uninitialized setup.config field.

Based on a suggestion from Devin Heitmueller.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Thanks-to: Devin Heitmueller <dheitmueller@kernellabs.com>
Reported-by: Scott Robinson <scott.robinson55@gmail.com>
Tested-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: af9033: feed clock to RF tuner

commit 9dc0f3fe3fe6b83b44e5920a0b143b4f96755b59 upstream.

IT9135 RF tuner clock is coming from demodulator. We need enable it
early in demod init, before any tuner I/O. Currently it is enabled
by tuner driver itself, but it is too late and performance will be
reduced as some registers are not updated correctly. Clock is
disabled automatically when demod is put onto sleep.

Cc: Bimow Chen <Bimow.Chen@ite.com.tw>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: af9035: new IDs: add support for PCTV 78e and PCTV 79e

commit a04646c045cab08a9e62b9be8f01ecbb0632d24e upstream.

add the following IDs
USB_PID_PCTV_78E (0x025a) for PCTV 78e
USB_PID_PCTV_79E (0x0262) for PCTV 79e

For these it9135 devices.

Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Cc: Antti Palosaari <crope@iki.fi>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: it913x: init tuner on attach

commit 01b461bbe74b89da8941f4c95711777d87b9172e upstream.

That register is needed to program very first in order to operate
correctly.

[crope@iki.fi: returned sequence back, removed sleep, moved reg
write earlier to prevent populating tuner ops in case of failure]

Signed-off-by: Bimow Chen <Bimow.Chen@ite.com.tw>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpufreq: fix cpufreq suspend/resume for intel_pstate

commit 8e30444e153008e8eb3e74cbcb7a865bfcfb04a0 upstream.

Cpufreq core introduces cpufreq_suspended flag to let cpufreq sysfs nodes
across S2RAM/S2DISK. But the flag is only set in the cpufreq_suspend()
for cpufreq drivers which have target or target_index callback. This
skips intel_pstate driver. This patch is to set the flag before checking
target or target_index callback.

Fixes: 2f0aea936360 (cpufreq: suspend governors on system suspend/hibernate)
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
[rjw: Subject]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpufreq: release policy->rwsem on error

commit 7106e02baed4a72fb23de56b02ad4d31daa74d95 upstream.

While debugging a cpufreq-related hardware failure on a system I saw the
following lockdep warning:

=========================
[ BUG: held lock freed! ] 3.17.0-rc4+ #1 Tainted: G            E
-------------------------
insmod/2247 is freeing memory ffff88006e1b1400-ffff88006e1b17ff, with a lock still held there!
  (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80
3 locks held by insmod/2247:
  #0:  (subsys mutex#5){+.+.+.}, at: [<ffffffff81485579>] subsys_interface_register+0x69/0x120
  #1:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff8156cf73>] __cpufreq_add_dev.isra.21+0x73/0xb80
  #2:  (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80

stack backtrace:
CPU: 0 PID: 2247 Comm: insmod Tainted: G            E  3.17.0-rc4+ #1
Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 08/24/2013
  0000000000000000 000000008f3063c4 ffff88006f87bb30 ffffffff8171b358
  ffff88006bcf3750 ffff88006f87bb68 ffffffff810e09e1 ffff88006e1b1400
  ffffea0001b86c00 ffffffff8156d327 ffff880073003500 0000000000000246
Call Trace:
  [<ffffffff8171b358>] dump_stack+0x4d/0x66
  [<ffffffff810e09e1>] debug_check_no_locks_freed+0x171/0x180
  [<ffffffff8156d327>] ? __cpufreq_add_dev.isra.21+0x427/0xb80
  [<ffffffff8121412b>] kfree+0xab/0x2b0
  [<ffffffff8156d327>] __cpufreq_add_dev.isra.21+0x427/0xb80
  [<ffffffff81724cf7>] ? _raw_spin_unlock+0x27/0x40
  [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
  [<ffffffff8156da8e>] cpufreq_add_dev+0xe/0x10
  [<ffffffff814855d1>] subsys_interface_register+0xc1/0x120
  [<ffffffff8156bcf2>] cpufreq_register_driver+0x112/0x340
  [<ffffffff8121415a>] ? kfree+0xda/0x2b0
  [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
  [<ffffffffa003562e>] pcc_cpufreq_init+0x4af/0xe81 [pcc_cpufreq]
  [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
  [<ffffffff811f7472>] ? __vunmap+0xd2/0x120
  [<ffffffff81127155>] load_module+0x1315/0x1b70
  [<ffffffff811222a0>] ? store_uevent+0x70/0x70
  [<ffffffff811229d9>] ? copy_module_from_fd.isra.44+0x129/0x180
  [<ffffffff81127b86>] SyS_finit_module+0xa6/0xd0
  [<ffffffff81725b69>] system_call_fastpath+0x16/0x1b
cpufreq: __cpufreq_add_dev: ->get() failed
insmod: ERROR: could not insert module pcc-cpufreq.ko: No such device

The warning occurs in the __cpufreq_add_dev() code which does

        down_write(&policy->rwsem);
...
        if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
                policy->cur = cpufreq_driver->get(policy->cpu);
                if (!policy->cur) {
                        pr_err("%s: ->get() failed\n", __func__);
                        goto err_get_freq;
                }

If cpufreq_driver->get(policy->cpu) returns an error we execute the
code at err_get_freq, which does not up the policy->rwsem.  This causes
the lockdep warning.

Trivial patch to up the policy->rwsem in the error path.

After the patch has been applied, and an error occurs in the
cpufreq_driver->get(policy->cpu) call we will now see

cpufreq: __cpufreq_add_dev: ->get() failed
cpufreq: __cpufreq_add_dev: ->get() failed
modprobe: ERROR: could not insert 'pcc_cpufreq': No such device

Fixes: 4e97b631f24c (cpufreq: Initialize governor for a new policy under policy->rwsem)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "PCI: Make sure bus number resources stay within their parents bounds"

commit 12d8706963f073fffad16c7c24160ef20d9aeaff upstream.

This reverts commit 1820ffdccb9b ("PCI: Make sure bus number resources stay
within their parents bounds") because it breaks some systems with LSI Logic
FC949ES Fibre Channel Adapters, apparently by exposing a defect in those
adapters.

Dirk tested a Tyan VX50 (B4985) with this device that worked like this
prior to 1820ffdccb9b:

    bus: [bus 00-7f] on node 0 link 1
    ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-07])
    pci 0000:00:0e.0: PCI bridge to [bus 0a]
    pci_bus 0000:0a: busn_res: can not insert [bus 0a] under [bus 00-07] (conflicts with (null) [bus 00-07])
    pci 0000:0a:00.0: [1000:0646] type 00 class 0x0c0400 (FC adapter)

Note that the root bridge [bus 00-07] aperture is wrong; this is a BIOS
defect in the PCI0 _CRS method.  But prior to 1820ffdccb9b, we didn't
enforce that aperture, and the FC adapter worked fine at 0a:00.0.

After 1820ffdccb9b, we notice that 00:0e.0's aperture is not contained in
the root bridge's aperture, so we reconfigure it so it *is* contained:

    pci 0000:00:0e.0: bridge configuration invalid ([bus 0a-0a]), reconfiguring
    pci 0000:00:0e.0: PCI bridge to [bus 06-07]

This effectively moves the FC device from 0a:00.0 to 07:00.0, which should
be legal.  But when we enumerate bus 06, the FC device doesn't respond, so
we don't find anything.  This is probably a defect in the FC device.

Possible fixes (due to Yinghai):

    1) Add a quirk to fix the _CRS information based on what amd_bus.c read
       from the hardware

    2) Reset the FC device after we change its bus number

    3) Revert 1820ffdccb9b

Fix 1 would be relatively easy, but it does sweep the LSI FC issue under
the rug.  We might want to reconfigure bus numbers in the future for some
other reason, e.g., hotplug, and then we could trip over this again.

For that reason, I like fix 2, but we don't know whether it actually works,
and we don't have a patch for it yet.

This revert is fix 3, which also sweeps the LSI FC issue under the rug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=84281
Reported-by: Dirk Gouders <dirk@gouders.net>
Tested-by: Dirk Gouders <dirk@gouders.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nl80211: clear skb cb before passing to netlink

commit bd8c78e78d5011d8111bc2533ee73b13a3bd6c42 upstream.

In testmode and vendor command reply/event SKBs we use the
skb cb data to store nl80211 parameters between allocation
and sending. This causes the code for CONFIG_NETLINK_MMAP
to get confused, because it takes ownership of the skb cb
data when the SKB is handed off to netlink, and it doesn't
explicitly clear it.

Clear the skb cb explicitly when we're done and before it
gets passed to netlink to avoid this issue.

Reported-by: Assaf Azulay <assaf.azulay@intel.com>
Reported-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>