1. 01 Oct, 2015 40 commits
    • huaibin Wang's avatar
      ip6_gre: release cached dst on tunnel removal · 6d8c1905
      huaibin Wang authored
      [ Upstream commit d4257295 ]
      
      When a tunnel is deleted, the cached dst entry should be released.
      
      This problem may prevent the removal of a netns (seen with a x-netns IPv6
      gre tunnel):
        unregister_netdevice: waiting for lo to become free. Usage count = 3
      
      CC: Dmitry Kozlov <xeb@mail.ru>
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Signed-off-by: default avatarhuaibin Wang <huaibin.wang@6wind.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d8c1905
    • Dan Carpenter's avatar
      rds: fix an integer overflow test in rds_info_getsockopt() · 7cd10331
      Dan Carpenter authored
      [ Upstream commit 468b732b ]
      
      "len" is a signed integer.  We check that len is not negative, so it
      goes from zero to INT_MAX.  PAGE_SIZE is unsigned long so the comparison
      is type promoted to unsigned long.  ULONG_MAX - 4095 is a higher than
      INT_MAX so the condition can never be true.
      
      I don't know if this is harmful but it seems safe to limit "len" to
      INT_MAX - 4095.
      
      Fixes: a8c879a7 ('RDS: Info and stats')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cd10331
    • Florian Westphal's avatar
      netlink: don't hold mutex in rcu callback when releasing mmapd ring · 3ebe377b
      Florian Westphal authored
      [ Upstream commit 0470eb99 ]
      
      Kirill A. Shutemov says:
      
      This simple test-case trigers few locking asserts in kernel:
      
      int main(int argc, char **argv)
      {
              unsigned int block_size = 16 * 4096;
              struct nl_mmap_req req = {
                      .nm_block_size          = block_size,
                      .nm_block_nr            = 64,
                      .nm_frame_size          = 16384,
                      .nm_frame_nr            = 64 * block_size / 16384,
              };
              unsigned int ring_size;
      	int fd;
      
      	fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
              if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                      exit(1);
              if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                      exit(1);
      
      	ring_size = req.nm_block_nr * req.nm_block_size;
      	mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      	return 0;
      }
      
      +++ exited with 0 +++
      BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
      in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
      3 locks held by init/1:
       #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
       #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
       #2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
      Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20
      
      CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
       ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
       0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
       ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
      Call Trace:
       <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
       [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
       [<ffffffff81085bed>] __might_sleep+0x4d/0x90
       [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
       [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
       [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
       [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
       [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
       [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
       [<ffffffff817e484d>] __sk_free+0x1d/0x160
       [<ffffffff817e49a9>] sk_free+0x19/0x20
      [..]
      
      Cong Wang says:
      
      We can't hold mutex lock in a rcu callback, [..]
      
      Thomas Graf says:
      
      The socket should be dead at this point. It might be simpler to
      add a netlink_release_ring() function which doesn't require
      locking at all.
      Reported-by: default avatar"Kirill A. Shutemov" <kirill@shutemov.name>
      Diagnosed-by: default avatarCong Wang <cwang@twopensource.com>
      Suggested-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ebe377b
    • Edward Hyunkoo Jee's avatar
      inet: frags: fix defragmented packet's IP header for af_packet · cecc5622
      Edward Hyunkoo Jee authored
      [ Upstream commit 0848f642 ]
      
      When ip_frag_queue() computes positions, it assumes that the passed
      sk_buff does not contain L2 headers.
      
      However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
      functions can be called on outgoing packets that contain L2 headers.
      
      Also, IPv4 checksum is not corrected after reassembly.
      
      Fixes: 7736d33f ("packet: Add pre-defragmentation support for ipv4 fanouts.")
      Signed-off-by: default avatarEdward Hyunkoo Jee <edjee@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Jerry Chu <hkchu@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cecc5622
    • Nikolay Aleksandrov's avatar
      bonding: fix destruction of bond with devices different from arphrd_ether · e3e3caac
      Nikolay Aleksandrov authored
      [ Upstream commit 06f6d109 ]
      
      When the bonding is being unloaded and the netdevice notifier is
      unregistered it executes NETDEV_UNREGISTER for each device which should
      remove the bond's proc entry but if the device enslaved is not of
      ARPHRD_ETHER type and is in front of the bonding, it may execute
      bond_release_and_destroy() first which would release the last slave and
      destroy the bond device leaving the proc entry and thus we will get the
      following error (with dynamic debug on for bond_netdev_event to see the
      events order):
      [  908.963051] eql: event: 9
      [  908.963052] eql: IFF_SLAVE
      [  908.963054] eql: event: 2
      [  908.963056] eql: IFF_SLAVE
      [  908.963058] eql: event: 6
      [  908.963059] eql: IFF_SLAVE
      [  908.963110] bond0: Releasing active interface eql
      [  908.976168] bond0: Destroying bond bond0
      [  908.976266] bond0 (unregistering): Released all slaves
      [  908.984097] ------------[ cut here ]------------
      [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
      remove_proc_entry+0x112/0x160()
      [  908.984110] remove_proc_entry: removing non-empty directory
      'net/bonding', leaking at least 'bond0'
      [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
      oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
      crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
      snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
      gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
      psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
      drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
      pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
      autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
      ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
      virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
      usb_common [last unloaded: bonding]
      
      [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
      4.2.0-rc2+ #8
      [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
      ffff8800358dfda8
      [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
      ffff88003e3a4280
      [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
      ffffffff8172ebd0
      [  908.984181] Call Trace:
      [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
      [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
      [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
      [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
      [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
      [bonding]
      [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
      [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
      [  908.984225]  [<ffffffff8142f52d>] ?
      unregister_pernet_operations+0x8d/0xd0
      [  908.984228]  [<ffffffff8142f58d>] ?
      unregister_pernet_subsys+0x1d/0x30
      [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
      [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
      [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
      [  908.984244]  [<ffffffff8152b732>] ?
      entry_SYSCALL_64_fastpath+0x16/0x75
      [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
      
      Thus remove the proc entry manually if bond_release_and_destroy() is
      used. Because of the checks in bond_remove_proc_entry() it's not a
      problem for a bond device to change namespaces (the bug fixed by the
      Fixes commit) but since commit
      f9399814 ("bonding: Don't allow bond devices to change network
      namespaces.") that can't happen anyway.
      Reported-by: default avatarCarol Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: a64d49c3 ("bonding: Manage /proc/net/bonding/ entries from
                            the netdev events")
      Tested-by: default avatarCarol L Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3e3caac
    • Eric Dumazet's avatar
      ipv6: lock socket in ip6_datagram_connect() · 4b633bbe
      Eric Dumazet authored
      [ Upstream commit 03645a11 ]
      
      ip6_datagram_connect() is doing a lot of socket changes without
      socket being locked.
      
      This looks wrong, at least for udp_lib_rehash() which could corrupt
      lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b633bbe
    • Tilman Schmidt's avatar
      isdn/gigaset: reset tty->receive_room when attaching ser_gigaset · c6419a86
      Tilman Schmidt authored
      [ Upstream commit fd98e941 ]
      
      Commit 79901317 ("n_tty: Don't flush buffer when closing ldisc"),
      first merged in kernel release 3.10, caused the following regression
      in the Gigaset M101 driver:
      
      Before that commit, when closing the N_TTY line discipline in
      preparation to switching to N_GIGASET_M101, receive_room would be
      reset to a non-zero value by the call to n_tty_flush_buffer() in
      n_tty's close method. With the removal of that call, receive_room
      might be left at zero, blocking data reception on the serial line.
      
      The present patch fixes that regression by setting receive_room
      to an appropriate value in the ldisc open method.
      
      Fixes: 79901317 ("n_tty: Don't flush buffer when closing ldisc")
      Signed-off-by: default avatarTilman Schmidt <tilman@imap.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6419a86
    • Nikolay Aleksandrov's avatar
      bridge: mdb: fix double add notification · 8d228c93
      Nikolay Aleksandrov authored
      [ Upstream commit 5ebc7846 ]
      
      Since the mdb add/del code was introduced there have been 2 br_mdb_notify
      calls when doing br_mdb_add() resulting in 2 notifications on each add.
      
      Example:
       Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
       Before patch:
       root@debian:~# bridge monitor all
       [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
       [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
      
       After patch:
       root@debian:~# bridge monitor all
       [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: cfd56754 ("bridge: add support of adding and deleting mdb entries")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d228c93
    • Herbert Xu's avatar
      net: Fix skb_set_peeked use-after-free bug · 5fa39f16
      Herbert Xu authored
      [ Upstream commit a0a2a660 ]
      
      The commit 738ac1eb ("net: Clone
      skb before setting peeked flag") introduced a use-after-free bug
      in skb_recv_datagram.  This is because skb_set_peeked may create
      a new skb and free the existing one.  As it stands the caller will
      continue to use the old freed skb.
      
      This patch fixes it by making skb_set_peeked return the new skb
      (or the old one if unchanged).
      
      Fixes: 738ac1eb ("net: Clone skb before setting peeked flag")
      Reported-by: default avatarBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarBrenden Blanco <bblanco@plumgrid.com>
      Reviewed-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fa39f16
    • Herbert Xu's avatar
      net: Fix skb csum races when peeking · 4164cda8
      Herbert Xu authored
      [ Upstream commit 89c22d8c ]
      
      When we calculate the checksum on the recv path, we store the
      result in the skb as an optimisation in case we need the checksum
      again down the line.
      
      This is in fact bogus for the MSG_PEEK case as this is done without
      any locking.  So multiple threads can peek and then store the result
      to the same skb, potentially resulting in bogus skb states.
      
      This patch fixes this by only storing the result if the skb is not
      shared.  This preserves the optimisations for the few cases where
      it can be done safely due to locking or other reasons, e.g., SIOCINQ.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4164cda8
    • Herbert Xu's avatar
      net: Clone skb before setting peeked flag · 0ba48ae9
      Herbert Xu authored
      [ Upstream commit 738ac1eb ]
      
      Shared skbs must not be modified and this is crucial for broadcast
      and/or multicast paths where we use it as an optimisation to avoid
      unnecessary cloning.
      
      The function skb_recv_datagram breaks this rule by setting peeked
      without cloning the skb first.  This causes funky races which leads
      to double-free.
      
      This patch fixes this by cloning the skb and replacing the skb
      in the list when setting skb->peeked.
      
      Fixes: a59322be ("[UDP]: Only increment counter on first peek/recv")
      Reported-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ba48ae9
    • Julian Anastasov's avatar
      net: call rcu_read_lock early in process_backlog · c987fa71
      Julian Anastasov authored
      [ Upstream commit 2c17d27c ]
      
      Incoming packet should be either in backlog queue or
      in RCU read-side section. Otherwise, the final sequence of
      flush_backlog() and synchronize_net() may miss packets
      that can run without device reference:
      
      CPU 1                  CPU 2
                             skb->dev: no reference
                             process_backlog:__skb_dequeue
                             process_backlog:local_irq_enable
      
      on_each_cpu for
      flush_backlog =>       IPI(hardirq): flush_backlog
                             - packet not found in backlog
      
                             CPU delayed ...
      synchronize_net
      - no ongoing RCU
      read-side sections
      
      netdev_run_todo,
      rcu_barrier: no
      ongoing callbacks
                             __netif_receive_skb_core:rcu_read_lock
                             - too late
      free dev
                             process packet for freed dev
      
      Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue")
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c987fa71
    • Oleg Nesterov's avatar
      net: pktgen: fix race between pktgen_thread_worker() and kthread_stop() · f85eee64
      Oleg Nesterov authored
      [ Upstream commit fecdf8be ]
      
      pktgen_thread_worker() is obviously racy, kthread_stop() can come
      between the kthread_should_stop() check and set_current_state().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Reported-by: default avatarMarcelo Leitner <mleitner@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f85eee64
    • Nikolay Aleksandrov's avatar
      bridge: mdb: zero out the local br_ip variable before use · 7865ece3
      Nikolay Aleksandrov authored
      [ Upstream commit f1158b74 ]
      
      Since commit b0e9a30d ("bridge: Add vlan id to multicast groups")
      there's a check in br_ip_equal() for a matching vlan id, but the mdb
      functions were not modified to use (or at least zero it) so when an
      entry was added it would have a garbage vlan id (from the local br_ip
      variable in __br_mdb_add/del) and this would prevent it from being
      matched and also deleted. So zero out the whole local ip var to protect
      ourselves from future changes and also to fix the current bug, since
      there's no vlan id support in the mdb uapi - use always vlan id 0.
      Example before patch:
      root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
      root@debian:~# bridge mdb
      dev br0 port eth1 grp 239.0.0.1 permanent
      root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent
      RTNETLINK answers: Invalid argument
      
      After patch:
      root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
      root@debian:~# bridge mdb
      dev br0 port eth1 grp 239.0.0.1 permanent
      root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent
      root@debian:~# bridge mdb
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Fixes: b0e9a30d ("bridge: Add vlan id to multicast groups")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7865ece3
    • Stephen Smalley's avatar
      net/tipc: initialize security state for new connection socket · afabf2a8
      Stephen Smalley authored
      [ Upstream commit fdd75ea8 ]
      
      Calling connect() with an AF_TIPC socket would trigger a series
      of error messages from SELinux along the lines of:
      SELinux: Invalid class 0
      type=AVC msg=audit(1434126658.487:34500): avc:  denied  { <unprintable> }
        for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0
        tcontext=system_u:object_r:unlabeled_t:s0 tclass=<unprintable>
        permissive=0
      
      This was due to a failure to initialize the security state of the new
      connection sock by the tipc code, leaving it with junk in the security
      class field and an unlabeled secid.  Add a call to security_sk_clone()
      to inherit the security state from the parent socket.
      Reported-by: default avatarTim Shearer <tim.shearer@overturenetworks.com>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afabf2a8
    • Angga's avatar
      ipv6: Make MLD packets to only be processed locally · 3b9393dc
      Angga authored
      [ Upstream commit 4c938d22 ]
      
      Before commit daad1512 ("ipv6: Make ipv6_is_mld() inline and use it
      from ip6_mc_input().") MLD packets were only processed locally. After the
      change, a copy of MLD packet goes through ip6_mr_input, causing
      MRT6MSG_NOCACHE message to be generated to user space.
      
      Make MLD packet only processed locally.
      
      Fixes: daad1512 ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().")
      Signed-off-by: default avatarHermin Anggawijaya <hermin.anggawijaya@alliedtelesis.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b9393dc
    • Alexei Starovoitov's avatar
      x86: bpf_jit: fix compilation of large bpf programs · 9f6191da
      Alexei Starovoitov authored
      commit 3f7352bf upstream.
      
      x86 has variable length encoding. x86 JIT compiler is trying
      to pick the shortest encoding for given bpf instruction.
      While doing so the jump targets are changing, so JIT is doing
      multiple passes over the program. Typical program needs 3 passes.
      Some very short programs converge with 2 passes. Large programs
      may need 4 or 5. But specially crafted bpf programs may hit the
      pass limit and if the program converges on the last iteration
      the JIT compiler will be producing an image full of 'int 3' insns.
      Fix this corner case by doing final iteration over bpf program.
      
      Fixes: 0a14842f ("net: filter: Just In Time compiler for x86-64")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Tested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f6191da
    • Dan Carpenter's avatar
      vhost/scsi: potential memory corruption · fa83234f
      Dan Carpenter authored
      commit 59c816c1 upstream.
      
      This code in vhost_scsi_make_tpg() is confusing because we limit "tpgt"
      to UINT_MAX but the data type of "tpg->tport_tpgt" and that is a u16.
      
      I looked at the context and it turns out that in
      vhost_scsi_set_endpoint(), "tpg->tport_tpgt" is used as an offset into
      the vs_tpg[] array which has VHOST_SCSI_MAX_TARGET (256) elements so
      anything higher than 255 then it is invalid.  I have made that the limit
      now.
      
      In vhost_scsi_send_evt() we mask away values higher than 255, but now
      that the limit has changed, we don't need the mask.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [ The affected function was renamed to vhost_scsi_make_tpg before
        the vulnerability was announced, I ported it to 3.10 stable and
        changed the code in function tcm_vhost_make_tpg]
      Signed-off-by: default avatarWang Long <long.wanglong@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa83234f
    • Marcelo Ricardo Leitner's avatar
      sctp: fix ASCONF list handling · 7bf24986
      Marcelo Ricardo Leitner authored
      commit 2d45a02d upstream.
      
      ->auto_asconf_splist is per namespace and mangled by functions like
      sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
      
      Also, the call to inet_sk_copy_descendant() was backuping
      ->auto_asconf_list through the copy but was not honoring
      ->do_auto_asconf, which could lead to list corruption if it was
      different between both sockets.
      
      This commit thus fixes the list handling by using ->addr_wq_lock
      spinlock to protect the list. A special handling is done upon socket
      creation and destruction for that. Error handlig on sctp_init_sock()
      will never return an error after having initialized asconf, so
      sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
      will be take on sctp_close_sock(), before locking the socket, so we
      don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
      
      Instead of taking the lock on sctp_sock_migrate() for copying and
      restoring the list values, it's preferred to avoid rewritting it by
      implementing sctp_copy_descendant().
      
      Issue was found with a test application that kept flipping sysctl
      default_auto_asconf on and off, but one could trigger it by issuing
      simultaneous setsockopt() calls on multiple sockets or by
      creating/destroying sockets fast enough. This is only triggerable
      locally.
      
      Fixes: 9f7d653b ("sctp: Add Auto-ASCONF support (core).")
      Reported-by: default avatarJi Jianwen <jiji@redhat.com>
      Suggested-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Suggested-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [wangkai: backport to 3.10: adjust context]
      Signed-off-by: default avatarWang Kai <morgan.wang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bf24986
    • Hin-Tak Leung's avatar
      hfs,hfsplus: cache pages correctly between bnode_create and bnode_free · 61cabc7d
      Hin-Tak Leung authored
      commit 7cb74be6 upstream.
      
      Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
      hfs_bnode_find() for finding or creating pages corresponding to an inode)
      are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
      and should not be page_cache_release()'ed until hfs_bnode_free().
      
      This patch fixes a problem I first saw in July 2012: merely running "du"
      on a large hfsplus-mounted directory a few times on a reasonably loaded
      system would get the hfsplus driver all confused and complaining about
      B-tree inconsistencies, and generates a "BUG: Bad page state".  Most
      recently, I can generate this problem on up-to-date Fedora 22 with shipped
      kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
      mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
      lightly-used QEMU VM image of the full Mac OS X 10.9:
      
      $ df -i / /home /mnt
      Filesystem                  Inodes   IUsed      IFree IUse% Mounted on
      /dev/mapper/fedora-root    3276800  551665    2725135   17% /
      /dev/mapper/fedora-home   52879360  716221   52163139    2% /home
      /dev/nbd0p2             4294967295 1387818 4293579477    1% /mnt
      
      After applying the patch, I was able to run "du /" (60+ times) and "du
      /mnt" (150+ times) continuously and simultaneously for 6+ hours.
      
      There are many reports of the hfsplus driver getting confused under load
      and generating "BUG: Bad page state" or other similar issues over the
      years.  [1]
      
      The unpatched code [2] has always been wrong since it entered the kernel
      tree.  The only reason why it gets away with it is that the
      kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
      the kernel has not had a chance to reuse the memory for something else,
      most of the time.
      
      The current RW driver appears to have followed the design and development
      of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
      2001) had a B-tree node-centric approach to
      read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
      migrating towards version 0.2 (June 2002) of caching and releasing pages
      per inode extents.  When the current RW code first entered the kernel [2]
      in 2005, there was an REF_PAGES conditional (and "//" commented out code)
      to switch between B-node centric paging to inode-centric paging.  There
      was a mistake with the direction of one of the REF_PAGES conditionals in
      __hfs_bnode_create().  In a subsequent "remove debug code" commit [4], the
      read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
      removed, but a page_cache_release() was mistakenly left in (propagating
      the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
      page_cache_release() in bnode_release() (which should be spanned by
      !REF_PAGES) was never enabled.
      
      References:
      [1]:
      Michael Fox, Apr 2013
      http://www.spinics.net/lists/linux-fsdevel/msg63807.html
      ("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")
      
      Sasha Levin, Feb 2015
      http://lkml.org/lkml/2015/2/20/85 ("use after free")
      
      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
      https://bugzilla.kernel.org/show_bug.cgi?id=42342
      https://bugzilla.kernel.org/show_bug.cgi?id=63841
      https://bugzilla.kernel.org/show_bug.cgi?id=78761
      
      [2]:
      http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
      fs/hfs/bnode.c?id=d1081202
      commit d1081202
      Author: Andrew Morton <akpm@osdl.org>
      Date:   Wed Feb 25 16:17:36 2004 -0800
      
          [PATCH] HFS rewrite
      
      http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
      fs/hfsplus/bnode.c?id=91556682
      
      commit 91556682
      Author: Andrew Morton <akpm@osdl.org>
      Date:   Wed Feb 25 16:17:48 2004 -0800
      
          [PATCH] HFS+ support
      
      [3]:
      http://sourceforge.net/projects/linux-hfsplus/
      
      http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
      http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
      
      http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
      fs/hfsplus/bnode.c?r1=1.4&r2=1.5
      
      Date:   Thu Jun 6 09:45:14 2002 +0000
      Use buffer cache instead of page cache in bnode.c. Cache inode extents.
      
      [4]:
      http://git.kernel.org/cgit/linux/kernel/git/\
      stable/linux-stable.git/commit/?id=a5e3985f
      
      commit a5e3985f
      Author: Roman Zippel <zippel@linux-m68k.org>
      Date:   Tue Sep 6 15:18:47 2005 -0700
      
      [PATCH] hfs: remove debug code
      Signed-off-by: default avatarHin-Tak Leung <htl10@users.sourceforge.net>
      Signed-off-by: default avatarSergei Antonov <saproj@gmail.com>
      Reviewed-by: default avatarAnton Altaparmakov <anton@tuxera.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Sougata Santra <sougata@tuxera.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61cabc7d
    • Noa Osherovich's avatar
      IB/mlx4: Use correct SL on AH query under RoCE · 2698f574
      Noa Osherovich authored
      commit 5e99b139 upstream.
      
      The mlx4 IB driver implementation for ib_query_ah used a wrong offset
      (28 instead of 29) when link type is Ethernet. Fixed to use the correct one.
      
      Fixes: fa417f7b ('IB/mlx4: Add support for IBoE')
      Signed-off-by: default avatarShani Michaeli <shanim@mellanox.com>
      Signed-off-by: default avatarNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2698f574
    • Jack Morgenstein's avatar
      IB/mlx4: Forbid using sysfs to change RoCE pkeys · a6d452e0
      Jack Morgenstein authored
      commit 2b135db3 upstream.
      
      The pkey mapping for RoCE must remain the default mapping:
      VFs:
        virtual index 0 = mapped to real index 0 (0xFFFF)
        All others indices: mapped to a real pkey index containing an
                            invalid pkey.
      PF:
        virtual index i = real index i.
      
      Don't allow users to change these mappings using files found in
      sysfs.
      
      Fixes: c1e7e466 ('IB/mlx4: Add iov directory in sysfs under the ib device')
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6d452e0
    • Yishai Hadas's avatar
      IB/uverbs: Fix race between ib_uverbs_open and remove_one · caf23350
      Yishai Hadas authored
      commit 35d4a0b6 upstream.
      
      Fixes: 2a72f212 ("IB/uverbs: Remove dev_table")
      
      Before this commit there was a device look-up table that was protected
      by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
      it was dropped and container_of was used instead, it enabled the race
      with remove_one as dev might be freed just after:
      dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
      before the kref_get.
      
      In addition, this buggy patch added some dead code as
      container_of(x,y,z) can never be NULL and so dev can never be NULL.
      As a result the comment above ib_uverbs_open saying "the open method
      will either immediately run -ENXIO" is wrong as it can never happen.
      
      The solution follows Jason Gunthorpe suggestion from below URL:
      https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html
      
      cdev will hold a kref on the parent (the containing structure,
      ib_uverbs_device) and only when that kref is released it is
      guaranteed that open will never be called again.
      
      In addition, fixes the active count scheme to use an atomic
      not a kref to prevent WARN_ON as pointed by above comment
      from Jason.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarShachar Raindel <raindel@mellanox.com>
      Reviewed-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      caf23350
    • Christoph Hellwig's avatar
      IB/uverbs: reject invalid or unknown opcodes · 939f8043
      Christoph Hellwig authored
      commit b632ffa7 upstream.
      
      We have many WR opcodes that are only supported in kernel space
      and/or require optional information to be copied into the WR
      structure.  Reject all those not explicitly handled so that we
      can't pass invalid information to drivers.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      939f8043
    • Hin-Tak Leung's avatar
      hfs: fix B-tree corruption after insertion at position 0 · 431152b6
      Hin-Tak Leung authored
      commit b4cc0efe upstream.
      
      Fix B-tree corruption when a new record is inserted at position 0 in the
      node in hfs_brec_insert().
      
      This is an identical change to the corresponding hfs b-tree code to Sergei
      Antonov's "hfsplus: fix B-tree corruption after insertion at position 0",
      to keep similar code paths in the hfs and hfsplus drivers in sync, where
      appropriate.
      Signed-off-by: default avatarHin-Tak Leung <htl10@users.sourceforge.net>
      Cc: Sergei Antonov <saproj@gmail.com>
      Cc: Joe Perches <joe@perches.com>
      Reviewed-by: default avatarVyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      431152b6
    • David Vrabel's avatar
      xen/gntdev: convert priv->lock to a mutex · f8cb6399
      David Vrabel authored
      commit 1401c00e upstream.
      
      Unmapping may require sleeping and we unmap while holding priv->lock, so
      convert it to a mutex.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8cb6399
    • NeilBrown's avatar
      md/raid10: always set reshape_safe when initializing reshape_position. · d3e972d5
      NeilBrown authored
      commit 299b0685 upstream.
      
      'reshape_position' tracks where in the reshape we have reached.
      'reshape_safe' tracks where in the reshape we have safely recorded
      in the metadata.
      
      These are compared to determine when to update the metadata.
      So it is important that reshape_safe is initialised properly.
      Currently it isn't.  When starting a reshape from the beginning
      it usually has the correct value by luck.  But when reducing the
      number of devices in a RAID10, it has the wrong value and this leads
      to the metadata not being updated correctly.
      This can lead to corruption if the reshape is not allowed to complete.
      
      This patch is suitable for any -stable kernel which supports RAID10
      reshape, which is 3.5 and later.
      
      Fixes: 3ea7daa5 ("md/raid10: add reshape support")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3e972d5
    • Jialing Fu's avatar
      mmc: core: fix race condition in mmc_wait_data_done · ab7a4b4b
      Jialing Fu authored
      commit 71f8a4b8 upstream.
      
      The following panic is captured in ker3.14, but the issue still exists
      in latest kernel.
      ---------------------------------------------------------------------
      [   20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer dereference
      at virtual address 00000578
      ......
      [   20.738499] c0 3136 (Compiler) PC is at _raw_spin_lock_irqsave+0x24/0x60
      [   20.738527] c0 3136 (Compiler) LR is at _raw_spin_lock_irqsave+0x20/0x60
      [   20.740134] c0 3136 (Compiler) Call trace:
      [   20.740165] c0 3136 (Compiler) [<ffffffc0008ee900>] _raw_spin_lock_irqsave+0x24/0x60
      [   20.740200] c0 3136 (Compiler) [<ffffffc0000dd024>] __wake_up+0x1c/0x54
      [   20.740230] c0 3136 (Compiler) [<ffffffc000639414>] mmc_wait_data_done+0x28/0x34
      [   20.740262] c0 3136 (Compiler) [<ffffffc0006391a0>] mmc_request_done+0xa4/0x220
      [   20.740314] c0 3136 (Compiler) [<ffffffc000656894>] sdhci_tasklet_finish+0xac/0x264
      [   20.740352] c0 3136 (Compiler) [<ffffffc0000a2b58>] tasklet_action+0xa0/0x158
      [   20.740382] c0 3136 (Compiler) [<ffffffc0000a2078>] __do_softirq+0x10c/0x2e4
      [   20.740411] c0 3136 (Compiler) [<ffffffc0000a24bc>] irq_exit+0x8c/0xc0
      [   20.740439] c0 3136 (Compiler) [<ffffffc00008489c>] handle_IRQ+0x48/0xac
      [   20.740469] c0 3136 (Compiler) [<ffffffc000081428>] gic_handle_irq+0x38/0x7c
      ----------------------------------------------------------------------
      Because in SMP, "mrq" has race condition between below two paths:
      path1: CPU0: <tasklet context>
        static void mmc_wait_data_done(struct mmc_request *mrq)
        {
           mrq->host->context_info.is_done_rcv = true;
           //
           // If CPU0 has just finished "is_done_rcv = true" in path1, and at
           // this moment, IRQ or ICache line missing happens in CPU0.
           // What happens in CPU1 (path2)?
           //
           // If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode:
           // path2 would have chance to break from wait_event_interruptible
           // in mmc_wait_for_data_req_done and continue to run for next
           // mmc_request (mmc_blk_rw_rq_prep).
           //
           // Within mmc_blk_rq_prep, mrq is cleared to 0.
           // If below line still gets host from "mrq" as the result of
           // compiler, the panic happens as we traced.
           wake_up_interruptible(&mrq->host->context_info.wait);
        }
      
      path2: CPU1: <The mmcqd thread runs mmc_queue_thread>
        static int mmc_wait_for_data_req_done(...
        {
           ...
           while (1) {
                 wait_event_interruptible(context_info->wait,
                         (context_info->is_done_rcv ||
                          context_info->is_new_req));
           	   static void mmc_blk_rw_rq_prep(...
                 {
                 ...
                 memset(brq, 0, sizeof(struct mmc_blk_request));
      
      This issue happens very coincidentally; however adding mdelay(1) in
      mmc_wait_data_done as below could duplicate it easily.
      
         static void mmc_wait_data_done(struct mmc_request *mrq)
         {
           mrq->host->context_info.is_done_rcv = true;
      +    mdelay(1);
           wake_up_interruptible(&mrq->host->context_info.wait);
          }
      
      At runtime, IRQ or ICache line missing may just happen at the same place
      of the mdelay(1).
      
      This patch gets the mmc_context_info at the beginning of function, it can
      avoid this race condition.
      Signed-off-by: default avatarJialing Fu <jlfu@marvell.com>
      Tested-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Fixes: 2220eedf ("mmc: fix async request mechanism ....")
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab7a4b4b
    • Jann Horn's avatar
      fs: if a coredump already exists, unlink and recreate with O_EXCL · 9bdee2f9
      Jann Horn authored
      commit fbb18169 upstream.
      
      It was possible for an attacking user to trick root (or another user) into
      writing his coredumps into an attacker-readable, pre-existing file using
      rename() or link(), causing the disclosure of secret data from the victim
      process' virtual memory.  Depending on the configuration, it was also
      possible to trick root into overwriting system files with coredumps.  Fix
      that issue by never writing coredumps into existing files.
      
      Requirements for the attack:
       - The attack only applies if the victim's process has a nonzero
         RLIMIT_CORE and is dumpable.
       - The attacker can trick the victim into coredumping into an
         attacker-writable directory D, either because the core_pattern is
         relative and the victim's cwd is attacker-writable or because an
         absolute core_pattern pointing to a world-writable directory is used.
       - The attacker has one of these:
        A: on a system with protected_hardlinks=0:
           execute access to a folder containing a victim-owned,
           attacker-readable file on the same partition as D, and the
           victim-owned file will be deleted before the main part of the attack
           takes place. (In practice, there are lots of files that fulfill
           this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
           This does not apply to most Linux systems because most distros set
           protected_hardlinks=1.
        B: on a system with protected_hardlinks=1:
           execute access to a folder containing a victim-owned,
           attacker-readable and attacker-writable file on the same partition
           as D, and the victim-owned file will be deleted before the main part
           of the attack takes place.
           (This seems to be uncommon.)
        C: on any system, independent of protected_hardlinks:
           write access to a non-sticky folder containing a victim-owned,
           attacker-readable file on the same partition as D
           (This seems to be uncommon.)
      
      The basic idea is that the attacker moves the victim-owned file to where
      he expects the victim process to dump its core.  The victim process dumps
      its core into the existing file, and the attacker reads the coredump from
      it.
      
      If the attacker can't move the file because he does not have write access
      to the containing directory, he can instead link the file to a directory
      he controls, then wait for the original link to the file to be deleted
      (because the kernel checks that the link count of the corefile is 1).
      
      A less reliable variant that requires D to be non-sticky works with link()
      and does not require deletion of the original link: link() the file into
      D, but then unlink() it directly before the kernel performs the link count
      check.
      
      On systems with protected_hardlinks=0, this variant allows an attacker to
      not only gain information from coredumps, but also clobber existing,
      victim-writable files with coredumps.  (This could theoretically lead to a
      privilege escalation.)
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9bdee2f9
    • Jaewon Kim's avatar
      vmscan: fix increasing nr_isolated incurred by putback unevictable pages · de047ce4
      Jaewon Kim authored
      commit c54839a7 upstream.
      
      reclaim_clean_pages_from_list() assumes that shrink_page_list() returns
      number of pages removed from the candidate list.  But shrink_page_list()
      puts back mlocked pages without passing it to caller and without
      counting as nr_reclaimed.  This increases nr_isolated.
      
      To fix this, this patch changes shrink_page_list() to pass unevictable
      pages back to caller.  Caller will take care those pages.
      
      Minchan said:
      
      It fixes two issues.
      
      1. With unevictable page, cma_alloc will be successful.
      
      Exactly speaking, cma_alloc of current kernel will fail due to
      unevictable pages.
      
      2. fix leaking of NR_ISOLATED counter of vmstat
      
      With it, too_many_isolated works.  Otherwise, it could make hang until
      the process get SIGKILL.
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de047ce4
    • Helge Deller's avatar
      parisc: Filter out spurious interrupts in PA-RISC irq handler · 706ad8dc
      Helge Deller authored
      commit b1b4e435 upstream.
      
      When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
      long way to go to find the right IRQ line, registering it, then registering the
      serial port and the irq handler for the serial port. During this phase spurious
      interrupts for the serial port may happen which then crashes the kernel because
      the action handler might not have been set up yet.
      
      So, basically it's a race condition between the serial port hardware and the
      CPU which sets up the necessary fields in the irq sructs. The main reason for
      this race is, that we unmask the serial port irqs too early without having set
      up everything properly before (which isn't easily possible because we need the
      IRQ number to register the serial ports).
      
      This patch is a work-around for this problem. It adds checks to the CPU irq
      handler to verify if the IRQ action field has been initialized already. If not,
      we just skip this interrupt (which isn't critical for a serial port at bootup).
      The real fix would probably involve rewriting all PA-RISC specific IRQ code
      (for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
      the irq chips and proper irq enabling along this line.
      
      This bug has been in the PA-RISC port since the beginning, but the crashes
      happened very rarely with currently used hardware.  But on the latest machine
      which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
      1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
      crashed at every boot because of this race. So, without this patch the machine
      would currently be unuseable.
      
      For the record, here is the flow logic:
      1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
      2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
      3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
      4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
      5. serial_init_chip() then registers the 8250 port.
      Problems:
      - In step 4 the CPU irq shouldn't have been registered yet, but after step 5
      - If serial irq happens between 4 and 5 have finished, the kernel will crash
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      706ad8dc
    • Trond Myklebust's avatar
      NFS: nfs_set_pgio_error sometimes misses errors · 690eb5ee
      Trond Myklebust authored
      commit e9ae58ae upstream.
      
      We should ensure that we always set the pgio_header's error field
      if a READ or WRITE RPC call returns an error. The current code depends
      on 'hdr->good_bytes' always being initialised to a large value, which
      is not always done correctly by callers.
      When this happens, applications may end up missing important errors.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      690eb5ee
    • NeilBrown's avatar
      NFSv4: don't set SETATTR for O_RDONLY|O_EXCL · 9520ac79
      NeilBrown authored
      commit efcbc04e upstream.
      
      It is unusual to combine the open flags O_RDONLY and O_EXCL, but
      it appears that libre-office does just that.
      
      [pid  3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
      [pid  3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>
      
      NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
      probably to reset the timestamps.
      
      When it was an O_RDONLY open, the SETATTR command does not
      identify any actual attributes to change.
      If no delegation was provided to the open, the SETATTR uses the
      all-zeros stateid and the request is accepted (at least by the
      Linux NFS server - no harm, no foul).
      
      If a read-delegation was provided, this is used in the SETATTR
      request, and a Netapp filer will justifiably claim
      NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
      to retry - indefinitely.
      
      So only treat O_EXCL specially if O_CREAT was also given.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9520ac79
    • David Härdeman's avatar
      rc-core: fix remove uevent generation · 92a6eef0
      David Härdeman authored
      commit a66b0c41 upstream.
      
      The input_dev is already gone when the rc device is being unregistered
      so checking for its presence only means that no remove uevent will be
      generated.
      Signed-off-by: default avatarDavid Härdeman <david@hardeman.nu>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92a6eef0
    • Minfei Huang's avatar
      x86/mm: Initialize pmd_idx in page_table_range_init_count() · 55b9029e
      Minfei Huang authored
      commit 9962eea9 upstream.
      
      The variable pmd_idx is not initialized for the first iteration of the
      for loop.
      
      Assign the proper value which indexes the start address.
      
      Fixes: 719272c4 'x86, mm: only call early_ioremap_page_table_range_init() once'
      Signed-off-by: default avatarMinfei Huang <mnfhuang@gmail.com>
      Cc: tony.luck@intel.com
      Cc: wangnan0@huawei.com
      Cc: david.vrabel@citrix.com
      Reviewed-by: yinghai@kernel.org
      Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55b9029e
    • Jeffery Miller's avatar
      Add radeon suspend/resume quirk for HP Compaq dc5750. · 1d6c4573
      Jeffery Miller authored
      commit 09bfda10 upstream.
      
      With the radeon driver loaded the HP Compaq dc5750
      Small Form Factor machine fails to resume from suspend.
      Adding a quirk similar to other devices avoids
      the problem and the system resumes properly.
      Signed-off-by: default avatarJeffery Miller <jmiller@neverware.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d6c4573
    • Thomas Huth's avatar
      powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers · 2ba90c0e
      Thomas Huth authored
      commit 1c2cb594 upstream.
      
      The EPOW interrupt handler uses rtas_get_sensor(), which in turn
      uses rtas_busy_delay() to wait for RTAS becoming ready in case it
      is necessary. But rtas_busy_delay() is annotated with might_sleep()
      and thus may not be used by interrupts handlers like the EPOW handler!
      This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
      enabled:
      
       BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
       in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
       Call Trace:
       [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
       [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
       [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
       [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
       [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
       [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
       [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
       [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
       [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
       [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
       [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
       [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
       [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180
      
      Fix this issue by introducing a new rtas_get_sensor_fast() function
      that does not use rtas_busy_delay() - and thus can only be used for
      sensors that do not cause a BUSY condition - known as "fast" sensors.
      
      The EPOW sensor is defined to be "fast" in sPAPR - mpe.
      
      Fixes: 587f83e8 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ba90c0e
    • Michael Ellerman's avatar
      powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash · 36e5789b
      Michael Ellerman authored
      commit 74b5037b upstream.
      
      The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
      PAGE_SIZE.
      
      However when built with a 4K PAGE_SIZE there is an additional config
      option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
      also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.
      
      This is used in one obscure configuration, to support 64K pages for SPU
      local store on the Cell processor when the rest of the kernel is using
      4K pages.
      
      In this configuration, pte_pagesize_index() is defined to just pass
      through its arguments to get_slice_psize(). However pte_pagesize_index()
      is called for both user and kernel addresses, whereas get_slice_psize()
      only knows how to handle user addresses.
      
      This has been broken forever, however until recently it happened to
      work. That was because in get_slice_psize() the large kernel address
      would cause the right shift of the slice mask to return zero.
      
      However in commit 7aa0727f ("powerpc/mm: Increase the slice range to
      64TB"), the get_slice_psize() code was changed so that instead of a
      right shift we do an array lookup based on the address. When passed a
      kernel address this means we index way off the end of the slice array
      and return random junk.
      
      That is only fatal if we happen to hit something non-zero, but when we
      do return a non-zero value we confuse the MMU code and eventually cause
      a check stop.
      
      This fix is ugly, but simple. When we're called for a kernel address we
      return 4K, which is always correct in this configuration, otherwise we
      use the slice mask.
      
      Fixes: 7aa0727f ("powerpc/mm: Increase the slice range to 64TB")
      Reported-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36e5789b
    • Takashi Iwai's avatar
      ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437 · 4c9510d5
      Takashi Iwai authored
      commit a161574e upstream.
      
      It turned out that the machine has a bass speaker, so take a correct
      fixup entry.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c9510d5
    • Takashi Iwai's avatar
      ALSA: hda - Enable headphone jack detect on old Fujitsu laptops · e8c2bbe9
      Takashi Iwai authored
      commit bb148bde upstream.
      
      According to the bug report, FSC Amilo laptops with ALC880 can detect
      the headphone jack but currently the driver disables it.  It's partly
      intentionally, as non-working jack detect was reported in the past.
      Let's enable now.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8c2bbe9