1. 24 May, 2015 30 commits
    • Michal Kubeček's avatar
      udp: only allow UFO for packets from SOCK_DGRAM sockets · 65f26669
      Michal Kubeček authored
      [ Upstream commit acf8dd0a ]
      
      If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
      UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
      CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
      checksum is to be computed on segmentation. However, in this case,
      skb->csum_start and skb->csum_offset are never set as raw socket
      transmit path bypasses udp_send_skb() where they are usually set. As a
      result, driver may access invalid memory when trying to calculate the
      checksum and store the result (as observed in virtio_net driver).
      
      Moreover, the very idea of modifying the userspace provided UDP header
      is IMHO against raw socket semantics (I wasn't able to find a document
      clearly stating this or the opposite, though). And while allowing
      CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
      too intrusive change just to handle a corner case like this. Therefore
      disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 332640b2)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      65f26669
    • Steffen Klassert's avatar
      ipv4: Don't use ufo handling on later transformed packets · 417b2efd
      Steffen Klassert authored
      We might call ip_ufo_append_data() for packets that will be IPsec
      transformed later. This function should be used just for real
      udp packets. So we check for rt->dst.header_len which is only
      nonzero on IPsec handling and call ip_ufo_append_data() just
      if rt->dst.header_len is zero.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit c146066a)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      417b2efd
    • Matthew Thode's avatar
      net: reject creation of netdev names with colons · a7357ca5
      Matthew Thode authored
      [ Upstream commit a4176a93 ]
      
      colons are used as a separator in netdev device lookup in dev_ioctl.c
      
      Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME
      Signed-off-by: default avatarMatthew Thode <mthode@mthode.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit d501ebeb)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a7357ca5
    • Ignacy Gawędzki's avatar
      ematch: Fix auto-loading of ematch modules. · 20914ec4
      Ignacy Gawędzki authored
      [ Upstream commit 34eea79e ]
      
      In tcf_em_validate(), after calling request_module() to load the
      kind-specific module, set em->ops to NULL before returning -EAGAIN, so
      that module_put() is not called again by tcf_em_tree_destroy().
      Signed-off-by: default avatarIgnacy Gawędzki <ignacy.gawedzki@green-communications.fr>
      Acked-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 9405be73)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      20914ec4
    • Florian Westphal's avatar
      ppp: deflate: never return len larger than output buffer · 221956a2
      Florian Westphal authored
      [ Upstream commit e2a4800e ]
      
      When we've run out of space in the output buffer to store more data, we
      will call zlib_deflate with a NULL output buffer until we've consumed
      remaining input.
      
      When this happens, olen contains the size the output buffer would have
      consumed iff we'd have had enough room.
      
      This can later cause skb_over_panic when ppp_generic skb_put()s
      the returned length.
      Reported-by: default avatarIain Douglas <centos@1n6.org.uk>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 8bcd6442)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      221956a2
    • Ani Sinha's avatar
      net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland. · 964a5909
      Ani Sinha authored
      commit 6a2a2b3a upstream.
      
      Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when
      msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage
      value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will
      break old binaries and any code for which there is no access to source code.
      To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland.
      Signed-off-by: default avatarAni Sinha <ani@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit d29f1f53)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      964a5909
    • Jann Horn's avatar
      fs: take i_mutex during prepare_binprm for set[ug]id executables · 0c5d4221
      Jann Horn authored
      commit 8b01fc86 upstream.
      
      This prevents a race between chown() and execve(), where chowning a
      setuid-user binary to root would momentarily make the binary setuid
      root.
      
      This patch was mostly written by Linus Torvalds.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2:
       - Drop the task_no_new_privs() and user namespace checks
       - Open-code file_inode()
       - s/READ_ONCE/ACCESS_ONCE/
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 470e517b)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0c5d4221
    • D.S. Ljungmark's avatar
      ipv6: Don't reduce hop limit for an interface · daacd26b
      D.S. Ljungmark authored
      commit 6fd99094 upstream.
      
      A local route may have a lower hop_limit set than global routes do.
      
      RFC 3756, Section 4.2.7, "Parameter Spoofing"
      
      >   1.  The attacker includes a Current Hop Limit of one or another small
      >       number which the attacker knows will cause legitimate packets to
      >       be dropped before they reach their destination.
      
      >   As an example, one possible approach to mitigate this threat is to
      >   ignore very small hop limits.  The nodes could implement a
      >   configurable minimum hop limit, and ignore attempts to set it below
      >   said limit.
      Signed-off-by: default avatarD.S. Ljungmark <ljungmark@modio.se>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust ND_PRINTK() usage]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit f10f7d2a)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      daacd26b
    • Sasha Levin's avatar
      net: rds: use correct size for max unacked packets and bytes · 6246ff96
      Sasha Levin authored
      commit db27ebb1 upstream.
      
      Max unacked packets/bytes is an int while sizeof(long) was used in the
      sysctl table.
      
      This means that when they were getting read we'd also leak kernel memory
      to userspace along with the timeout values.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 3760b67b)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6246ff96
    • Sasha Levin's avatar
      net: llc: use correct size for sysctl timeout entries · 716fff2a
      Sasha Levin authored
      commit 6b8d9117 upstream.
      
      The timeout entries are sizeof(int) rather than sizeof(long), which
      means that when they were getting read we'd also leak kernel memory
      to userspace along with the timeout values.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 88fe14be)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      716fff2a
    • Shachar Raindel's avatar
      IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic · e402af7b
      Shachar Raindel authored
      commit 8494057a upstream.
      
      Properly verify that the resulting page aligned end address is larger
      than both the start address and the length of the memory area requested.
      
      Both the start and length arguments for ib_umem_get are controlled by
      the user. A misbehaving user can provide values which will cause an
      integer overflow when calculating the page aligned end address.
      
      This overflow can cause also miscalculation of the number of pages
      mapped, and additional logic issues.
      
      Addresses: CVE-2014-8159
      Signed-off-by: default avatarShachar Raindel <raindel@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 485f16b7)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e402af7b
    • Daniel Borkmann's avatar
      net: sctp: fix slab corruption from use after free on INIT collisions · 1b9143bd
      Daniel Borkmann authored
      commit 600ddd68 upstream
      
      When hitting an INIT collision case during the 4WHS with AUTH enabled, as
      already described in detail in commit 1be9a950 ("net: sctp: inherit
      auth_capable on INIT collisions"), it can happen that we occasionally
      still remotely trigger the following panic on server side which seems to
      have been uncovered after the fix from commit 1be9a950 ...
      
      [  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
      [  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
      [  533.940559] PGD 5030f2067 PUD 0
      [  533.957104] Oops: 0000 [#1] SMP
      [  533.974283] Modules linked in: sctp mlx4_en [...]
      [  534.939704] Call Trace:
      [  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
      [  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
      [  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
      [  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
      [  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
      [  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
      [  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
      [  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      ... or depending on the the application, for example this one:
      
      [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
      [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
      [ 1370.054568] PGD 633c94067 PUD 0
      [ 1370.070446] Oops: 0000 [#1] SMP
      [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
      [ 1370.963431] Call Trace:
      [ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
      [ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
      [ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
      [ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
      [ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      With slab debugging enabled, we can see that the poison has been overwritten:
      
      [  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
      [  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
      [  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
      [  669.826424]  __slab_alloc+0x4bf/0x566
      [  669.826433]  __kmalloc+0x280/0x310
      [  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
      [  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
      [  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
      [  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
      [  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
      [  669.826635]  __slab_free+0x39/0x2a8
      [  669.826643]  kfree+0x1d6/0x230
      [  669.826650]  kzfree+0x31/0x40
      [  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
      [  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
      [  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]
      
      Since this only triggers in some collision-cases with AUTH, the problem at
      heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
      when having refcnt 1, once directly in sctp_assoc_update() and yet again
      from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
      the already kzfree'd memory, which is also consistent with the observation
      of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
      at a later point in time when poison is checked on new allocation).
      
      Reference counting of auth keys revisited:
      
      Shared keys for AUTH chunks are being stored in endpoints and associations
      in endpoint_shared_keys list. On endpoint creation, a null key is being
      added; on association creation, all endpoint shared keys are being cached
      and thus cloned over to the association. struct sctp_shared_key only holds
      a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
      keeps track of users internally through refcounting. Naturally, on assoc
      or enpoint destruction, sctp_shared_key are being destroyed directly and
      the reference on sctp_auth_bytes dropped.
      
      User space can add keys to either list via setsockopt(2) through struct
      sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
      adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
      with refcount 1 and in case of replacement drops the reference on the old
      sctp_auth_bytes. A key can be set active from user space through setsockopt()
      on the id via sctp_auth_set_active_key(), which iterates through either
      endpoint_shared_keys and in case of an assoc, invokes (one of various places)
      sctp_auth_asoc_init_active_key().
      
      sctp_auth_asoc_init_active_key() computes the actual secret from local's
      and peer's random, hmac and shared key parameters and returns a new key
      directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
      the reference if there was a previous one. The secret, which where we
      eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
      intitial refcount of 1, which also stays unchanged eventually in
      sctp_assoc_update(). This key is later being used for crypto layer to
      set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().
      
      To close the loop: asoc->asoc_shared_key is freshly allocated secret
      material and independant of the sctp_shared_key management keeping track
      of only shared keys in endpoints and assocs. Hence, also commit 4184b2a7
      ("net: sctp: fix memory leak in auth key management") is independant of
      this bug here since it concerns a different layer (though same structures
      being used eventually). asoc->asoc_shared_key is reference dropped correctly
      on assoc destruction in sctp_association_free() and when active keys are
      being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
      of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
      to remove that sctp_auth_key_put() from there which fixes these panics.
      
      Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1b9143bd
    • Daniel Borkmann's avatar
      net: sctp: fix memory leak in auth key management · f014e54c
      Daniel Borkmann authored
      commit 4184b2a7 upstream.
      
      A very minimal and simple user space application allocating an SCTP
      socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
      the socket again will leak the memory containing the authentication
      key from user space:
      
      unreferenced object 0xffff8800837047c0 (size 16):
        comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
        hex dump (first 16 bytes):
          01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811c88d8>] __kmalloc+0xe8/0x270
          [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
          [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
          [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
          [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
          [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
          [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This is bad because of two things, we can bring down a machine from
      user space when auth_enable=1, but also we would leave security sensitive
      keying material in memory without clearing it after use. The issue is
      that sctp_auth_create_key() already sets the refcount to 1, but after
      allocation sctp_auth_set_key() does an additional refcount on it, and
      thus leaving it around when we free the socket.
      
      Fixes: 65b07e5d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 3af10169)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f014e54c
    • Jan Kara's avatar
      isofs: Fix unchecked printing of ER records · 09b5d759
      Jan Kara authored
      commit 4e202462 upstream
      
      We didn't check length of rock ridge ER records before printing them.
      Thus corrupted isofs image can cause us to access and print some memory
      behind the buffer with obvious consequences.
      Reported-and-tested-by: default avatarCarl Henrik Lunde <chlunde@ping.uio.no>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      09b5d759
    • Jan Kara's avatar
      isofs: Fix infinite looping over CE entries · 08313e26
      Jan Kara authored
      commit f54e18f1 upstream
      
      Rock Ridge extensions define so called Continuation Entries (CE) which
      define where is further space with Rock Ridge data. Corrupted isofs
      image can contain arbitrarily long chain of these, including a one
      containing loop and thus causing kernel to end in an infinite loop when
      traversing these entries.
      
      Limit the traversal to 32 entries which should be more than enough space
      to store all the Rock Ridge data.
      Reported-by: default avatarP J P <ppandit@redhat.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      08313e26
    • Florian Westphal's avatar
      netfilter: conntrack: disable generic tracking for known protocols · 8e421332
      Florian Westphal authored
      commit db29a950 upstream
      
      Given following iptables ruleset:
      
      -P FORWARD DROP
      -A FORWARD -m sctp --dport 9 -j ACCEPT
      -A FORWARD -p tcp --dport 80 -j ACCEPT
      -A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT
      
      One would assume that this allows SCTP on port 9 and TCP on port 80.
      Unfortunately, if the SCTP conntrack module is not loaded, this allows
      *all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
      which we think is a security issue.
      
      This is because on the first SCTP packet on port 9, we create a dummy
      "generic l4" conntrack entry without any port information (since
      conntrack doesn't know how to extract this information).
      
      All subsequent packets that are unknown will then be in established
      state since they will fallback to proto_generic and will match the
      'generic' entry.
      
      Our originally proposed version [1] completely disabled generic protocol
      tracking, but Jozsef suggests to not track protocols for which a more
      suitable helper is available, hence we now mitigate the issue for in
      tree known ct protocol helpers only, so that at least NAT and direction
      information will still be preserved for others.
      
       [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html
      
      Joint work with Daniel Borkmann.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      [bwh: Backported to 2.6.32: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8e421332
    • Ben Hutchings's avatar
      splice: Apply generic position and size checks to each write · 7e6536a2
      Ben Hutchings authored
      We need to check the position and size of file writes against various
      limits, using generic_write_check().  This was not being done for
      the splice write path.  It was fixed upstream by commit 8d020765
      ("->splice_write() via ->write_iter()") but we can't apply that.
      
      CVE-2014-7822
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7e6536a2
    • Robert Baldyga's avatar
      serial: samsung: wait for transfer completion before clock disable · 17f35338
      Robert Baldyga authored
      This patch adds waiting until transmit buffer and shifter will be empty
      before clock disabling.
      
      Without this fix it's possible to have clock disabled while data was
      not transmited yet, which causes unproper state of TX line and problems
      in following data transfers.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRobert Baldyga <r.baldyga@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      (cherry picked from commit 1ff383a4)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      17f35338
    • Shai Fultheim's avatar
      x86: Conditionally update time when ack-ing pending irqs · d7522130
      Shai Fultheim authored
      commit 42fa4250 upstream.
      
      On virtual environments, apic_read could take a long time. As a
      result, under certain conditions the ack pending loop may exit
      without any queued irqs left, but after more than one second. A
      warning will be printed needlessly in this case.
      
      If the loop is about to exit regardless of max_loops, don't
      update it.
      Signed-off-by: default avatarShai Fultheim <shai@scalemp.com>
      [ rebased and reworded the commit message]
      Signed-off-by: default avatarIdo Yariv <ido@wizery.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit c9f1417b)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d7522130
    • Andy Lutomirski's avatar
      x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization · c069a4f2
      Andy Lutomirski authored
      commit 956421fb upstream.
      
      'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
      the related state make sense for 'ret_from_sys_call'.  This is
      entirely the wrong check.  TS_COMPAT would make a little more
      sense, but there's really no point in keeping this optimization
      at all.
      
      This fixes a return to the wrong user CS if we came from int
      0x80 in a 64-bit task.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
      [ Backported from tip:x86/asm. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 159891c0)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c069a4f2
    • Borislav Petkov's avatar
      x86, cpu, amd: Add workaround for family 16h, erratum 793 · ebcbe139
      Borislav Petkov authored
      commit 3b564968 upstream
      
      This adds the workaround for erratum 793 as a precaution in case not
      every BIOS implements it.  This addresses CVE-2013-6885.
      
      Erratum text:
      
      [Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
      document 51810 Rev. 3.04 November 2013]
      
      793 Specific Combination of Writes to Write Combined Memory Types and
      Locked Instructions May Cause Core Hang
      
      Description
      
      Under a highly specific and detailed set of internal timing
      conditions, a locked instruction may trigger a timing sequence whereby
      the write to a write combined memory type is not flushed, causing the
      locked instruction to stall indefinitely.
      
      Potential Effect on System
      
      Processor core hang.
      
      Suggested Workaround
      
      BIOS should set MSR
      C001_1020[15] = 1b.
      
      Fix Planned
      
      No fix planned
      
      [ hpa: updated description, fixed typo in MSR name ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnicTested-by: default avatarAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      [bwh: Backported to 3.2:
       - Adjust filename
       - Venkatesh Srinivas pointed out we should use {rd,wr}msrl_safe() to
         avoid crashing on KVM.  This was fixed upstream by commit 8f86a737
         ("x86, AMD: Convert to the new bit access MSR accessors") but that's too
         much trouble to backport.  Here we must use {rd,wr}msrl_amd_safe().]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ebcbe139
    • Hector Marco-Gisbert's avatar
      ASLR: fix stack randomization on 64-bit systems · ba455d81
      Hector Marco-Gisbert authored
      commit 4e7c22d4 upstream
      
      The issue is that the stack for processes is not properly randomized on 64 bit
      architectures due to an integer overflow.
      
      The affected function is randomize_stack_top() in file "fs/binfmt_elf.c":
      
      static unsigned long randomize_stack_top(unsigned long stack_top)
      {
               unsigned int random_variable = 0;
      
               if ((current->flags & PF_RANDOMIZE) &&
                       !(current->personality & ADDR_NO_RANDOMIZE)) {
                       random_variable = get_random_int() & STACK_RND_MASK;
                       random_variable <<= PAGE_SHIFT;
               }
               return PAGE_ALIGN(stack_top) + random_variable;
               return PAGE_ALIGN(stack_top) - random_variable;
      }
      
      Note that, it declares the "random_variable" variable as "unsigned int". Since
      the result of the shifting operation between STACK_RND_MASK (which is
      0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):
      
      random_variable <<= PAGE_SHIFT;
      
      then the two leftmost bits are dropped when storing the result in the
      "random_variable". This variable shall be at least 34 bits long to hold the
      (22+12) result.
      
      These two dropped bits have an impact on the entropy of process stack.
      Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One
      fourth of expected entropy).
      
      This patch restores back the entropy by correcting the types involved in the
      operations in the functions randomize_stack_top() and stack_maxrandom_size().
      
      The successful fix can be tested with:
      $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
      7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
      7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
      7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
      7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
      ...
      
      Once corrected, the leading bytes should be between 7ffc and 7fff, rather
      than always being 7fff.
      
      CVE-2015-1593
      Signed-off-by: default avatarHector Marco-Gisbert <hecmargi@upv.es>
      Signed-off-by: default avatarIsmael Ripoll <iripoll@upv.es>
      [kees: rebase, fix 80 char, clean up commit message, add test example, cve]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ba455d81
    • Andy Lutomirski's avatar
      x86_64, vdso: Fix the vdso address randomization algorithm · 7f36f1ac
      Andy Lutomirski authored
      commit 394f56fe upstream
      
      The theory behind vdso randomization is that it's mapped at a random
      offset above the top of the stack.  To avoid wasting a page of
      memory for an extra page table, the vdso isn't supposed to extend
      past the lowest PMD into which it can fit.  Other than that, the
      address should be a uniformly distributed address that meets all of
      the alignment requirements.
      
      The current algorithm is buggy: the vdso has about a 50% probability
      of being at the very end of a PMD.  The current algorithm also has a
      decent chance of failing outright due to incorrect handling of the
      case where the top of the stack is near the top of its PMD.
      
      This fixes the implementation.  The paxtest estimate of vdso
      "randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
      don't know what the paxtest code is actually calculating.)
      
      It's worth noting that this algorithm is inherently biased: the vdso
      is more likely to end up near the end of its PMD than near the
      beginning.  Ideally we would either nix the PMD sharing requirement
      or jointly randomize the vdso and the stack to reduce the bias.
      
      In the mean time, this is a considerable improvement with basically
      no risk of compatibility issues, since the allowed outputs of the
      algorithm are unchanged.
      
      As an easy test, doing this:
      
      for i in `seq 10000`
        do grep -P vdso /proc/self/maps |cut -d- -f1
      done |sort |uniq -d
      
      used to produce lots of output (1445 lines on my most recent run).
      A tiny subset looks like this:
      
      7fffdfffe000
      7fffe01fe000
      7fffe05fe000
      7fffe07fe000
      7fffe09fe000
      7fffe0bfe000
      7fffe0dfe000
      
      Note the suspicious fe000 endings.  With the fix, I get a much more
      palatable 76 repeated addresses.
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      [bwh: Backported to 2.6.32:
       - The whole file is only built for x86_64; adjust context and comment for this
       - We don't have align_vdso_addr()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7f36f1ac
    • Andy Lutomirski's avatar
      x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit · b23c0b06
      Andy Lutomirski authored
      commit 29fa6825 upstream
      
      paravirt_enabled has the following effects:
      
       - Disables the F00F bug workaround warning.  There is no F00F bug
         workaround any more because Linux's standard IDT handling already
         works around the F00F bug, but the warning still exists.  This
         is only cosmetic, and, in any event, there is no such thing as
         KVM on a CPU with the F00F bug.
      
       - Disables 32-bit APM BIOS detection.  On a KVM paravirt system,
         there should be no APM BIOS anyway.
      
       - Disables tboot.  I think that the tboot code should check the
         CPUID hypervisor bit directly if it matters.
      
       - paravirt_enabled disables espfix32.  espfix32 should *not* be
         disabled under KVM paravirt.
      
      The last point is the purpose of this patch.  It fixes a leak of the
      high 16 bits of the kernel stack address on 32-bit KVM paravirt
      guests.  Fixes CVE-2014-8134.
      Suggested-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      [bwh: Backported to 2.6.32: adjust indentation, context]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b23c0b06
    • Andy Lutomirski's avatar
      x86/tls: Don't validate lm in set_thread_area() after all · c52fdba6
      Andy Lutomirski authored
      commit 3fb2f423 upstream.
      
      It turns out that there's a lurking ABI issue.  GCC, when
      compiling this in a 32-bit program:
      
      struct user_desc desc = {
      	.entry_number    = idx,
      	.base_addr       = base,
      	.limit           = 0xfffff,
      	.seg_32bit       = 1,
      	.contents        = 0, /* Data, grow-up */
      	.read_exec_only  = 0,
      	.limit_in_pages  = 1,
      	.seg_not_present = 0,
      	.useable         = 0,
      };
      
      will leave .lm uninitialized.  This means that anything in the
      kernel that reads user_desc.lm for 32-bit tasks is unreliable.
      
      Revert the .lm check in set_thread_area().  The value never did
      anything in the first place.
      
      Fixes: 0e58af4e ("x86/tls: Disallow unusual TLS segments")
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit c759a579)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c52fdba6
    • Andy Lutomirski's avatar
      x86/tls: Disallow unusual TLS segments · 18cb16aa
      Andy Lutomirski authored
      commit 0e58af4e upstream.
      
      Users have no business installing custom code segments into the
      GDT, and segments that are not present but are otherwise valid
      are a historical source of interesting attacks.
      
      For completeness, block attempts to set the L bit.  (Prior to
      this patch, the L bit would have been silently dropped.)
      
      This is an ABI break.  I've checked glibc, musl, and Wine, and
      none of them look like they'll have any trouble.
      
      Note to stable maintainers: this is a hardening patch that fixes
      no known bugs.  Given the possibility of ABI issues, this
      probably shouldn't be backported quickly.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: security@kernel.org <security@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit fbc3c534)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      18cb16aa
    • Andy Lutomirski's avatar
      x86, tls: Interpret an all-zero struct user_desc as "no segment" · 1f50d3c7
      Andy Lutomirski authored
      commit 3669ef9f upstream.
      
      The Witcher 2 did something like this to allocate a TLS segment index:
      
              struct user_desc u_info;
              bzero(&u_info, sizeof(u_info));
              u_info.entry_number = (uint32_t)-1;
      
              syscall(SYS_set_thread_area, &u_info);
      
      Strictly speaking, this code was never correct.  It should have set
      read_exec_only and seg_not_present to 1 to indicate that it wanted
      to find a free slot without putting anything there, or it should
      have put something sensible in the TLS slot if it wanted to allocate
      a TLS entry for real.  The actual effect of this code was to
      allocate a bogus segment that could be used to exploit espfix.
      
      The set_thread_area hardening patches changed the behavior, causing
      set_thread_area to return -EINVAL and crashing the game.
      
      This changes set_thread_area to interpret this as a request to find
      a free slot and to leave it empty, which isn't *quite* what the game
      expects but should be close enough to keep it working.  In
      particular, using the code above to allocate two segments will
      allocate the same segment both times.
      
      According to FrostbittenKing on Github, this fixes The Witcher 2.
      
      If this somehow still causes problems, we could instead allocate
      a limit==0 32-bit data segment, but that seems rather ugly to me.
      
      Fixes: 41bdc785 x86/tls: Validate TLS entries to protect espfix
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 3175b4cb)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1f50d3c7
    • Andy Lutomirski's avatar
      x86, tls, ldt: Stop checking lm in LDT_empty · 598b6280
      Andy Lutomirski authored
      commit e30ab185 upstream.
      
      32-bit programs don't have an lm bit in their ABI, so they can't
      reliably cause LDT_empty to return true without resorting to memset.
      They shouldn't need to do this.
      
      This should fix a longstanding, if minor, issue in all 64-bit kernels
      as well as a potential regression in the TLS hardening code.
      
      Fixes: 41bdc785 x86/tls: Validate TLS entries to protect espfix
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit f62570cb)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      598b6280
    • Andy Lutomirski's avatar
      x86/tls: Validate TLS entries to protect espfix · 85e01300
      Andy Lutomirski authored
      commit 41bdc785 upstream
      
      Installing a 16-bit RW data segment into the GDT defeats espfix.
      AFAICT this will not affect glibc, Wine, or dosemu at all.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: security@kernel.org <security@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      85e01300
    • Andy Lutomirski's avatar
      x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs · f0d8cc6f
      Andy Lutomirski authored
      commit 7ddc6a21 upstream.
      
      These functions can be executed on the int3 stack, so kprobes
      are dangerous. Tracing is probably a bad idea, too.
      
      Fixes: b645af2d ("x86_64, traps: Rework bad_iret")
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.2:
       - Use __kprobes instead of NOKPROBE_SYMBOL()
       - Don't use __visible]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 8ea4c465)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f0d8cc6f
  2. 13 Dec, 2014 10 commits