1. 23 Nov, 2014 22 commits
    • Andrey Ryabinin's avatar
      net: sendmsg: fix NULL pointer dereference · cf903573
      Andrey Ryabinin authored
      [ Upstream commit 40eea803 ]
      
      Sasha's report:
      	> While fuzzing with trinity inside a KVM tools guest running the latest -next
      	> kernel with the KASAN patchset, I've stumbled on the following spew:
      	>
      	> [ 4448.949424] ==================================================================
      	> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
      	> [ 4448.952988] Read of size 2 by thread T19638:
      	> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
      	> [ 4448.956823]  ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
      	> [ 4448.958233]  ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
      	> [ 4448.959552]  0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
      	> [ 4448.961266] Call Trace:
      	> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
      	> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
      	> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
      	> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
      	> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
      	> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
      	> [ 4448.970103] sock_sendmsg (net/socket.c:654)
      	> [ 4448.971584] ? might_fault (mm/memory.c:3741)
      	> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
      	> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
      	> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
      	> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
      	> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
      	> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
      	> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
      	> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
      	> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
      	> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
      	> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
      	> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
      	> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
      	> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
      	> [ 4448.988929] ==================================================================
      
      This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.
      
      After this report there was no usual "Unable to handle kernel NULL pointer dereference"
      and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.
      
      This bug was introduced in f3d33426
      (net: rework recvmsg handler msg_name and msg_namelen logic).
      Commit message states that:
      	"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      	 non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      	 affect sendto as it would bail out earlier while trying to copy-in the
      	 address."
      But in fact this affects sendto when address 0 is mapped and contains
      socket address structure in it. In such case copy-in address will succeed,
      verify_iovec() function will successfully exit with msg->msg_namelen > 0
      and msg->msg_name == NULL.
      
      This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      cf903573
    • Daniel Borkmann's avatar
      net: sctp: inherit auth_capable on INIT collisions · ebec0c65
      Daniel Borkmann authored
      [ Upstream commit 1be9a950 ]
      
      Jason reported an oops caused by SCTP on his ARM machine with
      SCTP authentication enabled:
      
      Internal error: Oops: 17 [#1] ARM
      CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1
      task: c6eefa40 ti: c6f52000 task.ti: c6f52000
      PC is at sctp_auth_calculate_hmac+0xc4/0x10c
      LR is at sg_init_table+0x20/0x38
      pc : [<c024bb80>]    lr : [<c00f32dc>]    psr: 40000013
      sp : c6f538e8  ip : 00000000  fp : c6f53924
      r10: c6f50d80  r9 : 00000000  r8 : 00010000
      r7 : 00000000  r6 : c7be4000  r5 : 00000000  r4 : c6f56254
      r3 : c00c8170  r2 : 00000001  r1 : 00000008  r0 : c6f1e660
      Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 0005397f  Table: 06f28000  DAC: 00000015
      Process sctp-test (pid: 104, stack limit = 0xc6f521c0)
      Stack: (0xc6f538e8 to 0xc6f54000)
      [...]
      Backtrace:
      [<c024babc>] (sctp_auth_calculate_hmac+0x0/0x10c) from [<c0249af8>] (sctp_packet_transmit+0x33c/0x5c8)
      [<c02497bc>] (sctp_packet_transmit+0x0/0x5c8) from [<c023e96c>] (sctp_outq_flush+0x7fc/0x844)
      [<c023e170>] (sctp_outq_flush+0x0/0x844) from [<c023ef78>] (sctp_outq_uncork+0x24/0x28)
      [<c023ef54>] (sctp_outq_uncork+0x0/0x28) from [<c0234364>] (sctp_side_effects+0x1134/0x1220)
      [<c0233230>] (sctp_side_effects+0x0/0x1220) from [<c02330b0>] (sctp_do_sm+0xac/0xd4)
      [<c0233004>] (sctp_do_sm+0x0/0xd4) from [<c023675c>] (sctp_assoc_bh_rcv+0x118/0x160)
      [<c0236644>] (sctp_assoc_bh_rcv+0x0/0x160) from [<c023d5bc>] (sctp_inq_push+0x6c/0x74)
      [<c023d550>] (sctp_inq_push+0x0/0x74) from [<c024a6b0>] (sctp_rcv+0x7d8/0x888)
      
      While we already had various kind of bugs in that area
      ec0223ec ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if
      we/peer is AUTH capable") and b14878cc ("net: sctp: cache
      auth_enable per endpoint"), this one is a bit of a different
      kind.
      
      Giving a bit more background on why SCTP authentication is
      needed can be found in RFC4895:
      
        SCTP uses 32-bit verification tags to protect itself against
        blind attackers. These values are not changed during the
        lifetime of an SCTP association.
      
        Looking at new SCTP extensions, there is the need to have a
        method of proving that an SCTP chunk(s) was really sent by
        the original peer that started the association and not by a
        malicious attacker.
      
      To cause this bug, we're triggering an INIT collision between
      peers; normal SCTP handshake where both sides intent to
      authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO
      parameters that are being negotiated among peers:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
      
      RFC4895 says that each endpoint therefore knows its own random
      number and the peer's random number *after* the association
      has been established. The local and peer's random number along
      with the shared key are then part of the secret used for
      calculating the HMAC in the AUTH chunk.
      
      Now, in our scenario, we have 2 threads with 1 non-blocking
      SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY
      and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling
      sctp_bindx(3), listen(2) and connect(2) against each other,
      thus the handshake looks similar to this, e.g.:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] -----------
        -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] -------->
        ...
      
      Since such collisions can also happen with verification tags,
      the RFC4895 for AUTH rather vaguely says under section 6.1:
      
        In case of INIT collision, the rules governing the handling
        of this Random Number follow the same pattern as those for
        the Verification Tag, as explained in Section 5.2.4 of
        RFC 2960 [5]. Therefore, each endpoint knows its own Random
        Number and the peer's Random Number after the association
        has been established.
      
      In RFC2960, section 5.2.4, we're eventually hitting Action B:
      
        B) In this case, both sides may be attempting to start an
           association at about the same time but the peer endpoint
           started its INIT after responding to the local endpoint's
           INIT. Thus it may have picked a new Verification Tag not
           being aware of the previous Tag it had sent this endpoint.
           The endpoint should stay in or enter the ESTABLISHED
           state but it MUST update its peer's Verification Tag from
           the State Cookie, stop any init or cookie timers that may
           running and send a COOKIE ACK.
      
      In other words, the handling of the Random parameter is the
      same as behavior for the Verification Tag as described in
      Action B of section 5.2.4.
      
      Looking at the code, we exactly hit the sctp_sf_do_dupcook_b()
      case which triggers an SCTP_CMD_UPDATE_ASSOC command to the
      side effect interpreter, and in fact it properly copies over
      peer_{random, hmacs, chunks} parameters from the newly created
      association to update the existing one.
      
      Also, the old asoc_shared_key is being released and based on
      the new params, sctp_auth_asoc_init_active_key() updated.
      However, the issue observed in this case is that the previous
      asoc->peer.auth_capable was 0, and has *not* been updated, so
      that instead of creating a new secret, we're doing an early
      return from the function sctp_auth_asoc_init_active_key()
      leaving asoc->asoc_shared_key as NULL. However, we now have to
      authenticate chunks from the updated chunk list (e.g. COOKIE-ACK).
      
      That in fact causes the server side when responding with ...
      
        <------------------ AUTH; COOKIE-ACK -----------------
      
      ... to trigger a NULL pointer dereference, since in
      sctp_packet_transmit(), it discovers that an AUTH chunk is
      being queued for xmit, and thus it calls sctp_auth_calculate_hmac().
      
      Since the asoc->active_key_id is still inherited from the
      endpoint, and the same as encoded into the chunk, it uses
      asoc->asoc_shared_key, which is still NULL, as an asoc_key
      and dereferences it in ...
      
        crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len)
      
      ... causing an oops. All this happens because sctp_make_cookie_ack()
      called with the *new* association has the peer.auth_capable=1
      and therefore marks the chunk with auth=1 after checking
      sctp_auth_send_cid(), but it is *actually* sent later on over
      the then *updated* association's transport that didn't initialize
      its shared key due to peer.auth_capable=0. Since control chunks
      in that case are not sent by the temporary association which
      are scheduled for deletion, they are issued for xmit via
      SCTP_CMD_REPLY in the interpreter with the context of the
      *updated* association. peer.auth_capable was 0 in the updated
      association (which went from COOKIE_WAIT into ESTABLISHED state),
      since all previous processing that performed sctp_process_init()
      was being done on temporary associations, that we eventually
      throw away each time.
      
      The correct fix is to update to the new peer.auth_capable
      value as well in the collision case via sctp_assoc_update(),
      so that in case the collision migrated from 0 -> 1,
      sctp_auth_asoc_init_active_key() can properly recalculate
      the secret. This therefore fixes the observed server panic.
      
      Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
      Reported-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Tested-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ebec0c65
    • Eric Dumazet's avatar
      ipv4: fix buffer overflow in ip_options_compile() · 696a6366
      Eric Dumazet authored
      [ Upstream commit 10ec9472 ]
      
      There is a benign buffer overflow in ip_options_compile spotted by
      AddressSanitizer[1] :
      
      Its benign because we always can access one extra byte in skb->head
      (because header is followed by struct skb_shared_info), and in this case
      this byte is not even used.
      
      [28504.910798] ==================================================================
      [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
      [28504.913170] Read of size 1 by thread T15843:
      [28504.914026]  [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0
      [28504.915394]  [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120
      [28504.916843]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
      [28504.918175]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
      [28504.919490]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
      [28504.920835]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
      [28504.922208]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
      [28504.923459]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
      [28504.924722]
      [28504.925106] Allocated by thread T15843:
      [28504.925815]  [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120
      [28504.926884]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
      [28504.927975]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
      [28504.929175]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
      [28504.930400]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
      [28504.931677]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
      [28504.932851]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
      [28504.934018]
      [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
      [28504.934377]  of 40-byte region [ffff880026382800, ffff880026382828)
      [28504.937144]
      [28504.937474] Memory state around the buggy address:
      [28504.938430]  ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.939884]  ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.941294]  ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.942504]  ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.943483]  ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.945573]                         ^
      [28504.946277]  ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.094949]  ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.096114]  ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.097116]  ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.098472]  ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.099804] Legend:
      [28505.100269]  f - 8 freed bytes
      [28505.100884]  r - 8 redzone bytes
      [28505.101649]  . - 8 allocated bytes
      [28505.102406]  x=1..7 - x allocated bytes + (8-x) redzone bytes
      [28505.103637] ==================================================================
      
      [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernelSigned-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      696a6366
    • Sowmini Varadhan's avatar
      sunvnet: clean up objects created in vnet_new() on vnet_exit() · 19a0d61e
      Sowmini Varadhan authored
      [ Upstream commit a4b70a07 ]
      
      Nothing cleans up the objects created by
      vnet_new(), they are completely leaked.
      
      vnet_exit(), after doing the vio_unregister_driver() to clean
      up ports, should call a helper function that iterates over vnet_list
      and cleans up those objects. This includes unregister_netdevice()
      as well as free_netdev().
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Reviewed-by: default avatarKarl Volz <karl.volz@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      19a0d61e
    • Daniel Borkmann's avatar
      net: sctp: fix information leaks in ulpevent layer · ef94866e
      Daniel Borkmann authored
      [ Upstream commit 8f2e5ae4 ]
      
      While working on some other SCTP code, I noticed that some
      structures shared with user space are leaking uninitialized
      stack or heap buffer. In particular, struct sctp_sndrcvinfo
      has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
      remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
      putting this into cmsg. But also struct sctp_remote_error
      contains a 2 bytes hole that we don't fill but place into a skb
      through skb_copy_expand() via sctp_ulpevent_make_remote_error().
      
      Both structures are defined by the IETF in RFC6458:
      
      * Section 5.3.2. SCTP Header Information Structure:
      
        The sctp_sndrcvinfo structure is defined below:
      
        struct sctp_sndrcvinfo {
          uint16_t sinfo_stream;
          uint16_t sinfo_ssn;
          uint16_t sinfo_flags;
          <-- 2 bytes hole  -->
          uint32_t sinfo_ppid;
          uint32_t sinfo_context;
          uint32_t sinfo_timetolive;
          uint32_t sinfo_tsn;
          uint32_t sinfo_cumtsn;
          sctp_assoc_t sinfo_assoc_id;
        };
      
      * 6.1.3. SCTP_REMOTE_ERROR:
      
        A remote peer may send an Operation Error message to its peer.
        This message indicates a variety of error conditions on an
        association. The entire ERROR chunk as it appears on the wire
        is included in an SCTP_REMOTE_ERROR event. Please refer to the
        SCTP specification [RFC4960] and any extensions for a list of
        possible error formats. An SCTP error notification has the
        following format:
      
        struct sctp_remote_error {
          uint16_t sre_type;
          uint16_t sre_flags;
          uint32_t sre_length;
          uint16_t sre_error;
          <-- 2 bytes hole  -->
          sctp_assoc_t sre_assoc_id;
          uint8_t  sre_data[];
        };
      
      Fix this by setting both to 0 before filling them out. We also
      have other structures shared between user and kernel space in
      SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
      copy that buffer over from user space first and thus don't need
      to care about it in that cases.
      
      While at it, we can also remove lengthy comments copied from
      the draft, instead, we update the comment with the correct RFC
      number where one can look it up.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ef94866e
    • Andrey Utkin's avatar
      appletalk: Fix socket referencing in skb · a8a0b4eb
      Andrey Utkin authored
      [ Upstream commit 36beddc2 ]
      
      Setting just skb->sk without taking its reference and setting a
      destructor is invalid. However, in the places where this was done, skb
      is used in a way not requiring skb->sk setting. So dropping the setting
      of skb->sk.
      Thanks to Eric Dumazet <eric.dumazet@gmail.com> for correct solution.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441Reported-by: default avatarEd Martin <edman007@edman007.com>
      Signed-off-by: default avatarAndrey Utkin <andrey.krieger.utkin@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a8a0b4eb
    • dingtianhong's avatar
      igmp: fix the problem when mc leave group · 540e9ab0
      dingtianhong authored
      [ Upstream commit 52ad353a ]
      
      The problem was triggered by these steps:
      
      1) create socket, bind and then setsockopt for add mc group.
         mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
         mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
         setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));
      
      2) drop the mc group for this socket.
         mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
         mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
         setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));
      
      3) and then drop the socket, I found the mc group was still used by the dev:
      
         netstat -g
      
         Interface       RefCnt Group
         --------------- ------ ---------------------
         eth2		   1	  255.0.0.37
      
      Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
      to be released for the netdev when drop the socket, but this process was broken when
      route default is NULL, the reason is that:
      
      The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
      is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
      then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
      the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
      released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
      when dropping the socket, the mc_list is NULL and the dev still keep this group.
      
      v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
      	so I add the checking for the in_dev before polling the mc_list, make sure when
      	we remove the mc group, dec the refcnt to the real dev which was using the mc address.
      	The problem would never happened again.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      540e9ab0
    • Neal Cardwell's avatar
      tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb · 87e0f9d6
      Neal Cardwell authored
      [ Upstream commit 2cd0d743 ]
      
      If there is an MSS change (or misbehaving receiver) that causes a SACK
      to arrive that covers the end of an skb but is less than one MSS, then
      tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
      the skb ("Round if necessary..."), then chopping all bytes off the skb
      and creating a zero-byte skb in the write queue.
      
      This was visible now because the recently simplified TLP logic in
      bef1909e ("tcp: fixing TLP's FIN recovery") could find that 0-byte
      skb at the end of the write queue, and now that we do not check that
      skb's length we could send it as a TLP probe.
      
      Consider the following example scenario:
      
       mss: 1000
       skb: seq: 0 end_seq: 4000  len: 4000
       SACK: start_seq: 3999 end_seq: 4000
      
      The tcp_match_skb_to_sack() code will compute:
      
       in_sack = false
       pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
       new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
       new_len += mss = 4000
      
      Previously we would find the new_len > skb->len check failing, so we
      would fall through and set pkt_len = new_len = 4000 and chop off
      pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
      afterward in the write queue.
      
      With this new commit, we notice that the new new_len >= skb->len check
      succeeds, so that we return without trying to fragment.
      
      Fixes: adb92db8 ("tcp: Make SACK code to split only at mss boundaries")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      87e0f9d6
    • Mikulas Patocka's avatar
      sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue · c7008cf7
      Mikulas Patocka authored
      Hi
      
      This is backport of commit fd1232b2. It is
      suitable for all stable branches up to and including 3.14.*
      
      Mikulas
      
      commit fd1232b2
      Author: Mikulas Patocka <mpatocka@redhat.com>
      Date:   Tue Apr 8 21:52:05 2014 -0400
      
          sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue
      
          This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
          returns QUEUE FULL status.
      
          When the controller encounters an error (including QUEUE FULL or BUSY
          status), it aborts all not yet submitted requests in the function
          sym_dequeue_from_squeue.
      
          This function aborts them with DID_SOFT_ERROR.
      
          If the disk has full tag queue, the request that caused the overflow is
          aborted with QUEUE FULL status (and the scsi midlayer properly retries
          it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
          the following requests with DID_SOFT_ERROR --- for them, the midlayer
          does just a few retries and then signals the error up to sd.
      
          The result is that disk returning QUEUE FULL causes request failures.
      
          The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
          (rebranded ST336607LC) with command queue 48 or 64 tags.  The disk has
          64 tags, but under some access patterns it return QUEUE FULL when there
          are less than 64 pending tags.  The SCSI specification allows returning
          QUEUE FULL anytime and it is up to the host to retry.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
          Cc: Matthew Wilcox <matthew@wil.cx>
          Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c7008cf7
    • Tejun Heo's avatar
      ptrace,x86: force IRET path after a ptrace_stop() · a3b45048
      Tejun Heo authored
      [ Upstream commit b9cd18de ]
      
      The 'sysret' fastpath does not correctly restore even all regular
      registers, much less any segment registers or reflags values.  That is
      very much part of why it's faster than 'iret'.
      
      Normally that isn't a problem, because the normal ptrace() interface
      catches the process using the signal handler infrastructure, which
      always returns with an iret.
      
      However, some paths can get caught using ptrace_event() instead of the
      signal path, and for those we need to make sure that we aren't going to
      return to user space using 'sysret'.  Otherwise the modifications that
      may have been done to the register set by the tracer wouldn't
      necessarily take effect.
      
      Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
      arch_ptrace_stop_needed() which is invoked from ptrace_stop().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [wt: fixes CVE-2014-4699]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a3b45048
    • Lars-Peter Clausen's avatar
      ALSA: control: Protect user controls against concurrent access · 883f30e7
      Lars-Peter Clausen authored
      [ Upstream commit 07f4d9d7 ]
      
      The user-control put and get handlers as well as the tlv do not protect against
      concurrent access from multiple threads. Since the state of the control is not
      updated atomically it is possible that either two write operations or a write
      and a read operation race against each other. Both can lead to arbitrary memory
      disclosure. This patch introduces a new lock that protects user-controls from
      concurrent access. Since applications typically access controls sequentially
      than in parallel a single lock per card should be fine.
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Acked-by: default avatarJaroslav Kysela <perex@perex.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [wt: fixes CVE-2014-4652]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      883f30e7
    • Mathias Krause's avatar
      filter: prevent nla extensions to peek beyond the end of the message · cd684d88
      Mathias Krause authored
      [ Upstream commit 05ab8f26 ]
      
      The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check
      for a minimal message length before testing the supplied offset to be
      within the bounds of the message. This allows the subtraction of the nla
      header to underflow and therefore -- as the data type is unsigned --
      allowing far to big offset and length values for the search of the
      netlink attribute.
      
      The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is
      also wrong. It has the minuend and subtrahend mixed up, therefore
      calculates a huge length value, allowing to overrun the end of the
      message while looking for the netlink attribute.
      
      The following three BPF snippets will trigger the bugs when attached to
      a UNIX datagram socket and parsing a message with length 1, 2 or 3.
      
       ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]--
       | ld	#0x87654321
       | ldx	#42
       | ld	#nla
       | ret	a
       `---
      
       ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]--
       | ld	#0x87654321
       | ldx	#42
       | ld	#nlan
       | ret	a
       `---
      
       ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]--
       | ; (needs a fake netlink header at offset 0)
       | ld	#0
       | ldx	#42
       | ld	#nlan
       | ret	a
       `---
      
      Fix the first issue by ensuring the message length fulfills the minimal
      size constrains of a nla header. Fix the second bug by getting the math
      for the remainder calculation right.
      
      Fixes: 4738c1db ("[SKFILTER]: Add SKF_ADF_NLATTR instruction")
      Fixes: d214c753 ("filter: add SKF_AD_NLATTR_NEST to look for nested..")
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [geissert: backport to 2.6.32]
      [wt: fixes CVE-2014-3144]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      cd684d88
    • Vlastimil Babka's avatar
      mm: try_to_unmap_cluster() should lock_page() before mlocking · 767270fc
      Vlastimil Babka authored
      commit 68f662b89de39aae5f43ecd3fbc6ea120d0a47cf upstream
      
      A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin
      fuzzing with trinity.  The call site try_to_unmap_cluster() does not lock
      the pages other than its check_page parameter (which is already locked).
      
      The BUG_ON in mlock_vma_page() is not documented and its purpose is
      somewhat unclear, but apparently it serializes against page migration,
      which could otherwise fail to transfer the PG_mlocked flag.  This would
      not be fatal, as the page would be eventually encountered again, but
      NR_MLOCK accounting would become distorted nevertheless.  This patch adds
      a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that
      effect.
      
      The call site try_to_unmap_cluster() is fixed so that for page !=
      check_page, trylock_page() is attempted (to avoid possible deadlocks as we
      already have check_page locked) and mlock_vma_page() is performed only
      upon success.  If the page lock cannot be obtained, the page is left
      without PG_mlocked, which is again not a problem in the whole unevictable
      memory design.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (back ported from commit 57e68e9c)
      CVE-2014-3122
      BugLink: http://bugs.launchpad.net/bugs/1316268Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Acked-by: default avatarAndy Whitcroft <andy.whitcroft@canonical.com>
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      767270fc
    • Xufeng Zhang's avatar
      sctp: Fix sk_ack_backlog wrap-around problem · 24866e75
      Xufeng Zhang authored
      Consider the scenario:
      For a TCP-style socket, while processing the COOKIE_ECHO chunk in
      sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
      a new association would be created in sctp_unpack_cookie(), but afterwards,
      some processing maybe failed, and sctp_association_free() will be called to
      free the previously allocated association, in sctp_association_free(),
      sk_ack_backlog value is decremented for this socket, since the initial
      value for sk_ack_backlog is 0, after the decrement, it will be 65535,
      a wrap-around problem happens, and if we want to establish new associations
      afterward in the same socket, ABORT would be triggered since sctp deem the
      accept queue as full.
      Fix this issue by only decrementing sk_ack_backlog for associations in
      the endpoint's list.
      Fix-suggested-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarXufeng Zhang <xufeng.zhang@windriver.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit d3217b15)
      [wt: fixes CVE-2014-4667]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      24866e75
    • Lars-Peter Clausen's avatar
      ALSA: control: Handle numid overflow · 19961572
      Lars-Peter Clausen authored
      Each control gets automatically assigned its numids when the control is created.
      The allocation is done by incrementing the numid by the amount of allocated
      numids per allocation. This means that excessive creation and destruction of
      controls (e.g. via SNDRV_CTL_IOCTL_ELEM_ADD/REMOVE) can cause the id to
      eventually overflow. Currently when this happens for the control that caused the
      overflow kctl->id.numid + kctl->count will also over flow causing it to be
      smaller than kctl->id.numid. Most of the code assumes that this is something
      that can not happen, so we need to make sure that it won't happen
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Acked-by: default avatarJaroslav Kysela <perex@perex.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      (cherry picked from commit ac902c11)
      [wt: part 2 of CVE-2014-4656]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      19961572
    • Lars-Peter Clausen's avatar
      ALSA: control: Make sure that id->index does not overflow · 365c843b
      Lars-Peter Clausen authored
      The ALSA control code expects that the range of assigned indices to a control is
      continuous and does not overflow. Currently there are no checks to enforce this.
      If a control with a overflowing index range is created that control becomes
      effectively inaccessible and unremovable since snd_ctl_find_id() will not be
      able to find it. This patch adds a check that makes sure that controls with a
      overflowing index range can not be created.
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Acked-by: default avatarJaroslav Kysela <perex@perex.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      (cherry picked from commit 883a1d49)
      [wt: part 1 of CVE-2014-4656 fix]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      365c843b
    • Al Viro's avatar
      fix autofs/afs/etc. magic mountpoint breakage · 9ff558c6
      Al Viro authored
      We end up trying to kfree() nd.last.name on open("/mnt/tmp", O_CREAT)
      if /mnt/tmp is an autofs direct mount.  The reason is that nd.last_type
      is bogus here; we want LAST_BIND for everything of that kind and we
      get LAST_NORM left over from finding parent directory.
      
      So make sure that it *is* set properly; set to LAST_BIND before
      doing ->follow_link() - for normal symlinks it will be changed
      by __vfs_follow_link() and everything else needs it set that way.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      (cherry picked from commit 86acdca1)
      [wt: fixes CVE-2014-0203]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9ff558c6
    • Markos Chandras's avatar
      MIPS: asm: thread_info: Add _TIF_SECCOMP flag · e24be1e3
      Markos Chandras authored
      Add _TIF_SECCOMP flag to _TIF_WORK_SYSCALL_ENTRY to indicate
      that the system call needs to be checked against a seccomp filter.
      Signed-off-by: default avatarMarkos Chandras <markos.chandras@imgtec.com>
      Reviewed-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/6405/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      (cherry picked from commit 137f7df8)
      [wt: fixes CVE-2014-4157 - no _TIF_NOHZ nor _TIF_SYSCALL_TRACEPOINT in 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e24be1e3
    • Ralf Baechle's avatar
      MIPS: Cleanup flags in syscall flags handlers. · 10283345
      Ralf Baechle authored
      This will simplify further modifications.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      (cherry picked from commit e7f3b48a)
      [wt: this patch is only there to ease integration of next one]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      10283345
    • Stefan Bader's avatar
      x86_32, entry: Clean up sysenter_badsys declaration · 74f03fdd
      Stefan Bader authored
      commit 554086d8 "x86_32, entry: Do syscall exit work on badsys
      (CVE-2014-4508)" introduced a new jump label (sysenter_badsys) but
      somehow the END statements seem to have gone wrong (at least it
      feels that way to me).
      This does not seem to be a fatal problem, but just for the sake
      of symmetry, change the second syscall_badsys to sysenter_badsys.
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Link: http://lkml.kernel.org/r/1408093066-31021-1-git-send-email-stefan.bader@canonical.comAcked-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      (cherry picked from commit fb21b84e)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      74f03fdd
    • Sven Wegener's avatar
      x86_32, entry: Store badsys error code in %eax · d0790236
      Sven Wegener authored
      Commit 554086d8 ("x86_32, entry: Do syscall exit work on badsys
      (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
      code, resulting in syscall() not returning proper errors for undefined
      syscalls on CPUs supporting the sysenter feature.
      
      The following code:
      
      > int result = syscall(666);
      > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));
      
      results in:
      
      > result=666 errno=0 error=Success
      
      Obviously, the syscall return value is the called syscall number, but it
      should have been an ENOSYS error. When run under ptrace it behaves
      correctly, which makes it hard to debug in the wild:
      
      > result=-1 errno=38 error=Function not implemented
      
      The %eax register is the return value register. For debugging via ptrace
      the syscall entry code stores the complete register context on the
      stack. The badsys handlers only store the ENOSYS error code in the
      ptrace register set and do not set %eax like a regular syscall handler
      would. The old resume_userspace call chain contains code that clobbers
      %eax and it restores %eax from the ptrace registers afterwards. The same
      goes for the ptrace-enabled call chain. When ptrace is not used, the
      syscall return value is the passed-in syscall number from the untouched
      %eax register.
      
      Use %eax as the return value register in syscall_badsys and
      sysenter_badsys, like a real syscall handler does, and have the caller
      push the value onto the stack for ptrace access.
      Signed-off-by: default avatarSven Wegener <sven.wegener@stealer.net>
      Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.netReviewed-and-tested-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      (cherry picked from commit 8142b215)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d0790236
    • Andy Lutomirski's avatar
      x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508) · 561ca83b
      Andy Lutomirski authored
      The bad syscall nr paths are their own incomprehensible route
      through the entry control flow.  Rearrange them to work just like
      syscalls that return -ENOSYS.
      
      This fixes an OOPS in the audit code when fast-path auditing is
      enabled and sysenter gets a bad syscall nr (CVE-2014-4508).
      
      This has probably been broken since Linux 2.6.27:
      af0575bb i386 syscall audit fast-path
      
      Cc: stable@vger.kernel.org
      Cc: Roland McGrath <roland@redhat.com>
      Reported-by: default avatarToralf Frster <toralf.foerster@gmx.de>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.netSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      (cherry picked from commit 554086d8)
      [WT: this fix is incorrect and requires the two following patches]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      561ca83b
  2. 18 Jun, 2014 10 commits
    • Willy Tarreau's avatar
      Linux 2.6.32.63 · 02560982
      Willy Tarreau authored
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      02560982
    • Willy Tarreau's avatar
      net: fix regression introduced in 2.6.32.62 by sysctl fixes · b90422de
      Willy Tarreau authored
      Commits b7c9e4ee ("sysctl net: Keep tcp_syn_retries inside the boundary")
      and eedcafdc ("net: check net.core.somaxconn sysctl values") were missing
      a .strategy entry which is still required in 2.6.32. Because of this, the
      Ubuntu kernel team has faced kernel dumps during their testing.
      
      Tyler Hicks and Luis Henriques proposed this patch to fix the issue,
      which properly sets .strategy as needed in 2.6.32.
      Reported-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Cc: tyler.hicks@canonical.com
      Cc: Michal Tesar <mtesar@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b90422de
    • Andy Lutomirski's avatar
      auditsc: audit_krule mask accesses need bounds checking · a986d9c3
      Andy Lutomirski authored
      Fixes an easy DoS and possible information disclosure.
      
      This does nothing about the broken state of x32 auditing.
      
      eparis: If the admin has enabled auditd and has specifically loaded
      audit rules.  This bug has been around since before git.  Wow...
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit a3c54931)
      [wt: no audit_filter_inode_name(), applied to audit_filter_inodes() instead]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a986d9c3
    • Thomas Gleixner's avatar
      futex: Make lookup_pi_state more robust · 07bc7fb4
      Thomas Gleixner authored
      The current implementation of lookup_pi_state has ambigous handling of
      the TID value 0 in the user space futex.  We can get into the kernel
      even if the TID value is 0, because either there is a stale waiters bit
      or the owner died bit is set or we are called from the requeue_pi path
      or from user space just for fun.
      
      The current code avoids an explicit sanity check for pid = 0 in case
      that kernel internal state (waiters) are found for the user space
      address.  This can lead to state leakage and worse under some
      circumstances.
      
      Handle the cases explicit:
      
             Waiter | pi_state | pi->owner | uTID      | uODIED | ?
      
        [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
        [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid
      
        [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid
      
        [4]  Found  | Found    | NULL      | 0         | 1      | Valid
        [5]  Found  | Found    | NULL      | >0        | 1      | Invalid
      
        [6]  Found  | Found    | task      | 0         | 1      | Valid
      
        [7]  Found  | Found    | NULL      | Any       | 0      | Invalid
      
        [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
        [9]  Found  | Found    | task      | 0         | 0      | Invalid
        [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid
      
       [1] Indicates that the kernel can acquire the futex atomically. We
           came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.
      
       [2] Valid, if TID does not belong to a kernel thread. If no matching
           thread is found then it indicates that the owner TID has died.
      
       [3] Invalid. The waiter is queued on a non PI futex
      
       [4] Valid state after exit_robust_list(), which sets the user space
           value to FUTEX_WAITERS | FUTEX_OWNER_DIED.
      
       [5] The user space value got manipulated between exit_robust_list()
           and exit_pi_state_list()
      
       [6] Valid state after exit_pi_state_list() which sets the new owner in
           the pi_state but cannot access the user space value.
      
       [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.
      
       [8] Owner and user space value match
      
       [9] There is no transient state which sets the user space TID to 0
           except exit_robust_list(), but this is indicated by the
           FUTEX_OWNER_DIED bit. See [4]
      
      [10] There is no transient state which leaves owner and user space
           TID out of sync.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit 54a21788)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      07bc7fb4
    • Thomas Gleixner's avatar
      futex: Always cleanup owner tid in unlock_pi · 5622af0b
      Thomas Gleixner authored
      If the owner died bit is set at futex_unlock_pi, we currently do not
      cleanup the user space futex.  So the owner TID of the current owner
      (the unlocker) persists.  That's observable inconsistant state,
      especially when the ownership of the pi state got transferred.
      
      Clean it up unconditionally.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit 13fbca4c)
      [wt: adjusted context - cmpxchg_futex_value_locked() takes 3 args]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5622af0b
    • Thomas Gleixner's avatar
      futex: Validate atomic acquisition in futex_lock_pi_atomic() · 523b5a51
      Thomas Gleixner authored
      We need to protect the atomic acquisition in the kernel against rogue
      user space which sets the user space futex to 0, so the kernel side
      acquisition succeeds while there is existing state in the kernel
      associated to the real owner.
      
      Verify whether the futex has waiters associated with kernel state.  If
      it has, return -EINVAL.  The state is corrupted already, so no point in
      cleaning it up.  Subsequent calls will fail as well.  Not our problem.
      
      [ tglx: Use futex_top_waiter() and explain why we do not need to try
        	restoring the already corrupted user space state. ]
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit b3eaa9fc)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      523b5a51
    • Thomas Gleixner's avatar
      futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in... · 61b24122
      Thomas Gleixner authored
      futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
      
      If uaddr == uaddr2, then we have broken the rule of only requeueing from
      a non-pi futex to a pi futex with this call.  If we attempt this, then
      dangling pointers may be left for rt_waiter resulting in an exploitable
      condition.
      
      This change brings futex_requeue() in line with futex_wait_requeue_pi()
      which performs the same check as per commit 6f7b0a2a ("futex: Forbid
      uaddr == uaddr2 in futex_wait_requeue_pi()")
      
      [ tglx: Compare the resulting keys as well, as uaddrs might be
        	different depending on the mapping ]
      
      Fixes CVE-2014-3153.
      
      Reported-by: Pinkie Pie
      Signed-off-by: default avatarWill Drewry <wad@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit e9c243a5)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      61b24122
    • Thomas Gleixner's avatar
      futex: Prevent attaching to kernel threads · 129af2db
      Thomas Gleixner authored
      We happily allow userspace to declare a random kernel thread to be the
      owner of a user space PI futex.
      
      Found while analysing the fallout of Dave Jones syscall fuzzer.
      
      We also should validate the thread group for private futexes and find
      some fast way to validate whether the "alleged" owner has RW access on
      the file which backs the SHM, but that's a separate issue.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Darren Hart <darren@dvhart.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Carlos ODonell <carlos@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      (cherry picked from commit f0d71b3d)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      129af2db
    • Thomas Gleixner's avatar
      futex: Add another early deadlock detection check · 81ce5f0c
      Thomas Gleixner authored
      Dave Jones trinity syscall fuzzer exposed an issue in the deadlock
      detection code of rtmutex:
        http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com
      
      That underlying issue has been fixed with a patch to the rtmutex code,
      but the futex code must not call into rtmutex in that case because
          - it can detect that issue early
          - it avoids a different and more complex fixup for backing out
      
      If the user space variable got manipulated to 0x80000000 which means
      no lock holder, but the waiters bit set and an active pi_state in the
      kernel is found we can figure out the recursive locking issue by
      looking at the pi_state owner. If that is the current task, then we
      can safely return -EDEADLK.
      
      The check should have been added in commit 59fa6245 (futex: Handle
      futex_pi OWNER_DIED take over correctly) already, but I did not see
      the above issue caused by user space manipulation back then.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Darren Hart <darren@dvhart.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Carlos ODonell <carlos@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      (cherry picked from commit 866293ee)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      81ce5f0c
    • Ben Hutchings's avatar
      ethtool: Report link-down while interface is down · d100b274
      Ben Hutchings authored
      While an interface is down, many implementations of
      ethtool_ops::get_link, including the default, ethtool_op_get_link(),
      will report the last link state seen while the interface was up.  In
      general the current physical link state is not available if the
      interface is down.
      
      Define ETHTOOL_GLINK to reflect whether the interface *and* any
      physical port have a working link, and consistently return 0 when the
      interface is down.
      Signed-off-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit e596e6e4)
      Cc: Wang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d100b274
  3. 19 May, 2014 8 commits
    • Willy Tarreau's avatar
      Linux 2.6.32.62 · 7f62be2f
      Willy Tarreau authored
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7f62be2f
    • Matthew Daley's avatar
      floppy: don't write kernel-only members to FDRAWCMD ioctl output · f0fa6c03
      Matthew Daley authored
      Do not leak kernel-only floppy_raw_cmd structure members to userspace.
      This includes the linked-list pointer and the pointer to the allocated
      DMA space.
      Signed-off-by: default avatarMatthew Daley <mattd@bugfuzz.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit 2145e15e)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f0fa6c03
    • Matthew Daley's avatar
      floppy: ignore kernel-only members in FDRAWCMD ioctl input · f76b441b
      Matthew Daley authored
      Always clear out these floppy_raw_cmd struct members after copying the
      entire structure from userspace so that the in-kernel version is always
      valid and never left in an interdeterminate state.
      Signed-off-by: default avatarMatthew Daley <mattd@bugfuzz.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit ef87dbe7)
      [wt: be careful in 2.6.32 we still have the ugly macros everywhere]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f76b441b
    • Daniel Borkmann's avatar
      netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages · d751df64
      Daniel Borkmann authored
      commit b22f5126 upstream
      
      Some occurences in the netfilter tree use skb_header_pointer() in
      the following way ...
      
        struct dccp_hdr _dh, *dh;
        ...
        skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
      
      ... where dh itself is a pointer that is being passed as the copy
      buffer. Instead, we need to use &_dh as the forth argument so that
      we're copying the data into an actual buffer that sits on the stack.
      
      Currently, we probably could overwrite memory on the stack (e.g.
      with a possibly mal-formed DCCP packet), but unintentionally, as
      we only want the buffer to be placed into _dh variable.
      
      Fixes: 2bc78049 ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d751df64
    • Martin Schwidefsky's avatar
      s390: fix kernel crash due to linkage stack instructions · 0e37e30a
      Martin Schwidefsky authored
      commit 8d7f6690 upstream
      
      The kernel currently crashes with a low-address-protection exception
      if a user space process executes an instruction that tries to use the
      linkage stack. Set the base-ASTE origin and the subspace-ASTE origin
      of the dispatchable-unit-control-table to point to a dummy ASTE.
      Set up control register 15 to point to an empty linkage stack with no
      room left.
      
      A user space process with a linkage stack instruction will still crash
      but with a different exception which is correctly translated to a
      segmentation fault instead of a kernel oops.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [dannf: backported to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0e37e30a
    • Stephen Smalley's avatar
      SELinux: Fix kernel BUG on empty security contexts. · efe55430
      Stephen Smalley authored
      commit 2172fa70 upstream
      
      Setting an empty security context (length=0) on a file will
      lead to incorrectly dereferencing the type and other fields
      of the security context structure, yielding a kernel BUG.
      As a zero-length security context is never valid, just reject
      all such security contexts whether coming from userspace
      via setxattr or coming from the filesystem upon a getxattr
      request by SELinux.
      
      Setting a security context value (empty or otherwise) unknown to
      SELinux in the first place is only possible for a root process
      (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
      if the corresponding SELinux mac_admin permission is also granted
      to the domain by policy.  In Fedora policies, this is only allowed for
      specific domains such as livecd for setting down security contexts
      that are not defined in the build host policy.
      
      Reproducer:
      su
      setenforce 0
      touch foo
      setfattr -n security.selinux foo
      
      Caveat:
      Relabeling or removing foo after doing the above may not be possible
      without booting with SELinux disabled.  Any subsequent access to foo
      after doing the above will also trigger the BUG.
      
      BUG output from Matthew Thode:
      [  473.893141] ------------[ cut here ]------------
      [  473.962110] kernel BUG at security/selinux/ss/services.c:654!
      [  473.995314] invalid opcode: 0000 [#6] SMP
      [  474.027196] Modules linked in:
      [  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
      3.13.0-grsec #1
      [  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
      07/29/10
      [  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
      ffff8805f50cd488
      [  474.183707] RIP: 0010:[<ffffffff814681c7>]  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
      [  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
      0000000000000100
      [  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
      ffff8805e8aaa000
      [  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
      0000000000000006
      [  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
      0000000000000006
      [  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
      0000000000000000
      [  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
      knlGS:0000000000000000
      [  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
      00000000000207f0
      [  474.556058] Stack:
      [  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
      ffff8805f1190a40
      [  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
      ffff8805e8aac860
      [  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
      ffff8805c0ac3d94
      [  474.690461] Call Trace:
      [  474.723779]  [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
      [  474.778049]  [<ffffffff81468824>] security_compute_av+0xf4/0x20b
      [  474.811398]  [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
      [  474.843813]  [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
      [  474.875694]  [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
      [  474.907370]  [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
      [  474.938726]  [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
      [  474.970036]  [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
      [  475.000618]  [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
      [  475.030402]  [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
      [  475.061097]  [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
      [  475.094595]  [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
      [  475.148405]  [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
      [  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
      8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
      75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
      [  475.255884] RIP  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  475.296120]  RSP <ffff8805c0ac3c38>
      [  475.328734] ---[ end trace f076482e9d754adc ]---
      Reported-by: default avatarMatthew Thode <mthode@mthode.org>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      efe55430
    • Dan Carpenter's avatar
      aacraid: missing capable() check in compat ioctl · a7060b9d
      Dan Carpenter authored
      commit f856567b upstream
      
      In commit d496f94d ('[SCSI] aacraid: fix security weakness') we
      added a check on CAP_SYS_RAWIO to the ioctl.  The compat ioctls need the
      check as well.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a7060b9d
    • Dan Carpenter's avatar
      xfs: underflow bug in xfs_attrlist_by_handle() · e017ce71
      Dan Carpenter authored
      If we allocate less than sizeof(struct attrlist) then we end up
      corrupting memory or doing a ZERO_PTR_SIZE dereference.
      
      This can only be triggered with CAP_SYS_ADMIN.
      Reported-by: default avatarNico Golde <nico@ngolde.de>
      Reported-by: default avatarFabian Yamaguchi <fabs@goesec.de>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 071c529e)
      [dannf: backported to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e017ce71