1. 22 Mar, 2014 21 commits
  2. 21 Mar, 2014 2 commits
    • Rob Clark's avatar
      drm/ttm: don't oops if no invalidate_caches() · e00feda0
      Rob Clark authored
      commit 9ef7506f upstream.
      
      A few of the simpler TTM drivers (cirrus, ast, mgag200) do not implement
      this function.  Yet can end up somehow with an evicted bo:
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<          (null)>]           (null)
        PGD 16e761067 PUD 16e6cf067 PMD 0
        Oops: 0010 [#1] SMP
        Modules linked in: bnep bluetooth rfkill fuse ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg btrfs zlib_deflate raid6_pq xor dm_queue_length iTCO_wdt iTCO_vendor_support coretemp kvm dcdbas dm_service_time microcode serio_raw pcspkr lpc_ich mfd_core i7core_edac edac_core ses enclosure ipmi_si ipmi_msghandler shpchp acpi_power_meter mperf nfsd auth_rpcgss nfs_acl lockd uinput sunrpc dm_multipath xfs libcrc32c ata_generic pata_acpi sr_mod cdrom
         sd_mod usb_storage mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit lpfc drm_kms_helper ttm crc32c_intel ata_piix bfa drm ixgbe libata i2c_core mdio crc_t10dif ptp crct10dif_common pps_core scsi_transport_fc dca scsi_tgt megaraid_sas bnx2 dm_mirror dm_region_hash dm_log dm_mod
        CPU: 16 PID: 2572 Comm: X Not tainted 3.10.0-86.el7.x86_64 #1
        Hardware name: Dell Inc. PowerEdge R810/0H235N, BIOS 0.3.0 11/14/2009
        task: ffff8801799dabc0 ti: ffff88016c884000 task.ti: ffff88016c884000
        RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
        RSP: 0018:ffff88016c885ad8  EFLAGS: 00010202
        RAX: ffffffffa04e94c0 RBX: ffff880178937a20 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: 0000000000240004 RDI: ffff880178937a00
        RBP: ffff88016c885b60 R08: 00000000000171a0 R09: ffff88007cf171a0
        R10: ffffea0005842540 R11: ffffffff810487b9 R12: ffff880178937b30
        R13: ffff880178937a00 R14: ffff88016c885b78 R15: ffff880179929400
        FS:  00007f81ba2ef980(0000) GS:ffff88007cf00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000016e763000 CR4: 00000000000007e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Stack:
         ffffffffa0306fae ffff8801799295c0 0000000000260004 0000000000000001
         ffff88016c885b60 ffffffffa0307669 00ff88007cf17738 ffff88017cf17700
         ffff880178937a00 ffff880100000000 ffff880100000000 0000000079929400
        Call Trace:
         [<ffffffffa0306fae>] ? ttm_bo_handle_move_mem+0x54e/0x5b0 [ttm]
         [<ffffffffa0307669>] ? ttm_bo_mem_space+0x169/0x340 [ttm]
         [<ffffffffa0307bd7>] ttm_bo_move_buffer+0x117/0x130 [ttm]
         [<ffffffff81130001>] ? perf_event_init_context+0x141/0x220
         [<ffffffffa0307cb1>] ttm_bo_validate+0xc1/0x130 [ttm]
         [<ffffffffa04e7377>] mgag200_bo_pin+0x87/0xc0 [mgag200]
         [<ffffffffa04e56c4>] mga_crtc_cursor_set+0x474/0xbb0 [mgag200]
         [<ffffffff811971d2>] ? __mem_cgroup_commit_charge+0x152/0x3b0
         [<ffffffff815c4182>] ? mutex_lock+0x12/0x2f
         [<ffffffffa0201433>] drm_mode_cursor_common+0x123/0x170 [drm]
         [<ffffffffa0205231>] drm_mode_cursor_ioctl+0x41/0x50 [drm]
         [<ffffffffa01f5ca2>] drm_ioctl+0x502/0x630 [drm]
         [<ffffffff815cbab4>] ? __do_page_fault+0x1f4/0x510
         [<ffffffff8101cb68>] ? __restore_xstate_sig+0x218/0x4f0
         [<ffffffff811b4445>] do_vfs_ioctl+0x2e5/0x4d0
         [<ffffffff8124488e>] ? file_has_perm+0x8e/0xa0
         [<ffffffff811b46b1>] SyS_ioctl+0x81/0xa0
         [<ffffffff815d05d9>] system_call_fastpath+0x16/0x1b
        Code:  Bad RIP value.
        RIP  [<          (null)>]           (null)
         RSP <ffff88016c885ad8>
        CR2: 0000000000000000
      Signed-off-by: default avatarRob Clark <rclark@redhat.com>
      Reviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e00feda0
    • Filipe Brandenburger's avatar
      memcg: reparent charges of children before processing parent · fda958d5
      Filipe Brandenburger authored
      commit 4fb1a86f upstream.
      
      Sometimes the cleanup after memcg hierarchy testing gets stuck in
      mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.
      
      There may turn out to be several causes, but a major cause is this: the
      workitem to offline parent can get run before workitem to offline child;
      parent's mem_cgroup_reparent_charges() circles around waiting for the
      child's pages to be reparented to its lrus, but it's holding
      cgroup_mutex which prevents the child from reaching its
      mem_cgroup_reparent_charges().
      
      Further testing showed that an ordered workqueue for cgroup_destroy_wq
      is not always good enough: percpu_ref_kill_and_confirm's call_rcu_sched
      stage on the way can mess up the order before reaching the workqueue.
      
      Instead, when offlining a memcg, call mem_cgroup_reparent_charges() on
      all its children (and grandchildren, in the correct order) to have their
      charges reparented first.
      
      Fixes: e5fca243 ("cgroup: use a dedicated workqueue for cgroup destruction")
      Signed-off-by: default avatarFilipe Brandenburger <filbranden@google.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[v3.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fda958d5
  3. 14 Mar, 2014 4 commits
  4. 13 Mar, 2014 11 commits
    • Daniel Borkmann's avatar
      net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable · 00c53b02
      Daniel Borkmann authored
      [ Upstream commit ec0223ec ]
      
      RFC4895 introduced AUTH chunks for SCTP; during the SCTP
      handshake RANDOM; CHUNKS; HMAC-ALGO are negotiated (CHUNKS
      being optional though):
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
      
      A special case is when an endpoint requires COOKIE-ECHO
      chunks to be authenticated:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        ------------------ AUTH; COOKIE-ECHO ---------------->
        <-------------------- COOKIE-ACK ---------------------
      
      RFC4895, section 6.3. Receiving Authenticated Chunks says:
      
        The receiver MUST use the HMAC algorithm indicated in
        the HMAC Identifier field. If this algorithm was not
        specified by the receiver in the HMAC-ALGO parameter in
        the INIT or INIT-ACK chunk during association setup, the
        AUTH chunk and all the chunks after it MUST be discarded
        and an ERROR chunk SHOULD be sent with the error cause
        defined in Section 4.1. [...] If no endpoint pair shared
        key has been configured for that Shared Key Identifier,
        all authenticated chunks MUST be silently discarded. [...]
      
        When an endpoint requires COOKIE-ECHO chunks to be
        authenticated, some special procedures have to be followed
        because the reception of a COOKIE-ECHO chunk might result
        in the creation of an SCTP association. If a packet arrives
        containing an AUTH chunk as a first chunk, a COOKIE-ECHO
        chunk as the second chunk, and possibly more chunks after
        them, and the receiver does not have an STCB for that
        packet, then authentication is based on the contents of
        the COOKIE-ECHO chunk. In this situation, the receiver MUST
        authenticate the chunks in the packet by using the RANDOM
        parameters, CHUNKS parameters and HMAC_ALGO parameters
        obtained from the COOKIE-ECHO chunk, and possibly a local
        shared secret as inputs to the authentication procedure
        specified in Section 6.3. If authentication fails, then
        the packet is discarded. If the authentication is successful,
        the COOKIE-ECHO and all the chunks after the COOKIE-ECHO
        MUST be processed. If the receiver has an STCB, it MUST
        process the AUTH chunk as described above using the STCB
        from the existing association to authenticate the
        COOKIE-ECHO chunk and all the chunks after it. [...]
      
      Commit bbd0d598 introduced the possibility to receive
      and verification of AUTH chunk, including the edge case for
      authenticated COOKIE-ECHO. On reception of COOKIE-ECHO,
      the function sctp_sf_do_5_1D_ce() handles processing,
      unpacks and creates a new association if it passed sanity
      checks and also tests for authentication chunks being
      present. After a new association has been processed, it
      invokes sctp_process_init() on the new association and
      walks through the parameter list it received from the INIT
      chunk. It checks SCTP_PARAM_RANDOM, SCTP_PARAM_HMAC_ALGO
      and SCTP_PARAM_CHUNKS, and copies them into asoc->peer
      meta data (peer_random, peer_hmacs, peer_chunks) in case
      sysctl -w net.sctp.auth_enable=1 is set. If in INIT's
      SCTP_PARAM_SUPPORTED_EXT parameter SCTP_CID_AUTH is set,
      peer_random != NULL and peer_hmacs != NULL the peer is to be
      assumed asoc->peer.auth_capable=1, in any other case
      asoc->peer.auth_capable=0.
      
      Now, if in sctp_sf_do_5_1D_ce() chunk->auth_chunk is
      available, we set up a fake auth chunk and pass that on to
      sctp_sf_authenticate(), which at latest in
      sctp_auth_calculate_hmac() reliably dereferences a NULL pointer
      at position 0..0008 when setting up the crypto key in
      crypto_hash_setkey() by using asoc->asoc_shared_key that is
      NULL as condition key_id == asoc->active_key_id is true if
      the AUTH chunk was injected correctly from remote. This
      happens no matter what net.sctp.auth_enable sysctl says.
      
      The fix is to check for net->sctp.auth_enable and for
      asoc->peer.auth_capable before doing any operations like
      sctp_sf_authenticate() as no key is activated in
      sctp_auth_asoc_init_active_key() for each case.
      
      Now as RFC4895 section 6.3 states that if the used HMAC-ALGO
      passed from the INIT chunk was not used in the AUTH chunk, we
      SHOULD send an error; however in this case it would be better
      to just silently discard such a maliciously prepared handshake
      as we didn't even receive a parameter at all. Also, as our
      endpoint has no shared key configured, section 6.3 says that
      MUST silently discard, which we are doing from now onwards.
      
      Before calling sctp_sf_pdiscard(), we need not only to free
      the association, but also the chunk->auth_chunk skb, as
      commit bbd0d598 created a skb clone in that case.
      
      I have tested this locally by using netfilter's nfqueue and
      re-injecting packets into the local stack after maliciously
      modifying the INIT chunk (removing RANDOM; HMAC-ALGO param)
      and the SCTP packet containing the COOKIE_ECHO (injecting
      AUTH chunk before COOKIE_ECHO). Fixed with this patch applied.
      
      Fixes: bbd0d598 ("[SCTP]: Implement the receive and verification of AUTH chunk")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <yasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      00c53b02
    • Xin Long's avatar
      ip_tunnel:multicast process cause panic due to skb->_skb_refdst NULL pointer · 2a01e9e4
      Xin Long authored
      [ Upstream commit 10ddceb2 ]
      
      when ip_tunnel process multicast packets, it may check if the packet is looped
      back packet though 'rt_is_output_route(skb_rtable(skb))' in ip_tunnel_rcv(),
      but before that , skb->_skb_refdst has been dropped in iptunnel_pull_header(),
      so which leads to a panic.
      
      fix the bug: https://bugzilla.kernel.org/show_bug.cgi?id=70681Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2a01e9e4
    • Michael Chan's avatar
      tg3: Don't check undefined error bits in RXBD · baef79a5
      Michael Chan authored
      [ Upstream commit d7b95315 ]
      
      Redefine the RXD_ERR_MASK to include only relevant error bits. This fixes
      a customer reported issue of randomly dropping packets on the 5719.
      Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      baef79a5
    • Hans Schillstrom's avatar
      ipv6: ipv6_find_hdr restore prev functionality · 537fc749
      Hans Schillstrom authored
      [ Upstream commit accfe0e3 ]
      
      The commit 9195bb8e ("ipv6: improve
      ipv6_find_hdr() to skip empty routing headers") broke ipv6_find_hdr().
      
      When a target is specified like IPPROTO_ICMPV6 ipv6_find_hdr()
      returns -ENOENT when it's found, not the header as expected.
      
      A part of IPVS is broken and possible also nft_exthdr_eval().
      When target is -1 which it is most cases, it works.
      
      This patch exits the do while loop if the specific header is found
      so the nexthdr could be returned as expected.
      Reported-by: default avatarArt -kwaak- van Breemen <ard@telegraafnet.nl>
      Signed-off-by: default avatarHans Schillstrom <hans@schillstrom.com>
      CC:Ansis Atteka <aatteka@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      537fc749
    • Edward Cree's avatar
      sfc: check for NULL efx->ptp_data in efx_ptp_event · 1c026764
      Edward Cree authored
      [ Upstream commit 8f355e5c ]
      
      If we receive a PTP event from the NIC when we haven't set up PTP state
      in the driver, we attempt to read through a NULL pointer efx->ptp_data,
      triggering a panic.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarShradha Shah <sshah@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1c026764
    • Hannes Frederic Sowa's avatar
      ipv6: reuse ip6_frag_id from ip6_ufo_append_data · 3bbb02a1
      Hannes Frederic Sowa authored
      [ Upstream commit 916e4cf4 ]
      
      Currently we generate a new fragmentation id on UFO segmentation. It
      is pretty hairy to identify the correct net namespace and dst there.
      Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
      available at all.
      
      This causes unreliable or very predictable ipv6 fragmentation id
      generation while segmentation.
      
      Luckily we already have pregenerated the ip6_frag_id in
      ip6_ufo_append_data and can use it here.
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3bbb02a1
    • Jason Wang's avatar
      virtio-net: alloc big buffers also when guest can receive UFO · 251ed2ca
      Jason Wang authored
      [ Upstream commit 0e7ede80 ]
      
      We should alloc big buffers also when guest can receive UFO
      packets to let the big packets fit into guest rx buffer.
      
      Fixes 5c516751
      (virtio-net: Allow UFO feature to be set and advertised.)
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Sridhar Samudrala <sri@us.ibm.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      251ed2ca
    • Duan Jiong's avatar
      neigh: recompute reachabletime before returning from neigh_periodic_work() · df3839f9
      Duan Jiong authored
      [ Upstream commit feff9ab2 ]
      
      If the neigh table's entries is less than gc_thresh1, the function
      will return directly, and the reachabletime will not be recompute,
      so the reachabletime can be guessed.
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      df3839f9
    • Eric Dumazet's avatar
      net-tcp: fastopen: fix high order allocations · 10bbbd58
      Eric Dumazet authored
      [ Upstream commit f5ddcbbb ]
      
      This patch fixes two bugs in fastopen :
      
      1) The tcp_sendmsg(...,  @size) argument was ignored.
      
         Code was relying on user not fooling the kernel with iovec mismatches
      
      2) When MTU is about 64KB, tcp_send_syn_data() attempts order-5
      allocations, which are likely to fail when memory gets fragmented.
      
      Fixes: 783237e8 ("net-tcp: Fast Open client - sending SYN-data")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Tested-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      10bbbd58
    • Fernando Luis Vazquez Cao's avatar
      tun: remove bogus hardware vlan acceleration flags from vlan_features · cbf5a3f6
      Fernando Luis Vazquez Cao authored
      [ Upstream commit 6671b224 ]
      
      Even though only the outer vlan tag can be HW accelerated in the transmission
      path, in the TUN/TAP driver vlan_features mirrors hw_features, which happens
      to have the NETIF_F_HW_VLAN_?TAG_TX flags set. Because of this, during packet
      tranmisssion through a stacked vlan device dev_hard_start_xmit, (incorrectly)
      assuming that the vlan device supports hardware vlan acceleration, does not
      add the vlan header to the skb payload and the inner vlan tags are lost
      (vlan_tci contains the outer vlan tag when userspace reads the packet from
      the tap device).
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cbf5a3f6
    • Toshiaki Makita's avatar
      veth: Fix vlan_features so as to be able to use stacked vlan interfaces · cc9f71b2
      Toshiaki Makita authored
      [ Upstream commit 8d0d21f4 ]
      
      Even if we create a stacked vlan interface such as veth0.10.20, it sends
      single tagged frames (tagged with only vid 10).
      Because vlan_features of a veth interface has the
      NETIF_F_HW_VLAN_[CTAG/STAG]_TX bits, veth0.10 also has that feature, so
      dev_hard_start_xmit(veth0.10) doesn't call __vlan_put_tag() and
      vlan_dev_hard_start_xmit(veth0.10) overwrites vlan_tci.
      This prevents us from using a combination of 802.1ad and 802.1Q
      in containers, etc.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarFlavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cc9f71b2
  5. 12 Mar, 2014 2 commits
    • Waiman Long's avatar
      SELinux: Increase ebitmap_node size for 64-bit configuration · 17d633ca
      Waiman Long authored
      commit a767f680 upstream.
      
      Currently, the ebitmap_node structure has a fixed size of 32 bytes. On
      a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being
      used as bitmaps. The overhead ratio is 1/4.
      
      On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes
      are left for bitmap purpose and the overhead ratio is 1/2. With a
      3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit()
      function to be called about 9 million times. The average number of
      ebitmap_node traversal is about 3.7.
      
      This patch increases the size of the ebitmap_node structure to 64
      bytes for 64-bit system to keep the overhead ratio at 1/4. This may
      also improve performance a little bit by making node to node traversal
      less frequent (< 2) as more bits are available in each node.
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      17d633ca
    • Waiman Long's avatar
      SELinux: Reduce overhead of mls_level_isvalid() function call · 3b32517b
      Waiman Long authored
      commit fee71142 upstream.
      
      While running the high_systime workload of the AIM7 benchmark on
      a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel
      (with HT on), it was found that a pretty sizable amount of time was
      spent in the SELinux code. Below was the perf trace of the "perf
      record -a -s" of a test run at 1500 users:
      
        5.04%            ls  [kernel.kallsyms]     [k] ebitmap_get_bit
        1.96%            ls  [kernel.kallsyms]     [k] mls_level_isvalid
        1.95%            ls  [kernel.kallsyms]     [k] find_next_bit
      
      The ebitmap_get_bit() was the hottest function in the perf-report
      output.  Both the ebitmap_get_bit() and find_next_bit() functions
      were, in fact, called by mls_level_isvalid(). As a result, the
      mls_level_isvalid() call consumed 8.95% of the total CPU time of
      all the 24 virtual CPUs which is quite a lot. The majority of the
      mls_level_isvalid() function invocations come from the socket creation
      system call.
      
      Looking at the mls_level_isvalid() function, it is checking to see
      if all the bits set in one of the ebitmap structure are also set in
      another one as well as the highest set bit is no bigger than the one
      specified by the given policydb data structure. It is doing it in
      a bit-by-bit manner. So if the ebitmap structure has many bits set,
      the iteration loop will be done many times.
      
      The current code can be rewritten to use a similar algorithm as the
      ebitmap_contains() function with an additional check for the
      highest set bit. The ebitmap_contains() function was extended to
      cover an optional additional check for the highest set bit, and the
      mls_level_isvalid() function was modified to call ebitmap_contains().
      
      With that change, the perf trace showed that the used CPU time drop
      down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the
      total which is about 100X less than before.
      
        0.07%            ls  [kernel.kallsyms]     [k] ebitmap_contains
        0.05%            ls  [kernel.kallsyms]     [k] ebitmap_get_bit
        0.01%            ls  [kernel.kallsyms]     [k] mls_level_isvalid
        0.01%            ls  [kernel.kallsyms]     [k] find_next_bit
      
      The remaining ebitmap_get_bit() and find_next_bit() functions calls
      are made by other kernel routines as the new mls_level_isvalid()
      function will not call them anymore.
      
      This patch also improves the high_systime AIM7 benchmark result,
      though the improvement is not as impressive as is suggested by the
      reduction in CPU time spent in the ebitmap functions. The table below
      shows the performance change on the 2-socket x86-64 system (with HT
      on) mentioned above.
      
      +--------------+---------------+----------------+-----------------+
      |   Workload   | mean % change | mean % change  | mean % change   |
      |              | 10-100 users  | 200-1000 users | 1100-2000 users |
      +--------------+---------------+----------------+-----------------+
      | high_systime |     +0.1%     |     +0.9%      |     +2.6%       |
      +--------------+---------------+----------------+-----------------+
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3b32517b