1. 23 Jun, 2017 1 commit
  2. 22 Jun, 2017 2 commits
    • Björn Töpel's avatar
      perf probe: Fix probe definition for inlined functions · 7598f8bc
      Björn Töpel authored
      In commit 613f050d ("perf probe: Fix to probe on gcc generated
      functions in modules"), the offset from symbol is, incorrectly, added
      to the trace point address. This leads to incorrect probe trace points
      for inlined functions and when using relative line number on symbols.
      
      Prior this patch:
        $ perf probe -m nf_nat -D in_range
        p:probe/in_range nf_nat:in_range.isra.9+0
        $ perf probe -m i40e -D i40e_clean_rx_irq
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212
        $ perf probe -m i40e -D i40e_clean_rx_irq:16
        p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626
      
      After:
        $ perf probe -m nf_nat -D in_range
        p:probe/in_range nf_nat:in_range.isra.9+0
        $ perf probe -m i40e -D i40e_clean_rx_irq
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106
        $ perf probe -m i40e -D i40e_clean_rx_irq:16
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665
      
      Committer testing:
      
      Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are
      the functions that while not being explicitely marked as inline, were inlined
      by the compiler:
      
        # pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head
        __ew32
        e1000_regdump
        e1000e_dump_ps_pages
        e1000_desc_unused
        e1000e_systim_to_hwtstamp
        e1000e_rx_hwtstamp
        e1000e_update_rdt_wa
        e1000e_update_tdt_wa
        e1000_put_txbuf
        e1000_consume_page
      
      Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of
      them:
      
        # perf probe -m e1000e -D e1000e_rx_hwtstamp
        p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74
      
        # perf probe -m e1000e -D e1000_consume_page
        p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876
        p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506
        p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
      
      Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in
      e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for
      that function:
      
        $ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
        <SNIP>
        <1><13e27b>: Abbrev Number: 121 (DW_TAG_subprogram)
          <13e27c>   DW_AT_name        : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq
          <13e287>   DW_AT_low_pc      : 0x17a30
        <3><13e6ef>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13e6f0>   DW_AT_abstract_origin: <0x13ed2c>
          <13e6f4>   DW_AT_low_pc      : 0x17be6
        <SNIP>
        <1><13ed2c>: Abbrev Number: 142 (DW_TAG_subprogram)
           <13ed2e>   DW_AT_name        : (indirect string, offset: 0xa54c3): e1000_consume_page
      
      So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is
      inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s
      address, gives us the offset we should use in the probe definition:
      
        0x17be6 - 0x17a30 = 438
      
      but above we have 876, which is twice as much.
      
      Lets see the second inline expansion of e1000_consume_page() in
      e1000_clean_jumbo_rx_irq():
      
        <3><13e86e>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13e86f>   DW_AT_abstract_origin: <0x13ed2c>
          <13e873>   DW_AT_low_pc      : 0x17d21
      
        0x17d21 - 0x17a30 = 753
      
      So we where adding it at twice the offset from the containing function as we
      should.
      
      And then after this patch:
      
        # perf probe -m e1000e -D e1000e_rx_hwtstamp
        p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37
      
        # perf probe -m e1000e -D e1000_consume_page
        p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438
        p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753
        p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353
        #
      
      Which matches the two first expansions and shows that because we were
      doubling the offset it would spill over the next function:
      
        readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
         673: 0000000000017a30  1626 FUNC    LOCAL  DEFAULT    2 e1000_clean_jumbo_rx_irq
         674: 0000000000018090  2013 FUNC    LOCAL  DEFAULT    2 e1000_clean_rx_irq_ps
      
      This is the 3rd inline expansion of e1000_consume_page() in
      e1000_clean_jumbo_rx_irq():
      
         <3><13ec77>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13ec78>   DW_AT_abstract_origin: <0x13ed2c>
          <13ec7c>   DW_AT_low_pc      : 0x17f79
      
        0x17f79 - 0x17a30 = 1353
      
       So:
      
         0x17a30 + 2 * 1353 = 0x184c2
      
        And:
      
         0x184c2 - 0x18090 = 1074
      
      Which explains the bogus third expansion for e1000_consume_page() to end up at:
      
         p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
      
      All fixed now :-)
      
      [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 613f050d ("perf probe: Fix to probe on gcc generated functions in modules")
      Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7598f8bc
    • Kan Liang's avatar
      perf/x86/intel: Add 1G DTLB load/store miss support for SKL · fb3a5055
      Kan Liang authored
      Current DTLB load/store miss events (0x608/0x649) only counts 4K,2M and
      4M page size.
      Need to extend the events to support any page size (4K/2M/4M/1G).
      
      The complete DTLB load/store miss events are:
      
        DTLB_LOAD_MISSES.WALK_COMPLETED		0xe08
        DTLB_STORE_MISSES.WALK_COMPLETED		0xe49
      Signed-off-by: default avatarKan Liang <Kan.liang@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/20170619142609.11058-1-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fb3a5055
  3. 21 Jun, 2017 1 commit
  4. 20 Jun, 2017 2 commits
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 9705596d
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "One build fix for an Amlogic clk driver and a handful of Allwinner clk
        driver fixes for some DT bindings and a randconfig build error that
        all came in this merge window"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: sunxi-ng: a64: Export PLL_PERIPH0 clock for the PRCM
        clk: sunxi-ng: h3: Export PLL_PERIPH0 clock for the PRCM
        dt-bindings: clock: sunxi-ccu: Add pll-periph to PRCM's needed clocks
        clk: sunxi-ng: sun5i: Fix ahb_bist_clk definition
        clk: sunxi-ng: enable SUNXI_CCU_MP for PRCM
        clk: meson: gxbb: fix build error without RESET_CONTROLLER
        clk: sunxi-ng: v3s: Fix usb otg device reset bit
        clk: sunxi-ng: a31: Correct lcd1-ch1 clock register offset
      9705596d
    • Linus Torvalds's avatar
      Merge tag 'ntb-4.12-bugfixes' of git://github.com/jonmason/ntb · 865be780
      Linus Torvalds authored
      Pull NTB fixes from Jon Mason:
       "NTB bug fixes to address the modinfo in ntb_perf, a couple of bugs in
        the NTB transport QP calculations, skx doorbells, and sleeping in
        ntb_async_tx_submit"
      
      * tag 'ntb-4.12-bugfixes' of git://github.com/jonmason/ntb:
        ntb: no sleep in ntb_async_tx_submit
        ntb: ntb_hw_intel: Skylake doorbells should be 32bits, not 64bits
        ntb_transport: fix bug calculating num_qps_mw
        ntb_transport: fix qp count bug
        NTB: ntb_test: fix bug printing ntb_perf results
        ntb: Correct modinfo usage statement for ntb_perf
      865be780
  5. 19 Jun, 2017 14 commits
  6. 18 Jun, 2017 7 commits
  7. 17 Jun, 2017 7 commits
  8. 16 Jun, 2017 6 commits