1. 07 May, 2013 22 commits
    • Vineet Gupta's avatar
      ARC: [mm] Lazy D-cache flush (non aliasing VIPT) · eacd0e95
      Vineet Gupta authored
      flush_dcache_page( ) is MM hook to ensure that a page has consistent
      views between kernel and userspace. Thus it is called when
      
      * kernel writes to a page which at some later point could get mapped to
        userspace (so kernel mapping needs to be flushed-n-inv)
      * kernel is about to read from a page with possible userspace mappings
        (so userspace mappings needs to be made coherent with kernel ones)
      
      However for Non aliasing VIPT dcache, any userspace mapping will always
      be congruent to kernel mapping. Thus d-cache need need not be flushed at
      all (or delayed indefinitely).
      
      The only reason it does need to be flushed is when mapping code pages.
      Since icache doesn't snoop dcache, those dirty dcache lines need to be
      written back to memory and icache line invalidated so that icache lines
      fetch will get the right data.
      
      Decent gains on LMBench fork/exec/sh and File I/O micro-benchmarks.
      
      (1) FPGA @ 80 MHZ
      
      Processor, Processes - times in microseconds - smaller is better
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      3.9-rc6-a Linux 3.9.0-r   80 4.79 8.72 66.7 116. 239. 8.39 30.4 4798 14.K 34.K
      3.9-rc6-b Linux 3.9.0-r   80 4.79 8.62 65.4 111. 239. 8.35 29.0 3995 12.K 30.K
      3.9-rc7-c Linux 3.9.0-r   80 4.79 9.00 66.1 106. 239. 8.61 30.4 2858 10.K 24.K
                                                                      ^^^^ ^^^^ ^^^
      
      File & VM system latencies in microseconds - smaller is better
      -------------------------------------------------------------------------------
      Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                              Create Delete Create Delete Latency Fault  Fault selct
      --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
      3.9-rc6-a Linux 3.9.0-r  317.8  204.2 1122.3  375.1 3522.0 4.288     20.7 126.8
      3.9-rc6-b Linux 3.9.0-r  298.7  223.0 1141.6  367.8 3531.0 4.866     20.9 126.4
      3.9-rc7-c Linux 3.9.0-r  278.4  179.2  862.1  339.3 3705.0 3.223     20.3 126.6
                               ^^^^^  ^^^^^  ^^^^^  ^^^^
      
      (2) Customer Silicon @ 500 MHz (166 MHz mem)
      
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      abilis-ba Linux 3.9.0-r  497 0.71 1.38 4.58 12.0 35.5 1.40 3.89 2070 5525 13.K
      abilis-ca Linux 3.9.0-r  497 0.71 1.40 4.61 11.8 35.6 1.37 3.92 1411 4317 10.K
                                                                      ^^^^ ^^^^ ^^^
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      eacd0e95
    • Vineet Gupta's avatar
      ARC: [mm] micro-optimize page size icache invalidate · 764531cc
      Vineet Gupta authored
      start address is already page aligned and size is const PAGE_SIZE,
      thus fixups for alignment not needed in generated code.
      
      bloat-o-meter vmlinux-mm5 vmlinux
      add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-32 (-32)
      function                                     old     new   delta
      __inv_icache_page                             82      50     -32
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      764531cc
    • Vineet Gupta's avatar
      ARC: [mm] remove the pessimistic all-alias-invalidate icache helpers · 7f250a0f
      Vineet Gupta authored
      No users of this code anymore - so RIP !
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      7f250a0f
    • Vineet Gupta's avatar
      ARC: [mm] consolidate icache/dcache sync code · 94bad1af
      Vineet Gupta authored
      Now that we have same helper used for all icache invalidates (i.e.
      vaddr+paddr based exact line invalidate), consolidate the open coded
      calls into one place.
      
      Also rename flush_icache_range_vaddr => __sync_icache_dcache
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      94bad1af
    • Vineet Gupta's avatar
      ARC: [mm] optimise icache flush for kernel mappings · 7586bf72
      Vineet Gupta authored
      This change continues the theme from prev commit - this time icache
      handling for kernel's own code modification (vmalloc: loadable modules,
      breakpoints for kprobes/kgdb...)
      
      flush_icache_range() calls the CDU icache helper with vaddr to enable
      exact line invalidate.
      
      For a true kernel-virtual mapping, the vaddr is actually virtual hence
      valid as index into cache. For kprobes breakpoint however, the vaddr arg
      is actually paddr - since that's how normal kernel is mapped in ARC
      memory map.  This implies that CDU will use the same addr for
      indexing as for tag match - which is fine since kernel code would only
      have that "implicit" mapping and none other.
      
      This should speed up module loading significantly - specially on default
      ARC700 icache configurations (32k) which alias.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      7586bf72
    • Vineet Gupta's avatar
      ARC: [mm] optimise icache flush for user mappings · 24603fdd
      Vineet Gupta authored
      ARC icache doesn't snoop dcache thus executable pages need to be made
      coherent before mapping into userspace in flush_icache_page().
      
      However ARC700 CDU (hardware cache flush module) requires both vaddr
      (index in cache) as well as paddr (tag match) to correctly identify a
      line in the VIPT cache. A typical ARC700 SoC has aliasing icache, thus
      the paddr only based flush_icache_page() API couldn't be implemented
      efficiently. It had to loop thru all possible alias indexes and perform
      the invalidate operation (ofcourse the cache op would only succeed at
      the index(es) where tag matches - typically only 1, but the cost of
      visiting all the cache-bins needs to paid nevertheless).
      
      Turns out however that the vaddr (along with paddr) is available in
      update_mmu_cache() hence better suits ARC icache flush semantics.
      With both vaddr+paddr, exactly one flush operation per line is done.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      24603fdd
    • Vineet Gupta's avatar
      ARC: [mm] optimize needless full mm TLB flush on munmap · 8d56bec2
      Vineet Gupta authored
      munmap ends up calling tlb_flush() which for ARC was flushing the entire
      TLB unconditionally (by moving the MMU to a new ASID)
      
      do_munmap
        unmap_region
          unmap_vmas
            unmap_single_vma
               unmap_page_range
                  tlb_start_vma
                  zap_pud_range
                  tlb_end_vma()
        tlb_finish_mmu
          tlb_flush()  ---> unconditional flush_tlb_mm()
      
      So even a single page munmap, a frequent operation when uClibc dynamic
      linker (ldso) is loading the dependent shared libraries, would move the
      the ASID multiple times - needlessly invalidating the pre-faulted TLB
      entries (and increasing the rate of ASID wraparound + full TLB flush).
      
      This is now optimised to only be called if tlb->full_mm (which means
      for exit/execve) cases only. And for those cases, flush_tlb_mm() is
      already optimised to be a no-op for mm->mm_users == 0.
      
      So essentially there are no mmore full mm flushes - except for fork which
      anyhow needs it for properly COW'ing parent address space.
      
      munmap now needs to do TLB range flush, which is implemented with
      tlb_end_vma()
      
      Results
      -------
      1. ASID now consistenly moves by 4 during a simple ls (as opposed to 5 or
         7 before).
      
      2. LMBench microbenchmark also shows improvements
      
      Basic system parameters
      ------------------------------------------------------------------------------
      Host                 OS Description              Mhz  tlb  cache  mem scal
                                                           pages line   par load
                                                                 bytes
      --------- ------------- ----------------------- ---- ----- ----- ------ ----
      3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0404-gcc-4.4-ba   80     8    64 1.1000 1
      3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0405-avoid-full   80     8    64 1.1200 1
      
      Processor, Processes - times in microseconds - smaller is better
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      3.9-rc5-0 Linux 3.9.0-r   80 4.81 8.69 68.6 118. 239. 8.53 31.6 4839 13.K 34.K
      3.9-rc5-0 Linux 3.9.0-r   80 4.46 8.36 53.8 91.3 223. 8.12 24.2 4725 13.K 33.K
      
      File & VM system latencies in microseconds - smaller is better
      -------------------------------------------------------------------------------
      Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                              Create Delete Create Delete Latency Fault  Fault selct
      --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
      3.9-rc5-0 Linux 3.9.0-r  314.7  223.2 1054.9  390.2  3615.0 1.590 20.1 126.6
      3.9-rc5-0 Linux 3.9.0-r  265.8  183.8 1014.2  314.1  3193.0 6.910 18.8 110.4
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      8d56bec2
    • Mischa Jonker's avatar
      ARC: Add support for nSIM OSCI System C model · a92a5d0d
      Mischa Jonker authored
      This adds support for an ARC Virtual Platform. This platform is based on the
      System C standard promoted by the OSCI (Open System C Initiative) and uses
      nSIM to simulate the ARC CPU core itself.
      
      Users can build a virtual SoC by combining System C models of peripherals
      and CPU cores.
      Signed-off-by: default avatarMischa Jonker <mjonker@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      a92a5d0d
    • Christian Ruppert's avatar
      ARC: [TB10x] Adapt device tree to new compatible string · 0dfad77d
      Christian Ruppert authored
      The original device tree was written using a slightly different
      implementation of the fixed-factor-clock device tree binding. The
      compatible string must be modified in order to be compatible with the
      new implementation.
      Signed-off-by: default avatarChristian Ruppert <christian.ruppert@abilis.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      0dfad77d
    • Christian Ruppert's avatar
      ARC: [TB10x] Add support for TB10x platform · 072eb693
      Christian Ruppert authored
      Infrastructure required to make the Linux kernel compile and boot on the
      Abilis Systems TB10x series of SOCs based on ARC700 CPUs:
        - Kmake related files (Kconfig, Makefile, tb10x_defconfig)
        - TB10x platform initialisation
      Signed-off-by: default avatarChristian Ruppert <christian.ruppert@abilis.com>
      Signed-off-by: default avatarPierrick Hascoet <pierrick.hascoet@abilis.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      072eb693
    • Christian Ruppert's avatar
      ARC: [TB10x] Device tree of TB100 and TB101 Development Kits · 2eb9504b
      Christian Ruppert authored
      These are the device tree files for the Abilis Systems TB100 and TB101 ICs and
      their respective development kit PCBs. These files are committed in preparation
      of the following patch set which adds support for these chips to the ARC
      platform.
      Signed-off-by: default avatarChristian Ruppert <christian.ruppert@abilis.com>
      Signed-off-by: default avatarPierrick Hascoet <pierrick.hascoet@abilis.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      2eb9504b
    • Christian Ruppert's avatar
      ARC: Prepare interrupt code for external controllers · a37cdacc
      Christian Ruppert authored
      This patch adds some room for CPU-external interrupt controllers in the
      Linux interrupt space. Until now, only the 32 CPU internal interrupt lines
      were supported which does not allow for external interrupt controllers such
      as GPIO modules etc.
      Signed-off-by: default avatarChristian Ruppert <christian.ruppert@abilis.com>
      Signed-off-by: default avatarPierrick Hascoet <pierrick.hascoet@abilis.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      a37cdacc
    • Vineet Gupta's avatar
      ARC: Allow embedded arc-intc to be properly placed in DT intc hierarchy · c93d8b8c
      Vineet Gupta authored
      arc-intc is initialized in arc common code as it is applicable to all
      platforms. However platforms with their own external intc still need to
      refer to it for correct DT interrupt tree hierarchy setup,
      
      e.g.
      static struct of_device_id __initdata tb10x_irq_ids[] = {
      	{ .compatible = "snps,arc700-intc", .data = dummy_init_irq },
      	{ .compatible = "abilis,tb10x_ictl", .data = tb10x_init_irq },
      	{},
      };
      
      The fix is to use the generic irqchip framework to tie all irqchips in
      a special linker section and then call irqchip_init() which calls the
      DT of_irq_init() for all the intc in one go.
      
      That way the platform code need not be aware of arc-intc at all.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      c93d8b8c
    • Vineet Gupta's avatar
      ARC: [cmdline] Don't overwrite u-boot provided bootargs · 9593a933
      Vineet Gupta authored
      The existing code was wrong on several counts:
      
      * uboot provided bootargs were copied into @boot_command_line, only to
        be over-written by setup_machine_fdt(), effectively lost
      
      * @cmdline_p returned by setup_arch() to start_kernel() didn't include
        the DT /bootargs
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      9593a933
    • Vineet Gupta's avatar
      ARC: [cmdline] Remove CONFIG_CMDLINE · 6971881f
      Vineet Gupta authored
      Given that DeviceTree /bootargs can provide similar functionality,
      no point in providing duplicate infrastructure.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      6971881f
    • Vineet Gupta's avatar
      ARC: [plat-arcfpga] defconfig update · 330db333
      Vineet Gupta authored
      * Allow initramfs path to be symlink
      * CONFIG_PREEMPT be default
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      330db333
    • Vineet Gupta's avatar
      ARC: unaligned access emulation broken if callee-reg dest of LD/ST · ce147c74
      Vineet Gupta authored
      The fixup code correctly updates the callee-regs on stack, but
      fails to unwind it into actual register file. Thus userspace won't see
      the update.
      Reported-by: default avatarNoam Camus <noamc@ezchip.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      ce147c74
    • Vineet Gupta's avatar
      ARC: unaligned access emulation error handling consolidation · c723ea46
      Vineet Gupta authored
      If CONFIG_ARC_MISALIGN_ACCESS is not enabled, or if the fixup fails,
      call the same error handler: same signal/si_code to user (SIGBUS)
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      c723ea46
    • Vineet Gupta's avatar
      ARC: Debug/crash-printing Improvements · bd3c8b11
      Vineet Gupta authored
      * Remove the line-break between scratch/callee-regs (sneaked in when we
        converted from printk to pr_*
      
      * Use %pS to print the symbol names of faulting PC (ret pseudo register)
        and BLINK (call return register)
      
      * Don't print user-vma for a kernel crash (only do it for
        print-fatal-signals based regfile dump)
      
      * Verbose print the Interrupt/Exception Enable/Active state
      
      * for main executable link address is 0x10000 based (vs. 0) thus offset
        of faulting PC needs to be adjusted
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      bd3c8b11
    • Noam Camus's avatar
      ARC: fix typo with clock speed · 68e4790e
      Noam Camus authored
      Signed-off-by: default avatarNoam Camus <noamc@ezchip.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      68e4790e
    • Noam Camus's avatar
    • Alexander Shiyan's avatar
      ARC: Remove non existent refs to GENERIC_KERNEL_EXECVE & GENERIC_KERNEL_THREAD · 0e822845
      Alexander Shiyan authored
      This tracks mainline commit ae903caa "Bury the conditionals from
      kernel_thread/kernel_execve series" which we missed out as ARC port was
      not yet mainline.
      
      [vgupta: commit log modified]
      Signed-off-by: default avatarAlexander Shiyan <shc_work@mail.ru>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      0e822845
  2. 17 Apr, 2013 1 commit
    • Vineet Gupta's avatar
      ARC: [kbuild] Avoid DTB rebuilds if DTS are untouched · a89516b3
      Vineet Gupta authored
      Currently, for every ARC kernel build I see the following:
      
      --------------->8-----------------
        DTB    arch/arc/boot/dts/angel4.dtb.S
        AS      arch/arc/boot/dts/angel4.dtb.o
        LD      arch/arc/boot/dts/built-in.o
      rm arch/arc/boot/dts/angel4.dtb.S        <-- forces rebuild next iter
        CHK     kernel/config_data.h
      --------------->8-----------------
      
      This is because *.dts.S is intermediate file in dtb generation and is by
      default deleted by make which needs a ".SECONDARY" hint to NOT do so.
      
      This could have ideally been done in scripts/Makefile.lib - for benefit
      of all, however .SECONDARY doesn't seem to work with wildcards.
      
      Thanks to Stephen for suggesting .SECONDARY (vs .PRECIOUS) and making
      that work using a non wildcard version in arch makefile.
      
      Thanks to James Hogan for pointing out that *.dtb.S now needs to be
      added to clean-files
      Signed-off-by: default avatarStephen Warren <swarren@nvidia.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      a89516b3
  3. 09 Apr, 2013 12 commits
  4. 08 Apr, 2013 5 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes · f011a08c
      Linus Torvalds authored
      Pull powerpc bugfix from Stephen Rothwell:
       "A single BUG_ON fix for a condition that could happen for machines
        with certain hardware installed."
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes:
        powerpc: pSeries_lpar_hpte_remove fails from Adjunct partition being performed before the ANDCOND test
      f011a08c
    • Christian Ruppert's avatar
      ARC: Add implicit compiler barrier to raw_local_irq* functions · 79e5f05e
      Christian Ruppert authored
      ARC irqsave/restore macros were missing the compiler barrier, causing a
      stale load in irq-enabled region be used in irq-safe region, despite
      being changed, because the register holding the value was still live.
      
      The problem manifested as random crashes in timer code when stress
      testing ARCLinux (3.9-rc3) on a !SMP && !PREEMPT_COUNT
      
      Here's the exact sequence which caused this:
       (0). tv1[x] <----> t1 <---> t2
       (1). mod_timer(t1) interrupted after it calls timer_pending()
       (2). mod_timer(t2) completes
       (3). mod_timer(t1) resumes but messes up the list
       (4). __runt_timers( ) uses bogus timer_list entry / crashes in
            timer->function
      
      Essentially mod_timer() was racing against itself and while the spinlock
      serialized the tv1[] timer link list, timer_pending() called outside the
      spinlock, cached timer link list element in a register.
      With low register pressure (and a deep register file), lack of barrier
      in raw_local_irqsave() as well as preempt_disable (!PREEMPT_COUNT
      version), there was nothing to force gcc to reload across the spinlock,
      causing a stale value in reg be used for link list manipulation - ensuing
      a corruption.
      
      ARcompact disassembly which shows the culprit generated code:
      
      mod_timer:
          push_s blink
          mov_s r13,r0	# timer, timer
      ..
          ###### timer_pending( )
          ld_s r3,[r13]       # <------ <variable>.entry.next LOADED
          brne r3, 0, @.L163
      
      .L163:
      ..
          ###### spin_lock_irq( )
          lr  r5, [status32]  # flags
          bic r4, r5, 6       # temp, flags,
          and.f 0, r5, 6      # flags,
          flag.nz r4
      
          ###### detach_if_pending( ) begins
      
          tst_s r3,r3  <--------------
      			# timer_pending( ) checks timer->entry.next
                              # r3 is NOT reloaded by gcc, using stale value
          beq.d @.L169
          mov.eq r0,0
      
          #####  detach_timer( ): __list_del( )
      
          ld r4,[r13,4]    	# <variable>.entry.prev, D.31439
          st r4,[r3,4]     	# <variable>.prev, D.31439
          st r3,[r4]       	# <variable>.next, D.30246
      
      We initially tried to fix this by adding barrier() to preempt_* macros
      for !PREEMPT_COUNT but Linus clarified that it was anything but wrong.
      http://www.spinics.net/lists/kernel/msg1512709.html
      
      [vgupta: updated commitlog]
      
      Reported-by/Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
      Cc: Christian Ruppert <christian.ruppert@abilis.com>
      Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
      Debugged-by/Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      79e5f05e
    • Linus Torvalds's avatar
      Merge tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev · f465d40d
      Linus Torvalds authored
      Pull libata fixes from Jeff Garzik:
       "The HDIO_DRIVE_* fix is really the biggie.
      
        1) Fix ATAPI regression, noticed mainly on tape drives, due to a
           commit which mistakenly changed an 'int' return type to a 'bool'.
           Broken by commit 4dce8ba9 ("libata: Use 'bool' return value for
           ata_id_XXX")
      
        2) Add Slimtype DVD A DS8A8SH ATAPI quirk
      
        3) ata_piix: Intel Haswell platform quirk
      
        4) Avoid DMA'ing to stack buffer, when obtaining DEVSLP timings.  IMO
           a mild regression, given that libata previously did not DMA to a
           stack buffer.  Broken by commit commit 803739d2 ("[libata]
           replace sata_settings with devslp_timing")
      
        5) Fix regression impacting SMART and smartd, broken by commit
           84a9a8cd ("[libata] Set proper SK when CK_COND is set")"
      
      * tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
        [libata] Fix HDIO_DRIVE_* ioctl() Linux 3.9 regression
        libata: fix DMA to stack in reading devslp_timing parameters
        ata_piix: Fix DVD not dectected at some Haswell platforms
        libata: Set max sector to 65535 for Slimtype DVD A DS8A8SH drive
        libata: Use integer return value for atapi_command_packet_set
      f465d40d
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-3.9-rc6' of... · 5f2f280f
      Linus Torvalds authored
      Merge tag 'trace-fixes-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing fixes from Steven Rostedt:
       "This includes three fixes.  Two fix features added in 3.9 and one
        fixes a long time minor bug.
      
        The first patch fixes a race that can happen if the user switches from
        the irqsoff tracer to another tracer.  If a irqs off latency is
        detected, it will try to use the snapshot buffer, but the new tracer
        wont have it allocated.  There's a nasty warning that gets printed and
        the trace is ignored.  Nothing crashes, just a nasty WARN_ON is shown.
      
        The second patch fixes an issue where if the sysctl is used to disable
        and enable function tracing, it can put the function tracing into an
        unstable state.
      
        The third patch fixes an issue with perf using the function tracer.
        An update was done, where the stub function could be called during the
        perf function tracing, and that stub function wont have the "control"
        flag set and cause a nasty warning when running perf."
      
      * tag 'trace-fixes-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace: Do not call stub functions in control loop
        ftrace: Consistently restore trace function on sysctl enabling
        tracing: Fix race with update_max_tr_single and changing tracers
      5f2f280f
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Do not call stub functions in control loop · 395b97a3
      Steven Rostedt (Red Hat) authored
      The function tracing control loop used by perf spits out a warning
      if the called function is not a control function. This is because
      the control function references a per cpu allocated data structure
      on struct ftrace_ops that is not allocated for other types of
      functions.
      
      commit 0a016409 "ftrace: Optimize the function tracer list loop"
      
      Had an optimization done to all function tracing loops to optimize
      for a single registered ops. Unfortunately, this allows for a slight
      race when tracing starts or ends, where the stub function might be
      called after the current registered ops is removed. In this case we
      get the following dump:
      
      root# perf stat -e ftrace:function sleep 1
      [   74.339105] WARNING: at include/linux/ftrace.h:209 ftrace_ops_control_func+0xde/0xf0()
      [   74.349522] Hardware name: PRIMERGY RX200 S6
      [   74.357149] Modules linked in: sg igb iTCO_wdt ptp pps_core iTCO_vendor_support i7core_edac dca lpc_ich i2c_i801 coretemp edac_core crc32c_intel mfd_core ghash_clmulni_intel dm_multipath acpi_power_meter pcspk
      r microcode vhost_net tun macvtap macvlan nfsd kvm_intel kvm auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sd_mod crc_t10dif sr_mod cdrom mgag200 i2c_algo_bit drm_kms_helper ttm qla2xxx mptsas ahci drm li
      bahci scsi_transport_sas mptscsih libata scsi_transport_fc i2c_core mptbase scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
      [   74.446233] Pid: 1377, comm: perf Tainted: G        W    3.9.0-rc1 #1
      [   74.453458] Call Trace:
      [   74.456233]  [<ffffffff81062e3f>] warn_slowpath_common+0x7f/0xc0
      [   74.462997]  [<ffffffff810fbc60>] ? rcu_note_context_switch+0xa0/0xa0
      [   74.470272]  [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
      [   74.478117]  [<ffffffff81062e9a>] warn_slowpath_null+0x1a/0x20
      [   74.484681]  [<ffffffff81102ede>] ftrace_ops_control_func+0xde/0xf0
      [   74.491760]  [<ffffffff8162f400>] ftrace_call+0x5/0x2f
      [   74.497511]  [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
      [   74.503486]  [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
      [   74.509500]  [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
      [   74.516088]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.522268]  [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
      [   74.528837]  [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
      [   74.536696]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.542878]  [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
      [   74.548869]  [<ffffffff81105c67>] unregister_ftrace_function+0x27/0x50
      [   74.556243]  [<ffffffff8111eadf>] perf_ftrace_event_register+0x9f/0x140
      [   74.563709]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.569887]  [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
      [   74.575898]  [<ffffffff8111e94e>] perf_trace_destroy+0x2e/0x50
      [   74.582505]  [<ffffffff81127ba9>] tp_perf_event_destroy+0x9/0x10
      [   74.589298]  [<ffffffff811295d0>] free_event+0x70/0x1a0
      [   74.595208]  [<ffffffff8112a579>] perf_event_release_kernel+0x69/0xa0
      [   74.602460]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.608667]  [<ffffffff8112a640>] put_event+0x90/0xc0
      [   74.614373]  [<ffffffff8112a740>] perf_release+0x10/0x20
      [   74.620367]  [<ffffffff811a3044>] __fput+0xf4/0x280
      [   74.625894]  [<ffffffff811a31de>] ____fput+0xe/0x10
      [   74.631387]  [<ffffffff81083697>] task_work_run+0xa7/0xe0
      [   74.637452]  [<ffffffff81014981>] do_notify_resume+0x71/0xb0
      [   74.643843]  [<ffffffff8162fa92>] int_signal+0x12/0x17
      
      To fix this a new ftrace_ops flag is added that denotes the ftrace_list_end
      ftrace_ops stub as just that, a stub. This flag is now checked in the
      control loop and the function is not called if the flag is set.
      
      Thanks to Jovi for not just reporting the bug, but also pointing out
      where the bug was in the code.
      
      Link: http://lkml.kernel.org/r/514A8855.7090402@redhat.com
      Link: http://lkml.kernel.org/r/1364377499-1900-15-git-send-email-jovi.zhangwei@huawei.comTested-by: default avatarWANG Chao <chaowang@redhat.com>
      Reported-by: default avatarWANG Chao <chaowang@redhat.com>
      Reported-by: default avatarzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      395b97a3