1. 31 Jul, 2018 2 commits
    • Zubin Mithra's avatar
      tracefs: Annotate tracefs_ops with __ro_after_init · 5248ee85
      Zubin Mithra authored
      tracefs_ops is initialized inside tracefs_create_instance_dir and not
      modified after. tracefs_create_instance_dir allows for initialization
      only once, and is called from create_trace_instances(marked __init),
      which is called from tracer_init_tracefs(marked __init). Also, mark
      tracefs_create_instance_dir as __init.
      
      Link: http://lkml.kernel.org/r/20180725171901.4468-1-zsm@chromium.orgReviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarZubin Mithra <zsm@chromium.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      5248ee85
    • Joel Fernandes (Google)'s avatar
      tracing: Centralize preemptirq tracepoints and unify their usage · c3bc8fd6
      Joel Fernandes (Google) authored
      This patch detaches the preemptirq tracepoints from the tracers and
      keeps it separate.
      
      Advantages:
      * Lockdep and irqsoff event can now run in parallel since they no longer
      have their own calls.
      
      * This unifies the usecase of adding hooks to an irqsoff and irqson
      event, and a preemptoff and preempton event.
        3 users of the events exist:
        - Lockdep
        - irqsoff and preemptoff tracers
        - irqs and preempt trace events
      
      The unification cleans up several ifdefs and makes the code in preempt
      tracer and irqsoff tracers simpler. It gets rid of all the horrific
      ifdeferry around PROVE_LOCKING and makes configuration of the different
      users of the tracepoints more easy and understandable. It also gets rid
      of the time_* function calls from the lockdep hooks used to call into
      the preemptirq tracer which is not needed anymore. The negative delta in
      lines of code in this patch is quite large too.
      
      In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
      as a single point for registering probes onto the tracepoints. With
      this,
      the web of config options for preempt/irq toggle tracepoints and its
      users becomes:
      
       PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
             |                 |     \         |           |
             \    (selects)    /      \        \ (selects) /
            TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
                            \                  /
                             \ (depends on)   /
                           PREEMPTIRQ_TRACEPOINTS
      
      Other than the performance tests mentioned in the previous patch, I also
      ran the locking API test suite. I verified that all tests cases are
      passing.
      
      I also injected issues by not registering lockdep probes onto the
      tracepoints and I see failures to confirm that the probes are indeed
      working.
      
      This series + lockdep probes not registered (just to inject errors):
      [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
      [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
      [    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
      [    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
      [    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
      [    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
      [    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
      [    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
      [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
      [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
      
      With this series + lockdep probes registered, all locking tests pass:
      
      [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
      [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
      [    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
      [    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
      [    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
      [    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
      [    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
      [    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
      [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
      [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
      
      Link: http://lkml.kernel.org/r/20180730222423.196630-4-joel@joelfernandes.orgAcked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      c3bc8fd6
  2. 30 Jul, 2018 5 commits
  3. 27 Jul, 2018 2 commits
  4. 26 Jul, 2018 11 commits
  5. 25 Jul, 2018 4 commits
    • Artem Savkov's avatar
      tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure · 57ea2a34
      Artem Savkov authored
      If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
      it returns an error, but does not unset the tp flags it set previously.
      This results in a probe being considered enabled and failures like being
      unable to remove the probe through kprobe_events file since probes_open()
      expects every probe to be disabled.
      
      Link: http://lkml.kernel.org/r/20180725102826.8300-1-asavkov@redhat.com
      Link: http://lkml.kernel.org/r/20180725142038.4765-1-asavkov@redhat.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 41a7dd42 ("tracing/kprobes: Support ftrace_event_file base multibuffer")
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarArtem Savkov <asavkov@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      57ea2a34
    • Masami Hiramatsu's avatar
      selftests/ftrace: Add snapshot and tracing_on test case · 82f4f3e6
      Masami Hiramatsu authored
      Add a testcase for checking snapshot and tracing_on
      relationship. This ensures that the snapshotting doesn't
      affect current tracing on/off settings.
      
      Link: http://lkml.kernel.org/r/153149932412.11274.15289227592627901488.stgit@devbox
      
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: linux-kselftest@vger.kernel.org
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      82f4f3e6
    • Masami Hiramatsu's avatar
      ring_buffer: tracing: Inherit the tracing setting to next ring buffer · 73c8d894
      Masami Hiramatsu authored
      Maintain the tracing on/off setting of the ring_buffer when switching
      to the trace buffer snapshot.
      
      Taking a snapshot is done by swapping the backup ring buffer
      (max_tr_buffer). But since the tracing on/off setting is defined
      by the ring buffer, when swapping it, the tracing on/off setting
      can also be changed. This causes a strange result like below:
      
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 0 > tracing_on
        /sys/kernel/debug/tracing # cat tracing_on
        0
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        0
      
      We don't touch tracing_on, but snapshot changes tracing_on
      setting each time. This is an anomaly, because user doesn't know
      that each "ring_buffer" stores its own tracing-enable state and
      the snapshot is done by swapping ring buffers.
      
      Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
      Cc: stable@vger.kernel.org
      Fixes: debdd57f ("tracing: Make a snapshot feature available from userspace")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      [ Updated commit log and comment in the code ]
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      73c8d894
    • Steven Rostedt (VMware)'s avatar
      tracing: Fix double free of event_trigger_data · 1863c387
      Steven Rostedt (VMware) authored
      Running the following:
      
       # cd /sys/kernel/debug/tracing
       # echo 500000 > buffer_size_kb
      [ Or some other number that takes up most of memory ]
       # echo snapshot > events/sched/sched_switch/trigger
      
      Triggers the following bug:
      
       ------------[ cut here ]------------
       kernel BUG at mm/slub.c:296!
       invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
       CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
       RIP: 0010:kfree+0x16c/0x180
       Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
       RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
       RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
       RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
       RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
       R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
       R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
       FS:  00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
       Call Trace:
        event_trigger_callback+0xee/0x1d0
        event_trigger_write+0xfc/0x1a0
        __vfs_write+0x33/0x190
        ? handle_mm_fault+0x115/0x230
        ? _cond_resched+0x16/0x40
        vfs_write+0xb0/0x190
        ksys_write+0x52/0xc0
        do_syscall_64+0x5a/0x160
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f363e16ab50
       Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
       RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
       RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
       RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
       RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
       R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
       R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
       Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
      86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
       ---[ end trace d301afa879ddfa25 ]---
      
      The cause is because the register_snapshot_trigger() call failed to
      allocate the snapshot buffer, and then called unregister_trigger()
      which freed the data that was passed to it. Then on return to the
      function that called register_snapshot_trigger(), as it sees it
      failed to register, it frees the trigger_data again and causes
      a double free.
      
      By calling event_trigger_init() on the trigger_data (which only ups
      the reference counter for it), and then event_trigger_free() afterward,
      the trigger_data would not get freed by the registering trigger function
      as it would only up and lower the ref count for it. If the register
      trigger function fails, then the event_trigger_free() called after it
      will free the trigger data normally.
      
      Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home
      
      Cc: stable@vger.kerne.org
      Fixes: 93e31ffb ("tracing: Add 'snapshot' event trigger command")
      Reported-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      1863c387
  6. 22 Jul, 2018 8 commits
  7. 21 Jul, 2018 8 commits
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ea75a2c7
      Linus Torvalds authored
      Pull core kernel fixes from Ingo Molnar:
       "This is mostly the copy_to_user_mcsafe() related fixes from Dan
        Williams, and an ORC fix for Clang"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling
        lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()
        lib/iov_iter: Document _copy_to_iter_flushcache()
        lib/iov_iter: Document _copy_to_iter_mcsafe()
        objtool: Use '.strtab' if '.shstrtab' doesn't exist, to support ORC tables on Clang
      ea75a2c7
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ffb48e79
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Two regression fixes, one for xmon disassembly formatting and the
        other to fix the E500 build.
      
        Two commits to fix a potential security issue in the VFIO code under
        obscure circumstances.
      
        And finally a fix to the Power9 idle code to restore SPRG3, which is
        user visible and used for sched_getcpu().
      
        Thanks to: Alexey Kardashevskiy, David Gibson. Gautham R. Shenoy,
        James Clarke"
      
      * tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)
        powerpc/Makefile: Assemble with -me500 when building for E500
        KVM: PPC: Check if IOMMU page is contained in the pinned physical page
        vfio/spapr: Use IOMMU pageshift rather than pagesize
        powerpc/xmon: Fix disassembly since printf changes
      ffb48e79
    • Linus Torvalds's avatar
      Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 55b636b4
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "A fix of a corruption regarding fsync and clone, under some very
        specific conditions explained in the patch.
      
        The fix is marked for stable 3.16+ so I'd like to get it merged now
        given the impact"
      
      * tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix file data corruption after cloning a range and fsync
      55b636b4
    • Linus Torvalds's avatar
      mm: make vm_area_alloc() initialize core fields · 490fc053
      Linus Torvalds authored
      Like vm_area_dup(), it initializes the anon_vma_chain head, and the
      basic mm pointer.
      
      The rest of the fields end up being different for different users,
      although the plan is to also initialize the 'vm_ops' field to a dummy
      entry.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      490fc053
    • Linus Torvalds's avatar
      mm: make vm_area_dup() actually copy the old vma data · 95faf699
      Linus Torvalds authored
      .. and re-initialize th eanon_vma_chain head.
      
      This removes some boiler-plate from the users, and also makes it clear
      why it didn't need use the 'zalloc()' version.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95faf699
    • Linus Torvalds's avatar
      mm: use helper functions for allocating and freeing vm_area structs · 3928d4f5
      Linus Torvalds authored
      The vm_area_struct is one of the most fundamental memory management
      objects, but the management of it is entirely open-coded evertwhere,
      ranging from allocation and freeing (using kmem_cache_[z]alloc and
      kmem_cache_free) to initializing all the fields.
      
      We want to unify this in order to end up having some unified
      initialization of the vmas, and the first step to this is to at least
      have basic allocation functions.
      
      Right now those functions are literally just wrappers around the
      kmem_cache_*() calls.  This is a purely mechanical conversion:
      
          # new vma:
          kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()
      
          # copy old vma
          kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)
      
          # free vma
          kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)
      
      to the point where the old vma passed in to the vm_area_dup() function
      isn't even used yet (because I've left all the old manual initialization
      alone).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3928d4f5
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 191a3afa
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "5 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: memcg: fix use after free in mem_cgroup_iter()
        mm/huge_memory.c: fix data loss when splitting a file pmd
        fat: fix memory allocation failure handling of match_strdup()
        MAINTAINERS: Peter has moved
        mm/memblock: add missing include <linux/bootmem.h>
      191a3afa
    • Jing Xia's avatar
      mm: memcg: fix use after free in mem_cgroup_iter() · 9f15bde6
      Jing Xia authored
      It was reported that a kernel crash happened in mem_cgroup_iter(), which
      can be triggered if the legacy cgroup-v1 non-hierarchical mode is used.
      
      Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b8f
      ......
      Call trace:
        mem_cgroup_iter+0x2e0/0x6d4
        shrink_zone+0x8c/0x324
        balance_pgdat+0x450/0x640
        kswapd+0x130/0x4b8
        kthread+0xe8/0xfc
        ret_from_fork+0x10/0x20
      
        mem_cgroup_iter():
            ......
            if (css_tryget(css))    <-- crash here
      	    break;
            ......
      
      The crashing reason is that mem_cgroup_iter() uses the memcg object whose
      pointer is stored in iter->position, which has been freed before and
      filled with POISON_FREE(0x6b).
      
      And the root cause of the use-after-free issue is that
      invalidate_reclaim_iterators() fails to reset the value of iter->position
      to NULL when the css of the memcg is released in non- hierarchical mode.
      
      Link: http://lkml.kernel.org/r/1531994807-25639-1-git-send-email-jing.xia@unisoc.com
      Fixes: 6df38689 ("mm: memcontrol: fix possible memcg leak due to interrupted reclaim")
      Signed-off-by: default avatarJing Xia <jing.xia.mail@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <chunyan.zhang@unisoc.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f15bde6