1. 04 May, 2017 14 commits
    • Omar Sandoval's avatar
      blk-mq-debugfs: allow schedulers to register debugfs attributes · d332ce09
      Omar Sandoval authored
      This provides the infrastructure for schedulers to expose their internal
      state through debugfs. We add a list of queue attributes and a list of
      hctx attributes to struct elevator_type and wire them up when switching
      schedulers.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      
      Add missing seq_file.h header in blk-mq-debugfs.h
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      d332ce09
    • Omar Sandoval's avatar
      blk-mq: untangle debugfs and sysfs · 9c1051aa
      Omar Sandoval authored
      Originally, I tied debugfs registration/unregistration together with
      sysfs. There's no reason to do this, and it's getting in the way of
      letting schedulers define their own debugfs attributes. Instead, tie the
      debugfs registration to the lifetime of the structures themselves.
      
      The saner lifetimes mean we can also get rid of the extra mq directory
      and move everything one level up. I.e., nvme0n1/mq/hctx0/tags is now
      just nvme0n1/hctx0/tags.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9c1051aa
    • Omar Sandoval's avatar
      blk-mq: move debugfs declarations to a separate header file · d173a251
      Omar Sandoval authored
      Preparation for adding more declarations.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      d173a251
    • Bart Van Assche's avatar
      blk-mq: Do not invoke queue operations on a dead queue · 18d4d7d0
      Bart Van Assche authored
      In commit e869b546 ("blk-mq: Unregister debugfs attributes
      earlier"), we shuffled the debugfs cleanup around so that the "state"
      attribute was removed before we freed the blk-mq data structures.
      However, later changes are going to undo that, so we need to explicitly
      disallow running a dead queue.
      
      [Omar: rebased and updated commit message]
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      18d4d7d0
    • Omar Sandoval's avatar
      blk-mq-debugfs: get rid of a bunch of boilerplate · f57de23a
      Omar Sandoval authored
      A large part of blk-mq-debugfs.c is file_operations and seq_file
      boilerplate. This sucks as is but will suck even more when schedulers
      can define their own debugfs entries. Factor it all out into a single
      blk_mq_debugfs_fops which multiplexes as needed. We store the
      request_queue, blk_mq_hw_ctx, or blk_mq_ctx in the parent directory
      dentry, which is kind of hacky, but it works.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f57de23a
    • Omar Sandoval's avatar
      blk-mq-debugfs: rename hw queue directories from <n> to hctx<n> · 88aabbd7
      Omar Sandoval authored
      It's not clear what these numbered directories represent unless you
      consult the code. We're about to get rid of the intermediate "mq"
      directory, so these would be even more confusing without that context.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      88aabbd7
    • Omar Sandoval's avatar
      blk-mq-debugfs: don't open code strstrip() · 71b90511
      Omar Sandoval authored
      Slightly more readable, plus we also strip leading spaces.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      71b90511
    • Omar Sandoval's avatar
      blk-mq-debugfs: error on long write to queue "state" file · c7e4145a
      Omar Sandoval authored
      blk_queue_flags_store() currently truncates and returns a short write if
      the operation being written is too long. This can give us weird results,
      like here:
      
      $ echo "run            bar"
      echo: write error: invalid argument
      $ dmesg
      [ 1103.075435] blk_queue_flags_store: unsupported operation bar. Use either 'run' or 'start'
      
      Instead, return an error if the user does this. While we're here, make
      the argument names consistent with everywhere else in this file.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c7e4145a
    • Omar Sandoval's avatar
      blk-mq-debugfs: clean up flag definitions · 1a435111
      Omar Sandoval authored
      Make sure the spelled out flag names match the definition. This also
      adds a missing hctx state, BLK_MQ_S_START_ON_RUN, and a missing
      cmd_flag, __REQ_NOUNMAP.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1a435111
    • Omar Sandoval's avatar
      blk-mq-debugfs: separate flags with | · bec03d6b
      Omar Sandoval authored
      This reads more naturally than spaces.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      bec03d6b
    • Jan Kara's avatar
      nfs: Fix bdi handling for cloned superblocks · 9052c7cf
      Jan Kara authored
      In commit 0d3b12584972 "nfs: Convert to separately allocated bdi" I have
      wrongly cloned bdi reference in nfs_clone_super(). Further inspection
      has shown that originally the code was actually allocating a new bdi (in
      ->clone_server callback) which was later registered in
      nfs_fs_mount_common() and used for sb->s_bdi in nfs_initialise_sb().
      This could later result in bdi for the original superblock not getting
      unregistered when that superblock got shutdown (as the cloned sb still
      held bdi reference) and later when a new superblock was created under
      the same anonymous device number, a clash in sysfs has happened on bdi
      registration:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 10284 at /linux-next/fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x74
      sysfs: cannot create duplicate filename '/devices/virtual/bdi/0:32'
      Modules linked in: axp20x_usb_power gpio_axp209 nvmem_sunxi_sid sun4i_dma sun4i_ss virt_dma
      CPU: 1 PID: 10284 Comm: mount.nfs Not tainted 4.11.0-rc4+ #14
      Hardware name: Allwinner sun7i (A20) Family
      [<c010f19c>] (unwind_backtrace) from [<c010bc74>] (show_stack+0x10/0x14)
      [<c010bc74>] (show_stack) from [<c03c6e24>] (dump_stack+0x78/0x8c)
      [<c03c6e24>] (dump_stack) from [<c0122200>] (__warn+0xe8/0x100)
      [<c0122200>] (__warn) from [<c0122250>] (warn_slowpath_fmt+0x38/0x48)
      [<c0122250>] (warn_slowpath_fmt) from [<c02ac178>] (sysfs_warn_dup+0x64/0x74)
      [<c02ac178>] (sysfs_warn_dup) from [<c02ac254>] (sysfs_create_dir_ns+0x84/0x94)
      [<c02ac254>] (sysfs_create_dir_ns) from [<c03c8b8c>] (kobject_add_internal+0x9c/0x2ec)
      [<c03c8b8c>] (kobject_add_internal) from [<c03c8e24>] (kobject_add+0x48/0x98)
      [<c03c8e24>] (kobject_add) from [<c048d75c>] (device_add+0xe4/0x5a0)
      [<c048d75c>] (device_add) from [<c048ddb4>] (device_create_groups_vargs+0xac/0xbc)
      [<c048ddb4>] (device_create_groups_vargs) from [<c048dde4>] (device_create_vargs+0x20/0x28)
      [<c048dde4>] (device_create_vargs) from [<c02075c8>] (bdi_register_va+0x44/0xfc)
      [<c02075c8>] (bdi_register_va) from [<c023d378>] (super_setup_bdi_name+0x48/0xa4)
      [<c023d378>] (super_setup_bdi_name) from [<c0312ef4>] (nfs_fill_super+0x1a4/0x204)
      [<c0312ef4>] (nfs_fill_super) from [<c03133f0>] (nfs_fs_mount_common+0x140/0x1e8)
      [<c03133f0>] (nfs_fs_mount_common) from [<c03335cc>] (nfs4_remote_mount+0x50/0x58)
      [<c03335cc>] (nfs4_remote_mount) from [<c023ef98>] (mount_fs+0x14/0xa4)
      [<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128)
      [<c025cba0>] (vfs_kern_mount) from [<c033352c>] (nfs_do_root_mount+0x80/0xa0)
      [<c033352c>] (nfs_do_root_mount) from [<c0333818>] (nfs4_try_mount+0x28/0x3c)
      [<c0333818>] (nfs4_try_mount) from [<c0313874>] (nfs_fs_mount+0x2cc/0x8c4)
      [<c0313874>] (nfs_fs_mount) from [<c023ef98>] (mount_fs+0x14/0xa4)
      [<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128)
      [<c025cba0>] (vfs_kern_mount) from [<c02600f0>] (do_mount+0x158/0xc7c)
      [<c02600f0>] (do_mount) from [<c0260f98>] (SyS_mount+0x8c/0xb4)
      [<c0260f98>] (SyS_mount) from [<c0107840>] (ret_fast_syscall+0x0/0x3c)
      
      Fix the problem by always creating new bdi for a superblock as we used
      to do.
      Reported-and-tested-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Fixes: 0d3b12584972ce5781179ad3f15cca3cdb5cae05
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9052c7cf
    • Peter Zijlstra's avatar
      block/mq: Cure cpu hotplug lock inversion · eabe0659
      Peter Zijlstra authored
      By poking at /debug/sched_features I triggered the following splat:
      
       [] ======================================================
       [] WARNING: possible circular locking dependency detected
       [] 4.11.0-00873-g964c8b7-dirty #694 Not tainted
       [] ------------------------------------------------------
       [] bash/2109 is trying to acquire lock:
       []  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff8120cb8b>] static_key_slow_dec+0x1b/0x50
       []
       [] but task is already holding lock:
       []  (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170
       []
       [] which lock already depends on the new lock.
       []
       []
       [] the existing dependency chain (in reverse order) is:
       []
       [] -> #2 (&sb->s_type->i_mutex_key#4){+++++.}:
       []        lock_acquire+0x100/0x210
       []        down_write+0x28/0x60
       []        start_creating+0x5e/0xf0
       []        debugfs_create_dir+0x13/0x110
       []        blk_mq_debugfs_register+0x21/0x70
       []        blk_mq_register_dev+0x64/0xd0
       []        blk_register_queue+0x6a/0x170
       []        device_add_disk+0x22d/0x440
       []        loop_add+0x1f3/0x280
       []        loop_init+0x104/0x142
       []        do_one_initcall+0x43/0x180
       []        kernel_init_freeable+0x1de/0x266
       []        kernel_init+0xe/0x100
       []        ret_from_fork+0x31/0x40
       []
       [] -> #1 (all_q_mutex){+.+.+.}:
       []        lock_acquire+0x100/0x210
       []        __mutex_lock+0x6c/0x960
       []        mutex_lock_nested+0x1b/0x20
       []        blk_mq_init_allocated_queue+0x37c/0x4e0
       []        blk_mq_init_queue+0x3a/0x60
       []        loop_add+0xe5/0x280
       []        loop_init+0x104/0x142
       []        do_one_initcall+0x43/0x180
       []        kernel_init_freeable+0x1de/0x266
       []        kernel_init+0xe/0x100
       []        ret_from_fork+0x31/0x40
      
       []  *** DEADLOCK ***
       []
       [] 3 locks held by bash/2109:
       []  #0:  (sb_writers#11){.+.+.+}, at: [<ffffffff81292bcd>] vfs_write+0x17d/0x1a0
       []  #1:  (debugfs_srcu){......}, at: [<ffffffff8155a90d>] full_proxy_write+0x5d/0xd0
       []  #2:  (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170
       []
       [] stack backtrace:
       [] CPU: 9 PID: 2109 Comm: bash Not tainted 4.11.0-00873-g964c8b7-dirty #694
       [] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
       [] Call Trace:
      
       []  lock_acquire+0x100/0x210
       []  get_online_cpus+0x2a/0x90
       []  static_key_slow_dec+0x1b/0x50
       []  static_key_disable+0x20/0x30
       []  sched_feat_write+0x131/0x170
       []  full_proxy_write+0x97/0xd0
       []  __vfs_write+0x28/0x120
       []  vfs_write+0xb5/0x1a0
       []  SyS_write+0x49/0xa0
       []  entry_SYSCALL_64_fastpath+0x23/0xc2
      
      This is because of the cpu hotplug lock rework. Break the chain at #1
      by reversing the lock acquisition order. This way i_mutex_key#4 no
      longer depends on cpu_hotplug_lock and things are good.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      eabe0659
    • Javier González's avatar
      lightnvm: fix bad back free on error path · 507f7d68
      Javier González authored
      Free memory correctly when an allocation fails on a loop and we free
      backwards previously successful allocations.
      Signed-off-by: default avatarJavier González <javier@cnexlabs.com>
      Reviewed-by: default avatarMatias Bjørling <matias@cnexlabs.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      507f7d68
    • Javier González's avatar
      lightnvm: create cmd before allocating request · 2e13f33a
      Javier González authored
      Create nvme command before allocating a request using
      nvme_alloc_request, which uses the command direction. Up until now, the
      command has been zeroized, so all commands have been allocated as a
      read operation.
      Signed-off-by: default avatarJavier González <javier@cnexlabs.com>
      Reviewed-by: default avatarMatias Bjørling <matias@cnexlabs.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2e13f33a
  2. 03 May, 2017 3 commits
  3. 02 May, 2017 9 commits
  4. 01 May, 2017 3 commits
    • Linus Torvalds's avatar
      Merge branch 'for-4.12/post-merge' of git://git.kernel.dk/linux-block · 08c521a2
      Linus Torvalds authored
      Pull second round of block layer updates from Jens Axboe:
      
       - Further fixups to the NVMe APST code, from Andy.
      
       - Various fixes for (mostly) nvme-fc, from Christoph and James.
      
       - NVMe scsi fixes from Jon and Christoph.
      
      * 'for-4.12/post-merge' of git://git.kernel.dk/linux-block: (39 commits)
        nvme-scsi: remove nvme_trans_security_protocol
        nvme-lightnvm: add missing endianess conversion in nvme_nvm_end_io
        nvme-scsi: Consider LBA format in IO splitting calculation
        nvme-fc: avoid memory corruption caused by calling nvmf_free_options() twice
        lpfc: Fix memory corruption of the lpfc_ncmd->list pointers
        nvme: Add nvme_core.force_apst to ignore the NO_APST quirk
        nvme: Display raw APST configuration via DYNAMIC_DEBUG
        nvme: Fix APST comment
        lpfc revison 11.2.0.12
        Fix Express lane queue creation.
        Update ABORT processing for NVMET.
        Fix implicit logo and RSCN handling for NVMET
        Add Fabric assigned WWN support.
        Fix max_sgl_segments settings for NVME / NVMET
        Fix crash after issuing lip reset
        Fix driver load issues when MRQ=8
        Remove hba lock from NVMET issue WQE.
        Fix nvme initiator handling when not enabled.
        Fix driver usage of 128B WQEs when WQ_CREATE is V1.
        Fix driver unload/reload operation.
        ...
      08c521a2
    • Linus Torvalds's avatar
      Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block · 69475292
      Linus Torvalds authored
      Pull block layer updates from Jens Axboe:
      
       - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
         was initially a fork of CFQ, but subsequently changed to implement
         fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
         to be used on desktop type single drives, providing good fairness.
         From Paolo.
      
       - Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
         using a scalable token based algorithm that throttles IO based on
         live completion IO stats, similary to blk-wbt. From Omar.
      
       - A series from Jan, moving users to separately allocated backing
         devices. This continues the work of separating backing device life
         times, solving various problems with hot removal.
      
       - A series of updates for lightnvm, mostly from Javier. Includes a
         'pblk' target that exposes an open channel SSD as a physical block
         device.
      
       - A series of fixes and improvements for nbd from Josef.
      
       - A series from Omar, removing queue sharing between devices on mostly
         legacy drivers. This helps us clean up other bits, if we know that a
         queue only has a single device backing. This has been overdue for
         more than a decade.
      
       - Fixes for the blk-stats, and improvements to unify the stats and user
         windows. This both improves blk-wbt, and enables other users to
         register a need to receive IO stats for a device. From Omar.
      
       - blk-throttle improvements from Shaohua. This provides a scalable
         framework for implementing scalable priotization - particularly for
         blk-mq, but applicable to any type of block device. The interface is
         marked experimental for now.
      
       - Bucketized IO stats for IO polling from Stephen Bates. This improves
         efficiency of polled workloads in the presence of mixed block size
         IO.
      
       - A few fixes for opal, from Scott.
      
       - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
         From a variety of folks, mostly Sagi and James Smart.
      
       - A series from Bart, improving our exposed info and capabilities from
         the blk-mq debugfs support.
      
       - A series from Christoph, cleaning up how handle WRITE_ZEROES.
      
       - A series from Christoph, cleaning up the block layer handling of how
         we track errors in a request. On top of being a nice cleanup, it also
         shrinks the size of struct request a bit.
      
       - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
         never used by platforms, and the latter has outlived it's usefulness.
      
       - Various little bug fixes and cleanups from a wide variety of folks.
      
      * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
        block: hide badblocks attribute by default
        blk-mq: unify hctx delay_work and run_work
        block: add kblock_mod_delayed_work_on()
        blk-mq: unify hctx delayed_run_work and run_work
        nbd: fix use after free on module unload
        MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
        blk-mq-sched: alloate reserved tags out of normal pool
        mtip32xx: use runtime tag to initialize command header
        scsi: Implement blk_mq_ops.show_rq()
        blk-mq: Add blk_mq_ops.show_rq()
        blk-mq: Show operation, cmd_flags and rq_flags names
        blk-mq: Make blk_flags_show() callers append a newline character
        blk-mq: Move the "state" debugfs attribute one level down
        blk-mq: Unregister debugfs attributes earlier
        blk-mq: Only unregister hctxs for which registration succeeded
        blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
        blk-mq: Let blk_mq_debugfs_register() look up the queue name
        blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
        ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
        ide-pm: always pass 0 error to __blk_end_request_all
        ..
      69475292
    • Linus Torvalds's avatar
      Linux 4.11 · a351e9b9
      Linus Torvalds authored
      a351e9b9
  5. 30 Apr, 2017 3 commits
  6. 29 Apr, 2017 3 commits
  7. 28 Apr, 2017 5 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0e911788
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Just a couple more stragglers, I really hope this is it.
      
        1) Don't let frags slip down into the GRO segmentation handlers, from
           Steffen Klassert.
      
        2) Truesize under-estimation triggers warnings in TCP over loopback
           with socket filters, 2 part fix from Eric Dumazet.
      
        3) Fix undesirable reset of bonding MTU to ETH_HLEN on slave removal,
           from Paolo Abeni.
      
        4) If we flush the XFRM policy after garbage collection, it doesn't
           work because stray entries can be created afterwards. Fix from Xin
           Long.
      
        5) Hung socket connection fixes in TIPC from Parthasarathy Bhuvaragan.
      
        6) Fix GRO regression with IPSEC when netfilter is disabled, from
           Sabrina Dubroca.
      
        7) Fix cpsw driver Kconfig dependency regression, from Arnd Bergmann"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: hso: register netdev later to avoid a race condition
        net: adjust skb->truesize in ___pskb_trim()
        tcp: do not underestimate skb->truesize in tcp_trim_head()
        bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
        ipv4: Don't pass IP fragments to upper layer GRO handlers.
        cpsw/netcp: refine cpts dependency
        tipc: close the connection if protocol messages contain errors
        tipc: improve error validations for sockets in CONNECTING state
        tipc: Fix missing connection request handling
        xfrm: fix GRO for !CONFIG_NETFILTER
        xfrm: do the garbage collection after flushing policy
      0e911788
    • Andreas Kemnade's avatar
      net: hso: register netdev later to avoid a race condition · 4c761daf
      Andreas Kemnade authored
      If the netdev is accessed before the urbs are initialized,
      there will be NULL pointer dereferences. That is avoided by
      registering it when it is fully initialized.
      
      This case occurs e.g. if dhcpcd is running in the background
      and the device is probed, either after insmod hso or
      when the device appears on the usb bus.
      
      A backtrace is the following:
      
      [ 1357.356048] usb 1-2: new high-speed USB device number 12 using ehci-omap
      [ 1357.551177] usb 1-2: New USB device found, idVendor=0af0, idProduct=8800
      [ 1357.558654] usb 1-2: New USB device strings: Mfr=3, Product=2, SerialNumber=0
      [ 1357.568572] usb 1-2: Product: Globetrotter HSUPA Modem
      [ 1357.574096] usb 1-2: Manufacturer: Option N.V.
      [ 1357.685882] hso 1-2:1.5: Not our interface
      [ 1460.886352] hso: unloaded
      [ 1460.889984] usbcore: deregistering interface driver hso
      [ 1513.769134] hso: ../drivers/net/usb/hso.c: Option Wireless
      [ 1513.846771] Unable to handle kernel NULL pointer dereference at virtual address 00000030
      [ 1513.887664] hso 1-2:1.5: Not our interface
      [ 1513.906890] usbcore: registered new interface driver hso
      [ 1513.937988] pgd = ecdec000
      [ 1513.949890] [00000030] *pgd=acd15831, *pte=00000000, *ppte=00000000
      [ 1513.956573] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
      [ 1513.962371] Modules linked in: hso usb_f_ecm omap2430 bnep bluetooth g_ether usb_f_rndis u_ether libcomposite configfs ipv6 arc4 wl18xx wlcore mac80211 cfg80211 bq27xxx_battery panel_tpo_td028ttec1 omapdrm drm_kms_helper cfbfillrect snd_soc_simple_card syscopyarea cfbimgblt snd_soc_simple_card_utils sysfillrect sysimgblt fb_sys_fops snd_soc_omap_twl4030 cfbcopyarea encoder_opa362 drm twl4030_madc_hwmon wwan_on_off snd_soc_gtm601 pwm_omap_dmtimer generic_adc_battery connector_analog_tv pwm_bl extcon_gpio omap3_isp wlcore_sdio videobuf2_dma_contig videobuf2_memops w1_bq27000 videobuf2_v4l2 videobuf2_core omap_hdq snd_soc_omap_mcbsp ov9650 snd_soc_omap bmp280_i2c bmg160_i2c v4l2_common snd_pcm_dmaengine bmp280 bmg160_core at24 bmc150_magn_i2c nvmem_core videodev phy_twl4030_usb bmc150_accel_i2c tsc2007
      [ 1514.037384]  bmc150_magn bmc150_accel_core media leds_tca6507 bno055 industrialio_triggered_buffer kfifo_buf gpio_twl4030 musb_hdrc snd_soc_twl4030 twl4030_vibra twl4030_madc twl4030_pwrbutton twl4030_charger industrialio w2sg0004 ehci_omap omapdss [last unloaded: hso]
      [ 1514.062622] CPU: 0 PID: 3433 Comm: dhcpcd Tainted: G        W       4.11.0-rc8-letux+ #1
      [ 1514.071136] Hardware name: Generic OMAP36xx (Flattened Device Tree)
      [ 1514.077758] task: ee748240 task.stack: ecdd6000
      [ 1514.082580] PC is at hso_start_net_device+0x50/0xc0 [hso]
      [ 1514.088287] LR is at hso_net_open+0x68/0x84 [hso]
      [ 1514.093231] pc : [<bf79c304>]    lr : [<bf79ced8>]    psr: a00f0013
      sp : ecdd7e20  ip : 00000000  fp : ffffffff
      [ 1514.105316] r10: 00000000  r9 : ed0e080c  r8 : ecd8fe2c
      [ 1514.110839] r7 : bf79cef4  r6 : ecd8fe00  r5 : 00000000  r4 : ed0dbd80
      [ 1514.117706] r3 : 00000000  r2 : c0020c80  r1 : 00000000  r0 : ecdb7800
      [ 1514.124572] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [ 1514.132110] Control: 10c5387d  Table: acdec019  DAC: 00000051
      [ 1514.138153] Process dhcpcd (pid: 3433, stack limit = 0xecdd6218)
      [ 1514.144470] Stack: (0xecdd7e20 to 0xecdd8000)
      [ 1514.149078] 7e20: ed0dbd80 ecd8fe98 00000001 00000000 ecd8f800 ecd8fe00 ecd8fe60 00000000
      [ 1514.157714] 7e40: ed0e080c bf79ced8 bf79ce70 ecd8f800 00000001 bf7a0258 ecd8f830 c068d958
      [ 1514.166320] 7e60: c068d8b8 ecd8f800 00000001 00001091 00001090 c068dba4 ecd8f800 00001090
      [ 1514.174926] 7e80: ecd8f940 ecd8f800 00000000 c068dc60 00000000 00000001 ed0e0800 ecd8f800
      [ 1514.183563] 7ea0: 00000000 c06feaa8 c0ca39c2 beea57dc 00000020 00000000 306f7368 00000000
      [ 1514.192169] 7ec0: 00000000 00000000 00001091 00000000 00000000 00000000 00000000 00008914
      [ 1514.200805] 7ee0: eaa9ab60 beea57dc c0c9bfc0 eaa9ab40 00000006 00000000 00046858 c066a948
      [ 1514.209411] 7f00: beea57dc eaa9ab60 ecc6b0c0 c02837b0 00000006 c0282c90 0000c000 c0283654
      [ 1514.218017] 7f20: c09b0c00 c098bc31 00000001 c0c5e513 c0c5e513 00000000 c0151354 c01a20c0
      [ 1514.226654] 7f40: c0c5e513 c01a3134 ecdd6000 c01a3160 ee7487f0 600f0013 00000000 ee748240
      [ 1514.235260] 7f60: ee748734 00000000 ecc6b0c0 ecc6b0c0 beea57dc 00008914 00000006 00000000
      [ 1514.243896] 7f80: 00046858 c02837b0 00001091 0003a1f0 00046608 0003a248 00000036 c01071e4
      [ 1514.252502] 7fa0: ecdd6000 c0107040 0003a1f0 00046608 00000006 00008914 beea57dc 00001091
      [ 1514.261108] 7fc0: 0003a1f0 00046608 0003a248 00000036 0003ac0c 00046608 00046610 00046858
      [ 1514.269744] 7fe0: 0003a0ac beea57d4 000167eb b6f23106 400f0030 00000006 00000000 00000000
      [ 1514.278411] [<bf79c304>] (hso_start_net_device [hso]) from [<bf79ced8>] (hso_net_open+0x68/0x84 [hso])
      [ 1514.288238] [<bf79ced8>] (hso_net_open [hso]) from [<c068d958>] (__dev_open+0xa0/0xf4)
      [ 1514.296600] [<c068d958>] (__dev_open) from [<c068dba4>] (__dev_change_flags+0x8c/0x130)
      [ 1514.305023] [<c068dba4>] (__dev_change_flags) from [<c068dc60>] (dev_change_flags+0x18/0x48)
      [ 1514.313934] [<c068dc60>] (dev_change_flags) from [<c06feaa8>] (devinet_ioctl+0x348/0x714)
      [ 1514.322540] [<c06feaa8>] (devinet_ioctl) from [<c066a948>] (sock_ioctl+0x2b0/0x308)
      [ 1514.330627] [<c066a948>] (sock_ioctl) from [<c0282c90>] (vfs_ioctl+0x20/0x34)
      [ 1514.338165] [<c0282c90>] (vfs_ioctl) from [<c0283654>] (do_vfs_ioctl+0x82c/0x93c)
      [ 1514.346038] [<c0283654>] (do_vfs_ioctl) from [<c02837b0>] (SyS_ioctl+0x4c/0x74)
      [ 1514.353759] [<c02837b0>] (SyS_ioctl) from [<c0107040>] (ret_fast_syscall+0x0/0x1c)
      [ 1514.361755] Code: e3822103 e3822080 e1822781 e5981014 (e5832030)
      [ 1514.510833] ---[ end trace dfb3e53c657f34a0 ]---
      Reported-by: default avatarH. Nikolaus Schaller <hns@goldelico.com>
      Signed-off-by: default avatarAndreas Kemnade <andreas@kemnade.info>
      Reviewed-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c761daf
    • Eric Dumazet's avatar
      net: adjust skb->truesize in ___pskb_trim() · c21b48cc
      Eric Dumazet authored
      Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
      skb_try_coalesce() using syzkaller and a filter attached to a TCP
      socket.
      
      As we did recently in commit 158f323b ("net: adjust skb->truesize in
      pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
      via a call to skb_condense().
      
      If all frags were freed, then skb->truesize can be recomputed.
      
      This call can be done if skb is not yet owned, or destructor is
      sock_edemux().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c21b48cc
    • Eric Dumazet's avatar
      tcp: do not underestimate skb->truesize in tcp_trim_head() · 7162fb24
      Eric Dumazet authored
      Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
      skb_try_coalesce() using syzkaller and a filter attached to a TCP
      socket over loopback interface.
      
      I believe one issue with looped skbs is that tcp_trim_head() can end up
      producing skb with under estimated truesize.
      
      It hardly matters for normal conditions, since packets sent over
      loopback are never truncated.
      
      Bytes trimmed from skb->head should not change skb truesize, since
      skb->head is not reallocated.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7162fb24
    • Paolo Abeni's avatar
      bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal · 19cdead3
      Paolo Abeni authored
      On slave list updates, the bonding driver computes its hard_header_len
      as the maximum of all enslaved devices's hard_header_len.
      If the slave list is empty, e.g. on last enslaved device removal,
      ETH_HLEN is used.
      
      Since the bonding header_ops are set only when the first enslaved
      device is attached, the above can lead to header_ops->create()
      being called with the wrong skb headroom in place.
      
      If bond0 is configured on top of ipoib devices, with the
      following commands:
      
      ifup bond0
      for slave in $BOND_SLAVES_LIST; do
      	ip link set dev $slave nomaster
      done
      ping -c 1 <ip on bond0 subnet>
      
      we will obtain a skb_under_panic() with a similar call trace:
      	skb_push+0x3d/0x40
      	push_pseudo_header+0x17/0x30 [ib_ipoib]
      	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
      	arp_create+0x12f/0x220
      	arp_send_dst.part.19+0x28/0x50
      	arp_solicit+0x115/0x290
      	neigh_probe+0x4d/0x70
      	__neigh_event_send+0xa7/0x230
      	neigh_resolve_output+0x12e/0x1c0
      	ip_finish_output2+0x14b/0x390
      	ip_finish_output+0x136/0x1e0
      	ip_output+0x76/0xe0
      	ip_local_out+0x35/0x40
      	ip_send_skb+0x19/0x40
      	ip_push_pending_frames+0x33/0x40
      	raw_sendmsg+0x7d3/0xb50
      	inet_sendmsg+0x31/0xb0
      	sock_sendmsg+0x38/0x50
      	SYSC_sendto+0x102/0x190
      	SyS_sendto+0xe/0x10
      	do_syscall_64+0x67/0x180
      	entry_SYSCALL64_slow_path+0x25/0x25
      
      This change addresses the issue avoiding updating the bonding device
      hard_header_len when the slaves list become empty, forbidding to
      shrink it below the value used by header_ops->create().
      
      The bug is there since commit 54ef3137 ("[PATCH] bonding: Handle large
      hard_header_len") but the panic can be triggered only since
      commit fc791b63 ("IB/ipoib: move back IB LL address into the hard
      header").
      Reported-by: default avatarNorbert P <noe@physik.uzh.ch>
      Fixes: 54ef3137 ("[PATCH] bonding: Handle large hard_header_len")
      Fixes: fc791b63 ("IB/ipoib: move back IB LL address into the hard header")
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19cdead3