Commits · 2e13f33a2464fc3aa7783022a90309cabeca8935 · Kirill Smelkov / linux

04 May, 2017 1 commit

lightnvm: create cmd before allocating request · 2e13f33a

Javier González authored May 03, 2017

Create nvme command before allocating a request using
nvme_alloc_request, which uses the command direction. Up until now, the
command has been zeroized, so all commands have been allocated as a
read operation.
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

2e13f33a

03 May, 2017 3 commits

blk-mq: don't use sync workqueue flushing from drivers · 2719aa21

Jens Axboe authored May 03, 2017

A previous commit introduced the sync flush, which we need from
internal callers like blk_mq_quiesce_queue(). However, we also
call the stop helpers from drivers, particularly from ->queue_rq()
when we have to stop processing for a bit. We can't block from
those locations, and we don't have to guarantee that we're
fully flushed.

Fixes: 9f993737 ("blk-mq: unify hctx delayed_run_work and run_work")
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

2719aa21

mtip32xx: convert internal commands to regular block infrastructure · 6f63503c

Jens Axboe authored May 02, 2017

Get rid of the private end_io handlers and data, and just use the
regular block IO path for these requests. This removes a lot of
redundant code.
Signed-off-by: Jens Axboe <axboe@fb.com>

6f63503c

mtip32xx: cleanup internal tag assumptions · 994ff079

Jens Axboe authored May 02, 2017

We don't decode the internal tag to the proper group or tag
indx. This works fine because we have hard wired it as 0 for now,
but could break if we get rid of that.
Signed-off-by: Jens Axboe <axboe@fb.com>

994ff079

02 May, 2017 9 commits

block: don't call blk_mq_quiesce_queue() after queue is frozen · 7a148c2f

Ming Lei authored May 02, 2017

After queue is frozen, no request in this queue can be in use at all, so
there can't be any .queue_rq() running on this queue.  It isn't
necessary to call blk_mq_quiesce_queue() any more, so remove it in both
elevator_switch_mq() and blk_mq_update_nr_requests().

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>

Fixed up the description a bit.
Signed-off-by: Jens Axboe <axboe@fb.com>

7a148c2f

blk-mq: update ->init_request and ->exit_request prototypes · d6296d39

Christoph Hellwig authored May 01, 2017

Remove the request_idx parameter, which can't be used safely now that we
support I/O schedulers with blk-mq.  Except for a superflous check in
mtip32xx it was unused anyway.

Also pass the tag_set instead of just the driver data - this allows drivers
to avoid some code duplication in a follow on cleanup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

d6296d39

Revert "mtip32xx: pass BLK_MQ_F_NO_SCHED" · a800ce8b

Jens Axboe authored Apr 27, 2017

This reverts commit 4981d04d.

The driver has been converted to using the proper infrastructure
for issuing internal commands. This means it's now safe to use with
the scheduling infrastruture, so we can now revert the change
that turned off scheduling for mtip32xx.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

a800ce8b

blk-mq-sched: remove hack that bypasses scheduler for reserved requests · 9f2779bf

Jens Axboe authored Apr 28, 2017

We have update the troublesome driver (mtip32xx) to deal with this
appropriately. So kill the hack that bypassed scheduler allocation
and insertion for reserved requests.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

9f2779bf

mtip32xx: convert internal command issue to block IO path · 3f5e6a35

Jens Axboe authored Apr 28, 2017

The driver special cases certain things for command issue, depending
on whether it's an internal command or not. Make the internal commands
use the regular infrastructure for issuing IO.

Since this is an 8-group souped up AHCI variant, we have to deal
with NCQ vs non-queueable commands. Do this from the queue_rq
handler, by backing off unless the drive is idle.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

3f5e6a35

mtip32xx: abstract out "are any commands active" helper · baed548a

Jens Axboe authored Apr 28, 2017

This is a prep patch for backoff in ->queue_rq() for non-ncq commands.
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

baed548a

mtip32xx: kill atomic argument to mtip_quiesce_io() · 8afdd94c

Jens Axboe authored Apr 27, 2017

All callers now pass in GFP_KERNEL, get rid of the argument.
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

8afdd94c

mtip32xx: get rid of 'atomic' argument to mtip_exec_internal_command() · 0f6422a2

Jens Axboe authored Apr 27, 2017

All callers can safely block. Kill the atomic/block argument, and
remove the argument from all callers.
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

0f6422a2

block: Remove elevator_change() · c0332694

Bart Van Assche authored May 01, 2017

Since commit 84253394 ("remove the mg_disk driver") removed the
only caller of elevator_change(), also remove the elevator_change()
function itself.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

c0332694

01 May, 2017 3 commits

Merge branch 'for-4.12/post-merge' of git://git.kernel.dk/linux-block · 08c521a2

Linus Torvalds authored May 01, 2017

Pull second round of block layer updates from Jens Axboe:

 - Further fixups to the NVMe APST code, from Andy.

 - Various fixes for (mostly) nvme-fc, from Christoph and James.

 - NVMe scsi fixes from Jon and Christoph.

* 'for-4.12/post-merge' of git://git.kernel.dk/linux-block: (39 commits)
  nvme-scsi: remove nvme_trans_security_protocol
  nvme-lightnvm: add missing endianess conversion in nvme_nvm_end_io
  nvme-scsi: Consider LBA format in IO splitting calculation
  nvme-fc: avoid memory corruption caused by calling nvmf_free_options() twice
  lpfc: Fix memory corruption of the lpfc_ncmd->list pointers
  nvme: Add nvme_core.force_apst to ignore the NO_APST quirk
  nvme: Display raw APST configuration via DYNAMIC_DEBUG
  nvme: Fix APST comment
  lpfc revison 11.2.0.12
  Fix Express lane queue creation.
  Update ABORT processing for NVMET.
  Fix implicit logo and RSCN handling for NVMET
  Add Fabric assigned WWN support.
  Fix max_sgl_segments settings for NVME / NVMET
  Fix crash after issuing lip reset
  Fix driver load issues when MRQ=8
  Remove hba lock from NVMET issue WQE.
  Fix nvme initiator handling when not enabled.
  Fix driver usage of 128B WQEs when WQ_CREATE is V1.
  Fix driver unload/reload operation.
  ...

08c521a2

Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block · 69475292

Linus Torvalds authored May 01, 2017

Pull block layer updates from Jens Axboe:

 - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
   was initially a fork of CFQ, but subsequently changed to implement
   fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
   to be used on desktop type single drives, providing good fairness.
   From Paolo.

 - Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
   using a scalable token based algorithm that throttles IO based on
   live completion IO stats, similary to blk-wbt. From Omar.

 - A series from Jan, moving users to separately allocated backing
   devices. This continues the work of separating backing device life
   times, solving various problems with hot removal.

 - A series of updates for lightnvm, mostly from Javier. Includes a
   'pblk' target that exposes an open channel SSD as a physical block
   device.

 - A series of fixes and improvements for nbd from Josef.

 - A series from Omar, removing queue sharing between devices on mostly
   legacy drivers. This helps us clean up other bits, if we know that a
   queue only has a single device backing. This has been overdue for
   more than a decade.

 - Fixes for the blk-stats, and improvements to unify the stats and user
   windows. This both improves blk-wbt, and enables other users to
   register a need to receive IO stats for a device. From Omar.

 - blk-throttle improvements from Shaohua. This provides a scalable
   framework for implementing scalable priotization - particularly for
   blk-mq, but applicable to any type of block device. The interface is
   marked experimental for now.

 - Bucketized IO stats for IO polling from Stephen Bates. This improves
   efficiency of polled workloads in the presence of mixed block size
   IO.

 - A few fixes for opal, from Scott.

 - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
   From a variety of folks, mostly Sagi and James Smart.

 - A series from Bart, improving our exposed info and capabilities from
   the blk-mq debugfs support.

 - A series from Christoph, cleaning up how handle WRITE_ZEROES.

 - A series from Christoph, cleaning up the block layer handling of how
   we track errors in a request. On top of being a nice cleanup, it also
   shrinks the size of struct request a bit.

 - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
   never used by platforms, and the latter has outlived it's usefulness.

 - Various little bug fixes and cleanups from a wide variety of folks.

* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
  block: hide badblocks attribute by default
  blk-mq: unify hctx delay_work and run_work
  block: add kblock_mod_delayed_work_on()
  blk-mq: unify hctx delayed_run_work and run_work
  nbd: fix use after free on module unload
  MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
  blk-mq-sched: alloate reserved tags out of normal pool
  mtip32xx: use runtime tag to initialize command header
  scsi: Implement blk_mq_ops.show_rq()
  blk-mq: Add blk_mq_ops.show_rq()
  blk-mq: Show operation, cmd_flags and rq_flags names
  blk-mq: Make blk_flags_show() callers append a newline character
  blk-mq: Move the "state" debugfs attribute one level down
  blk-mq: Unregister debugfs attributes earlier
  blk-mq: Only unregister hctxs for which registration succeeded
  blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
  blk-mq: Let blk_mq_debugfs_register() look up the queue name
  blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
  ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
  ide-pm: always pass 0 error to __blk_end_request_all
  ..

69475292

Linux 4.11 · a351e9b9
Linus Torvalds authored Apr 30, 2017

a351e9b9

30 Apr, 2017 3 commits

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 97ce89f8

Linus Torvalds authored Apr 30, 2017

Pull x86 fixes from Thomas Gleixner:
 "The final fixes for 4.11:

   - prevent a triple fault with function graph tracing triggered via
     suspend to ram

   - prevent optimizing for size when function graph tracing is enabled
     and the compiler does not support -mfentry

   - prevent mwaitx() being called with a zero timeout as mwaitx() might
     never return. Observed on the new Ryzen CPUs"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Prevent timer value 0 for MWAITX
  x86/build: convert function graph '-Os' error to warning
  ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram

97ce89f8

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 14e07f02

Linus Torvalds authored Apr 30, 2017

Pull scheduler fix from Thomas Gleixner:
 "A single fix for a cputime accounting regression which got introduced
  in the 4.11 cycle"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cputime: Fix ksoftirqd cputime accounting regression

14e07f02

Prevent timer value 0 for MWAITX · 88d879d2

Janakarajan Natarajan authored Apr 25, 2017

Newer hardware has uncovered a bug in the software implementation of
using MWAITX for the delay function. A value of 0 for the timer is meant
to indicate that a timeout will not be used to exit MWAITX. On newer
hardware this can result in MWAITX never returning, resulting in NMI
soft lockup messages being printed. On older hardware, some of the other
conditions under which MWAITX can exit masked this issue. The AMD APM
does not currently document this and will be updated.

Please refer to http://marc.info/?l=kvm&m=148950623231140 for
information regarding NMI soft lockup messages on an AMD Ryzen 1800X.
This has been root-caused as a 0 passed to MWAITX causing it to wait
indefinitely.

This change has the added benefit of avoiding the unnecessary setup of
MONITORX/MWAITX when the delay value is zero.
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Link: http://lkml.kernel.org/r/1493156643-29366-1-git-send-email-Janakarajan.Natarajan@amd.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>

88d879d2

29 Apr, 2017 3 commits

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 8c9a694d

Linus Torvalds authored Apr 29, 2017

Pull iov iter fix from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix a braino in ITER_PIPE iov_iter_revert()

8c9a694d

fix a braino in ITER_PIPE iov_iter_revert() · 4fa55cef

Al Viro authored Apr 29, 2017

Fixes: 27c0e374Tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

4fa55cef

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 0060e79a

Linus Torvalds authored Apr 28, 2017

Pull clk fix from Stephen Boyd:
 "One odd config build fix for a recent Allwinner clock driver change
  that got merged. The common code called code in another file that
  wasn't always built. This just forces it on so people don't run into
  this bad configuration"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi-ng: always select CCU_GATE

0060e79a

28 Apr, 2017 18 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0e911788

Linus Torvalds authored Apr 28, 2017

Pull networking fixes from David Miller:
 "Just a couple more stragglers, I really hope this is it.

  1) Don't let frags slip down into the GRO segmentation handlers, from
     Steffen Klassert.

  2) Truesize under-estimation triggers warnings in TCP over loopback
     with socket filters, 2 part fix from Eric Dumazet.

  3) Fix undesirable reset of bonding MTU to ETH_HLEN on slave removal,
     from Paolo Abeni.

  4) If we flush the XFRM policy after garbage collection, it doesn't
     work because stray entries can be created afterwards. Fix from Xin
     Long.

  5) Hung socket connection fixes in TIPC from Parthasarathy Bhuvaragan.

  6) Fix GRO regression with IPSEC when netfilter is disabled, from
     Sabrina Dubroca.

  7) Fix cpsw driver Kconfig dependency regression, from Arnd Bergmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: hso: register netdev later to avoid a race condition
  net: adjust skb->truesize in ___pskb_trim()
  tcp: do not underestimate skb->truesize in tcp_trim_head()
  bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
  ipv4: Don't pass IP fragments to upper layer GRO handlers.
  cpsw/netcp: refine cpts dependency
  tipc: close the connection if protocol messages contain errors
  tipc: improve error validations for sockets in CONNECTING state
  tipc: Fix missing connection request handling
  xfrm: fix GRO for !CONFIG_NETFILTER
  xfrm: do the garbage collection after flushing policy

0e911788

net: hso: register netdev later to avoid a race condition · 4c761daf

Andreas Kemnade authored Apr 26, 2017

If the netdev is accessed before the urbs are initialized,
there will be NULL pointer dereferences. That is avoided by
registering it when it is fully initialized.

This case occurs e.g. if dhcpcd is running in the background
and the device is probed, either after insmod hso or
when the device appears on the usb bus.

A backtrace is the following:

[ 1357.356048] usb 1-2: new high-speed USB device number 12 using ehci-omap
[ 1357.551177] usb 1-2: New USB device found, idVendor=0af0, idProduct=8800
[ 1357.558654] usb 1-2: New USB device strings: Mfr=3, Product=2, SerialNumber=0
[ 1357.568572] usb 1-2: Product: Globetrotter HSUPA Modem
[ 1357.574096] usb 1-2: Manufacturer: Option N.V.
[ 1357.685882] hso 1-2:1.5: Not our interface
[ 1460.886352] hso: unloaded
[ 1460.889984] usbcore: deregistering interface driver hso
[ 1513.769134] hso: ../drivers/net/usb/hso.c: Option Wireless
[ 1513.846771] Unable to handle kernel NULL pointer dereference at virtual address 00000030
[ 1513.887664] hso 1-2:1.5: Not our interface
[ 1513.906890] usbcore: registered new interface driver hso
[ 1513.937988] pgd = ecdec000
[ 1513.949890] [00000030] *pgd=acd15831, *pte=00000000, *ppte=00000000
[ 1513.956573] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
[ 1513.962371] Modules linked in: hso usb_f_ecm omap2430 bnep bluetooth g_ether usb_f_rndis u_ether libcomposite configfs ipv6 arc4 wl18xx wlcore mac80211 cfg80211 bq27xxx_battery panel_tpo_td028ttec1 omapdrm drm_kms_helper cfbfillrect snd_soc_simple_card syscopyarea cfbimgblt snd_soc_simple_card_utils sysfillrect sysimgblt fb_sys_fops snd_soc_omap_twl4030 cfbcopyarea encoder_opa362 drm twl4030_madc_hwmon wwan_on_off snd_soc_gtm601 pwm_omap_dmtimer generic_adc_battery connector_analog_tv pwm_bl extcon_gpio omap3_isp wlcore_sdio videobuf2_dma_contig videobuf2_memops w1_bq27000 videobuf2_v4l2 videobuf2_core omap_hdq snd_soc_omap_mcbsp ov9650 snd_soc_omap bmp280_i2c bmg160_i2c v4l2_common snd_pcm_dmaengine bmp280 bmg160_core at24 bmc150_magn_i2c nvmem_core videodev phy_twl4030_usb bmc150_accel_i2c tsc2007
[ 1514.037384]  bmc150_magn bmc150_accel_core media leds_tca6507 bno055 industrialio_triggered_buffer kfifo_buf gpio_twl4030 musb_hdrc snd_soc_twl4030 twl4030_vibra twl4030_madc twl4030_pwrbutton twl4030_charger industrialio w2sg0004 ehci_omap omapdss [last unloaded: hso]
[ 1514.062622] CPU: 0 PID: 3433 Comm: dhcpcd Tainted: G        W       4.11.0-rc8-letux+ #1
[ 1514.071136] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[ 1514.077758] task: ee748240 task.stack: ecdd6000
[ 1514.082580] PC is at hso_start_net_device+0x50/0xc0 [hso]
[ 1514.088287] LR is at hso_net_open+0x68/0x84 [hso]
[ 1514.093231] pc : [<bf79c304>]    lr : [<bf79ced8>]    psr: a00f0013
sp : ecdd7e20  ip : 00000000  fp : ffffffff
[ 1514.105316] r10: 00000000  r9 : ed0e080c  r8 : ecd8fe2c
[ 1514.110839] r7 : bf79cef4  r6 : ecd8fe00  r5 : 00000000  r4 : ed0dbd80
[ 1514.117706] r3 : 00000000  r2 : c0020c80  r1 : 00000000  r0 : ecdb7800
[ 1514.124572] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[ 1514.132110] Control: 10c5387d  Table: acdec019  DAC: 00000051
[ 1514.138153] Process dhcpcd (pid: 3433, stack limit = 0xecdd6218)
[ 1514.144470] Stack: (0xecdd7e20 to 0xecdd8000)
[ 1514.149078] 7e20: ed0dbd80 ecd8fe98 00000001 00000000 ecd8f800 ecd8fe00 ecd8fe60 00000000
[ 1514.157714] 7e40: ed0e080c bf79ced8 bf79ce70 ecd8f800 00000001 bf7a0258 ecd8f830 c068d958
[ 1514.166320] 7e60: c068d8b8 ecd8f800 00000001 00001091 00001090 c068dba4 ecd8f800 00001090
[ 1514.174926] 7e80: ecd8f940 ecd8f800 00000000 c068dc60 00000000 00000001 ed0e0800 ecd8f800
[ 1514.183563] 7ea0: 00000000 c06feaa8 c0ca39c2 beea57dc 00000020 00000000 306f7368 00000000
[ 1514.192169] 7ec0: 00000000 00000000 00001091 00000000 00000000 00000000 00000000 00008914
[ 1514.200805] 7ee0: eaa9ab60 beea57dc c0c9bfc0 eaa9ab40 00000006 00000000 00046858 c066a948
[ 1514.209411] 7f00: beea57dc eaa9ab60 ecc6b0c0 c02837b0 00000006 c0282c90 0000c000 c0283654
[ 1514.218017] 7f20: c09b0c00 c098bc31 00000001 c0c5e513 c0c5e513 00000000 c0151354 c01a20c0
[ 1514.226654] 7f40: c0c5e513 c01a3134 ecdd6000 c01a3160 ee7487f0 600f0013 00000000 ee748240
[ 1514.235260] 7f60: ee748734 00000000 ecc6b0c0 ecc6b0c0 beea57dc 00008914 00000006 00000000
[ 1514.243896] 7f80: 00046858 c02837b0 00001091 0003a1f0 00046608 0003a248 00000036 c01071e4
[ 1514.252502] 7fa0: ecdd6000 c0107040 0003a1f0 00046608 00000006 00008914 beea57dc 00001091
[ 1514.261108] 7fc0: 0003a1f0 00046608 0003a248 00000036 0003ac0c 00046608 00046610 00046858
[ 1514.269744] 7fe0: 0003a0ac beea57d4 000167eb b6f23106 400f0030 00000006 00000000 00000000
[ 1514.278411] [<bf79c304>] (hso_start_net_device [hso]) from [<bf79ced8>] (hso_net_open+0x68/0x84 [hso])
[ 1514.288238] [<bf79ced8>] (hso_net_open [hso]) from [<c068d958>] (__dev_open+0xa0/0xf4)
[ 1514.296600] [<c068d958>] (__dev_open) from [<c068dba4>] (__dev_change_flags+0x8c/0x130)
[ 1514.305023] [<c068dba4>] (__dev_change_flags) from [<c068dc60>] (dev_change_flags+0x18/0x48)
[ 1514.313934] [<c068dc60>] (dev_change_flags) from [<c06feaa8>] (devinet_ioctl+0x348/0x714)
[ 1514.322540] [<c06feaa8>] (devinet_ioctl) from [<c066a948>] (sock_ioctl+0x2b0/0x308)
[ 1514.330627] [<c066a948>] (sock_ioctl) from [<c0282c90>] (vfs_ioctl+0x20/0x34)
[ 1514.338165] [<c0282c90>] (vfs_ioctl) from [<c0283654>] (do_vfs_ioctl+0x82c/0x93c)
[ 1514.346038] [<c0283654>] (do_vfs_ioctl) from [<c02837b0>] (SyS_ioctl+0x4c/0x74)
[ 1514.353759] [<c02837b0>] (SyS_ioctl) from [<c0107040>] (ret_fast_syscall+0x0/0x1c)
[ 1514.361755] Code: e3822103 e3822080 e1822781 e5981014 (e5832030)
[ 1514.510833] ---[ end trace dfb3e53c657f34a0 ]---
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c761daf

net: adjust skb->truesize in ___pskb_trim() · c21b48cc

Eric Dumazet authored Apr 26, 2017

Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
skb_try_coalesce() using syzkaller and a filter attached to a TCP
socket.

As we did recently in commit 158f323b ("net: adjust skb->truesize in
pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
via a call to skb_condense().

If all frags were freed, then skb->truesize can be recomputed.

This call can be done if skb is not yet owned, or destructor is
sock_edemux().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c21b48cc

tcp: do not underestimate skb->truesize in tcp_trim_head() · 7162fb24

Eric Dumazet authored Apr 26, 2017

Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
skb_try_coalesce() using syzkaller and a filter attached to a TCP
socket over loopback interface.

I believe one issue with looped skbs is that tcp_trim_head() can end up
producing skb with under estimated truesize.

It hardly matters for normal conditions, since packets sent over
loopback are never truncated.

Bytes trimmed from skb->head should not change skb truesize, since
skb->head is not reallocated.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7162fb24

bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal · 19cdead3

Paolo Abeni authored Apr 27, 2017

On slave list updates, the bonding driver computes its hard_header_len
as the maximum of all enslaved devices's hard_header_len.
If the slave list is empty, e.g. on last enslaved device removal,
ETH_HLEN is used.

Since the bonding header_ops are set only when the first enslaved
device is attached, the above can lead to header_ops->create()
being called with the wrong skb headroom in place.

If bond0 is configured on top of ipoib devices, with the
following commands:

ifup bond0
for slave in $BOND_SLAVES_LIST; do
	ip link set dev $slave nomaster
done
ping -c 1 <ip on bond0 subnet>

we will obtain a skb_under_panic() with a similar call trace:
	skb_push+0x3d/0x40
	push_pseudo_header+0x17/0x30 [ib_ipoib]
	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
	arp_create+0x12f/0x220
	arp_send_dst.part.19+0x28/0x50
	arp_solicit+0x115/0x290
	neigh_probe+0x4d/0x70
	__neigh_event_send+0xa7/0x230
	neigh_resolve_output+0x12e/0x1c0
	ip_finish_output2+0x14b/0x390
	ip_finish_output+0x136/0x1e0
	ip_output+0x76/0xe0
	ip_local_out+0x35/0x40
	ip_send_skb+0x19/0x40
	ip_push_pending_frames+0x33/0x40
	raw_sendmsg+0x7d3/0xb50
	inet_sendmsg+0x31/0xb0
	sock_sendmsg+0x38/0x50
	SYSC_sendto+0x102/0x190
	SyS_sendto+0xe/0x10
	do_syscall_64+0x67/0x180
	entry_SYSCALL64_slow_path+0x25/0x25

This change addresses the issue avoiding updating the bonding device
hard_header_len when the slaves list become empty, forbidding to
shrink it below the value used by header_ops->create().

The bug is there since commit 54ef3137 ("[PATCH] bonding: Handle large
hard_header_len") but the panic can be triggered only since
commit fc791b63 ("IB/ipoib: move back IB LL address into the hard
header").
Reported-by: Norbert P <noe@physik.uzh.ch>
Fixes: 54ef3137 ("[PATCH] bonding: Handle large hard_header_len")
Fixes: fc791b63 ("IB/ipoib: move back IB LL address into the hard header")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

19cdead3

ipv4: Don't pass IP fragments to upper layer GRO handlers. · 9b83e031

Steffen Klassert authored Apr 28, 2017

Upper layer GRO handlers can not handle IP fragments, so
exit GRO processing in this case.

This fixes ESP GRO because the packet must be reassembled
before we can decapsulate, otherwise we get authentication
failures.

It also aligns IPv4 to IPv6 where packets with fragmentation
headers are not passed to upper layer GRO handlers.

Fixes: 7785bba2 ("esp: Add a software GRO codepath")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b83e031

cpsw/netcp: refine cpts dependency · 504926df

Arnd Bergmann authored Apr 28, 2017

Tony Lindgren reports a kernel oops that resulted from my compile-time
fix on the default config. This shows two problems:

a) configurations that did not already enable PTP_1588_CLOCK will
   now miss the cpts driver

b) when cpts support is disabled, the driver crashes. This is a
   preexisting problem that we did not notice before my patch.

While the second problem is still being investigated, this modifies
the dependencies again, getting us back to the original state, with
another 'select NET_PTP_CLASSIFY' added in to avoid the original
link error we got, and the 'depends on POSIX_TIMERS' to hide
the CPTS support when turning it on would be useless.

Cc: stable@vger.kernel.org # 4.11 needs this
Fixes: 07fef362 ("cpsw/netcp: cpts depends on posix_timers")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

504926df

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 5577e679

David S. Miller authored Apr 28, 2017

Steffen Klassert says:

====================
pull request (net): ipsec 2017-04-28

1) Do garbage collecting after a policy flush to remove old
   bundles immediately. From Xin Long.

2) Fix GRO if netfilter is not defined.
   From Sabrina Dubroca.

Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5577e679

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · affb852d

Linus Torvalds authored Apr 28, 2017

Pull input fix from Dmitry Torokhov:
 "Yet another quirk to i8042 to get touchpad recognized on some laptops"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: i8042 - add Clevo P650RS to the i8042 reset list

affb852d

clk: sunxi-ng: always select CCU_GATE · 36c02d0b

Arnd Bergmann authored Apr 27, 2017

When the base driver is enabled but all SoC specific drivers are turned
off, we now get a build error after code was added to always refer to the
clk gates:

drivers/clk/built-in.o: In function `ccu_pll_notifier_cb':
:(.text+0x154f8): undefined reference to `ccu_gate_helper_disable'
:(.text+0x15504): undefined reference to `ccu_gate_helper_enable'

This changes the Kconfig to always require the gate code to be built-in
when CONFIG_SUNXI_CCU is set.

Fixes: 02ae2bc6 ("clk: sunxi-ng: Add clk notifier to gate then ungate PLL clocks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>

36c02d0b

Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 28b20135

Linus Torvalds authored Apr 28, 2017

Pull btrfs fix from Chris Mason:
 "We have one more fix for btrfs.

  This gets rid of a new WARN_ON from rc1 that ended up making more
  noise than we really want. The larger fix for the underflow got
  delayed a bit and it's better for now to put it under
  CONFIG_BTRFS_DEBUG"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: qgroup: move noisy underflow warning to debugging build

28b20135

Merge branch 'tipc-socket-connection-hangs' · c5184717

David S. Miller authored Apr 28, 2017

Parthasarathy Bhuvaragan says:

====================
tipc: fix hanging socket connections

This patch series contains fixes for the socket layer to
prevent hanging / stale connections.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c5184717

tipc: close the connection if protocol messages contain errors · c1be7756

Parthasarathy Bhuvaragan authored Apr 26, 2017

When a socket is shutting down, we notify the peer node about the
connection termination by reusing an incoming message if possible.
If the last received message was a connection acknowledgment
message, we reverse this message and set the error code to
TIPC_ERR_NO_PORT and send it to peer.

In tipc_sk_proto_rcv(), we never check for message errors while
processing the connection acknowledgment or probe messages. Thus
this message performs the usual flow control accounting and leaves
the session hanging.

In this commit, we terminate the connection when we receive such
error messages.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1be7756

tipc: improve error validations for sockets in CONNECTING state · 4e0df495

Parthasarathy Bhuvaragan authored Apr 26, 2017

Until now, the checks for sockets in CONNECTING state was based on
the assumption that the incoming message was always from the
peer's accepted data socket.

However an application using a non-blocking socket sends an implicit
connect, this socket which is in CONNECTING state can receive error
messages from the peer's listening socket. As we discard these
messages, the application socket hangs as there due to inactivity.
In addition to this, there are other places where we process errors
but do not notify the user.

In this commit, we process such incoming error messages and notify
our users about them using sk_state_change().
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e0df495

tipc: Fix missing connection request handling · 42b531de

Parthasarathy Bhuvaragan authored Apr 26, 2017

In filter_connect, we use waitqueue_active() to check for any
connections to wakeup. But waitqueue_active() is missing memory
barriers while accessing the critical sections, leading to
inconsistent results.

In this commit, we replace this with an SMP safe wq_has_sleeper()
using the generic socket callback sk_data_ready().
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

42b531de

block: hide badblocks attribute by default · 9438b3e0

Dan Williams authored Apr 27, 2017

Commit 99e6608c "block: Add badblock management for gendisks"
allowed for drivers like pmem and software-raid to advertise a list of
bad media areas. However, it inadvertently added a 'badblocks' to all
block devices. Lets clean this up by having the 'badblocks' attribute
not be visible when the driver has not populated a 'struct badblocks'
instance in the gendisk.

Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

9438b3e0

blk-mq: unify hctx delay_work and run_work · 21c6e939

Jens Axboe authored Apr 10, 2017

The only difference between ->run_work and ->delay_work, is that
the latter is used to defer running a queue. This is done by
marking the queue stopped, and scheduling ->delay_work to run
sometime in the future. While the queue is stopped, direct runs
or runs through ->run_work will not run the queue.

If we combine the handlers, then we need to handle two things:

1) If a delayed/stopped run is scheduled, then we should not run
   the queue before that has been completed.
2) If a queue is delayed/stopped, the handler needs to restart
   the queue. Normally a run of a queue with the stopped bit set
   would be a no-op.

Case 1 is handled by modifying a currently pending queue run
to the deadline set by the caller of blk_mq_delay_queue().
Subsequent attempts to queue a queue run will find the work
item already pending, and direct runs will see a stopped queue
as before.

Case 2 is handled by adding a new bit, BLK_MQ_S_START_ON_RUN,
that tells the work handler that it should clear a stopped
queue and run the handler.
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

21c6e939

block: add kblock_mod_delayed_work_on() · 818cd1cb

Jens Axboe authored Apr 10, 2017

This modifies (or adds, if not currently pending) an existing
delayed work item.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

818cd1cb