1. 06 May, 2020 5 commits
    • Peter Chen's avatar
      usb: chipidea: udc: add software sg list support · e48aa1eb
      Peter Chen authored
      The chipidea controller doesn't support short transfer for sg list,
      so we still keep setting IOC per TD, otherwise, there will be no interrupt
      for short transfer. Each TD has five entries for data buffer, each data
      buffer could be non-countinuous 4KB buffer, so it could handle
      up to 5 sg buffers one time. The benefit of this patch is avoiding
      OOM for low memory system(eg, 256MB) during large USB transfers, see
      below for detail. The non-sg handling has not changed.
      
      ufb: page allocation failure: order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
      nodemask=(null),cpuset=/,mems_allowed=0
      CPU: 2 PID: 370 Comm: ufb Not tainted 5.4.3-1.1.0+g54b3750d61fd #1
      Hardware name: NXP i.MX8MNano DDR4 EVK board (DT)
      Call trace:
       dump_backtrace+0x0/0x140
       show_stack+0x14/0x20
       dump_stack+0xb4/0xf8
       warn_alloc+0xec/0x158
       __alloc_pages_slowpath+0x9cc/0x9f8
       __alloc_pages_nodemask+0x21c/0x280
       alloc_pages_current+0x7c/0xe8
       kmalloc_order+0x1c/0x88
       __kmalloc+0x25c/0x298
       ffs_epfile_io.isra.0+0x20c/0x7d0
       ffs_epfile_read_iter+0xa8/0x188
       new_sync_read+0xe4/0x170
       __vfs_read+0x2c/0x40
       vfs_read+0xc8/0x1a0
       ksys_read+0x68/0xf0
       __arm64_sys_read+0x18/0x20
       el0_svc_common.constprop.0+0x68/0x160
       el0_svc_handler+0x20/0x80
       el0_svc+0x8/0xc
      Mem-Info:
      active_anon:2856 inactive_anon:5269 isolated_anon:12
       active_file:5238 inactive_file:18803 isolated_file:0
       unevictable:0 dirty:22 writeback:416 unstable:0
       slab_reclaimable:4073 slab_unreclaimable:3408
       mapped:727 shmem:7393 pagetables:37 bounce:0
       free:4104 free_pcp:118 free_cma:0
      Node 0 active_anon:11436kB inactive_anon:21076kB active_file:20988kB inactive_file:75216kB unevictable:0kB isolated(ano
      Node 0 DMA32 free:16820kB min:1808kB low:2260kB high:2712kB active_anon:11436kB inactive_anon:21076kB active_file:2098B
      lowmem_reserve[]: 0 0 0
      Node 0 DMA32: 508*4kB (UME) 242*8kB (UME) 730*16kB (UM) 21*32kB (UME) 5*64kB (UME) 2*128kB (M) 0*256kB 0*512kB 0*1024kB
      Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
      Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
      31455 total pagecache pages
      0 pages in swap cache
      Swap cache stats: add 0, delete 0, find 0/0
      Free swap  = 0kB
      Total swap = 0kB
      65536 pages RAM
      0 pages HighMem/MovableOnly
      10766 pages reserved
      0 pages cma reserved
      0 pages hwpoisoned
      Reviewed-by: default avatarJun Li <jun.li@nxp.com>
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      e48aa1eb
    • Peter Chen's avatar
      usb: chipidea: usbmisc_imx: using different ops for imx7d and imx7ulp · 380a7843
      Peter Chen authored
      imx7ulp uses different USB PHY with imx7d (MXS PHY vs PICO PHY), so the
      features are supported by non-core register are a little different.
      For example, autoresume feature is supported by all controllers for
      imx7ulp, but for imx7d, it is only supported by non-HSIC controller.
      
      Besides, these two platforms use different HSIC controller, imx7ulp
      needs software operation, but imx7d doesn't.
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      380a7843
    • Peter Chen's avatar
      usb: chipidea: pull down dp for possible charger detection operation · 5523f06a
      Peter Chen authored
      The bootloader may use device mode, and keep dp up. We need dp
      to be pulled down before possbile charger detection operation.
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      5523f06a
    • Jun Li's avatar
      usb: chipidea: introduce imx7d USB charger detection · 746f316b
      Jun Li authored
      imx7d (and imx8mm, imx8mn) uses Samsung PHY and USB generic PHY driver.
      The USB generic PHY driver is impossible to have a charger detection
      for every user, so we implement USB charger detection routine at glue
      layer. After the detection has finished, it will notify USB PHY
      charger framework, and the uevents will be triggered.
      Signed-off-by: default avatarJun Li <jun.li@nxp.com>
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      746f316b
    • Peter Chen's avatar
      usb: chipidea: introduce CI_HDRC_CONTROLLER_VBUS_EVENT glue layer use · d755cdb1
      Peter Chen authored
      Some vendors glue layer need to handle some events for vbus, eg,
      some i.mx platforms (imx7d, imx8mm, imx8mn, etc) needs vbus event
      to handle charger detection, its charger detection is finished at
      glue layer code, but not at USB PHY driver.
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      d755cdb1
  2. 29 Apr, 2020 1 commit
  3. 08 Apr, 2020 5 commits
  4. 30 Mar, 2020 3 commits
  5. 27 Mar, 2020 1 commit
  6. 26 Mar, 2020 6 commits
  7. 25 Mar, 2020 3 commits
    • Pawel Dembicki's avatar
      USB: serial: option: add Wistron Neweb D19Q1 · dfee7e2f
      Pawel Dembicki authored
      This modem is embedded on dlink dwr-960 router.
      The oem configuration states:
      
      T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0
      D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
      P: Vendor=1435 ProdID=d191 Rev=ff.ff
      S: Manufacturer=Android
      S: Product=Android
      S: SerialNumber=0123456789ABCDEF
      C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
      E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
      E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
      E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
      E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
      E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none)
      E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us
      
      Tested on openwrt distribution
      Signed-off-by: default avatarPawel Dembicki <paweldembicki@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      dfee7e2f
    • Pawel Dembicki's avatar
      USB: serial: option: add BroadMobi BM806U · 6cb2669c
      Pawel Dembicki authored
      BroadMobi BM806U is an Qualcomm MDM9225 based 3G/4G modem.
      Tested hardware BM806U is mounted on D-Link DWR-921-C3 router.
      
      T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  2 Spd=480  MxCh= 0
      D:  Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=2020 ProdID=2033 Rev= 2.28
      S:  Manufacturer=Mobile Connect
      S:  Product=Mobile Connect
      S:  SerialNumber=f842866cfd5a
      C:* #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=89(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      Co-developed-by: default avatarCezary Jackiewicz <cezary@eko.one.pl>
      Signed-off-by: default avatarCezary Jackiewicz <cezary@eko.one.pl>
      Signed-off-by: default avatarPawel Dembicki <paweldembicki@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      6cb2669c
    • Pawel Dembicki's avatar
      USB: serial: option: add support for ASKEY WWHC050 · 007d20dc
      Pawel Dembicki authored
      ASKEY WWHC050 is a mcie LTE modem.
      The oem configuration states:
      
      T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=480  MxCh= 0
      D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=1690 ProdID=7588 Rev=ff.ff
      S:  Manufacturer=Android
      S:  Product=Android
      S:  SerialNumber=813f0eef6e6e
      C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=88(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none)
      E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us
      
      Tested on openwrt distribution.
      Co-developed-by: default avatarCezary Jackiewicz <cezary@eko.one.pl>
      Signed-off-by: default avatarCezary Jackiewicz <cezary@eko.one.pl>
      Signed-off-by: default avatarPawel Dembicki <paweldembicki@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      007d20dc
  8. 24 Mar, 2020 6 commits
  9. 23 Mar, 2020 2 commits
  10. 22 Mar, 2020 8 commits
    • Linus Torvalds's avatar
      Merge tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 67d584e3
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "Two fixes.
      
        The first is a regression: when dropping some incompat bits the
        conditions were reversed. The other is a fix for rename whiteout
        potentially leaving stack memory linked to a list"
      
      * tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix removal of raid[56|1c34} incompat flags after removing block group
        btrfs: fix log context list corruption after rename whiteout error
      67d584e3
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · b3c03db6
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "10 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        x86/mm: split vmalloc_sync_all()
        mm, slub: prevent kmalloc_node crashes and memory leaks
        mm/mmu_notifier: silence PROVE_RCU_LIST warnings
        epoll: fix possible lost wakeup on epoll_ctl() path
        mm: do not allow MADV_PAGEOUT for CoW pages
        mm, memcg: throttle allocators based on ancestral memory.high
        mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
        page-flags: fix a crash at SetPageError(THP_SWAP)
        mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
        memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
      b3c03db6
    • Joerg Roedel's avatar
      x86/mm: split vmalloc_sync_all() · 763802b5
      Joerg Roedel authored
      Commit 3f8fd02b ("mm/vmalloc: Sync unmappings in
      __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in
      the vunmap() code-path.  While this change was necessary to maintain
      correctness on x86-32-pae kernels, it also adds additional cycles for
      architectures that don't need it.
      
      Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
      severe performance regressions in micro-benchmarks because it now also
      calls the x86-64 implementation of vmalloc_sync_all() on vunmap().  But
      the vmalloc_sync_all() implementation on x86-64 is only needed for newly
      created mappings.
      
      To avoid the unnecessary work on x86-64 and to gain the performance
      back, split up vmalloc_sync_all() into two functions:
      
      	* vmalloc_sync_mappings(), and
      	* vmalloc_sync_unmappings()
      
      Most call-sites to vmalloc_sync_all() only care about new mappings being
      synchronized.  The only exception is the new call-site added in the
      above mentioned commit.
      
      Shile Zhang directed us to a report of an 80% regression in reaim
      throughput.
      
      Fixes: 3f8fd02b ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Reported-by: default avatarShile Zhang <shile.zhang@linux.alibaba.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[GHES]
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
      Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
      Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      763802b5
    • Vlastimil Babka's avatar
      mm, slub: prevent kmalloc_node crashes and memory leaks · 0715e6c5
      Vlastimil Babka authored
      Sachin reports [1] a crash in SLUB __slab_alloc():
      
        BUG: Kernel NULL pointer dereference on read at 0x000073b0
        Faulting instruction address: 0xc0000000003d55f4
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
        NIP:  c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
        REGS: c0000008b37836d0 TRAP: 0300   Not tainted  (5.6.0-rc2-next-20200218-autotest)
        MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004844  XER: 00000000
        CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
        GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
        GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
        GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
        GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
        GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
        GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
        GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
        GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
        NIP ___slab_alloc+0x1f4/0x760
        LR __slab_alloc+0x34/0x60
        Call Trace:
          ___slab_alloc+0x334/0x760 (unreliable)
          __slab_alloc+0x34/0x60
          __kmalloc_node+0x110/0x490
          kvmalloc_node+0x58/0x110
          mem_cgroup_css_online+0x108/0x270
          online_css+0x48/0xd0
          cgroup_apply_control_enable+0x2ec/0x4d0
          cgroup_mkdir+0x228/0x5f0
          kernfs_iop_mkdir+0x90/0xf0
          vfs_mkdir+0x110/0x230
          do_mkdirat+0xb0/0x1a0
          system_call+0x5c/0x68
      
      This is a PowerPC platform with following NUMA topology:
      
        available: 2 nodes (0-1)
        node 0 cpus:
        node 0 size: 0 MB
        node 0 free: 0 MB
        node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
        node 1 size: 35247 MB
        node 1 free: 30907 MB
        node distances:
        node   0   1
          0:  10  40
          1:  40  10
      
        possible numa nodes: 0-31
      
      This only happens with a mmotm patch "mm/memcontrol.c: allocate
      shrinker_map on appropriate NUMA node" [2] which effectively calls
      kmalloc_node for each possible node.  SLUB however only allocates
      kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on
      node_to_mem_node to return such valid node for other nodes since commit
      a561ce00 ("slub: fall back to node_to_mem_node() node if allocating
      on memoryless node").  This is however not true in this configuration
      where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31,
      thus it contains zeroes and get_partial() ends up accessing
      non-allocated kmem_cache_node.
      
      A related issue was reported by Bharata (originally by Ramachandran) [3]
      where a similar PowerPC configuration, but with mainline kernel without
      patch [2] ends up allocating large amounts of pages by kmalloc-1k
      kmalloc-512.  This seems to have the same underlying issue with
      node_to_mem_node() not behaving as expected, and might probably also
      lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4].
      
      This patch should fix both issues by not relying on node_to_mem_node()
      anymore and instead simply falling back to NUMA_NO_NODE, when
      kmalloc_node(node) is attempted for a node that's not online, or has no
      usable memory.  The "usable memory" condition is also changed from
      node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly
      the condition that SLUB uses to allocate kmem_cache_node structures.
      The check in get_partial() is removed completely, as the checks in
      ___slab_alloc() are now sufficient to prevent get_partial() being
      reached with an invalid node.
      
      [1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
      [2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
      [3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
      [4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/
      
      Fixes: a561ce00 ("slub: fall back to node_to_mem_node() node if allocating on memoryless node")
      Reported-by: default avatarSachin Sant <sachinp@linux.vnet.ibm.com>
      Reported-by: default avatarPUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarSachin Sant <sachinp@linux.vnet.ibm.com>
      Tested-by: default avatarBharata B Rao <bharata@linux.ibm.com>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.czDebugged-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0715e6c5
    • Qian Cai's avatar
      mm/mmu_notifier: silence PROVE_RCU_LIST warnings · 63886bad
      Qian Cai authored
      It is safe to traverse mm->notifier_subscriptions->list either under
      SRCU read lock or mm->notifier_subscriptions->lock using
      hlist_for_each_entry_rcu().  Silence the PROVE_RCU_LIST false positives,
      for example,
      
        WARNING: suspicious RCU usage
        -----------------------------
        mm/mmu_notifier.c:484 RCU-list traversed in non-reader section!!
      
        other info that might help us debug this:
      
        rcu_scheduler_active = 2, debug_locks = 1
        3 locks held by libvirtd/802:
         #0: ffff9321e3f58148 (&mm->mmap_sem#2){++++}, at: do_mprotect_pkey+0xe1/0x3e0
         #1: ffffffff91ae6160 (mmu_notifier_invalidate_range_start){+.+.}, at: change_p4d_range+0x5fa/0x800
         #2: ffffffff91ae6e08 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x178/0x460
      
        stack backtrace:
        CPU: 7 PID: 802 Comm: libvirtd Tainted: G          I       5.6.0-rc6-next-20200317+ #2
        Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014
        Call Trace:
          dump_stack+0xa4/0xfe
          lockdep_rcu_suspicious+0xeb/0xf5
          __mmu_notifier_invalidate_range_start+0x3ff/0x460
          change_p4d_range+0x746/0x800
          change_protection+0x1df/0x300
          mprotect_fixup+0x245/0x3e0
          do_mprotect_pkey+0x23b/0x3e0
          __x64_sys_mprotect+0x51/0x70
          do_syscall_64+0x91/0xae8
          entry_SYSCALL_64_after_hwframe+0x49/0xb3
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Link: http://lkml.kernel.org/r/20200317175640.2047-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63886bad
    • Roman Penyaev's avatar
      epoll: fix possible lost wakeup on epoll_ctl() path · 1b53734b
      Roman Penyaev authored
      This fixes possible lost wakeup introduced by commit a218cc49.
      Originally modifications to ep->wq were serialized by ep->wq.lock, but
      in commit a218cc49 ("epoll: use rwlock in order to reduce
      ep_poll_callback() contention") a new rw lock was introduced in order to
      relax fd event path, i.e. callers of ep_poll_callback() function.
      
      After the change ep_modify and ep_insert (both are called on epoll_ctl()
      path) were switched to ep->lock, but ep_poll (epoll_wait) was using
      ep->wq.lock on wqueue list modification.
      
      The bug doesn't lead to any wqueue list corruptions, because wake up
      path and list modifications were serialized by ep->wq.lock internally,
      but actual waitqueue_active() check prior wake_up() call can be
      reordered with modifications of ep ready list, thus wake up can be lost.
      
      And yes, can be healed by explicit smp_mb():
      
        list_add_tail(&epi->rdlink, &ep->rdllist);
        smp_mb();
        if (waitqueue_active(&ep->wq))
      	wake_up(&ep->wp);
      
      But let's make it simple, thus current patch replaces ep->wq.lock with
      the ep->lock for wqueue modifications, thus wake up path always observes
      activeness of the wqueue correcty.
      
      Fixes: a218cc49 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
      Reported-by: default avatarMax Neunhoeffer <max@arangodb.com>
      Signed-off-by: default avatarRoman Penyaev <rpenyaev@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarMax Neunhoeffer <max@arangodb.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Jes Sorensen <jes.sorensen@gmail.com>
      Cc: <stable@vger.kernel.org>	[5.1+]
      Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de
      References: https://bugzilla.kernel.org/show_bug.cgi?id=205933Bisected-by: default avatarMax Neunhoeffer <max@arangodb.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b53734b
    • Michal Hocko's avatar
      mm: do not allow MADV_PAGEOUT for CoW pages · 12e967fd
      Michal Hocko authored
      Jann has brought up a very interesting point [1].  While shared pages
      are excluded from MADV_PAGEOUT normally, CoW pages can be easily
      reclaimed that way.  This can lead to all sorts of hard to debug
      problems.  E.g.  performance problems outlined by Daniel [2].
      
      There are runtime environments where there is a substantial memory
      shared among security domains via CoW memory and a easy to reclaim way
      of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either
      performance degradation in for the parent process which might be more
      privileged or even open side channel attacks.
      
      The feasibility of the latter is not really clear to me TBH but there is
      no real reason for exposure at this stage.  It seems there is no real
      use case to depend on reclaiming CoW memory via madvise at this stage so
      it is much easier to simply disallow it and this is what this patch
      does.  Put it simply MADV_{PAGEOUT,COLD} can operate only on the
      exclusively owned memory which is a straightforward semantic.
      
      [1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com
      [2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com
      
      Fixes: 9c276cc6 ("mm: introduce MADV_COLD")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.czSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12e967fd
    • Chris Down's avatar
      mm, memcg: throttle allocators based on ancestral memory.high · e26733e0
      Chris Down authored
      Prior to this commit, we only directly check the affected cgroup's
      memory.high against its usage.  However, it's possible that we are being
      reclaimed as a result of hitting an ancestor memory.high and should be
      penalised based on that, instead.
      
      This patch changes memory.high overage throttling to use the largest
      overage in its ancestors when considering how many penalty jiffies to
      charge.  This makes sure that we penalise poorly behaving cgroups in the
      same way regardless of at what level of the hierarchy memory.high was
      breached.
      
      Fixes: 0e4b01df ("mm, memcg: throttle allocators when failing reclaim over memory.high")
      Reported-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarChris Down <chris@chrisdown.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: <stable@vger.kernel.org>	[5.4.x+]
      Link: http://lkml.kernel.org/r/8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.1584036142.git.chris@chrisdown.nameSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e26733e0