1. 04 Jun, 2020 27 commits
    • Mike Rapoport's avatar
      mm: free_area_init: allow defining max_zone_pfn in descending order · 51930df5
      Mike Rapoport authored
      Some architectures (e.g.  ARC) have the ZONE_HIGHMEM zone below the
      ZONE_NORMAL.  Allowing free_area_init() parse max_zone_pfn array even it
      is sorted in descending order allows using free_area_init() on such
      architectures.
      
      Add top -> down traversal of max_zone_pfn array in free_area_init() and
      use the latter in ARC node/zone initialization.
      
      [rppt@kernel.org: ARC fix]
        Link: http://lkml.kernel.org/r/20200504153901.GM14260@kernel.org
      [rppt@linux.ibm.com: arc: free_area_init(): take into account PAE40 mode]
        Link: http://lkml.kernel.org/r/20200507205900.GH683243@linux.ibm.com
      [akpm@linux-foundation.org: declare arch_has_descending_max_zone_pfns()]
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Link: http://lkml.kernel.org/r/20200412194859.12663-18-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      51930df5
    • Mike Rapoport's avatar
      mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES · acd3f5c4
      Mike Rapoport authored
      The memmap_init() function was made to iterate over memblock regions and
      as the result the early_pfn_in_nid() function became obsolete.  Since
      CONFIG_NODES_SPAN_OTHER_NODES is only used to pick a stub or a real
      implementation of early_pfn_in_nid(), it is also not needed anymore.
      
      Remove both early_pfn_in_nid() and the CONFIG_NODES_SPAN_OTHER_NODES.
      Co-developed-by: default avatarHoan Tran <Hoan@os.amperecomputing.com>
      Signed-off-by: default avatarHoan Tran <Hoan@os.amperecomputing.com>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-17-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      acd3f5c4
    • Baoquan He's avatar
      mm: memmap_init: iterate over memblock regions rather that check each PFN · 73a6e474
      Baoquan He authored
      When called during boot the memmap_init_zone() function checks if each PFN
      is valid and actually belongs to the node being initialized using
      early_pfn_valid() and early_pfn_in_nid().
      
      Each such check may cost up to O(log(n)) where n is the number of memory
      banks, so for large amount of memory overall time spent in early_pfn*()
      becomes substantial.
      
      Since the information is anyway present in memblock, we can iterate over
      memblock memory regions in memmap_init() and only call memmap_init_zone()
      for PFN ranges that are know to be valid and in the appropriate node.
      
      [cai@lca.pw: fix a compilation warning from Clang]
        Link: http://lkml.kernel.org/r/CF6E407F-17DC-427C-8203-21979FB882EF@lca.pw
      [bhe@redhat.com: fix the incorrect hole in fast_isolate_freepages()]
        Link: http://lkml.kernel.org/r/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
        Link: http://lkml.kernel.org/r/20200521014407.29690-1-bhe@redhat.comSigned-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Qian Cai <cai@lca.pw>
      Link: http://lkml.kernel.org/r/20200412194859.12663-16-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      73a6e474
    • Mike Rapoport's avatar
      xtensa: simplify detection of memory zone boundaries · da50c57b
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-15-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da50c57b
    • Mike Rapoport's avatar
      unicore32: simplify detection of memory zone boundaries · 1b02ec01
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-14-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b02ec01
    • Mike Rapoport's avatar
      sparc32: simplify detection of memory zone boundaries · bee3b3cc
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-13-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bee3b3cc
    • Mike Rapoport's avatar
      parisc: simplify detection of memory zone boundaries · 625bf73e
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-12-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      625bf73e
    • Mike Rapoport's avatar
      m68k: mm: simplify detection of memory zone boundaries · 5d2ee1a1
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-11-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d2ee1a1
    • Mike Rapoport's avatar
      csky: simplify detection of memory zone boundaries · 8f4693f0
      Mike Rapoport authored
      The free_area_init() function only requires the definition of maximal PFN
      for each of the supported zone rater than calculation of actual zone sizes
      and the sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-10-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f4693f0
    • Mike Rapoport's avatar
      arm64: simplify detection of memory zone boundaries for UMA configs · 584cb13d
      Mike Rapoport authored
      The free_area_init() function only requires the definition of maximal PFN
      for each of the supported zone rater than calculation of actual zone sizes
      and the sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-9-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      584cb13d
    • Mike Rapoport's avatar
      arm: simplify detection of memory zone boundaries · a32c1c61
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-8-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a32c1c61
    • Mike Rapoport's avatar
      alpha: simplify detection of memory zone boundaries · 30760203
      Mike Rapoport authored
      free_area_init() only requires the definition of maximal PFN for each of
      the supported zone rater than calculation of actual zone sizes and the
      sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-7-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30760203
    • Mike Rapoport's avatar
      mm: use free_area_init() instead of free_area_init_nodes() · 9691a071
      Mike Rapoport authored
      free_area_init() has effectively became a wrapper for
      free_area_init_nodes() and there is no point of keeping it.  Still
      free_area_init() name is shorter and more general as it does not imply
      necessity to initialize multiple nodes.
      
      Rename free_area_init_nodes() to free_area_init(), update the callers and
      drop old version of free_area_init().
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-6-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9691a071
    • Mike Rapoport's avatar
      mm: free_area_init: use maximal zone PFNs rather than zone sizes · fa3354e4
      Mike Rapoport authored
      Currently, architectures that use free_area_init() to initialize memory
      map and node and zone structures need to calculate zone and hole sizes.
      We can use free_area_init_nodes() instead and let it detect the zone
      boundaries while the architectures will only have to supply the possible
      limits for the zones.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-5-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa3354e4
    • Mike Rapoport's avatar
      mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option · 3f08a302
      Mike Rapoport authored
      CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization of
      nodes and zones structures between the systems that have region to node
      mapping in memblock and those that don't.
      
      Currently all the NUMA architectures enable this option and for the
      non-NUMA systems we can presume that all the memory belongs to node 0 and
      therefore the compile time configuration option is not required.
      
      The remaining few architectures that use DISCONTIGMEM without NUMA are
      easily updated to use memblock_add_node() instead of memblock_add() and
      thus have proper correspondence of memblock regions to NUMA nodes.
      
      Still, free_area_init_node() must have a backward compatible version
      because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
      different.  Once all the architectures will use the new semantics, the
      entire compatibility layer can be dropped.
      
      To avoid addition of extra run time memory to store node id for
      architectures that keep memblock but have only a single node, the node id
      field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
      the corresponding accessors presume that in those cases it is always 0.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-4-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f08a302
    • Mike Rapoport's avatar
      mm: make early_pfn_to_nid() and related defintions close to each other · 6f24fbd3
      Mike Rapoport authored
      early_pfn_to_nid() and its helper __early_pfn_to_nid() are spread around
      include/linux/mm.h, include/linux/mmzone.h and mm/page_alloc.c.
      
      Drop unused stub for __early_pfn_to_nid() and move its actual generic
      implementation close to its users.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-3-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f24fbd3
    • Mike Rapoport's avatar
      mm: memblock: replace dereferences of memblock_region.nid with API calls · d622abf7
      Mike Rapoport authored
      Patch series "mm: rework free_area_init*() funcitons".
      
      After the discussion [1] about removal of CONFIG_NODES_SPAN_OTHER_NODES
      and CONFIG_HAVE_MEMBLOCK_NODE_MAP options, I took it a bit further and
      updated the node/zone initialization.
      
      Since all architectures have memblock, it is possible to use only the
      newer version of free_area_init_node() that calculates the zone and node
      boundaries based on memblock node mapping and architectural limits on
      possible zone PFNs.
      
      The architectures that still determined zone and hole sizes can be
      switched to the generic code and the old code that took those zone and
      hole sizes can be simply removed.
      
      And, since it all started from the removal of
      CONFIG_NODES_SPAN_OTHER_NODES, the memmap_init() is now updated to iterate
      over memblocks and so it does not need to perform early_pfn_to_nid() query
      for every PFN.
      
      [1] https://lore.kernel.org/lkml/1585420282-25630-1-git-send-email-Hoan@os.amperecomputing.com
      
      This patch (of 21):
      
      There are several places in the code that directly dereference
      memblock_region.nid despite this field being defined only when
      CONFIG_HAVE_MEMBLOCK_NODE_MAP=y.
      
      Replace these with calls to memblock_get_region_nid() to improve code
      robustness and to avoid possible breakage when
      CONFIG_HAVE_MEMBLOCK_NODE_MAP will be removed.
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200412194859.12663-2-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d622abf7
    • Michal Hocko's avatar
      mm: clarify __GFP_MEMALLOC usage · 574c1ae6
      Michal Hocko authored
      It seems that the existing documentation is not explicit about the
      expected usage and potential risks enough.  While it is calls out that
      users have to free memory when using this flag it is not really apparent
      that users have to careful to not deplete memory reserves and that they
      should implement some sort of throttling wrt.  freeing process.
      
      This is partly based on Neil's explanation [1].
      
      Let's also call out that a pre allocated pool allocator should be
      considered.
      
      [1] http://lkml.kernel.org/r/877dz0yxoa.fsf@notabene.neil.brown.name
      
      [akpm@linux-foundation.org: coding style fixes]
      [mhocko@kernel.org: update]
        Link: http://lkml.kernel.org/r/20200406070137.GC19426@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Link: http://lkml.kernel.org/r/20200403083543.11552-2-mhocko@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      574c1ae6
    • Daniel Axtens's avatar
      string.h: fix incompatibility between FORTIFY_SOURCE and KASAN · 47227d27
      Daniel Axtens authored
      The memcmp KASAN self-test fails on a kernel with both KASAN and
      FORTIFY_SOURCE.
      
      When FORTIFY_SOURCE is on, a number of functions are replaced with
      fortified versions, which attempt to check the sizes of the operands.
      However, these functions often directly invoke __builtin_foo() once they
      have performed the fortify check.  Using __builtins may bypass KASAN
      checks if the compiler decides to inline it's own implementation as
      sequence of instructions, rather than emit a function call that goes out
      to a KASAN-instrumented implementation.
      
      Why is only memcmp affected?
      ============================
      
      Of the string and string-like functions that kasan_test tests, only memcmp
      is replaced by an inline sequence of instructions in my testing on x86
      with gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2).
      
      I believe this is due to compiler heuristics.  For example, if I annotate
      kmalloc calls with the alloc_size annotation (and disable some fortify
      compile-time checking!), the compiler will replace every memset except the
      one in kmalloc_uaf_memset with inline instructions.  (I have some WIP
      patches to add this annotation.)
      
      Does this affect other functions in string.h?
      =============================================
      
      Yes. Anything that uses __builtin_* rather than __real_* could be
      affected. This looks like:
      
       - strncpy
       - strcat
       - strlen
       - strlcpy maybe, under some circumstances?
       - strncat under some circumstances
       - memset
       - memcpy
       - memmove
       - memcmp (as noted)
       - memchr
       - strcpy
      
      Whether a function call is emitted always depends on the compiler.  Most
      bugs should get caught by FORTIFY_SOURCE, but the missed memcmp test shows
      that this is not always the case.
      
      Isn't FORTIFY_SOURCE disabled with KASAN?
      ========================================-
      
      The string headers on all arches supporting KASAN disable fortify with
      kasan, but only when address sanitisation is _also_ disabled.  For example
      from x86:
      
       #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
       /*
        * For files that are not instrumented (e.g. mm/slub.c) we
        * should use not instrumented version of mem* functions.
        */
       #define memcpy(dst, src, len) __memcpy(dst, src, len)
       #define memmove(dst, src, len) __memmove(dst, src, len)
       #define memset(s, c, n) __memset(s, c, n)
      
       #ifndef __NO_FORTIFY
       #define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
       #endif
      
       #endif
      
      This comes from commit 6974f0c4 ("include/linux/string.h: add the
      option of fortified string.h functions"), and doesn't work when KASAN is
      enabled and the file is supposed to be sanitised - as with test_kasan.c
      
      I'm pretty sure this is not wrong, but not as expansive it should be:
      
       * we shouldn't use __builtin_memcpy etc in files where we don't have
         instrumentation - it could devolve into a function call to memcpy,
         which will be instrumented. Rather, we should use __memcpy which
         by convention is not instrumented.
      
       * we also shouldn't be using __builtin_memcpy when we have a KASAN
         instrumented file, because it could be replaced with inline asm
         that will not be instrumented.
      
      What is correct behaviour?
      ==========================
      
      Firstly, there is some overlap between fortification and KASAN: both
      provide some level of _runtime_ checking. Only fortify provides
      compile-time checking.
      
      KASAN and fortify can pick up different things at runtime:
      
       - Some fortify functions, notably the string functions, could easily be
         modified to consider sub-object sizes (e.g. members within a struct),
         and I have some WIP patches to do this. KASAN cannot detect these
         because it cannot insert poision between members of a struct.
      
       - KASAN can detect many over-reads/over-writes when the sizes of both
         operands are unknown, which fortify cannot.
      
      So there are a couple of options:
      
       1) Flip the test: disable fortify in santised files and enable it in
          unsanitised files. This at least stops us missing KASAN checking, but
          we lose the fortify checking.
      
       2) Make the fortify code always call out to real versions. Do this only
          for KASAN, for fear of losing the inlining opportunities we get from
          __builtin_*.
      
      (We can't use kasan_check_{read,write}: because the fortify functions are
      _extern inline_, you can't include _static_ inline functions without a
      compiler warning. kasan_check_{read,write} are static inline so we can't
      use them even when they would otherwise be suitable.)
      
      Take approach 2 and call out to real versions when KASAN is enabled.
      
      Use __underlying_foo to distinguish from __real_foo: __real_foo always
      refers to the kernel's implementation of foo, __underlying_foo could be
      either the kernel implementation or the __builtin_foo implementation.
      
      This is sometimes enough to make the memcmp test succeed with
      FORTIFY_SOURCE enabled. It is at least enough to get the function call
      into the module. One more fix is needed to make it reliable: see the next
      patch.
      
      Fixes: 6974f0c4 ("include/linux/string.h: add the option of fortified string.h functions")
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarDavid Gow <davidgow@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Link: http://lkml.kernel.org/r/20200423154503.5103-3-dja@axtens.netSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47227d27
    • Daniel Axtens's avatar
      kasan: stop tests being eliminated as dead code with FORTIFY_SOURCE · adb72ae1
      Daniel Axtens authored
      Patch series "Fix some incompatibilites between KASAN and FORTIFY_SOURCE", v4.
      
      3 KASAN self-tests fail on a kernel with both KASAN and FORTIFY_SOURCE:
      memchr, memcmp and strlen.
      
      When FORTIFY_SOURCE is on, a number of functions are replaced with
      fortified versions, which attempt to check the sizes of the operands.
      However, these functions often directly invoke __builtin_foo() once they
      have performed the fortify check.  The compiler can detect that the
      results of these functions are not used, and knows that they have no other
      side effects, and so can eliminate them as dead code.
      
      Why are only memchr, memcmp and strlen affected?
      ================================================
      
      Of string and string-like functions, kasan_test tests:
      
       * strchr  ->  not affected, no fortified version
       * strrchr ->  likewise
       * strcmp  ->  likewise
       * strncmp ->  likewise
      
       * strnlen ->  not affected, the fortify source implementation calls the
                     underlying strnlen implementation which is instrumented, not
                     a builtin
      
       * strlen  ->  affected, the fortify souce implementation calls a __builtin
                     version which the compiler can determine is dead.
      
       * memchr  ->  likewise
       * memcmp  ->  likewise
      
       * memset ->   not affected, the compiler knows that memset writes to its
      	       first argument and therefore is not dead.
      
      Why does this not affect the functions normally?
      ================================================
      
      In string.h, these functions are not marked as __pure, so the compiler
      cannot know that they do not have side effects.  If relevant functions are
      marked as __pure in string.h, we see the following warnings and the
      functions are elided:
      
      lib/test_kasan.c: In function `kasan_memchr':
      lib/test_kasan.c:606:2: warning: statement with no effect [-Wunused-value]
        memchr(ptr, '1', size + 1);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~
      lib/test_kasan.c: In function `kasan_memcmp':
      lib/test_kasan.c:622:2: warning: statement with no effect [-Wunused-value]
        memcmp(ptr, arr, size+1);
        ^~~~~~~~~~~~~~~~~~~~~~~~
      lib/test_kasan.c: In function `kasan_strings':
      lib/test_kasan.c:645:2: warning: statement with no effect [-Wunused-value]
        strchr(ptr, '1');
        ^~~~~~~~~~~~~~~~
      ...
      
      This annotation would make sense to add and could be added at any point,
      so the behaviour of test_kasan.c should change.
      
      The fix
      =======
      
      Make all the functions that are pure write their results to a global,
      which makes them live.  The strlen and memchr tests now pass.
      
      The memcmp test still fails to trigger, which is addressed in the next
      patch.
      
      [dja@axtens.net: drop patch 3]
        Link: http://lkml.kernel.org/r/20200424145521.8203-2-dja@axtens.net
      Fixes: 0c96350a ("lib/test_kasan.c: add tests for several string/memory API functions")
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarDavid Gow <davidgow@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Link: http://lkml.kernel.org/r/20200423154503.5103-1-dja@axtens.net
      Link: http://lkml.kernel.org/r/20200423154503.5103-2-dja@axtens.netSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      adb72ae1
    • John Hubbard's avatar
      mm/gup: might_lock_read(mmap_sem) in get_user_pages_fast() · f81cd178
      John Hubbard authored
      Instead of scattering these assertions across the drivers, do this
      assertion inside the core of get_user_pages_fast*() functions.  That also
      includes pin_user_pages_fast*() routines.
      
      Add a might_lock_read(mmap_sem) call to internal_get_user_pages_fast().
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Link: http://lkml.kernel.org/r/20200522010443.1290485-1-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f81cd178
    • John Hubbard's avatar
      drm/i915: convert get_user_pages() --> pin_user_pages() · 2170ecfa
      John Hubbard authored
      This code was using get_user_pages*(), in a "Case 2" scenario (DMA/RDMA),
      using the categorization from [1].  That means that it's time to convert
      the get_user_pages*() + put_page() calls to pin_user_pages*() +
      unpin_user_pages() calls.
      
      There is some helpful background in [2]: basically, this is a small part
      of fixing a long-standing disconnect between pinning pages, and file
      systems' use of those pages.
      
      [1] Documentation/core-api/pin_user_pages.rst
      
      [2] "Explicit pinning of user-space pages":
          https://lwn.net/Articles/807108/Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Link: http://lkml.kernel.org/r/20200519002124.2025955-5-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2170ecfa
    • John Hubbard's avatar
      mm/gup: introduce pin_user_pages_fast_only() · 104acc32
      John Hubbard authored
      This is the FOLL_PIN equivalent of __get_user_pages_fast(), except with a
      more descriptive name, and gup_flags instead of a boolean "write" in the
      argument list.
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://lkml.kernel.org/r/20200519002124.2025955-4-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      104acc32
    • John Hubbard's avatar
      mm/gup: refactor and de-duplicate gup_fast() code · 376a34ef
      John Hubbard authored
      There were two nearly identical sets of code for gup_fast() style of
      walking the page tables with interrupts disabled.  This has lead to the
      usual maintenance problems that arise from having duplicated code.
      
      There is already a core internal routine in gup.c for gup_fast(), so just
      enhance it very slightly: allow skipping the fall-back to "slow" (regular)
      get_user_pages(), via the new FOLL_FAST_ONLY flag.  Then, just call
      internal_get_user_pages_fast() from __get_user_pages_fast(), and adjust
      the API to match pre-existing API behavior.
      
      There is a change in behavior from this refactoring: the nested form of
      interrupt disabling is used in all gup_fast() variants now.  That's
      because there is only one place that interrupt disabling for page walking
      is done, and so the safer form is required.  This should, if anything,
      eliminate possible (rare) bugs, because the non-nested form of enabling
      interrupts was fragile at best.
      
      [jhubbard@nvidia.com: fixup]
        Link: http://lkml.kernel.org/r/20200521233841.1279742-1-jhubbard@nvidia.comSigned-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://lkml.kernel.org/r/20200519002124.2025955-3-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      376a34ef
    • John Hubbard's avatar
      mm/gup: move __get_user_pages_fast() down a few lines in gup.c · 9e1f0580
      John Hubbard authored
      Patch series "mm/gup, drm/i915: refactor gup_fast, convert to pin_user_pages()", v2.
      
      In order to convert the drm/i915 driver from get_user_pages() to
      pin_user_pages(), a FOLL_PIN equivalent of __get_user_pages_fast() was
      required.  That led to refactoring __get_user_pages_fast(), with the
      following goals:
      
      1) As above: provide a pin_user_pages*() routine for drm/i915 to call,
         in place of __get_user_pages_fast(),
      
      2) Get rid of the gup.c duplicate code for walking page tables with
         interrupts disabled. This duplicate code is a minor maintenance
         problem anyway.
      
      3) Make it easy for an upcoming patch from Souptick, which aims to
         convert __get_user_pages_fast() to use a gup_flags argument, instead
         of a bool writeable arg.  Also, if this series looks good, we can
         ask Souptick to change the name as well, to whatever the consensus
         is. My initial recommendation is: get_user_pages_fast_only(), to
         match the new pin_user_pages_only().
      
      This patch (of 4):
      
      This is in order to avoid a forward declaration of
      internal_get_user_pages_fast(), in the next patch.
      
      This is code movement only--all generated code should be identical.
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://lkml.kernel.org/r/20200522051931.54191-1-jhubbard@nvidia.com
      Link: http://lkml.kernel.org/r/20200519002124.2025955-1-jhubbard@nvidia.com
      Link: http://lkml.kernel.org/r/20200519002124.2025955-2-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e1f0580
    • Shakeel Butt's avatar
      mm/memcg: optimize memory.numa_stat like memory.stat · dd8657b6
      Shakeel Butt authored
      Currently reading memory.numa_stat traverses the underlying memcg tree
      multiple times to accumulate the stats to present the hierarchical view of
      the memcg tree.  However the kernel already maintains the hierarchical
      view of the stats and use it in memory.stat.  Just use the same mechanism
      in memory.numa_stat as well.
      
      I ran a simple benchmark which reads root_mem_cgroup's memory.numa_stat
      file in the presense of 10000 memcgs.  The results are:
      
      Without the patch:
      $ time cat /dev/cgroup/memory/memory.numa_stat > /dev/null
      
      real    0m0.700s
      user    0m0.001s
      sys     0m0.697s
      
      With the patch:
      $ time cat /dev/cgroup/memory/memory.numa_stat > /dev/null
      
      real    0m0.001s
      user    0m0.001s
      sys     0m0.000s
      
      [akpm@linux-foundation.org: avoid forcing out-of-line code generation]
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Link: http://lkml.kernel.org/r/20200304022058.248270-1-shakeelb@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd8657b6
    • Wang Hai's avatar
      mm/slub: fix a memory leak in sysfs_slab_add() · dde3c6b7
      Wang Hai authored
      syzkaller reports for memory leak when kobject_init_and_add() returns an
      error in the function sysfs_slab_add() [1]
      
      When this happened, the function kobject_put() is not called for the
      corresponding kobject, which potentially leads to memory leak.
      
      This patch fixes the issue by calling kobject_put() even if
      kobject_init_and_add() fails.
      
      [1]
        BUG: memory leak
        unreferenced object 0xffff8880a6d4be88 (size 8):
        comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s)
        hex dump (first 8 bytes):
          70 69 64 5f 33 00 ff ff                          pid_3...
        backtrace:
           kstrdup+0x35/0x70 mm/util.c:60
           kstrdup_const+0x3d/0x50 mm/util.c:82
           kvasprintf_const+0x112/0x170 lib/kasprintf.c:48
           kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289
           kobject_add_varg lib/kobject.c:384 [inline]
           kobject_init_and_add+0xd8/0x170 lib/kobject.c:473
           sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811
           __kmem_cache_create+0x50a/0x570 mm/slub.c:4384
           create_cache+0x113/0x1e0 mm/slab_common.c:407
           kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505
           kmem_cache_create+0xd/0x10 mm/slab_common.c:564
           create_pid_cachep kernel/pid_namespace.c:54 [inline]
           create_pid_namespace kernel/pid_namespace.c:96 [inline]
           copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148
           create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95
           unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229
           ksys_unshare+0x3d2/0x770 kernel/fork.c:2969
           __do_sys_unshare kernel/fork.c:3037 [inline]
           __se_sys_unshare kernel/fork.c:3035 [inline]
           __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035
           do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295
      
      Fixes: 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200602115033.1054-1-wanghai38@huawei.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dde3c6b7
  2. 03 Jun, 2020 10 commits
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · d6f9469a
      Linus Torvalds authored
      Pull erofs updates from Gao Xiang:
       "The most interesting part is the new mount api conversion, which is
        actually a old patch already pending for several cycles. And the
        others are recent trivial cleanups here.
      
        Summary:
      
         - Convert to use the new mount apis
      
         - Some random cleanup patches"
      
      * tag 'erofs-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: suppress false positive last_block warning
        erofs: convert to use the new mount fs_context api
        erofs: code cleanup by removing ifdef macro surrounding
      d6f9469a
    • Linus Torvalds's avatar
      Merge tag 'jfs-5.8' of git://github.com/kleikamp/linux-shaggy · cadf3223
      Linus Torvalds authored
      Pull JFS update from David Kleikamp:
       "Replace zero-length array in JFS"
      
      * tag 'jfs-5.8' of git://github.com/kleikamp/linux-shaggy:
        jfs: Replace zero-length array with flexible-array member
      cadf3223
    • Linus Torvalds's avatar
      Merge tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · f3cdc8ae
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "Highlights:
      
         - speedup dead root detection during orphan cleanup, eg. when there
           are many deleted subvolumes waiting to be cleaned, the trees are
           now looked up in radix tree instead of a O(N^2) search
      
         - snapshot creation with inherited qgroup will mark the qgroup
           inconsistent, requires a rescan
      
         - send will emit file capabilities after chown, this produces a
           stream that does not need postprocessing to set the capabilities
           again
      
         - direct io ported to iomap infrastructure, cleaned up and simplified
           code, notably removing last use of struct buffer_head in btrfs code
      
        Core changes:
      
         - factor out backreference iteration, to be used by ordinary
           backreferences and relocation code
      
         - improved global block reserve utilization
            * better logic to serialize requests
            * increased maximum available for unlink
            * improved handling on large pages (64K)
      
         - direct io cleanups and fixes
            * simplify layering, where cloned bios were unnecessarily created
              for some cases
            * error handling fixes (submit, endio)
            * remove repair worker thread, used to avoid deadlocks during
              repair
      
         - refactored block group reading code, preparatory work for new type
           of block group storage that should improve mount time on large
           filesystems
      
        Cleanups:
      
         - cleaned up (and slightly sped up) set/get helpers for metadata data
           structure members
      
         - root bit REF_COWS got renamed to SHAREABLE to reflect the that the
           blocks of the tree get shared either among subvolumes or with the
           relocation trees
      
        Fixes:
      
         - when subvolume deletion fails due to ENOSPC, the filesystem is not
           turned read-only
      
         - device scan deals with devices from other filesystems that changed
           ownership due to overwrite (mkfs)
      
         - fix a race between scrub and block group removal/allocation
      
         - fix long standing bug of a runaway balance operation, printing the
           same line to the syslog, caused by a stale status bit on a reloc
           tree that prevented progress
      
         - fix corrupt log due to concurrent fsync of inodes with shared
           extents
      
         - fix space underflow for NODATACOW and buffered writes when it for
           some reason needs to fallback to COW mode"
      
      * tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits)
        btrfs: fix space_info bytes_may_use underflow during space cache writeout
        btrfs: fix space_info bytes_may_use underflow after nocow buffered write
        btrfs: fix wrong file range cleanup after an error filling dealloc range
        btrfs: remove redundant local variable in read_block_for_search
        btrfs: open code key_search
        btrfs: split btrfs_direct_IO to read and write part
        btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
        fs: remove dio_end_io()
        btrfs: switch to iomap_dio_rw() for dio
        iomap: remove lockdep_assert_held()
        iomap: add a filesystem hook for direct I/O bio submission
        fs: export generic_file_buffered_read()
        btrfs: turn space cache writeout failure messages into debug messages
        btrfs: include error on messages about failure to write space/inode caches
        btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
        btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
        btrfs: make checksum item extension more efficient
        btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
        btrfs: unexport btrfs_compress_set_level()
        btrfs: simplify iget helpers
        ...
      f3cdc8ae
    • Linus Torvalds's avatar
      Merge tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 8eeae5ba
      Linus Torvalds authored
      Pull DAX updates part two from Darrick Wong:
       "This time around, we're hoisting the DONTCACHE flag from XFS into the
        VFS so that we can make the incore DAX mode changes become effective
        sooner.
      
        We can't change the file data access mode on a live inode because we
        don't have a safe way to change the file ops pointers. The incore
        state change becomes effective at inode loading time, which can happen
        if the inode is evicted. Therefore, we're making it so that
        filesystems can ask the VFS to evict the inode as soon as the last
        holder drops.
      
        The per-fs changes to make this call this will be in subsequent pull
        requests from Ted and myself.
      
        Summary:
      
         - Introduce DONTCACHE flags for dentries and inodes. This hint will
           cause the VFS to drop the associated objects immediately after the
           last put, so that we can change the file access mode (DAX or page
           cache) on the fly"
      
      * tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        fs: Introduce DCACHE_DONTCACHE
        fs: Lift XFS_IDONTCACHE to the VFS layer
      8eeae5ba
    • Linus Torvalds's avatar
      Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 96ed320d
      Linus Torvalds authored
      Pull DAX updates part one from Darrick Wong:
       "After many years of LKML-wrangling about how to enable programs to
        query and influence the file data access mode (DAX) when a filesystem
        resides on storage devices such as persistent memory, Ira Weiny has
        emerged with a proposed set of standard behaviors that has not been
        shot down by anyone! We're more or less standardizing on the current
        XFS behavior and adapting ext4 to do the same.
      
        This is the first of a handful pull requests that will make ext4 and
        XFS present a consistent interface for user programs that care about
        DAX. We add a statx attribute that programs can check to see if DAX is
        enabled on a particular file. Then, we update the DAX documentation to
        spell out the user-visible behaviors that filesystems will guarantee
        (until the next storage industry shakeup). The on-disk inode flag has
        been in XFS for a few years now.
      
        Summary:
      
         - Clean up io_is_direct.
      
         - Add a new statx flag to indicate when file data access is being
           done via DAX (as opposed to the page cache).
      
         - Update the documentation for how system administrators and
           application programmers can take advantage of the (still
           experimental DAX) feature"
      
      Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/
      
      * tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        Documentation/dax: Update Usage section
        fs/stat: Define DAX statx attribute
        fs: Remove unneeded IS_DAX() check in io_is_direct()
      96ed320d
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 16d91548
      Linus Torvalds authored
      Pull xfs updates from Darrick Wong:
       "Most of the changes this cycle are refactoring of existing code in
        preparation for things landing in the future.
      
        We also fixed various problems and deficiencies in the quota
        implementation, and (I hope) the last of the stale read vectors by
        forcing write allocations to go through the unwritten state until the
        write completes.
      
        Summary:
      
         - Various cleanups to remove dead code, unnecessary conditionals,
           asserts, etc.
      
         - Fix a linker warning caused by xfs stuffing '-g' into CFLAGS
           redundantly.
      
         - Tighten up our dmesg logging to ensure that everything is prefixed
           with 'XFS' for easier grepping.
      
         - Kill a bunch of typedefs.
      
         - Refactor the deferred ops code to reduce indirect function calls.
      
         - Increase type-safety with the deferred ops code.
      
         - Make the DAX mount options a tri-state.
      
         - Fix some error handling problems in the inode flush code and clean
           up other inode flush warts.
      
         - Refactor log recovery so that each log item recovery functions now
           live with the other log item processing code.
      
         - Fix some SPDX forms.
      
         - Fix quota counter corruption if the fs crashes after running
           quotacheck but before any dquots get logged.
      
         - Don't fail metadata verification on zero-entry attr leaf blocks,
           since they're just part of the disk format now due to a historic
           lack of log atomicity.
      
         - Don't allow SWAPEXT between files with different [ugp]id when
           quotas are enabled.
      
         - Refactor inode fork reading and verification to run directly from
           the inode-from-disk function. This means that we now actually
           guarantee that _iget'ted inodes are totally verified and ready to
           go.
      
         - Move the incore inode fork format and extent counts to the ifork
           structure.
      
         - Scalability improvements by reducing cacheline pingponging in
           struct xfs_mount.
      
         - More scalability improvements by removing m_active_trans from the
           hot path.
      
         - Fix inode counter update sanity checking to run /only/ on debug
           kernels.
      
         - Fix longstanding inconsistency in what error code we return when a
           program hits project quota limits (ENOSPC).
      
         - Fix group quota returning the wrong error code when a program hits
           group quota limits.
      
         - Fix per-type quota limits and grace periods for group and project
           quotas so that they actually work.
      
         - Allow extension of individual grace periods.
      
         - Refactor the non-reclaim inode radix tree walking code to remove a
           bunch of stupid little functions and straighten out the
           inconsistent naming schemes.
      
         - Fix a bug in speculative preallocation where we measured a new
           allocation based on the last extent mapping in the file instead of
           looking farther for the last contiguous space allocation.
      
         - Force delalloc writes to unwritten extents. This closes a stale
           disk contents exposure vector if the system goes down before the
           write completes.
      
         - More lockdep whackamole"
      
      * tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (129 commits)
        xfs: more lockdep whackamole with kmem_alloc*
        xfs: force writes to delalloc regions to unwritten
        xfs: refactor xfs_iomap_prealloc_size
        xfs: measure all contiguous previous extents for prealloc size
        xfs: don't fail unwritten extent conversion on writeback due to edquot
        xfs: rearrange xfs_inode_walk_ag parameters
        xfs: straighten out all the naming around incore inode tree walks
        xfs: move xfs_inode_ag_iterator to be closer to the perag walking code
        xfs: use bool for done in xfs_inode_ag_walk
        xfs: fix inode ag walk predicate function return values
        xfs: refactor eofb matching into a single helper
        xfs: remove __xfs_icache_free_eofblocks
        xfs: remove flags argument from xfs_inode_ag_walk
        xfs: remove xfs_inode_ag_iterator_flags
        xfs: remove unused xfs_inode_ag_iterator function
        xfs: replace open-coded XFS_ICI_NO_TAG
        xfs: move eofblocks conversion function to xfs_ioctl.c
        xfs: allow individual quota grace period extension
        xfs: per-type quota timers and warn limits
        xfs: switch xfs_get_defquota to take explicit type
        ...
      16d91548
    • Linus Torvalds's avatar
      Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · d9afbb35
      Linus Torvalds authored
      Pull lockdown update from James Morris:
       "An update for the security subsystem to allow unprivileged users
        to see the status of the lockdown feature. From Jeremy Cline"
      
      Also an added comment to describe CAP_SETFCAP.
      
      * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        capabilities: add description for CAP_SETFCAP
        lockdown: Allow unprivileged users to see lockdown status
      d9afbb35
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · f41030a2
      Linus Torvalds authored
      Pull SELinux updates from Paul Moore:
       "The highlights:
      
         - A number of improvements to various SELinux internal data
           structures to help improve performance. We move the role
           transitions into a hash table. In the content structure we shift
           from hashing the content string (aka SELinux label) to the
           structure itself, when it is valid. This last change not only
           offers a speedup, but it helps us simplify the code some as well.
      
         - Add a new SELinux policy version which allows for a more space
           efficient way of storing the filename transitions in the binary
           policy. Given the default Fedora SELinux policy with the unconfined
           module enabled, this change drops the policy size from ~7.6MB to
           ~3.3MB. The kernel policy load time dropped as well.
      
         - Some fixes to the error handling code in the policy parser to
           properly return error codes when things go wrong"
      
      * tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: netlabel: Remove unused inline function
        selinux: do not allocate hashtabs dynamically
        selinux: fix return value on error in policydb_read()
        selinux: simplify range_write()
        selinux: fix error return code in policydb_read()
        selinux: don't produce incorrect filename_trans_count
        selinux: implement new format of filename transitions
        selinux: move context hashing under sidtab
        selinux: hash context structure directly
        selinux: store role transitions in a hash table
        selinux: drop unnecessary smp_load_acquire() call
        selinux: fix warning Comparison to bool
      f41030a2
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · 9d99b164
      Linus Torvalds authored
      Pull audit updates from Paul Moore:
       "Summary of the significant patches:
      
         - Record information about binds/unbinds to the audit multicast
           socket. This helps identify which processes have/had access to the
           information in the audit stream.
      
         - Cleanup and add some additional information to the netfilter
           configuration events collected by audit.
      
         - Fix some of the audit error handling code so we don't leak network
           namespace references"
      
      * tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: add subj creds to NETFILTER_CFG record to
        audit: Replace zero-length array with flexible-array
        audit: make symbol 'audit_nfcfgs' static
        netfilter: add audit table unregister actions
        audit: tidy and extend netfilter_cfg x_tables
        audit: log audit netlink multicast bind and unbind
        audit: fix a net reference leak in audit_list_rules_send()
        audit: fix a net reference leak in audit_send_reply()
      9d99b164
    • Linus Torvalds's avatar
      Merge tag 'tomoyo-pr-20200601' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1 · 91681e84
      Linus Torvalds authored
      Pull tomoyo update from Tetsuo Handa:
       "One patch for suppressing coccicheck's warning"
      
      * tag 'tomoyo-pr-20200601' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
        tomoyo: use true for bool variable
      91681e84
  3. 02 Jun, 2020 3 commits
    • Stefan Hajnoczi's avatar
      capabilities: add description for CAP_SETFCAP · 56f2e3b7
      Stefan Hajnoczi authored
      Document the purpose of CAP_SETFCAP.  For some reason this capability
      had no description while the others did.
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      56f2e3b7
    • Linus Torvalds's avatar
      Merge tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block · 1ee08de1
      Linus Torvalds authored
      Pull io_uring updates from Jens Axboe:
       "A relatively quiet round, mostly just fixes and code improvements. In
      particular:
      
         - Make statx just use the generic statx handler, instead of open
           coding it. We don't need that anymore, as we always call it async
           safe (Bijan)
      
         - Enable closing of the ring itself. Also fixes O_PATH closure (me)
      
         - Properly name completion members (me)
      
         - Batch reap of dead file registrations (me)
      
         - Allow IORING_OP_POLL with double waitqueues (me)
      
         - Add tee(2) support (Pavel)
      
         - Remove double off read (Pavel)
      
         - Fix overflow cancellations (Pavel)
      
         - Improve CQ timeouts (Pavel)
      
         - Async defer drain fixes (Pavel)
      
         - Add support for enabling/disabling notifications on a registered
           eventfd (Stefano)
      
         - Remove dead state parameter (Xiaoguang)
      
         - Disable SQPOLL submit on dying ctx (Xiaoguang)
      
         - Various code cleanups"
      
      * tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block: (29 commits)
        io_uring: fix overflowed reqs cancellation
        io_uring: off timeouts based only on completions
        io_uring: move timeouts flushing to a helper
        statx: hide interfaces no longer used by io_uring
        io_uring: call statx directly
        statx: allow system call to be invoked from io_uring
        io_uring: add io_statx structure
        io_uring: get rid of manual punting in io_close
        io_uring: separate DRAIN flushing into a cold path
        io_uring: don't re-read sqe->off in timeout_prep()
        io_uring: simplify io_timeout locking
        io_uring: fix flush req->refs underflow
        io_uring: don't submit sqes when ctx->refs is dying
        io_uring: async task poll trigger cleanup
        io_uring: add tee(2) support
        splice: export do_tee()
        io_uring: don't repeat valid flag list
        io_uring: rename io_file_put()
        io_uring: remove req->needs_fixed_files
        io_uring: cleanup io_poll_remove_one() logic
        ...
      1ee08de1
    • Linus Torvalds's avatar
      Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block · bce159d7
      Linus Torvalds authored
      Pull block driver updates from Jens Axboe:
       "On top of the core changes, here are the block driver changes for this
        merge window:
      
         - NVMe changes:
              - NVMe over Fibre Channel protocol updates, which also reach
                over to drivers/scsi/lpfc (James Smart)
              - namespace revalidation support on the target (Anthony
                Iliopoulos)
              - gcc zero length array fix (Arnd Bergmann)
              - nvmet cleanups (Chaitanya Kulkarni)
              - misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
              - use a SRQ per completion vector (Max Gurtovoy)
              - fix handling of runtime changes to the queue count (Weiping
                Zhang)
              - t10 protection information support for nvme-rdma and
                nvmet-rdma (Israel Rukshin and Max Gurtovoy)
              - target side AEN improvements (Chaitanya Kulkarni)
              - various fixes and minor improvements all over, icluding the
                nvme part of the lpfc driver"
      
         - Floppy code cleanup series (Willy, Denis)
      
         - Floppy contention fix (Jiri)
      
         - Loop CONFIGURE support (Martijn)
      
         - bcache fixes/improvements (Coly, Joe, Colin)
      
         - q->queuedata cleanups (Christoph)
      
         - Get rid of ioctl_by_bdev (Christoph, Stefan)
      
         - md/raid5 allocation fixes (Coly)
      
         - zero length array fixes (Gustavo)
      
         - swim3 task state fix (Xu)"
      
      * tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
        bcache: configure the asynchronous registertion to be experimental
        bcache: asynchronous devices registration
        bcache: fix refcount underflow in bcache_device_free()
        bcache: Convert pr_<level> uses to a more typical style
        bcache: remove redundant variables i and n
        lpfc: Fix return value in __lpfc_nvme_ls_abort
        lpfc: fix axchg pointer reference after free and double frees
        lpfc: Fix pointer checks and comments in LS receive refactoring
        nvme: set dma alignment to qword
        nvmet: cleanups the loop in nvmet_async_events_process
        nvmet: fix memory leak when removing namespaces and controllers concurrently
        nvmet-rdma: add metadata/T10-PI support
        nvmet: add metadata support for block devices
        nvmet: add metadata/T10-PI support
        nvme: add Metadata Capabilities enumerations
        nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
        nvmet: rename nvmet_rw_len to nvmet_rw_data_len
        nvmet: add metadata characteristics for a namespace
        nvme-rdma: add metadata/T10-PI support
        nvme-rdma: introduce nvme_rdma_sgl structure
        ...
      bce159d7