Commits · 0544b324e62c177c3a9e9c3bdce22e6db9f34588 · Kirill Smelkov / linux

28 Dec, 2018 6 commits

Joe Perches authored Dec 20, 2018

kzalloc can return NULL so an additional check is needed. While there
is a check for ret_buf there is no check for the allocation of
ret_buf->crfid.fid - this check is thus added. Both call-sites
of tconInfoAlloc() check for NULL return of tconInfoAlloc()
so returning NULL on failure of kzalloc() here seems appropriate.
As the kzalloc() is the only thing here that can fail it is
moved to the beginning so as not to initialize other resources
on failure of kzalloc.

Fixes: 3d4ef9a1 ("smb3: fix redundant opens on root")
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

0544b324

cifs: remove set but not used variable 'server' · 29cbfa1b

YueHaibing authored Dec 18, 2018

Fixes gcc '-Wunused-but-set-variable' warning:

fs/cifs/smb2pdu.c: In function 'smb311_posix_mkdir':
fs/cifs/smb2pdu.c:2040:26: warning:
 variable 'server' set but not used [-Wunused-but-set-variable]

fs/cifs/smb2pdu.c: In function 'build_qfs_info_req':
fs/cifs/smb2pdu.c:4067:26: warning:
 variable 'server' set but not used [-Wunused-but-set-variable]

The first 'server' never used since commit bea851b8 ("smb3: Fix mode on
mkdir on smb311 mounts")
And the second not used since commit 1fc6ad2f ("cifs: remove
header_preamble_size where it is always 0")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

29cbfa1b

cifs: Use kzfree() to free password · 34bca9bb

Dan Carpenter authored Dec 20, 2018

We should zero out the password before we free it.

Fixes: 3d6cacbb5310 ("cifs: Add DFS cache routines")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>

34bca9bb

cifs: Fix to use kmem_cache_free() instead of kfree() · 3e80be01

Wei Yongjun authored Dec 18, 2018

memory allocated by kmem_cache_alloc() in alloc_cache_entry()
should be freed using kmem_cache_free(), not kfree().

Fixes: 34a44fb160f9 ("cifs: Add DFS cache routines")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>

3e80be01

cifs: update for current_kernel_time64() removal · 54e4f73c

Stephen Rothwell authored Dec 17, 2018

Fixes cifs build failure after merge of the y2038 tree

After merging the y2038 tree, today's linux-next build (x86_64
allmodconfig) failed like this:

fs/cifs/dfs_cache.c: In function 'cache_entry_expired':
fs/cifs/dfs_cache.c:106:7: error: implicit declaration of function 'current_kernel_time64'; did you mean 'core_kernel_text'? [-Werror=implicit-function-declaration]
  ts = current_kernel_time64();
       ^~~~~~~~~~~~~~~~~~~~~
       core_kernel_text
fs/cifs/dfs_cache.c:106:5: error: incompatible types when assigning to type 'struct timespec64' from type 'int'
  ts = current_kernel_time64();
     ^
fs/cifs/dfs_cache.c: In function 'get_expire_time':
fs/cifs/dfs_cache.c:342:24: error: incompatible type for argument 1 of 'timespec64_add'
  return timespec64_add(current_kernel_time64(), ts);
                        ^~~~~~~~~~~~~~~~~~~~~~~
In file included from include/linux/restart_block.h:10,
                 from include/linux/thread_info.h:13,
                 from arch/x86/include/asm/preempt.h:7,
                 from include/linux/preempt.h:78,
                 from include/linux/rcupdate.h:40,
                 from fs/cifs/dfs_cache.c:8:
include/linux/time64.h:66:66: note: expected 'struct timespec64' but argument is of type 'int'
 static inline struct timespec64 timespec64_add(struct timespec64 lhs,
                                                ~~~~~~~~~~~~~~~~~~^~~
fs/cifs/dfs_cache.c:343:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^

Caused by:

  commit ccea641b6742 ("timekeeping: remove obsolete time accessors")

interacting with:
  commit 34a44fb160f9 ("cifs: Add DFS cache routines")

from the cifs tree.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steve French <stfrench@microsoft.com>

54e4f73c

cifs: Add DFS cache routines · 54be1f6c

Paulo Alcantara authored Nov 14, 2018

* Add new dfs_cache.[ch] files

* Add new /proc/fs/cifs/dfscache file
  - dump current cache when read
  - clear current cache when writing "0" to it

* Add delayed_work to periodically refresh cache entries

The new interface will be used for caching DFS referrals, as well as
supporting client target failover.

The DFS cache is a hashtable that maps UNC paths to cache entries.

A cache entry contains:
- the UNC path it is mapped on
- how much the the UNC path the entry consumes
- flags
- a Time-To-Live after which the entry expires
- a list of possible targets (linked lists of UNC paths)
- a "hint target" pointing the last known working target or the first
  target if none were tried. This hint lets cifs.ko remember and try
  working targets first.

* Looking for an entry in the cache is done with dfs_cache_find()
  - if no valid entries are found, a DFS query is made, stored in the
    cache and returned
  - the full target list can be copied and returned to avoid race
    conditions and looped on with the help with the
    dfs_cache_tgt_iterator

* Updating the target hint to the next target is done with
  dfs_cache_update_tgthint()

These functions have a dfs_cache_noreq_XXX() version that doesn't
fetches referrals if no entries are found. These versions don't
require the tcp/ses/tcon/cifs_sb parameters as a result.

Expired entries cannot be used and since they have a pretty short TTL
[1] in order for them to be useful for failover the DFS cache adds a
delayed work called periodically to keep them fresh.

Since we might not have available connections to issue the referral
request when refreshing we need to store volume_info structs with
credentials and other needed info to be able to connect to the right
server.

1: Windows defaults: 5mn for domain-based referrals, 30mn for regular
links
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

54be1f6c

24 Dec, 2018 19 commits

cifs: Save TTL value when parsing DFS referrals · e7b602f4

Paulo Alcantara authored Nov 14, 2018

This will be needed by DFS cache.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

e7b602f4

cifs: auto disable 'serverino' in dfs mounts · 5fc7fcd0

Aurelien Aptel authored Nov 16, 2018

Different servers have different set of file ids.

After failover, unique IDs will be different so we can't validate
them.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>

5fc7fcd0

cifs: Make devname param optional in cifs_compose_mount_options() · d9345e0a

Paulo Alcantara authored Nov 14, 2018

If we only want to get the mount options strings, do not return the
devname.

For DFS failover, we'll be passing the DFS full path down to
cifs_mount() rather than the devname.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

d9345e0a

cifs: Skip any trailing backslashes from UNC · c34fea5a

Paulo Alcantara authored Nov 14, 2018

When extracting hostname from UNC, check for leading backslashes
before trying to remove them.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

c34fea5a

cifs: Refactor out cifs_mount() · 56c762eb

Paulo Alcantara authored Nov 14, 2018

* Split and refactor the very large function cifs_mount() in multiple
  functions:

- tcp, ses and tcon setup to mount_get_conns()
- tcp, ses and tcon cleanup in mount_put_conns()
- tcon tlink setup to mount_setup_tlink()
- remote path checking to is_path_remote()

* Implement 2 version of cifs_mount() for DFS-enabled builds and
  non-DFS-enabled builds (CONFIG_CIFS_DFS_UPCALL).

In preparation for DFS failover support.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

56c762eb

CIFS: Fix error mapping for SMB2_LOCK command which caused OFD lock problem · 9a596f5b

Georgy A Bystrenin authored Dec 21, 2018

While resolving a bug with locks on samba shares found a strange behavior.
When a file locked by one node and we trying to lock it from another node
it fail with errno 5 (EIO) but in that case errno must be set to
(EACCES | EAGAIN).
This isn't happening when we try to lock file second time on same node.
In this case it returns EACCES as expected.
Also this issue not reproduces when we use SMB1 protocol (vers=1.0 in
mount options).

Further investigation showed that the mapping from status_to_posix_error
is different for SMB1 and SMB2+ implementations.
For SMB1 mapping is [NT_STATUS_LOCK_NOT_GRANTED to ERRlock]
(See fs/cifs/netmisc.c line 66)
but for SMB2+ mapping is [STATUS_LOCK_NOT_GRANTED to -EIO]
(see fs/cifs/smb2maperror.c line 383)

Quick changes in SMB2+ mapping from EIO to EACCES has fixed issue.

BUG: https://bugzilla.kernel.org/show_bug.cgi?id=201971Signed-off-by: Georgy A Bystrenin <gkot@altlinux.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

9a596f5b

CIFS: return correct errors when pinning memory failed for direct I/O · 54e94ff9

Long Li authored Dec 16, 2018

When pinning memory failed, we should return the correct error code and
rewind the SMB credits.
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Cc: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

54e94ff9

CIFS: use the correct length when pinning memory for direct I/O for write · b6bc8a7b

Long Li authored Dec 16, 2018

The current code attempts to pin memory using the largest possible wsize
based on the currect SMB credits. This doesn't cause kernel oops but this
is not optimal as we may pin more pages then actually needed.

Fix this by only pinning what are needed for doing this write I/O.
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>

b6bc8a7b

cifs: check ntwrk_buf_start for NULL before dereferencing it · 59a63e47

Ronnie Sahlberg authored Dec 13, 2018

RHBZ: 1021460

There is an issue where when multiple threads open/close the same directory
ntwrk_buf_start might end up being NULL, causing the call to smbCalcSize
later to oops with a NULL deref.

The real bug is why this happens and why this can become NULL for an
open cfile, which should not be allowed.
This patch tries to avoid a oops until the time when we fix the underlying
issue.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

59a63e47

cifs: remove coverity warning in calc_lanman_hash · 52baa51d

Ronnie Sahlberg authored Dec 12, 2018

password_with_pad is a fixed size buffer of 16 bytes, it contains a
password string, to be padded with \0 if shorter than 16 bytes
but is just truncated if longer.
It is not, and we do not depend on it to be, nul terminated.

As such, do not use strncpy() to populate this buffer since
the str* prefix suggests that this is a string, which it is not,
and it also confuses coverity causing a false warning.

Detected by CoverityScan CID#113743 ("Buffer not null terminated")
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

52baa51d

cifs: remove set but not used variable 'smb_buf' · 0f57451e

YueHaibing authored Dec 07, 2018

Fixes gcc '-Wunused-but-set-variable' warning:

fs/cifs/sess.c: In function '_sess_auth_rawntlmssp_assemble_req':
fs/cifs/sess.c:1157:18: warning:
 variable 'smb_buf' set but not used [-Wunused-but-set-variable]

It never used since commit cc87c47d ("cifs: Separate rawntlmssp auth
from CIFS_SessSetup()")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

0f57451e

cifs: suppress some implicit-fallthrough warnings · 07fa6010

Gustavo A. R. Silva authored Nov 27, 2018

To avoid the warning:

     warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steve French <stfrench@microsoft.com>

07fa6010

cifs: change smb2_query_eas to use the compound query-info helper · f9793b6f

Ronnie Sahlberg authored Nov 27, 2018

Reducing the number of network roundtrips improves the performance
of query xattrs
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

f9793b6f

Add vers=3.0.2 as a valid option for SMBv3.0.2 · 4a3b38ae

Kenneth D'souza authored Nov 17, 2018

Technically 3.02 is not the dialect name although that is more familiar to
many, so we should also accept the official dialect name (3.0.2 vs. 3.02)
in vers=
Signed-off-by: Kenneth D'souza <kdsouza@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

4a3b38ae

cifs: create a helper function for compound query_info · 07d3b2e4

Ronnie Sahlberg authored Dec 20, 2018

and convert statfs to use it.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

07d3b2e4

cifs: address trivial coverity warning · 97aa495a

Steve French authored Nov 15, 2018

This is not actually a bug but as Coverity points out we shouldn't
be doing an "|=" on a value which hasn't been set (although technically
it was memset to zero so isn't a bug) and so might as well change
"|=" to "=" in this line

Detected by CoverityScan, CID#728535 ("Unitialized scalar variable")
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>

97aa495a

cifs: smb2 commands can not be negative, remove confusing check · f5942db5

Steve French authored Nov 14, 2018

As Coverity points out le16_to_cpu(midEntry->Command) can not be
less than zero.

Detected by CoverityScan, CID#1438650 ("Macro compares unsigned to 0")
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>

f5942db5

cifs: use a compound for setting an xattr · 0967e545

Ronnie Sahlberg authored Nov 06, 2018

Improve performance by reducing number of network round trips
for set xattr.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

0967e545

cifs: clean up indentation, replace spaces with tab · 5890255b

Colin Ian King authored Nov 03, 2018

Trivial fix to clean up indentation, replace spaces with tab
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

5890255b

23 Dec, 2018 2 commits

Linux 4.20 · 8fe28cb5
Linus Torvalds authored Dec 23, 2018

8fe28cb5

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 3c730b10

Linus Torvalds authored Dec 23, 2018

Pull vfs fixes from Al Viro:
 "A couple of fixes - no common topic ;-)"

[ The aio spectre patch also came in from Jens, so now we have that
  doubly fixed .. ]

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  proc/sysctl: don't return ENOMEM on lookup when a table is unregistering
  aio: fix spectre gadget in lookup_ioctx

3c730b10

22 Dec, 2018 5 commits

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 9105b8aa

Linus Torvalds authored Dec 22, 2018

Pull SCSI fixes from James Bottomley:
 "This is two simple target fixes and one discard related I/O starvation
  problem in sd.

  The discard problem occurs because the discard page doesn't have a
  mempool backing so if the allocation fails due to memory pressure, we
  then lose the forward progress we require if the writeout is on the
  same device. The fix is to back it with a mempool"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: use mempool for discard special page
  scsi: target: iscsi: cxgbit: add missing spin_lock_init()
  scsi: target: iscsi: cxgbit: fix csk leak

9105b8aa

Merge tag 'compiler-attributes-for-linus-v4.20' of https://github.com/ojeda/linux · 1104bd96

Linus Torvalds authored Dec 22, 2018

Pull compiler_types.h fix from Miguel Ojeda:
 "A cleanup for userspace in compiler_types.h: don't pollute userspace
  with macro definitions (Xiaozhou Liu)

  This is harmless for the kernel, but v4.19 was released with a few
  macros exposed to userspace as the patch explains; which this removes,
  so it *could* happen that we break something for someone (although
  leaving inline redefined is probably worse)"

* tag 'compiler-attributes-for-linus-v4.20' of https://github.com/ojeda/linux:
  include/linux/compiler_types.h: don't pollute userspace with macro definitions

1104bd96

Merge tag 'auxdisplay-for-linus-v4.20' of https://github.com/ojeda/linux · 38c0ecf6

Linus Torvalds authored Dec 22, 2018

Pull auxdisplay fix from Miguel Ojeda:
 "charlcd: fix x/y command parsing (Mans Rullgard)"

* tag 'auxdisplay-for-linus-v4.20' of https://github.com/ojeda/linux:
  auxdisplay: charlcd: fix x/y command parsing

38c0ecf6

Revert "vfs: Allow userns root to call mknod on owned filesystems." · 94f82008

Christian Brauner authored Jul 05, 2018

This reverts commit 55956b59.

commit 55956b59 ("vfs: Allow userns root to call mknod on owned filesystems.")
enabled mknod() in user namespaces for userns root if CAP_MKNOD is
available. However, these device nodes are useless since any filesystem
mounted from a non-initial user namespace will set the SB_I_NODEV flag on
the filesystem. Now, when a device node s created in a non-initial user
namespace a call to open() on said device node will fail due to:

bool may_open_dev(const struct path *path)
{
        return !(path->mnt->mnt_flags & MNT_NODEV) &&
                !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
}

The problem with this is that as of the aforementioned commit mknod()
creates partially functional device nodes in non-initial user namespaces.
In particular, it has the consequence that as of the aforementioned commit
open() will be more privileged with respect to device nodes than mknod().
Before it was the other way around. Specifically, if mknod() succeeded
then it was transparent for any userspace application that a fatal error
must have occured when open() failed.

All of this breaks multiple userspace workloads and a widespread assumption
about how to handle mknod(). Basically, all container runtimes and systemd
live by the slogan "ask for forgiveness not permission" when running user
namespace workloads. For mknod() the assumption is that if the syscall
succeeds the device nodes are useable irrespective of whether it succeeds
in a non-initial user namespace or not. This logic was chosen explicitly
to allow for the glorious day when mknod() will actually be able to create
fully functional device nodes in user namespaces.
A specific problem people are already running into when running 4.18 rc
kernels are failing systemd services. For any distro that is run in a
container systemd services started with the PrivateDevices= property set
will fail to start since the device nodes in question cannot be
opened (cf. the arguments in [1]).

Full disclosure, Seth made the very sound argument that it is already
possible to end up with partially functional device nodes. Any filesystem
mounted with MS_NODEV set will allow mknod() to succeed but will not allow
open() to succeed. The difference to the case here is that the MS_NODEV
case is transparent to userspace since it is an explicitly set mount option
while the SB_I_NODEV case is an implicit property enforced by the kernel
and hence opaque to userspace.

[1]: https://github.com/systemd/systemd/pull/9483Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

94f82008

dma-mapping: fix flags in dma_alloc_wc · 0cd60eb1

Christoph Hellwig authored Dec 22, 2018

We really need the writecombine flag in dma_alloc_wc, fix a stupid
oversight.

Fixes: 7ed1d91a ("dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0cd60eb1

21 Dec, 2018 8 commits

Merge branch 'akpm' (patches from Andrew) · 23203e3f

Linus Torvalds authored Dec 21, 2018

Merge misc fixes from Andrew Morton:
 "4 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, page_alloc: fix has_unmovable_pages for HugePages
  fork,memcg: fix crash in free_thread_stack on memcg charge fail
  mm: thp: fix flags for pmd migration when split
  mm, memory_hotplug: initialize struct pages for the full memory section

23203e3f

mm, page_alloc: fix has_unmovable_pages for HugePages · 17e2e7d7

Oscar Salvador authored Dec 21, 2018

While playing with gigantic hugepages and memory_hotplug, I triggered
the following #PF when "cat memoryX/removable":

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 1481 Comm: cat Tainted: G            E     4.20.0-rc6-mm1-1-default+ #18
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:has_unmovable_pages+0x154/0x210
  Call Trace:
   is_mem_section_removable+0x7d/0x100
   removable_show+0x90/0xb0
   dev_attr_show+0x1c/0x50
   sysfs_kf_seq_show+0xca/0x1b0
   seq_read+0x133/0x380
   __vfs_read+0x26/0x180
   vfs_read+0x89/0x140
   ksys_read+0x42/0x90
   do_syscall_64+0x5b/0x180
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

The reason is we do not pass the Head to page_hstate(), and so, the call
to compound_order() in page_hstate() returns 0, so we end up checking
all hstates's size to match PAGE_SIZE.

Obviously, we do not find any hstate matching that size, and we return
NULL.  Then, we dereference that NULL pointer in
hugepage_migration_supported() and we got the #PF from above.

Fix that by getting the head page before calling page_hstate().

Also, since gigantic pages span several pageblocks, re-adjust the logic
for skipping pages.  While are it, we can also get rid of the
round_up().

[osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal]
  Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de
Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.deSigned-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

17e2e7d7

fork,memcg: fix crash in free_thread_stack on memcg charge fail · 5eed6f1d

Rik van Riel authored Dec 21, 2018

Commit 9b6f7e16 ("mm: rework memcg kernel stack accounting") will
result in fork failing if allocating a kernel stack for a task in
dup_task_struct exceeds the kernel memory allowance for that cgroup.

Unfortunately, it also results in a crash.

This is due to the code jumping to free_stack and calling
free_thread_stack when the memcg kernel stack charge fails, but without
tsk->stack pointing at the freshly allocated stack.

This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:

#5 [ffffc900244efc88] die at ffffffff8101f0ab
 #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
 #7 [ffffc900244efce0] general_protection at ffffffff818ff082
    [exception RIP: llist_add_batch+7]
    RIP: ffffffff8150d487  RSP: ffffc900244efd98  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: ffff88085ef55980  RCX: 0000000000000000
    RDX: ffff88085ef55980  RSI: 343834343531203a  RDI: 343834343531203a
    RBP: ffffc900244efd98   R8: 0000000000000001   R9: ffff8808578c3600
    R10: 0000000000000000  R11: 0000000000000001  R12: ffff88029f6c21c0
    R13: 0000000000000286  R14: ffff880147759b00  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
 #9 [ffffc900244efdb8] copy_process at ffffffff81086e37
#10 [ffffc900244efe98] _do_fork at ffffffff810884e0
#11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
#12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
    RIP: 000000000049b948  RSP: 00007ffcdb307830  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000896030  RCX: 000000000049b948
    RDX: 0000000000000000  RSI: 00007ffcdb307790  RDI: 00000000005d7421
    RBP: 000000000067370f   R8: 00007ffcdb3077b0   R9: 000000000001ed00
    R10: 0000000000000008  R11: 0000000000000246  R12: 0000000000000040
    R13: 000000000000000f  R14: 0000000000000000  R15: 000000000088d018
    ORIG_RAX: 000000000000003a  CS: 0033  SS: 002b

The simplest fix is to assign tsk->stack right where it is allocated.

Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e16 ("mm: rework memcg kernel stack accounting")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5eed6f1d

mm: thp: fix flags for pmd migration when split · 2e83ee1d

Peter Xu authored Dec 21, 2018

When splitting a huge migrating PMD, we'll transfer all the existing PMD
bits and apply them again onto the small PTEs.  However we are fetching
the bits unconditionally via pmd_soft_dirty(), pmd_write() or
pmd_yound() while actually they don't make sense at all when it's a
migration entry.  Fix them up.  Since at it, drop the ifdef together as
not needed.

Note that if my understanding is correct about the problem then if
without the patch there is chance to lose some of the dirty bits in the
migrating pmd pages (on x86_64 we're fetching bit 11 which is part of
swap offset instead of bit 2) and it could potentially corrupt the
memory of an userspace program which depends on the dirty bit.

Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.comSigned-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2e83ee1d

mm, memory_hotplug: initialize struct pages for the full memory section · 2830bf6f

Mikhail Zaslonko authored Dec 21, 2018

If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized.  This may lead
to VM_BUG_ON due to uninitialized struct page access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered
by memory_hotplug sysfs handlers:

Here are the the panic examples:
 CONFIG_DEBUG_VM=y
 CONFIG_DEBUG_VM_PGFLAGS=y

 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ( test_pages_in_a_zone+0xde/0x160)
   show_valid_zones+0x5c/0x190
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

 kernel parameter mem=3075M
 --------------------------
 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ( is_mem_section_removable+0xb4/0x190)
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix the problem by initializing the last memory section of each zone in
memmap_init_zone() till the very end, even if it goes beyond the zone end.

Michal said:

: This has alwways been problem AFAIU.  It just went unnoticed because we
: have zeroed memmaps during allocation before f7f99100 ("mm: stop
: zeroing memory during allocation in vmemmap") and so the above test
: would simply skip these ranges as belonging to zone 0 or provided a
: garbage.
:
: So I guess we do care for post f7f99100 kernels mostly and
: therefore Fixes: f7f99100 ("mm: stop zeroing memory during
: allocation in vmemmap")

Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com
Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2830bf6f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 6cafab50

Linus Torvalds authored Dec 21, 2018

Pull sparc fixes from David Miller:
 "Just some small fixes here and there, and a refcount leak in a serial
  driver, nothing serious"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  serial/sunsu: fix refcount leak
  sparc: Set "ARCH: sunxx" information on the same line
  sparc: vdso: Drop implicit common-page-size linker flag

6cafab50

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 87935eee

Linus Torvalds authored Dec 21, 2018

Pull more networking fixes from David Miller:
 "Some more bug fixes have trickled in, we have:

  1) Local MAC entries properly in mscc driver, from Allan W. Nielsen.

  2) Eric Dumazet found some more of the typical "pskb_may_pull() -->
     oops forgot to reload the header pointer" bugs in ipv6 tunnel
     handling.

  3) Bad SKB socket pointer in ipv6 fragmentation handling, from Herbert
     Xu.

  4) Overflow fix in sk_msg_clone(), from Vakul Garg.

  5) Validate address lengths in AF_PACKET, from Willem de Bruijn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixup
  qmi_wwan: Add support for Fibocom NL678 series
  tls: Do not call sk_memcopy_from_iter with zero length
  ipv6: tunnels: fix two use-after-free
  Prevent overflow of sk_msg in sk_msg_clone()
  packet: validate address length
  net: netxen: fix a missing check and an uninitialized use
  tcp: fix a race in inet_diag_dump_icsk()
  MAINTAINERS: update cxgb4 and cxgb3 maintainer
  ipv6: frags: Fix bogus skb->sk in reassembled packets
  mscc: Configured MAC entries should be locked.

87935eee

auxdisplay: charlcd: fix x/y command parsing · 9bc30ab8

Mans Rullgard authored Dec 05, 2018

The x/y command parsing has been broken since commit 12995706
("staging: panel: Fixed checkpatch warning about simple_strtoul()").

Commit b34050fa ("auxdisplay: charlcd: Fix and clean up handling of
x/y commands") fixed some problems by rewriting the parsing code,
but also broke things further by removing the check for a complete
command before attempting to parse it.  As a result, parsing is
terminated at the first x or y character.

This reinstates the check for a final semicolon.  Whereas the original
code use strchr(), this is wasteful seeing as the semicolon is always
at the end of the buffer.  Thus check this character directly instead.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>

9bc30ab8