- 03 Nov, 2018 21 commits
-
-
git://git.infradead.org/users/hch/dma-mappingLinus Torvalds authored
Pull dma-mapping fix from Christoph Hellwig: "Avoid compile warnings on non-default arm64 configs" * tag 'dma-mapping-4.20-2' of git://git.infradead.org/users/hch/dma-mapping: arm64: fix warnings without CONFIG_IOMMU_DMA
-
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds authored
Pull Kbuild updates from Masahiro Yamada: - clean-up leftovers in Kconfig files - remove stale oldnoconfig and silentoldconfig targets - remove unneeded cc-fullversion and cc-name variables - improve merge_config script to allow overriding option prefix * tag 'kbuild-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: remove cc-name variable kbuild: replace cc-name test with CONFIG_CC_IS_CLANG merge_config.sh: Allow to define config prefix kbuild: remove unused cc-fullversion variable kconfig: remove silentoldconfig target kconfig: remove oldnoconfig target powerpc: PCI_MSI needs PCI powerpc: remove CONFIG_MCA leftovers powerpc: remove CONFIG_PCI_QSPAN scsi: aha152x: rename the PCMCIA define
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fixes and updates from Steve French: "Three small fixes (one Kerberos related, one for stable, and another fixes an oops in xfstest 377), two helpful debugging improvements, three patches for cifs directio and some minor cleanup" * tag '4.20-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix signed/unsigned mismatch on aio_read patch cifs: don't dereference smb_file_target before null check CIFS: Add direct I/O functions to file_operations CIFS: Add support for direct I/O write CIFS: Add support for direct I/O read smb3: missing defines and structs for reparse point handling smb3: allow more detailed protocol info on open files for debugging smb3: on kerberos mount if server doesn't specify auth type use krb5 smb3: add trace point for tree connection cifs: fix spelling mistake, EACCESS -> EACCES cifs: fix return value for cifs_listxattr
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull 9p fix from Al Viro: "Regression fix for net/9p handling of iov_iter; broken by braino when switching to iov_iter_is_kvec() et.al., spotted and fixed by Marc" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: iov_iter: Fix 9p virtio breakage
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull more SCSI updates from James Bottomley: "This is a set of minor small (and safe changes) that didn't make the initial pull request plus some bug fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mvsas: Remove set but not used variable 'id' scsi: qla2xxx: Remove two arguments from qlafx00_error_entry() scsi: qla2xxx: Make sure that qlafx00_ioctl_iosb_entry() initializes 'res' scsi: qla2xxx: Remove a set-but-not-used variable scsi: qla2xxx: Make qla2x00_sysfs_write_nvram() easier to analyze scsi: qla2xxx: Declare local functions 'static' scsi: qla2xxx: Improve several kernel-doc headers scsi: qla2xxx: Modify fall-through annotations scsi: 3w-sas: 3w-9xxx: Use unsigned char for cdb scsi: mvsas: Use dma_pool_zalloc scsi: target: Don't request modules that aren't even built scsi: target: Set response length for REPORT TARGET PORT GROUPS
-
Linus Torvalds authored
Merge more updates from Andrew Morton: - more ocfs2 work - various leftovers * emailed patches from Andrew Morton <akpm@linux-foundation.org>: memory_hotplug: cond_resched in __remove_pages bfs: add sanity check at bfs_fill_super() kernel/sysctl.c: remove duplicated include kernel/kexec_file.c: remove some duplicated includes mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask ocfs2: fix clusters leak in ocfs2_defrag_extent() ocfs2: dlmglue: clean up timestamp handling ocfs2: don't put and assigning null to bh allocated outside ocfs2: fix a misuse a of brelse after failing ocfs2_check_dir_entry ocfs2: don't use iocb when EIOCBQUEUED returns ocfs2: without quota support, avoid calling quota recovery ocfs2: remove ocfs2_is_o2cb_active() mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings include/linux/notifier.h: SRCU: fix ctags mm: handle no memcg case in memcg_kmem_charge() properly
-
Michal Hocko authored
We have received a bug report that unbinding a large pmem (>1TB) can result in a soft lockup: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [ndctl:4365] [...] Supported: Yes CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default #1 SLE12-SP4 Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018 task: ffff9cce7d4410c0 task.stack: ffffbe9eb1bc4000 RIP: 0010:__put_page+0x62/0x80 Call Trace: devm_memremap_pages_release+0x152/0x260 release_nodes+0x18d/0x1d0 device_release_driver_internal+0x160/0x210 unbind_store+0xb3/0xe0 kernfs_fop_write+0x102/0x180 __vfs_write+0x26/0x150 vfs_write+0xad/0x1a0 SyS_write+0x42/0x90 do_syscall_64+0x74/0x150 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fd13166b3d0 It has been reported on an older (4.12) kernel but the current upstream code doesn't cond_resched in the hot remove code at all and the given range to remove might be really large. Fix the issue by calling cond_resched once per memory section. Link: http://lkml.kernel.org/r/20181031125840.23982-1-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Dan Williams <dan.j.williams@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tetsuo Handa authored
syzbot is reporting too large memory allocation at bfs_fill_super() [1]. Since file system image is corrupted such that bfs_sb->s_start == 0, bfs_fill_super() is trying to allocate 8MB of continuous memory. Fix this by adding a sanity check on bfs_sb->s_start, __GFP_NOWARN and printf(). [1] https://syzkaller.appspot.com/bug?id=16a87c236b951351374a84c8a32f40edbc034e96 Link: http://lkml.kernel.org/r/1525862104-3407-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+71c6b5d68e91149fc8a4@syzkaller.appspotmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tigran Aivazian <aivazian.tigran@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michael Schupikov authored
Remove one include of <linux/pipe_fs_i.h>. No functional changes. Link: http://lkml.kernel.org/r/20181004134223.17735-1-michael@schupikov.deSigned-off-by: Michael Schupikov <michael@schupikov.de> Reviewed-by: Richard Weinberger <richard@nod.at> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
zhong jiang authored
We include kexec.h and slab.h twice in kexec_file.c. It's unnecessary. hence just remove them. Link: http://lkml.kernel.org/r/1537498098-19171-1-git-send-email-zhongjiang@huawei.comSigned-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
THP allocation mode is quite complex and it depends on the defrag mode. This complexity is hidden in alloc_hugepage_direct_gfpmask from a large part currently. The NUMA special casing (namely __GFP_THISNODE) is however independent and placed in alloc_pages_vma currently. This both adds an unnecessary branch to all vma based page allocation requests and it makes the code more complex unnecessarily as well. Not to mention that e.g. shmem THP used to do the node reclaiming unconditionally regardless of the defrag mode until recently. This was not only unexpected behavior but it was also hardly a good default behavior and I strongly suspect it was just a side effect of the code sharing more than a deliberate decision which suggests that such a layering is wrong. Get rid of the thp special casing from alloc_pages_vma and move the logic to alloc_hugepage_direct_gfpmask. __GFP_THISNODE is applied to the resulting gfp mask only when the direct reclaim is not requested and when there is no explicit numa binding to preserve the current logic. Please note that there's also a slight difference wrt MPOL_BIND now. The previous code would avoid using __GFP_THISNODE if the local node was outside of policy_nodemask(). After this patch __GFP_THISNODE is avoided for all MPOL_BIND policies. So there's a difference that if local node is actually allowed by the bind policy's nodemask, previously __GFP_THISNODE would be added, but now it won't be. From the behavior POV this is still correct because the policy nodemask is used. Link: http://lkml.kernel.org/r/20180925120326.24392-3-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Larry Chen authored
ocfs2_defrag_extent() might leak allocated clusters. When the file system has insufficient space, the number of claimed clusters might be less than the caller wants. If that happens, the original code might directly commit the transaction without returning clusters. This patch is based on code in ocfs2_add_clusters_in_btree(). [akpm@linux-foundation.org: include localalloc.h, reduce scope of data_ac] Link: http://lkml.kernel.org/r/20180904041621.16874-3-lchen@suse.comSigned-off-by: Larry Chen <lchen@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arnd Bergmann authored
The handling of timestamps outside of the 1970..2038 range in the dlm glue is rather inconsistent: on 32-bit architectures, this has always wrapped around to negative timestamps in the 1902..1969 range, while on 64-bit kernels all timestamps are interpreted as positive 34 bit numbers in the 1970..2514 year range. Now that the VFS code handles 64-bit timestamps on all architectures, we can make the behavior more consistent here, and return the same result that we had on 64-bit already, making the file system y2038 safe in the process. Outside of dlmglue, it already uses 64-bit on-disk timestamps anway, so that part is fine. For consistency, I'm changing ocfs2_pack_timespec() to clamp anything outside of the supported range to the minimum and maximum values. This avoids a possible ambiguity of values before 1970 in particular, which used to be interpreted as times at the end of the 2514 range previously. Link: http://lkml.kernel.org/r/20180619155826.4106487-1-arnd@arndb.deSigned-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Changwei Ge authored
ocfs2_read_blocks() and ocfs2_read_blocks_sync() are both used to read several blocks from disk. Currently, the input argument *bhs* can be NULL or NOT. It depends on the caller's behavior. If the function fails in reading blocks from disk, the corresponding bh will be assigned to NULL and put. Obviously, above process for non-NULL input bh is not appropriate. Because the caller doesn't even know its bhs are put and re-assigned. If buffer head is managed by caller, ocfs2_read_blocks and ocfs2_read_blocks_sync() should not evaluate it to NULL. It will cause caller accessing illegal memory, thus crash. Link: http://lkml.kernel.org/r/HK2PR06MB045285E0F4FBB561F9F2F9B3D5680@HK2PR06MB0452.apcprd06.prod.outlook.comSigned-off-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Guozhonghua <guozhonghua@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Changwei Ge authored
Somehow, file system metadata was corrupted, which causes ocfs2_check_dir_entry() to fail in function ocfs2_dir_foreach_blk_el(). According to the original design intention, if above happens we should skip the problematic block and continue to retrieve dir entry. But there is obviouse misuse of brelse around related code. After failure of ocfs2_check_dir_entry(), current code just moves to next position and uses the problematic buffer head again and again during which the problematic buffer head is released for multiple times. I suppose, this a serious issue which is long-lived in ocfs2. This may cause other file systems which is also used in a the same host insane. So we should also consider about bakcporting this patch into linux -stable. Link: http://lkml.kernel.org/r/HK2PR06MB045211675B43EED794E597B6D56E0@HK2PR06MB0452.apcprd06.prod.outlook.comSigned-off-by: Changwei Ge <ge.changwei@h3c.com> Suggested-by: Changkuo Shi <shi.changkuo@h3c.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Changwei Ge authored
When -EIOCBQUEUED returns, it means that aio_complete() will be called from dio_complete(), which is an asynchronous progress against write_iter. Generally, IO is a very slow progress than executing instruction, but we still can't take the risk to access a freed iocb. And we do face a BUG crash issue. Using the crash tool, iocb is obviously freed already. crash> struct -x kiocb ffff881a350f5900 struct kiocb { ki_filp = 0xffff881a350f5a80, ki_pos = 0x0, ki_complete = 0x0, private = 0x0, ki_flags = 0x0 } And the backtrace shows: ocfs2_file_write_iter+0xcaa/0xd00 [ocfs2] aio_run_iocb+0x229/0x2f0 do_io_submit+0x291/0x540 SyS_io_submit+0x10/0x20 system_call_fastpath+0x16/0x75 Link: http://lkml.kernel.org/r/1523361653-14439-1-git-send-email-ge.changwei@h3c.comSigned-off-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Guozhonghua authored
During one dead node's recovery by other node, quota recovery work will be queued. We should avoid calling quota when it is not supported, so check the quota flags. Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA401071AC9FB@H3CMLB12-EX.srv.huawei-3com.comSigned-off-by: guozhonghua <guozhonghua@h3c.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Gang He authored
Remove ocfs2_is_o2cb_active(). We have similar functions to identify which cluster stack is being used via osb->osb_cluster_stack. Secondly, the current implementation of ocfs2_is_o2cb_active() is not totally safe. Based on the design of stackglue, we need to get ocfs2_stack_lock before using ocfs2_stack related data structures, and that active_stack pointer can be NULL in the case of mount failure. Link: http://lkml.kernel.org/r/1495441079-11708-1-git-send-email-ghe@suse.comSigned-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Eric Ren <zren@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrea Arcangeli authored
THP allocation might be really disruptive when allocated on NUMA system with the local node full or hard to reclaim. Stefan has posted an allocation stall report on 4.12 based SLES kernel which suggests the same issue: kvm: page allocation stalls for 194572ms, order:9, mode:0x4740ca(__GFP_HIGHMEM|__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|__GFP_MOVABLE|__GFP_DIRECT_RECLAIM), nodemask=(null) kvm cpuset=/ mems_allowed=0-1 CPU: 10 PID: 84752 Comm: kvm Tainted: G W 4.12.0+98-ph <a href="/view.php?id=1" title="[geschlossen] Integration Ramdisk" class="resolved">0000001</a> SLE15 (unreleased) Hardware name: Supermicro SYS-1029P-WTRT/X11DDW-NT, BIOS 2.0 12/05/2017 Call Trace: dump_stack+0x5c/0x84 warn_alloc+0xe0/0x180 __alloc_pages_slowpath+0x820/0xc90 __alloc_pages_nodemask+0x1cc/0x210 alloc_pages_vma+0x1e5/0x280 do_huge_pmd_wp_page+0x83f/0xf00 __handle_mm_fault+0x93d/0x1060 handle_mm_fault+0xc6/0x1b0 __do_page_fault+0x230/0x430 do_page_fault+0x2a/0x70 page_fault+0x7b/0x80 [...] Mem-Info: active_anon:126315487 inactive_anon:1612476 isolated_anon:5 active_file:60183 inactive_file:245285 isolated_file:0 unevictable:15657 dirty:286 writeback:1 unstable:0 slab_reclaimable:75543 slab_unreclaimable:2509111 mapped:81814 shmem:31764 pagetables:370616 bounce:0 free:32294031 free_pcp:6233 free_cma:0 Node 0 active_anon:254680388kB inactive_anon:1112760kB active_file:240648kB inactive_file:981168kB unevictable:13368kB isolated(anon):0kB isolated(file):0kB mapped:280240kB dirty:1144kB writeback:0kB shmem:95832kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 81225728kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Node 1 active_anon:250583072kB inactive_anon:5337144kB active_file:84kB inactive_file:0kB unevictable:49260kB isolated(anon):20kB isolated(file):0kB mapped:47016kB dirty:0kB writeback:4kB shmem:31224kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 31897600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no The defrag mode is "madvise" and from the above report it is clear that the THP has been allocated for MADV_HUGEPAGA vma. Andrea has identified that the main source of the problem is __GFP_THISNODE usage: : The problem is that direct compaction combined with the NUMA : __GFP_THISNODE logic in mempolicy.c is telling reclaim to swap very : hard the local node, instead of failing the allocation if there's no : THP available in the local node. : : Such logic was ok until __GFP_THISNODE was added to the THP allocation : path even with MPOL_DEFAULT. : : The idea behind the __GFP_THISNODE addition, is that it is better to : provide local memory in PAGE_SIZE units than to use remote NUMA THP : backed memory. That largely depends on the remote latency though, on : threadrippers for example the overhead is relatively low in my : experience. : : The combination of __GFP_THISNODE and __GFP_DIRECT_RECLAIM results in : extremely slow qemu startup with vfio, if the VM is larger than the : size of one host NUMA node. This is because it will try very hard to : unsuccessfully swapout get_user_pages pinned pages as result of the : __GFP_THISNODE being set, instead of falling back to PAGE_SIZE : allocations and instead of trying to allocate THP on other nodes (it : would be even worse without vfio type1 GUP pins of course, except it'd : be swapping heavily instead). Fix this by removing __GFP_THISNODE for THP requests which are requesting the direct reclaim. This effectivelly reverts 5265047a on the grounds that the zone/node reclaim was known to be disruptive due to premature reclaim when there was memory free. While it made sense at the time for HPC workloads without NUMA awareness on rare machines, it was ultimately harmful in the majority of cases. The existing behaviour is similar, if not as widespare as it applies to a corner case but crucially, it cannot be tuned around like zone_reclaim_mode can. The default behaviour should always be to cause the least harm for the common case. If there are specialised use cases out there that want zone_reclaim_mode in specific cases, then it can be built on top. Longterm we should consider a memory policy which allows for the node reclaim like behavior for the specific memory ranges which would allow a [1] http://lkml.kernel.org/r/20180820032204.9591-1-aarcange@redhat.com Mel said: : Both patches look correct to me but I'm responding to this one because : it's the fix. The change makes sense and moves further away from the : severe stalling behaviour we used to see with both THP and zone reclaim : mode. : : I put together a basic experiment with usemem configured to reference a : buffer multiple times that is 80% the size of main memory on a 2-socket : box with symmetric node sizes and defrag set to "always". The defrag : setting is not the default but it would be functionally similar to : accessing a buffer with madvise(MADV_HUGEPAGE). Usemem is configured to : reference the buffer multiple times and while it's not an interesting : workload, it would be expected to complete reasonably quickly as it fits : within memory. The results were; : : usemem : vanilla noreclaim-v1 : Amean Elapsd-1 42.78 ( 0.00%) 26.87 ( 37.18%) : Amean Elapsd-3 27.55 ( 0.00%) 7.44 ( 73.00%) : Amean Elapsd-4 5.72 ( 0.00%) 5.69 ( 0.45%) : : This shows the elapsed time in seconds for 1 thread, 3 threads and 4 : threads referencing buffers 80% the size of memory. With the patches : applied, it's 37.18% faster for the single thread and 73% faster with two : threads. Note that 4 threads showing little difference does not indicate : the problem is related to thread counts. It's simply the case that 4 : threads gets spread so their workload mostly fits in one node. : : The overall view from /proc/vmstats is more startling : : 4.19.0-rc1 4.19.0-rc1 : vanillanoreclaim-v1r1 : Minor Faults 35593425 708164 : Major Faults 484088 36 : Swap Ins 3772837 0 : Swap Outs 3932295 0 : : Massive amounts of swap in/out without the patch : : Direct pages scanned 6013214 0 : Kswapd pages scanned 0 0 : Kswapd pages reclaimed 0 0 : Direct pages reclaimed 4033009 0 : : Lots of reclaim activity without the patch : : Kswapd efficiency 100% 100% : Kswapd velocity 0.000 0.000 : Direct efficiency 67% 100% : Direct velocity 11191.956 0.000 : : Mostly from direct reclaim context as you'd expect without the patch. : : Page writes by reclaim 3932314.000 0.000 : Page writes file 19 0 : Page writes anon 3932295 0 : Page reclaim immediate 42336 0 : : Writes from reclaim context is never good but the patch eliminates it. : : We should never have default behaviour to thrash the system for such a : basic workload. If zone reclaim mode behaviour is ever desired but on a : single task instead of a global basis then the sensible option is to build : a mempolicy that enforces that behaviour. This was a severe regression compared to previous kernels that made important workloads unusable and it starts when __GFP_THISNODE was added to THP allocations under MADV_HUGEPAGE. It is not a significant risk to go to the previous behavior before __GFP_THISNODE was added, it worked like that for years. This was simply an optimization to some lucky workloads that can fit in a single node, but it ended up breaking the VM for others that can't possibly fit in a single node, so going back is safe. [mhocko@suse.com: rewrote the changelog based on the one from Andrea] Link: http://lkml.kernel.org/r/20180925120326.24392-2-mhocko@kernel.org Fixes: 5265047a ("mm, thp: really limit transparent hugepage allocation to local node") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Debugged-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Mel Gorman <mgorman@techsingularity.net> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: <stable@vger.kernel.org> [4.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Sam Protsenko authored
ctags indexing ("make tags" command) throws this warning: ctags: Warning: include/linux/notifier.h:125: null expansion of name pattern "\1" This is the result of DEFINE_PER_CPU() macro expansion. Fix that by getting rid of line break. Similar fix was already done in commit 25528213 ("tags: Fix DEFINE_PER_CPU expansions"), but this one probably wasn't noticed. Link: http://lkml.kernel.org/r/20181030202808.28027-1-semen.protsenko@linaro.org Fixes: 9c80172b ("kernel/SRCU: provide a static initializer") Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Roman Gushchin authored
Mike Galbraith reported a regression caused by the commit 9b6f7e16 ("mm: rework memcg kernel stack accounting") on a system with "cgroup_disable=memory" boot option: the system panics with the following stack trace: BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-preempt+ #410 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fed4 RIP: 0010:page_counter_try_charge+0x22/0xc0 Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 0f 84 a7 00 00 00 41 56 48 89 f8 49 89 fe 49 Call Trace: try_charge+0xcb/0x780 memcg_kmem_charge_memcg+0x28/0x80 memcg_kmem_charge+0x8b/0x1d0 copy_process.part.41+0x1ca/0x2070 _do_fork+0xd7/0x3d0 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe The problem occurs because get_mem_cgroup_from_current() returns the NULL pointer if memory controller is disabled. Let's check if this is a case at the beginning of memcg_kmem_charge() and just return 0 if mem_cgroup_disabled() returns true. This is how we handle this case in many other places in the memory controller code. Link: http://lkml.kernel.org/r/20181029215123.17830-1-guro@fb.com Fixes: 9b6f7e16 ("mm: rework memcg kernel stack accounting") Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Mike Galbraith <efault@gmx.de> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 02 Nov, 2018 19 commits
-
-
Marc Zyngier authored
When switching to the new iovec accessors, a negation got subtly dropped, leading to 9p being remarkably broken (here with kvmtool): [ 7.430941] VFS: Mounted root (9p filesystem) on device 0:15. [ 7.432080] devtmpfs: mounted [ 7.432717] Freeing unused kernel memory: 1344K [ 7.433658] Run /virt/init as init process Warning: unable to translate guest address 0x7e00902ff000 to host Warning: unable to translate guest address 0x7e00902fefc0 to host Warning: unable to translate guest address 0x7e00902ff000 to host Warning: unable to translate guest address 0x7e008febef80 to host Warning: unable to translate guest address 0x7e008febf000 to host Warning: unable to translate guest address 0x7e008febef00 to host Warning: unable to translate guest address 0x7e008febf000 to host [ 7.436376] Kernel panic - not syncing: Requested init /virt/init failed (error -8). [ 7.437554] CPU: 29 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc8-02267-g00e23707 #291 [ 7.439006] Hardware name: linux,dummy-virt (DT) [ 7.439902] Call trace: [ 7.440387] dump_backtrace+0x0/0x148 [ 7.441104] show_stack+0x14/0x20 [ 7.441768] dump_stack+0x90/0xb4 [ 7.442425] panic+0x120/0x27c [ 7.443036] kernel_init+0xa4/0x100 [ 7.443725] ret_from_fork+0x10/0x18 [ 7.444444] SMP: stopping secondary CPUs [ 7.445391] Kernel Offset: disabled [ 7.446169] CPU features: 0x0,23000438 [ 7.446974] Memory Limit: none [ 7.447645] ---[ end Kernel panic - not syncing: Requested init /virt/init failed (error -8). ]--- Restoring the missing "!" brings the guest back to life. Fixes: 00e23707 ("iov_iter: Use accessor function") Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Steve French authored
The patch "CIFS: Add support for direct I/O read" had a signed/unsigned mismatch (ssize_t vs. size_t) in the return from one function. Similar trivial change in aio_write Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reported-by: Julia Lawall <julia.lawall@lip6.fr>
-
Colin Ian King authored
There is a null check on dst_file->private data which suggests it can be potentially null. However, before this check, pointer smb_file_target is derived from dst_file->private and dereferenced in the call to tlink_tcon, hence there is a potential null pointer deference. Fix this by assigning smb_file_target and target_tcon after the null pointer sanity checks. Detected by CoverityScan, CID#1475302 ("Dereference before null check") Fixes: 04b38d60 ("vfs: pull btrfs clone API to vfs layer") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <stfrench@microsoft.com>
-
Long Li authored
With direct read/write functions implemented, add them to file_operations. Dircet I/O is used under two conditions: 1. When mounting with "cache=none", CIFS uses direct I/O for all user file data transfer. 2. When opening a file with O_DIRECT, CIFS uses direct I/O for all data transfer on this file. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
-
Long Li authored
With direct I/O write, user supplied buffers are pinned to the memory and data are transferred directly from user buffers to the transport layer. Change in v3: add support for kernel AIO Change in v4: Refactor common write code to __cifs_writev for direct and non-direct I/O. Retry on direct I/O failure. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
-
Long Li authored
With direct I/O read, we transfer the data directly from transport layer to the user data buffer. Change in v3: add support for kernel AIO Change in v4: Refactor common read code to __cifs_readv for direct and non-direct I/O. Retry on direct I/O failure. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
-
Steve French authored
We were missing some structs from MS-FSCC relating to reparse point handling. Add them to protocol defines in smb2pdu.h Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
-
Steve French authored
In order to debug complex problems it is often helpful to have detailed information on the client and server view of the open file information. Add the ability for root to view the list of smb3 open files and dump the persistent handle and other info so that it can be more easily correlated with server logs. Sample output from "cat /proc/fs/cifs/open_files" # Version:1 # Format: # <tree id> <persistent fid> <flags> <count> <pid> <uid> <filename> <mid> 0x5 0x800000378 0x8000 1 7704 0 some-file 0x14 0xcb903c0c 0x84412e67 0x8000 1 7754 1001 rofile 0x1a6d 0xcb903c0c 0x9526b767 0x8000 1 7720 1000 file 0x1a5b 0xcb903c0c 0x9ce41a21 0x8000 1 7715 0 smallfile 0xd67 Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
-
Steve French authored
Some servers (e.g. Azure) do not include a spnego blob in the SMB3 negotiate protocol response, so on kerberos mounts ("sec=krb5") we can fail, as we expected the server to list its supported auth types (OIDs in the spnego blob in the negprot response). Change this so that on krb5 mounts we default to trying krb5 if the server doesn't list its supported protocol mechanisms. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org>
-
Steve French authored
In debugging certain scenarios, especially reconnect cases, it can be helpful to have a dynamic trace point for the result of tree connect. See sample output below from a reconnect event. The new event is 'smb3_tcon' TASK-PID CPU# |||| TIMESTAMP FUNCTION | | | |||| | | cifsd-6071 [001] .... 2659.897923: smb3_reconnect: server=localhost current_mid=0xa kworker/1:1-71 [001] .... 2666.026342: smb3_cmd_done: sid=0x0 tid=0x0 cmd=0 mid=0 kworker/1:1-71 [001] .... 2666.026576: smb3_cmd_err: sid=0xc49e1787 tid=0x0 cmd=1 mid=1 status=0xc0000016 rc=-5 kworker/1:1-71 [001] .... 2666.031677: smb3_cmd_done: sid=0xc49e1787 tid=0x0 cmd=1 mid=2 kworker/1:1-71 [001] .... 2666.031921: smb3_cmd_done: sid=0xc49e1787 tid=0x6e78f05f cmd=3 mid=3 kworker/1:1-71 [001] .... 2666.031923: smb3_tcon: xid=0 sid=0xc49e1787 tid=0x0 unc_name=\\localhost\test rc=0 kworker/1:1-71 [001] .... 2666.032097: smb3_cmd_done: sid=0xc49e1787 tid=0x6e78f05f cmd=11 mid=4 kworker/1:1-71 [001] .... 2666.032265: smb3_cmd_done: sid=0xc49e1787 tid=0x7912332f cmd=3 mid=5 kworker/1:1-71 [001] .... 2666.032266: smb3_tcon: xid=0 sid=0xc49e1787 tid=0x0 unc_name=\\localhost\IPC$ rc=0 kworker/1:1-71 [001] .... 2666.032386: smb3_cmd_done: sid=0xc49e1787 tid=0x7912332f cmd=11 mid=6 Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
-
Colin Ian King authored
Trivial fix to a spelling mistake of the error access name EACCESS, rename to EACCES Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <stfrench@microsoft.com>
-
Ronnie Sahlberg authored
If the application buffer was too small to fit all the names we would still count the number of bytes and return this for listxattr. This would then trigger a BUG in usercopy.c Fix the computation of the size so that we return -ERANGE correctly when the buffer is too small. This fixes the kernel BUG for xfstest generic/377 Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
-
Christoph Hellwig authored
__swiotlb_get_sgtable_page and __swiotlb_mmap_pfn are not only misnamed but also only used if CONFIG_IOMMU_DMA is set. Just add a simple ifdef for now, given that we plan to remove them entirely for the next merge window. Reported-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Will Deacon <will.deacon@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block layer fixes from Jens Axboe: "The biggest part of this pull request is the revert of the blkcg cleanup series. It had one fix earlier for a stacked device issue, but another one was reported. Rather than play whack-a-mole with this, revert the entire series and try again for the next kernel release. Apart from that, only small fixes/changes. Summary: - Indentation fixup for mtip32xx (Colin Ian King) - The blkcg cleanup series revert (Dennis Zhou) - Two NVMe fixes. One fixing a regression in the nvme request initialization in this merge window, causing nvme-fc to not work. The other is a suspend/resume p2p resource issue (James, Keith) - Fix sg discard merge, allowing us to merge in cases where we didn't before (Jianchao Wang) - Call rq_qos_exit() after the queue is frozen, preventing a hang (Ming) - Fix brd queue setup, fixing an oops if we fail setting up all devices (Ming)" * tag 'for-linus-20181102' of git://git.kernel.dk/linux-block: nvme-pci: fix conflicting p2p resource adds nvme-fc: fix request private initialization blkcg: revert blkcg cleanups series block: brd: associate with queue until adding disk block: call rq_qos_exit() after queue is frozen mtip32xx: clean an indentation issue, remove extraneous tabs block: fix the DISCARD request merge
-
Linus Torvalds authored
Merge tag 'pwm/for-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "This series contains a number of improvements to existing drivers, such as LPSS. Some drivers, such as renesas-tpu and rcar get support for more SoC generations. To round things off this fixes an issue with the sysfs interface" * tag 'pwm/for-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: lpss: Only set update bit if we are actually changing the settings pwm: lpss: Force runtime-resume on suspend on Cherry Trail pwm: Enable TI ECAP driver for ARCH_K3 dt-bindings: pwm: tiecap: Add TI AM654 SoC specific compatible dt-bindings: pwm: rcar: Add r8a774a1 support pwm: Send a uevent on the pwmchip device upon channel sysfs (un)export Revert "pwm: Set class for exported channels in sysfs" dt-bindings: pwm: renesas-tpu: Document r8a7744 support dt-bindings: pwm: rcar: Add r8a7744 support dt-bindings: pwm: renesas: tpu: Document R8A779{7|8}0 bindings dt-bindings: pwm: renesas: pwm-rcar: Document R8A779{7|8}0 bindings dt-bindings: pwm: renesas: tpu: Fix "compatible" prop description pwm: Use SPDX identifier for Renesas drivers pwm: lpss: Add get_state callback pwm: lpss: Release runtime-pm reference from the driver's remove callback pwm: lpss: Check PWM powerstate after resume on Cherry Trail devices pwm: lpss: Move struct pwm_lpss_chip definition to the header file pwm: lpss: Add ACPI HID for second PWM controller on Cherry Trail devices ACPI / PM: Export acpi_device_get_power() for use by modular build drivers pwm: tegra: Remove gratuituous blank line
-
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds authored
Pull more EDAC updates from Borislav Petkov: "The second part of the EDAC pile which contains the ADXL user and a build fix which addresses a not-so-sensical .config but fixes randconfig builds people do: - skx_edac: Address translation for NVDIMMs (Tony Luck and Qiuxu Zhuo) - ACPI_ADXL build fix" [ I don't think "sensical" is a word, particularly when used in the context of actually meaning "nonsensical", but I like it - Linus ] * tag 'edac_for_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, skx: Fix randconfig builds EDAC, skx_edac: Add address translation for non-volatile DIMMs
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "A few device-specific fixes: a fix for SPDIF on old Creative PCI board, and two additional fixes for the recent changes in FireWire audio stack" * tag 'sound-fix-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: firewire-lib: fix insufficient PCM rule for period/buffer size ALSA: ca0106: Disable IZD on SB0570 DAC to fix audio pops ALSA: dice: fix to wait for releases of all ALSA character devices
-
git://anongit.freedesktop.org/drm/drmLinus Torvalds authored
Pull drm fixes from Dave Airlie: "Pretty much a normal fixes pull pre-rc1, mostly amdgpu fixes, one i915 link training regression fix, and a couple of minor panel/bridge fixes and a panel quirk" * tag 'drm-next-2018-11-02' of git://anongit.freedesktop.org/drm/drm: (37 commits) drm/amdgpu: revert "enable gfxoff in non-sriov and stutter mode by default" drm/amd/pp: Print warning if od_sclk/mclk out of range drm/amd/pp: Fix pp_sclk/mclk_od not work on Vega10 drm/amd/pp: Fix pp_sclk/mclk_od not work on smu7 drm/amd/powerplay: no MGPU fan boost enablement on DPM disabled drm/amdgpu: Fix skipping hangged job reset during gpu recover. drm/amd/powerplay: revise Vega20 pptable version check drm/amd/display: set backlight level limit to 1 drm/panel: simple: Innolux TV123WAM is actually P120ZDG-BF1 dt-bindings: drm/panel: simple: Innolux TV123WAM is actually P120ZDG-BF1 drm/bridge: ti-sn65dsi86: Remove the mystery delay drm/panel: simple: Add "no-hpd" delay for Innolux TV123WAM drm/panel: simple: Support panels with HPD where HPD isn't connected dt-bindings: drm/panel: simple: Add no-hpd property drm/edid: Add 6 bpc quirk for BOE panel. drm/amdgpu: fix reporting of failed msg sent to SMU (v2) drm/amdgpu: Fix compute ring 1.0.0 failure after reset drm/amdgpu: fix VM leaf walking drm/amdgpu: fix amdgpu_vm_fini drm/amd/powerplay: commonize the API for retrieving current clocks ...
-
Linus Torvalds authored
Merge tag 'apparmor-pr-2018-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor updates from John Johansen: "Features/Improvements: - replace spin_is_locked() with lockdep - add base support for secmark labeling and matching Cleanups: - clean an indentation issue, remove extraneous space - remove no-op permission check in policy_unpack - fix checkpatch missing spaces error in Parse secmark policy - fix network performance issue in aa_label_sk_perm Bug fixes: - add #ifdef checks for secmark filtering - fix an error code in __aa_create_ns() - don't try to replace stale label in ptrace checks - fix failure to audit context info in build_change_hat - check buffer bounds when mapping permissions mask - fully initialize aa_perms struct when answering userspace query - fix uninitialized value in aa_split_fqname" * tag 'apparmor-pr-2018-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: clean an indentation issue, remove extraneous space apparmor: fix checkpatch error in Parse secmark policy apparmor: add #ifdef checks for secmark filtering apparmor: Fix uninitialized value in aa_split_fqname apparmor: don't try to replace stale label in ptraceme check apparmor: Replace spin_is_locked() with lockdep apparmor: Allow filtering based on secmark policy apparmor: Parse secmark policy apparmor: Add a wildcard secid apparmor: don't try to replace stale label in ptrace access check apparmor: Fix network performance issue in aa_label_sk_perm
-