1. 23 Feb, 2016 3 commits
  2. 22 Feb, 2016 6 commits
    • Jan Kara's avatar
      mbcache2: Use referenced bit instead of LRU · f0c8b462
      Jan Kara authored
      Currently we maintain perfect LRU list by moving entry to the tail of
      the list when it gets used. However these operations on cache-global
      list are relatively expensive.
      
      In this patch we switch to lazy updates of LRU list. Whenever entry gets
      used, we set a referenced bit in it. When reclaiming entries, we give
      referenced entries another round in the LRU. Since the list is not a
      real LRU anymore, rename it to just 'list'.
      
      In my testing this logic gives about 30% boost to workloads with mostly
      unique xattr blocks (e.g. xattr-bench with 10 files and 10000 unique
      xattr values).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      f0c8b462
    • Jan Kara's avatar
      mbcache2: limit cache size · c2f3140f
      Jan Kara authored
      So far number of entries in mbcache is limited only by the pressure from
      the shrinker. Since too many entries degrade the hash table and
      generally we expect that caching more entries has diminishing returns,
      limit number of entries the same way as in the old mbcache to 16 * hash
      table size.
      
      Once we exceed the desired maximum number of entries, we schedule a
      backround work to reclaim entries. If the background work cannot keep up
      and the number of entries exceeds two times the desired maximum, we
      reclaim some entries directly when allocating a new entry.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      c2f3140f
    • Jan Kara's avatar
      mbcache: remove mbcache · ecd1e644
      Jan Kara authored
      Both ext2 and ext4 are now converted to mbcache2. Remove the old mbcache
      code.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ecd1e644
    • Jan Kara's avatar
      ext2: convert to mbcache2 · be0726d3
      Jan Kara authored
      The conversion is generally straightforward. We convert filesystem from
      a global cache to per-fs one. Similarly to ext4 the tricky part is that
      xattr block corresponding to found mbcache entry can get freed before we
      get buffer lock for that block. So we have to check whether the entry is
      still valid after getting the buffer lock.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      be0726d3
    • Jan Kara's avatar
      ext4: convert to mbcache2 · 82939d79
      Jan Kara authored
      The conversion is generally straightforward. The only tricky part is
      that xattr block corresponding to found mbcache entry can get freed
      before we get buffer lock for that block. So we have to check whether
      the entry is still valid after getting buffer lock.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      82939d79
    • Jan Kara's avatar
      mbcache2: reimplement mbcache · f9a61eb4
      Jan Kara authored
      Original mbcache was designed to have more features than what ext?
      filesystems ended up using. It supported entry being in more hashes, it
      had a home-grown rwlocking of each entry, and one cache could cache
      entries from multiple filesystems. This genericity also resulted in more
      complex locking, larger cache entries, and generally more code
      complexity.
      
      This is reimplementation of the mbcache functionality to exactly fit the
      purpose ext? filesystems use it for. Cache entries are now considerably
      smaller (7 instead of 13 longs), the code is considerably smaller as
      well (414 vs 913 lines of code), and IMO also simpler. The new code is
      also much more lightweight.
      
      I have measured the speed using artificial xattr-bench benchmark, which
      spawns P processes, each process sets xattr for F different files, and
      the value of xattr is randomly chosen from a pool of V values. Averages
      of runtimes for 5 runs for various combinations of parameters are below.
      The first value in each cell is old mbache, the second value is the new
      mbcache.
      
      V=10
      F\P	1		2		4		8		16		32		64
      10	0.158,0.157	0.208,0.196	0.500,0.277	0.798,0.400	3.258,0.584	13.807,1.047	61.339,2.803
      100	0.172,0.167	0.279,0.222	0.520,0.275	0.825,0.341	2.981,0.505	12.022,1.202	44.641,2.943
      1000	0.185,0.174	0.297,0.239	0.445,0.283	0.767,0.340	2.329,0.480	6.342,1.198	16.440,3.888
      
      V=100
      F\P	1		2		4		8		16		32		64
      10	0.162,0.153	0.200,0.186	0.362,0.257	0.671,0.496	1.433,0.943	3.801,1.345	7.938,2.501
      100	0.153,0.160	0.221,0.199	0.404,0.264	0.945,0.379	1.556,0.485	3.761,1.156	7.901,2.484
      1000	0.215,0.191	0.303,0.246	0.471,0.288	0.960,0.347	1.647,0.479	3.916,1.176	8.058,3.160
      
      V=1000
      F\P	1		2		4		8		16		32		64
      10	0.151,0.129	0.210,0.163	0.326,0.245	0.685,0.521	1.284,0.859	3.087,2.251	6.451,4.801
      100	0.154,0.153	0.211,0.191	0.276,0.282	0.687,0.506	1.202,0.877	3.259,1.954	8.738,2.887
      1000	0.145,0.179	0.202,0.222	0.449,0.319	0.899,0.333	1.577,0.524	4.221,1.240	9.782,3.579
      
      V=10000
      F\P	1		2		4		8		16		32		64
      10	0.161,0.154	0.198,0.190	0.296,0.256	0.662,0.480	1.192,0.818	2.989,2.200	6.362,4.746
      100	0.176,0.174	0.236,0.203	0.326,0.255	0.696,0.511	1.183,0.855	4.205,3.444	19.510,17.760
      1000	0.199,0.183	0.240,0.227	1.159,1.014	2.286,2.154	6.023,6.039	---,10.933	---,36.620
      
      V=100000
      F\P	1		2		4		8		16		32		64
      10	0.171,0.162	0.204,0.198	0.285,0.230	0.692,0.500	1.225,0.881	2.990,2.243	6.379,4.771
      100	0.151,0.171	0.220,0.210	0.295,0.255	0.720,0.518	1.226,0.844	3.423,2.831	19.234,17.544
      1000	0.192,0.189	0.249,0.225	1.162,1.043	2.257,2.093	5.853,4.997	---,10.399	---,32.198
      
      We see that the new code is faster in pretty much all the cases and
      starting from 4 processes there are significant gains with the new code
      resulting in upto 20-times shorter runtimes. Also for large numbers of
      cached entries all values for the old code could not be measured as the
      kernel started hitting softlockups and died before the test completed.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      f9a61eb4
  3. 21 Feb, 2016 2 commits
  4. 19 Feb, 2016 2 commits
    • Jan Kara's avatar
      ext4: fix crashes in dioread_nolock mode · 74dae427
      Jan Kara authored
      Competing overwrite DIO in dioread_nolock mode will just overwrite
      pointer to io_end in the inode. This may result in data corruption or
      extent conversion happening from IO completion interrupt because we
      don't properly set buffer_defer_completion() when unlocked DIO races
      with locked DIO to unwritten extent.
      
      Since unlocked DIO doesn't need io_end for anything, just avoid
      allocating it and corrupting pointer from inode for locked DIO.
      A cleaner fix would be to avoid these games with io_end pointer from the
      inode but that requires more intrusive changes so we leave that for
      later.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      74dae427
    • Jan Kara's avatar
      ext4: fix bh->b_state corruption · ed8ad838
      Jan Kara authored
      ext4 can update bh->b_state non-atomically in _ext4_get_block() and
      ext4_da_get_block_prep(). Usually this is fine since bh is just a
      temporary storage for mapping information on stack but in some cases it
      can be fully living bh attached to a page. In such case non-atomic
      update of bh->b_state can race with an atomic update which then gets
      lost. Usually when we are mapping bh and thus updating bh->b_state
      non-atomically, nobody else touches the bh and so things work out fine
      but there is one case to especially worry about: ext4_finish_bio() uses
      BH_Uptodate_Lock on the first bh in the page to synchronize handling of
      PageWriteback state. So when blocksize < pagesize, we can be atomically
      modifying bh->b_state of a buffer that actually isn't under IO and thus
      can race e.g. with delalloc trying to map that buffer. The result is
      that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
      corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.
      
      Fix the problem by always updating bh->b_state bits atomically.
      
      CC: stable@vger.kernel.org
      Reported-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ed8ad838
  5. 16 Feb, 2016 1 commit
  6. 12 Feb, 2016 6 commits
    • Eryu Guan's avatar
      ext4: remove unused parameter "newblock" in convert_initialized_extent() · 56263b4c
      Eryu Guan authored
      The "newblock" parameter is not used in convert_initialized_extent(),
      remove it.
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      56263b4c
    • Eryu Guan's avatar
      ext4: don't read blocks from disk after extents being swapped · bcff2488
      Eryu Guan authored
      I notice ext4/307 fails occasionally on ppc64 host, reporting md5
      checksum mismatch after moving data from original file to donor file.
      
      The reason is that move_extent_per_page() calls __block_write_begin()
      and block_commit_write() to write saved data from original inode blocks
      to donor inode blocks, but __block_write_begin() not only maps buffer
      heads but also reads block content from disk if the size is not block
      size aligned.  At this time the physical block number in mapped buffer
      head is pointing to the donor file not the original file, and that
      results in reading wrong data to page, which get written to disk in
      following block_commit_write call.
      
      This also can be reproduced by the following script on 1k block size ext4
      on x86_64 host:
      
          mnt=/mnt/ext4
          donorfile=$mnt/donor
          testfile=$mnt/testfile
          e4compact=~/xfstests/src/e4compact
      
          rm -f $donorfile $testfile
      
          # reserve space for donor file, written by 0xaa and sync to disk to
          # avoid EBUSY on EXT4_IOC_MOVE_EXT
          xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile
      
          # create test file written by 0xbb
          xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile
      
          # compute initial md5sum
          md5sum $testfile | tee md5sum.txt
          # drop cache, force e4compact to read data from disk
          echo 3 > /proc/sys/vm/drop_caches
      
          # test defrag
          echo "$testfile" | $e4compact -i -v -f $donorfile
          # check md5sum
          md5sum -c md5sum.txt
      
      Fix it by creating & mapping buffer heads only but not reading blocks
      from disk, because all the data in page is guaranteed to be up-to-date
      in mext_page_mkuptodate().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      bcff2488
    • Insu Yun's avatar
      ext4: fix potential integer overflow · 46901760
      Insu Yun authored
      Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
      integer overflow could be happened.
      Therefore, need to fix integer overflow sanitization.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarInsu Yun <wuninsu@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      46901760
    • Huaitong Han's avatar
      ext4: add a line break for proc mb_groups display · 802cf1f9
      Huaitong Han authored
      This patch adds a line break for proc mb_groups display.
      Signed-off-by: default avatarHuaitong Han <huaitong.han@intel.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      802cf1f9
    • Anton Protopopov's avatar
      ext4: ioctl: fix erroneous return value · fdde368e
      Anton Protopopov authored
      The ext4_ioctl_setflags() function which is used in the ioctls
      EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR may return the positive value
      EPERM instead of -EPERM in case of error. This bug was introduced by a
      recent commit 9b7365fc.
      
      The following program can be used to illustrate the wrong behavior:
      
          #include <sys/types.h>
          #include <sys/ioctl.h>
          #include <sys/stat.h>
          #include <fcntl.h>
          #include <err.h>
      
          #define FS_IOC_GETFLAGS _IOR('f', 1, long)
          #define FS_IOC_SETFLAGS _IOW('f', 2, long)
          #define FS_IMMUTABLE_FL 0x00000010
      
          int main(void)
          {
              int fd;
              long flags;
      
              fd = open("file", O_RDWR|O_CREAT, 0600);
              if (fd < 0)
                  err(1, "open");
      
              if (ioctl(fd, FS_IOC_GETFLAGS, &flags) < 0)
                  err(1, "ioctl: FS_IOC_GETFLAGS");
      
              flags |= FS_IMMUTABLE_FL;
      
              if (ioctl(fd, FS_IOC_SETFLAGS, &flags) < 0)
                  err(1, "ioctl: FS_IOC_SETFLAGS");
      
              warnx("ioctl returned no error");
      
              return 0;
          }
      
      Running it gives the following result:
      
          $ strace -e ioctl ./test
          ioctl(3, FS_IOC_GETFLAGS, 0x7ffdbd8bfd38) = 0
          ioctl(3, FS_IOC_SETFLAGS, 0x7ffdbd8bfd38) = 1
          test: ioctl returned no error
          +++ exited with 0 +++
      
      Running the program on a kernel with the bug fixed gives the proper result:
      
          $ strace -e ioctl ./test
          ioctl(3, FS_IOC_GETFLAGS, 0x7ffdd2768258) = 0
          ioctl(3, FS_IOC_SETFLAGS, 0x7ffdd2768258) = -1 EPERM (Operation not permitted)
          test: ioctl: FS_IOC_SETFLAGS: Operation not permitted
          +++ exited with 1 +++
      Signed-off-by: default avatarAnton Protopopov <a.s.protopopov@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      fdde368e
    • Jan Kara's avatar
      ext4: fix scheduling in atomic on group checksum failure · 05145bd7
      Jan Kara authored
      When block group checksum is wrong, we call ext4_error() while holding
      group spinlock from ext4_init_block_bitmap() or
      ext4_init_inode_bitmap() which results in scheduling while in atomic.
      Fix the issue by calling ext4_error() later after dropping the spinlock.
      
      CC: stable@vger.kernel.org
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      05145bd7
  7. 08 Feb, 2016 2 commits
  8. 01 Feb, 2016 7 commits
    • Linus Torvalds's avatar
      Linux 4.5-rc2 · 36f90b0a
      Linus Torvalds authored
      36f90b0a
    • Linus Torvalds's avatar
      Merge tag 'usb-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · d784ef58
      Linus Torvalds authored
      Pull USB driver fixes from Greg KH:
       "Here are some small USB fixes and new device ids for 4.5-rc2.  Nothing
        major here, full details are in the shortlog, and all of these have
        been in linux-next successfully"
      
      * tag 'usb-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: option: fix Cinterion AHxx enumeration
        USB: mxu11x0: fix memory leak on usb_serial private data
        USB: serial: ftdi_sio: add support for Yaesu SCU-18 cable
        USB: serial: option: Adding support for Telit LE922
        USB: serial: visor: fix crash on detecting device without write_urbs
        USB: visor: fix null-deref at probe
        USB: cp210x: add ID for IAI USB to RS485 adaptor
        usb: hub: do not clear BOS field during reset device
        cdc-acm:exclude Samsung phone 04e8:685d
        usb: cdc-acm: send zero packet for intel 7260 modem
        usb: cdc-acm: handle unlinked urb in acm read callback
      d784ef58
    • Linus Torvalds's avatar
      Merge tag 'tty-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 54e3f3e3
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are some small tty/serial driver fixes for 4.5-rc2.
      
        They resolve a number of reported problems (the ioctl one specifically
        has been pointed out by numerous people) and one patch adds some new
        device ids for the 8250_pci driver.  All have been in linux-next
        successfully"
      
      * tag 'tty-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: 8250_pci: Add Intel Broadwell ports
        staging/speakup: Use tty_ldisc_ref() for paste kworker
        n_tty: Fix unsafe reference to "other" ldisc
        tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)
        tty: Retry failed reopen if tty teardown in-progress
        tty: Wait interruptibly for tty lock on reopen
      54e3f3e3
    • Linus Torvalds's avatar
      Merge tag 'staging-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 8c4e378e
      Linus Torvalds authored
      Pull staging fixes from Greg KH:
       "Here are some small staging driver fixes for 4.5-rc2.
      
        One of them predated 4.4-final, but I missed that merge window due to
        the holliday.  The others fix reported issues that have come up
        recently.  The tty change is needed for the speakup driver fix and has
        the ack of the tty driver maintainer as well, i.e.  myself :)
      
        All have been in linux-next with no reported issues"
      
      * tag 'staging-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        Staging: speakup: fix read scrolled-back VT
        Staging: speakup: Fix getting port information
        Revert "Staging: panel: usleep_range is preferred over udelay"
        iio: adis_buffer: Fix out-of-bounds memory access
      8c4e378e
    • Linus Torvalds's avatar
      Merge tag 'driver-core-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · f3ca903f
      Linus Torvalds authored
      Pull driver core fix from Greg KH:
       "Here's a single driver core fix that resolves an issue a lot of users
        have been hitting for a while now.  It's been tested a lot and has
        been in linux-next successfully for a while"
      
      * tag 'driver-core-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        base/platform: Fix platform drivers with no probe callback
      f3ca903f
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 510ae0c9
      Linus Torvalds authored
      Pull MIPS fix from Ralf Baechle:
       "Just a single revert for a patch which I had upstreamed out of
        sequence"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        Revert "MIPS: bcm63xx: nvram: Remove unused bcm63xx_nvram_get_psi_size() function"
      510ae0c9
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d517be5f
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A bit on the largish side due to a series of fixes for a regression in
        the x86 vector management which was introduced in 4.3.  This work was
        started in December already, but it took some time to fix all corner
        cases and a couple of older bugs in that area which were detected
        while at it
      
        Aside of that a few platform updates for intel-mid, quark and UV and
        two fixes for in the mm code:
         - Use proper types for pgprot values to avoid truncation
         - Prevent a size truncation in the pageattr code when setting page
           attributes for large mappings"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
        x86/mm/pat: Avoid truncation when converting cpa->numpages to address
        x86/mm: Fix types used in pgprot cacheability flags translations
        x86/platform/quark: Print boundaries correctly
        x86/platform/UV: Remove EFI memmap quirk for UV2+
        x86/platform/intel-mid: Join string and fix SoC name
        x86/platform/intel-mid: Enable 64-bit build
        x86/irq: Plug vector cleanup race
        x86/irq: Call irq_force_move_complete with irq descriptor
        x86/irq: Remove outgoing CPU from vector cleanup mask
        x86/irq: Remove the cpumask allocation from send_cleanup_vector()
        x86/irq: Clear move_in_progress before sending cleanup IPI
        x86/irq: Remove offline cpus from vector cleanup
        x86/irq: Get rid of code duplication
        x86/irq: Copy vectormask instead of an AND operation
        x86/irq: Check vector allocation early
        x86/irq: Reorganize the search in assign_irq_vector
        x86/irq: Reorganize the return path in assign_irq_vector
        x86/irq: Do not use apic_chip_data.old_domain as temporary buffer
        x86/irq: Validate that irq descriptor is still active
        x86/irq: Fix a race in x86_vector_free_irqs()
        ...
      d517be5f
  9. 31 Jan, 2016 8 commits
  10. 30 Jan, 2016 3 commits