1. 09 Mar, 2012 2 commits
    • Benjamin Marzinski's avatar
      GFS2: call gfs2_write_alloc_required for each chunk · 58a7d5fb
      Benjamin Marzinski authored
      gfs2_fallocate was calling gfs2_write_alloc_required() once at the start of
      the function. This caused problems since gfs2_write_alloc_required used a
      long unsigned int for the len, but gfs2_fallocate could allocate a much
      larger amount.  This patch will move the call into the loop where the
      chunks are actually allocated and zeroed out. This will keep the allocation
      size under the limit, and also allow gfs2_fallocate to quickly skip over
      sections of the file that are already completely allocated.
      
      fallcate_chunk was also not correctly setting the file size.  It was using the
      len veriable to find the last block written to, but by the time it was setting
      the size, the len variable had already been decremented to 0.
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      58a7d5fb
    • Steven Whitehouse's avatar
      GFS2: Clean up log flush header writing · 34cc1781
      Steven Whitehouse authored
      We already send both a pre and post flush to the block device
      when writing a journal header. There is no need to wait for
      the previous I/O specifically when we do this, unless we've
      turned "barriers" off.
      
      As a side effect, this also cleans up the code path for flushing
      the journal and makes it more readable.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      34cc1781
  2. 08 Mar, 2012 1 commit
    • Steven Whitehouse's avatar
      GFS2: Remove a __GFP_NOFAIL allocation · 75ca61c1
      Steven Whitehouse authored
      In order to ensure that we've got enough buffer heads for flushing
      the journal, the orignal code used __GFP_NOFAIL when performing
      this allocation. Here we dispense with that in favour of using a
      mempool. This should improve efficiency in low memory conditions
      since flushing the journal is a good way to get memory back, we
      don't want to be spinning, waiting on memory allocations. The
      buffers which are allocated via this mempool are fairly short lived,
      so that we'll recycle them pretty quickly.
      
      Although there are other memory allocations which occur during the
      journal flush process, this is the one which can potentially require
      the most memory, so the most important one to fix.
      
      The amount of memory reserved is a fixed amount, and we should not need
      to scale it when there are a greater number of filesystems in use.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      75ca61c1
  3. 07 Mar, 2012 1 commit
  4. 05 Mar, 2012 2 commits
    • Bob Peterson's avatar
      GFS2: make sure rgrps are up to date in func gfs2_blk2rgrpd · 58884c4d
      Bob Peterson authored
      This patch adds a call to gfs2_rindex_update from function gfs2_blk2rgrpd
      and removes calls to it that are made redundant by it. The problem is
      that a gfs2_grow can add rgrps to the rindex, then put those rgrps into
      use, thus rendering the rindex we read in at mount time incomplete.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      58884c4d
    • Bob Peterson's avatar
      GFS2: Eliminate sd_rindex_mutex · 6aad1c3d
      Bob Peterson authored
      Over time, we've slowly eliminated the use of sd_rindex_mutex.
      Up to this point, it was only used in two places: function
      gfs2_ri_total (which totals the file system size by reading
      and parsing the rindex file) and function gfs2_rindex_update
      which updates the rgrps in memory. Both of these functions have
      the rindex glock to protect them, so the rindex is unnecessary.
      Since gfs2_grow writes to the rindex via the meta_fs, the mutex
      is in the wrong order according to the normal rules. This patch
      eliminates the mutex entirely to avoid the problem.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      6aad1c3d
  5. 01 Mar, 2012 1 commit
  6. 28 Feb, 2012 11 commits
    • Steven Whitehouse's avatar
      GFS2: Make bd_cmp() static · 08728f2d
      Steven Whitehouse authored
      Add missing static to bd_cmp()
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      08728f2d
    • Bob Peterson's avatar
      GFS2: Sort the ordered write list · 4a36d08d
      Bob Peterson authored
      This patch sorts the ordered write list for GFS2 writes.
      This increases the throughput for simultaneous writes.
      For example, if you have ten processes, all doing:
      dd if=/dev/zero of=/mnt/gfs2/fileX
      on different files, the throughput will be much better.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      4a36d08d
    • Steven Whitehouse's avatar
      GFS2: FITRIM ioctl support · 66fc061b
      Steven Whitehouse authored
      The FITRIM ioctl provides an alternative way to send discard requests to
      the underlying device. Using the discard mount option results in every
      freed block generating a discard request to the block device. This can
      be slow, since many block devices can only process discard requests of
      larger sizes, and also such operations can be time consuming.
      
      Rather than using the discard mount option, FITRIM allows a sweep of the
      filesystem on an occasional basis, and also to optionally avoid sending
      down discard requests for smaller regions.
      
      In GFS2 FITRIM will work at resource group granularity. There is a flag
      for each resource group which keeps track of which resource groups have
      been trimmed. This flag is reset whenever a deallocation occurs in the
      resource group, and set whenever a successful FITRIM of that resource
      group has taken place. This helps to reduce repeated discard requests
      for the same block ranges, again improving performance.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      66fc061b
    • Steven Whitehouse's avatar
      GFS2: Move two functions from log.c to lops.c · 47ac5537
      Steven Whitehouse authored
      gfs2_log_get_buf() and gfs2_log_fake_buf() are both used
      only in lops.c, so move them next to their callers and they
      can then become static.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      47ac5537
    • Steven Whitehouse's avatar
      GFS2: glock statistics gathering · a245769f
      Steven Whitehouse authored
      The stats are divided into two sets: those relating to the
      super block and those relating to an individual glock. The
      super block stats are done on a per cpu basis in order to
      try and reduce the overhead of gathering them. They are also
      further divided by glock type.
      
      In the case of both the super block and glock statistics,
      the same information is gathered in each case. The super
      block statistics are used to provide default values for
      most of the glock statistics, so that newly created glocks
      should have, as far as possible, a sensible starting point.
      
      The statistics are divided into three pairs of mean and
      variance, plus two counters. The mean/variance pairs are
      smoothed exponential estimates and the algorithm used is
      one which will be very familiar to those used to calculation
      of round trip times in network code.
      
      The three pairs of mean/variance measure the following
      things:
      
       1. DLM lock time (non-blocking requests)
       2. DLM lock time (blocking requests)
       3. Inter-request time (again to the DLM)
      
      A non-blocking request is one which will complete right
      away, whatever the state of the DLM lock in question. That
      currently means any requests when (a) the current state of
      the lock is exclusive (b) the requested state is either null
      or unlocked or (c) the "try lock" flag is set. A blocking
      request covers all the other lock requests.
      
      There are two counters. The first is there primarily to show
      how many lock requests have been made, and thus how much data
      has gone into the mean/variance calculations. The other counter
      is counting queueing of holders at the top layer of the glock
      code. Hopefully that number will be a lot larger than the number
      of dlm lock requests issued.
      
      So why gather these statistics? There are several reasons
      we'd like to get a better idea of these timings:
      
      1. To be able to better set the glock "min hold time"
      2. To spot performance issues more easily
      3. To improve the algorithm for selecting resource groups for
      allocation (to base it on lock wait time, rather than blindly
      using a "try lock")
      Due to the smoothing action of the updates, a step change in
      some input quantity being sampled will only fully be taken
      into account after 8 samples (or 4 for the variance) and this
      needs to be carefully considered when interpreting the
      results.
      
      Knowing both the time it takes a lock request to complete and
      the average time between lock requests for a glock means we
      can compute the total percentage of the time for which the
      node is able to use a glock vs. time that the rest of the
      cluster has its share. That will be very useful when setting
      the lock min hold time.
      
      The other point to remember is that all times are in
      nanoseconds. Great care has been taken to ensure that we
      measure exactly the quantities that we want, as accurately
      as possible. There are always inaccuracies in any
      measuring system, but I hope this is as accurate as we
      can reasonably make it.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      a245769f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes · 891003ab
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
        GFS2: Read resource groups on mount
        GFS2: Ensure rindex is uptodate for fallocate
        GFS2: Read in rindex if necessary during unlink
        GFS2: Fix race between lru_list and glock ref count
      891003ab
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v3.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · d5a74afd
      Linus Torvalds authored
      IOMMU fixes for Linux 3.3-rc5
      
      All the fixes are for the OMAP IOMMU driver. The first patch is the
      biggest one. It fixes the calls of the function omap_find_iovm_area() in
      the omap-iommu-debug module which expects a 'struct device' parameter
      since commit fabdbca8 instead of an omap_iommu handle. The
      omap-iommu-debug code still passed the handle to the function which
      caused a crash.
      
      The second patch fixes a NULL pointer dereference in the OMAP code and
      the third patch makes sure that the omap-iommu is initialized before the
      omap-isp driver, which relies on the iommu. The last patch is only a
      workaround until defered probing is implemented.
      
      * tag 'iommu-fixes-v3.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        ARM: OMAP: make iommu subsys_initcall to fix builtin omap3isp
        iommu/omap: fix NULL pointer dereference
        iommu/omap: fix erroneous omap-iommu-debug API calls
      d5a74afd
    • Steven Whitehouse's avatar
      GFS2: Read resource groups on mount · a365fbf3
      Steven Whitehouse authored
      This makes mount take slightly longer, but at the same time, the first
      write to the filesystem will be faster too. It also means that if there
      is a problem in the resource index, then we can refuse to mount rather
      than having to try and report that when the first write occurs.
      
      In addition, to avoid recursive locking, we hvae to take account of
      instances when the rindex glock may already be held when we are
      trying to update the rbtree of resource groups.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      a365fbf3
    • Bob Peterson's avatar
      GFS2: Ensure rindex is uptodate for fallocate · 9e73f571
      Bob Peterson authored
      This patch fixes a problem whereby gfs2_grow was failing and causing GFS2
      to assert. The problem was that when GFS2's fallocate operation tried to
      acquire an "allocation" it made sure the rindex was up to date, and if not,
      it called gfs2_rindex_update. However, if the file being fallocated was
      the rindex itself, it was already locked at that point. By calling
      gfs2_rindex_update at an earlier point in time, we bring rindex up to date
      and thereby avoid trying to lock it when the "allocation" is acquired.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      9e73f571
    • Bob Peterson's avatar
      GFS2: Read in rindex if necessary during unlink · 718b97bd
      Bob Peterson authored
      This patch fixes a problem whereby you were unable to delete
      files until other file system operations were done (such as
      statfs, touch, writes, etc.) that caused the rindex to be
      read in.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      718b97bd
    • Steven Whitehouse's avatar
      GFS2: Fix race between lru_list and glock ref count · 4043b886
      Steven Whitehouse authored
      This patch fixes a narrow race window between the glock ref count
      hitting zero and glocks being removed from the lru_list.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      4043b886
  7. 27 Feb, 2012 13 commits
  8. 26 Feb, 2012 3 commits
    • Andreas Bießmann's avatar
      mod/file2alias: make modpost compile on darwin again · dd2a3aca
      Andreas Bießmann authored
      commit e49ce141 breaks cross compiling
      the linux kernel on darwin hosts.
      This fix introduce some minimal glue to adopt linker section handling
      for darwin hosts.
      Signed-off-by: default avatarAndreas Bießmann <andreas@biessmann.de>
      CC: Rusty Russell <rusty@rustcorp.com.au>
      CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      CC: Jochen Friedrich <jochen@scram.de>
      CC: Samuel Ortiz <sameo@linux.intel.com>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Tested-by: default avatarBernhard Walle <bernhard@bwalle.de>
      dd2a3aca
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 203738e5
      Linus Torvalds authored
      1) ICMP sockets leave err uninitialized but we try to return it for the
         unsupported MSG_OOB case, reported by Dave Jones.
      
      2) Add new Zaurus device ID entries, from Dave Jones.
      
      3) Pointer calculation in hso driver memset is wrong, from Dan
         Carpenter.
      
      4) ks8851_probe() checks unsigned value as negative, fix also from Dan
         Carpenter.
      
      5) Fix crashes in atl1c driver due to TX queue handling, from Eric
         Dumazet.  I anticipate some TX side locking fixes coming in the near
         future for this driver as well.
      
      6) The inline directive fix in Bluetooth which was breaking the build
         only with very new versions of GCC, from Johan Hedberg.
      
      7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge
         window, reported by Meelis Roos and fixed by Eric Dumazet.
      
      8) JME driver doesn't flush RX FIFO correctly, from Guo-Fu Tseng.
      
      9) Some ip6_route_output() callers test the return value for NULL, but
         this never happens as the convention is to return a dst entry with
         dst->error set.  Fixes from RonQing Li.
      
      10) Logitech Harmony 900 should be handled by zaurus driver not
         cdc_ether, update white lists and black lists accordingly.  From
         Scott Talbert.
      
      11) Receiving from certain kinds of devices there won't be a MAC header,
         so there is no MAC header to fixup in the IPSEC code, and if we try
         to do it we'll crash.  Fix from Eric Dumazet.
      
      12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny
         Petrilin.
      
      13) Fix regression in link-down handling in davinci_emac which causes
         all RX descriptors to be freed up and therefore RX to wedge
         completely, from Christian Riesch.
      
      14) It took two attempts, but ctnetlink soft lockups seem to be
         cured now, from Pablo Neira Ayuso.
      
      15) Endianness bug fix in ENIC driver, from Santosh Nayak.
      
      16) The long ago conversion of the PPP fragmentation code over to
         abstracted SKB list handling wasn't perfect, once we get an
         out of sequence SKB we don't flush the rest of them like we
         should.  From Ben McKeegan.
      
      17) Fix regression of ->ip_summed initialization in sfc driver.
         From Ben Hutchings.
      
      18) Bluetooth timeout mistakenly using msecs instead of jiffies,
         from Andrzej Kaczmarek.
      
      19) Using _sync variant of work cancellation results in deadlocks,
         use the non _sync variants instead.  From Andre Guedes.
      
      20) Bluetooth rfcomm code had reference counting problems leading
         to crashes, fix from Octavian Purdila.
      
      21) The conversion of netem over to classful qdisc handling added
         two bugs to netem_dequeue(), fixes from Eric Dumazet.
      
      22) Missing pci_iounmap() in ATM Solos driver.  Fix from Julia Lawall.
      
      23) b44_pci_exit() should not have __exit tag since it's invoked from
         non-__exit code.  From Nikola Pajkovsky.
      
      24) The conversion of the neighbour hash tables over to RCU added a
         race, fixed here by adding the necessary reread of tbl->nht, fix
         from Michel Machado.
      
      25) When we added VF (virtual function) attributes for network device
         dumps, this potentially bloats up the size of the dump of one
         network device such that the dump size is too large for the buffer
         allocated by properly written netlink applications.
      
         In particular, if you add 255 VFs to a network device, parts of
         GLIBC stop working.
      
         To fix this, we add an attribute that is used to turn on these
         extended portions of the network device dump.  Sophisticaed
         applications like 'ip' that want to see this stuff  will be changed
         to set the attribute, whereas things like GLIBC that don't care
         about VFs simply will not, and therefore won't be busted by the
         mere presence of VFs on a network device.
      
         Thanks to the tireless work of Greg Rose on this fix.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
        sfc: Fix assignment of ip_summed for pre-allocated skbs
        ppp: fix 'ppp_mp_reconstruct bad seq' errors
        enic: Fix endianness bug.
        gre: fix spelling in comments
        netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
        Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
        davinci_emac: Do not free all rx dma descriptors during init
        mlx4_core: Fixing array indexes when setting port types
        phy: IC+101G and PHY_HAS_INTERRUPT flag
        netdev/phy/icplus: Correct broken phy_init code
        ipsec: be careful of non existing mac headers
        Move Logitech Harmony 900 from cdc_ether to zaurus
        hso: memsetting wrong data in hso_get_count()
        netfilter: ip6_route_output() never returns NULL.
        ethernet/broadcom: ip6_route_output() never returns NULL.
        ipv6: ip6_route_output() never returns NULL.
        jme: Fix FIFO flush issue
        atm: clip: remove clip_tbl
        ipv4: ping: Fix recvmsg MSG_OOB error handling.
        rtnetlink: Fix problem with buffer allocation
        ...
      203738e5
    • Linus Torvalds's avatar
      Fix autofs compile without CONFIG_COMPAT · 3c761ea0
      Linus Torvalds authored
      The autofs compat handling fix caused a compile failure when
      CONFIG_COMPAT isn't defined.
      
      Instead of adding random #ifdef'fery in autofs, let's just make the
      compat helpers earlier to use: without CONFIG_COMPAT, is_compat_task()
      just hardcodes to zero.
      
      We could probably do something similar for a number of other cases where
      we have #ifdef's in code, but this is the low-hanging fruit.
      Reported-and-tested-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c761ea0
  9. 25 Feb, 2012 6 commits