1. 20 Jul, 2012 8 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd · ce9f8d6b
      Linus Torvalds authored
      Pull pnfs/ore fixes from Boaz Harrosh:
       "These are catastrophic fixes to the pnfs objects-layout that were just
        discovered.  They are also destined for @stable.
      
        I have found these and worked on them at around RC1 time but
        unfortunately went to the hospital for kidney stones and had a very
        slow recovery.  I refrained from sending them as is, before proper
        testing, and surly I have found a bug just yesterday.
      
        So now they are all well tested, and have my sign-off.  Other then
        fixing the problem at hand, and assuming there are no bugs at the new
        code, there is low risk to any surrounding code.  And in anyway they
        affect only these paths that are now broken.  That is RAID5 in pnfs
        objects-layout code.  It does also affect exofs (which was not broken)
        but I have tested exofs and it is lower priority then objects-layout
        because no one is using exofs, but objects-layout has lots of users."
      
      * 'for-linus' of git://git.open-osd.org/linux-open-osd:
        pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
        pnfs-obj: don't leak objio_state if ore_write/read fails
        ore: Unlock r4w pages in exact reverse order of locking
        ore: Remove support of partial IO request (NFS crash)
        ore: Fix NFS crash by supporting any unaligned RAID IO
      ce9f8d6b
    • Linus Torvalds's avatar
      Merge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs · 17934162
      Linus Torvalds authored
      Pull UBIFS free space fix-up bugfix from Artem Bityutskiy:
       "It's been reported already twice recently:
      
          http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html
          http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html
      
        and we finally have the fix.  I am quite confident the fix is correct
        because I could reproduce the problem with nandsim and verify the fix.
        It was also verified by Iwo (the reporter).
      
        I am also confident that this is OK to merge the fix so late because
        this patch affects only the fixup functionality, which is not used by
        most users."
      
      * tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs:
        UBIFS: fix a bug in empty space fix-up
      17934162
    • Boaz Harrosh's avatar
      pnfs-obj: Fix __r4w_get_page when offset is beyond i_size · c999ff68
      Boaz Harrosh authored
      It is very common for the end of the file to be unaligned on
      stripe size. But since we know it's beyond file's end then
      the XOR should be preformed with all zeros.
      
      Old code used to just read zeros out of the OSD devices, which is a great
      waist. But what scares me more about this situation is that, we now have
      pages attached to the file's mapping that are beyond i_size. I don't
      like the kind of bugs this calls for.
      
      Fix both birds, by returning a global zero_page, if offset is beyond
      i_size.
      
      TODO:
      	Change the API to ->__r4w_get_page() so a NULL can be
      	returned without being considered as error, since XOR API
      	treats NULL entries as zero_pages.
      
      [Bug since 3.2. Should apply the same way to all Kernels since]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      c999ff68
    • Boaz Harrosh's avatar
      pnfs-obj: don't leak objio_state if ore_write/read fails · 9909d45a
      Boaz Harrosh authored
      [Bug since 3.2 Kernel]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      9909d45a
    • Boaz Harrosh's avatar
      ore: Unlock r4w pages in exact reverse order of locking · 537632e0
      Boaz Harrosh authored
      The read-4-write pages are locked in address ascending order.
      But where unlocked in a way easiest for coding. Fix that,
      locks should be released in opposite order of locking, .i.e
      descending address order.
      
      I have not hit this dead-lock. It was found by inspecting the
      dbug print-outs. I suspect there is an higher lock at caller that
      protects us, but fix it regardless.
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      537632e0
    • Boaz Harrosh's avatar
      ore: Remove support of partial IO request (NFS crash) · 62b62ad8
      Boaz Harrosh authored
      Do to OOM situations the ore might fail to allocate all resources
      needed for IO of the full request. If some progress was possible
      it would proceed with a partial/short request, for the sake of
      forward progress.
      
      Since this crashes NFS-core and exofs is just fine without it just
      remove this contraption, and fail.
      
      TODO:
      	Support real forward progress with some reserved allocations
      	of resources, such as mem pools and/or bio_sets
      
      [Bug since 3.2 Kernel]
      CC: Stable Tree <stable@kernel.org>
      CC: Benny Halevy <bhalevy@tonian.com>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      62b62ad8
    • Boaz Harrosh's avatar
      ore: Fix NFS crash by supporting any unaligned RAID IO · 9ff19309
      Boaz Harrosh authored
      In RAID_5/6 We used to not permit an IO that it's end
      byte is not stripe_size aligned and spans more than one stripe.
      .i.e the caller must check if after submission the actual
      transferred bytes is shorter, and would need to resubmit
      a new IO with the remainder.
      
      Exofs supports this, and NFS was supposed to support this
      as well with it's short write mechanism. But late testing has
      exposed a CRASH when this is used with none-RPC layout-drivers.
      
      The change at NFS is deep and risky, in it's place the fix
      at ORE to lift the limitation is actually clean and simple.
      So here it is below.
      
      The principal here is that in the case of unaligned IO on
      both ends, beginning and end, we will send two read requests
      one like old code, before the calculation of the first stripe,
      and also a new site, before the calculation of the last stripe.
      If any "boundary" is aligned or the complete IO is within a single
      stripe. we do a single read like before.
      
      The code is clean and simple by splitting the old _read_4_write
      into 3 even parts:
      1._read_4_write_first_stripe
      2. _read_4_write_last_stripe
      3. _read_4_write_execute
      
      And calling 1+3 at the same place as before. 2+3 before last
      stripe, and in the case of all in a single stripe then 1+2+3
      is preformed additively.
      
      Why did I not think of it before. Well I had a strike of
      genius because I have stared at this code for 2 years, and did
      not find this simple solution, til today. Not that I did not try.
      
      This solution is much better for NFS than the previous supposedly
      solution because the short write was dealt  with out-of-band after
      IO_done, which would cause for a seeky IO pattern where as in here
      we execute in order. At both solutions we do 2 separate reads, only
      here we do it within a single IO request. (And actually combine two
      writes into a single submission)
      
      NFS/exofs code need not change since the ORE API communicates the new
      shorter length on return, what will happen is that this case would not
      occur anymore.
      
      hurray!!
      
      [Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      9ff19309
    • Artem Bityutskiy's avatar
      UBIFS: fix a bug in empty space fix-up · c6727932
      Artem Bityutskiy authored
      UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
      limitations of dumb flasher programs. Namely, of those flashers that are unable
      to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
      the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
      relatively new (introduced in v3.0).
      
      The fix-up routine (fixup_free_space()) is executed only once at the very first
      mount if the superblock has the 'space_fixup' flag set (can be done with -F
      option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
      writes it back to the same LEB. The routine assumes the image is pristine and
      does not have anything in the journal.
      
      There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
      All but one LEB of the log of a pristine file-system are empty. And one
      contains just a commit start node. And 'fixup_free_space()' just unmapped this
      LEB, which resulted in wiping the commit start node. As a result, some users
      were unable to mount the file-system next time with the following symptom:
      
      UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
      UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0
      
      The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
      that the beginning of empty space in the log head (c->lhead_offs) was known
      on mount. However, it is not the case - it was always 0. UBIFS does not store
      in it the master node and finds out by scanning the log on every mount.
      
      The fix is simple - just pass commit start node size instead of 0 to
      'fixup_leb()'.
      Signed-off-by: default avatarArtem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
      Cc: stable@vger.kernel.org [v3.0+]
      Reported-by: default avatarIwo Mergler <Iwo.Mergler@netcommwireless.com>
      Tested-by: default avatarIwo Mergler <Iwo.Mergler@netcommwireless.com>
      Reported-by: default avatarJames Nute <newten82@gmail.com>
      c6727932
  2. 19 Jul, 2012 13 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 85efc72a
      Linus Torvalds authored
      Pull last minute Ceph fixes from Sage Weil:
       "The important one fixes a bug in the socket failure handling behavior
        that was turned up in some recent failure injection testing.  The
        other two are minor bug fixes."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        rbd: endian bug in rbd_req_cb()
        rbd: Fix ceph_snap_context size calculation
        libceph: fix messenger retry
      85efc72a
    • Linus Torvalds's avatar
      Merge tag 'md-3.5-fixes' of git://neil.brown.name/md · 3e4b9459
      Linus Torvalds authored
      Pull three md bugfixes from NeilBrown:
       "One of the bugs was introduced in 3.5-rc1.  Others have been there for
        longer."
      
      * tag 'md-3.5-fixes' of git://neil.brown.name/md:
        md/raid1: close some possible races on write errors during resync
        md: avoid crash when stopping md array races with closing other open fds.
        md: fix bug in handling of new_data_offset
      3e4b9459
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 309d4b00
      Linus Torvalds authored
      Pull networking changes from David Miller:
       "Ok, we should be good to go now"
      
      1) We have to statically initialize the init_net device list head rather
         than do so in an initcall, otherwise netprio_cgroup crashes if it's
         built statically rather than modular (Mark D.  Rustad)
      
      2) Fix SKB null oopser in CIPSO ipv4 option processing (Paul Moore)
      
      3) Qlogic maintainers update (Anirban Chakraborty)
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: Statically initialize init_net.dev_base_head
        MAINTAINERS: Changes in qlcnic and qlge maintainers list
        cipso: don't follow a NULL pointer when setsockopt() is called
      309d4b00
    • Linus Torvalds's avatar
      Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 61c901c5
      Linus Torvalds authored
      Pull HID update from Jiri Kosina:
       "A final round of changes for HID for 3.5: just device ID additions."
      
      * 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: hid-multitouch: add support for Zytronic panels
        HID: add Sennheiser BTD500USB device support
        HID: add battery quirk for Apple Wireless ANSI
      61c901c5
    • Ezequiel Garcia's avatar
      cx25821: Remove bad strcpy to read-only char* · 380e99fc
      Ezequiel Garcia authored
      The strcpy was being used to set the name of the board.  Since the
      destination char* was read-only and the name is set statically at
      compile time; this was both wrong and redundant.
      
      The type of char* is changed to const char* to prevent future errors.
      Reported-by: default avatarRadek Masin <radek@masin.eu>
      Signed-off-by: default avatarEzequiel Garcia <elezegarcia@gmail.com>
      [ Taking directly due to vacations   - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      380e99fc
    • Benjamin Tissoires's avatar
    • NeilBrown's avatar
      md/raid1: close some possible races on write errors during resync · 58e94ae1
      NeilBrown authored
      commit 4367af55
         md/raid1: clear bad-block record when write succeeds.
      
      Added a 'reschedule_retry' call possibility at the end of
      end_sync_write, but didn't add matching code at the end of
      sync_request_write.  So if the writes complete very quickly, or
      scheduling makes it seem that way, then we can miss rescheduling
      the request and the resync could hang.
      
      Also commit 73d5c38a
          md: avoid races when stopping resync.
      
      Fix a race condition in this same code in end_sync_write but didn't
      make the change in sync_request_write.
      
      This patch updates sync_request_write to fix both of those.
      Patch is suitable for 3.1 and later kernels.
      Reported-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Original-version-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      58e94ae1
    • NeilBrown's avatar
      md: avoid crash when stopping md array races with closing other open fds. · a05b7ea0
      NeilBrown authored
      md will refuse to stop an array if any other fd (or mounted fs) is
      using it.
      When any fs is unmounted of when the last open fd is closed all
      pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put)
      so there will be no pending IO to worry about when the array is
      stopped.
      
      However in order to send the STOP_ARRAY ioctl to stop the array one
      must first get and open fd on the block device.
      If some fd is being used to write to the block device and it is closed
      after mdadm open the block device, but before mdadm issues the
      STOP_ARRAY ioctl, then there will be no last-close on the md device so
      __blkdev_put will not call sync_blockdev.
      
      If this happens, then IO can still be in-flight while md tears down
      the array and bad things can happen (use-after-free and subsequent
      havoc).
      
      So in the case where do_md_stop is being called from an open file
      descriptor, call sync_block after taking the mutex to ensure there
      will be no new openers.
      
      This is needed when setting a read-write device to read-only too.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarmajianpeng <majianpeng@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a05b7ea0
    • NeilBrown's avatar
      md: fix bug in handling of new_data_offset · 25f7fd47
      NeilBrown authored
      commit c6563a8c
          md: add possibility to change data-offset for devices.
      
      introduced a 'new_data_offset' attribute which should normally
      be the same as 'data_offset', but can be explicitly set to a different
      value to allow a reshape operation to move the data.
      
      Unfortunately when the 'data_offset' is explicitly set through
      sysfs, the new_data_offset is not also set, so the two would become
      out-of-sync incorrectly.
      
      One result of this is that trying to set the 'size' after the
      'data_offset' would fail because it is not permitted to set the size
      when the 'data_offset' and 'new_data_offset' are different - as that
      can be confusing.
      Consequently when mdadm tried to do this while assembling an IMSM
      array it would fail.
      
      This bug was introduced in 3.5-rc1.
      Reported-by: default avatarBrian Downing <bdowning@lavos.net>
      Bisected-by: default avatarBrian Downing <bdowning@lavos.net>
      Tested-by: default avatarBrian Downing <bdowning@lavos.net>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      25f7fd47
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 8a7298b7
      Linus Torvalds authored
      Pull target fixes from Nicholas Bellinger:
       "This includes a bugfix from MDR to address a NULL pointer OOPs with
        FCoE aborts, along with a WRITE_SAME emulation bugfix for NOLB=0
        cases, and persistent reservation return cleanups from Roland.
      
        All three patches are CC'ed to stable."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        target: Fix range calculation in WRITE SAME emulation when num blocks == 0
        target: Clean up returning errors in PR handling code
        tcm_fc: Fix crash seen with aborts and large reads
      8a7298b7
    • Olaf Hering's avatar
      kexec: update URL of kexec homepage · b1bdd2eb
      Olaf Hering authored
      The referenced html file does not exist anymore. Replace the URL with
      the current project homepage.
      Signed-off-by: default avatarOlaf Hering <olaf@aepfle.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b1bdd2eb
    • Yoichi Yuasa's avatar
      mips: fix bug.h build regression · 893a0574
      Yoichi Yuasa authored
      Commit 37778088 ("bug.h: need linux/kernel.h for TAINT_WARN.") broke
      all MIPS builds:
      
          CC      arch/mips/kernel/machine_kexec.o
        include/linux/log2.h: In function '__ilog2_u32':
        include/linux/log2.h:34:2: error: implicit declaration of function 'fls' [-Werror=implicit-function-declaration]
        include/linux/log2.h: In function '__ilog2_u64':
        include/linux/log2.h:42:2: error: implicit declaration of function 'fls64' [-Werror=implicit-function-declaration]
        ...
      Signed-off-by: default avatarYoichi Yuasa <yuasa@linux-mips.org>
      Tested-by: default avatarJohn Crispin <blogic@openwrt.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      893a0574
    • Linus Torvalds's avatar
      Make wait_for_device_probe() also do scsi_complete_async_scans() · eea03c20
      Linus Torvalds authored
      Commit a7a20d10 ("sd: limit the scope of the async probe domain")
      make the SCSI device probing run device discovery in it's own async
      domain.
      
      However, as a result, the partition detection was no longer synchronized
      by async_synchronize_full() (which, despite the name, only synchronizes
      the global async space, not all of them).  Which in turn meant that
      "wait_for_device_probe()" would not wait for the SCSI partitions to be
      parsed.
      
      And "wait_for_device_probe()" was what the boot time init code relied on
      for mounting the root filesystem.
      
      Now, most people never noticed this, because not only is it
      timing-dependent, but modern distributions all use initrd.  So the root
      filesystem isn't actually on a disk at all.  And then before they
      actually mount the final disk filesystem, they will have loaded the
      scsi-wait-scan module, which not only does the expected
      wait_for_device_probe(), but also does scsi_complete_async_scans().
      
      [ Side note: scsi_complete_async_scans() had also been partially broken,
        but that was fixed in commit 43a8d39d ("fix async probe
        regression"), so that same commit a7a20d10 had actually broken
        setups even if you used scsi-wait-scan explicitly ]
      
      Solve this problem by just moving the scsi_complete_async_scans() call
      into wait_for_device_probe().  Everybody who wants to wait for device
      probing to finish really wants the SCSI probing to complete, so there's
      no reason not to do this.
      
      So now "wait_for_device_probe()" really does what the name implies, and
      properly waits for device probing to finish.  This also removes the now
      unnecessary extra calls to scsi_complete_async_scans().
      Reported-and-tested-by: default avatarArtem S. Tashkinov <t.artem@mailcity.com>
      Cc: Dan Williams <dan.j.williams@gmail.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: linux-scsi <linux-scsi@vger.kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eea03c20
  3. 18 Jul, 2012 19 commits