1. 25 Jul, 2011 1 commit
  2. 22 Jul, 2011 2 commits
    • Jan Kara's avatar
      ext3: Fix data corruption in inodes with journalled data · b22570d9
      Jan Kara authored
      When journalling data for an inode (either because it is a symlink or
      because the filesystem is mounted in data=journal mode), ext3_evict_inode()
      can discard unwritten data by calling truncate_inode_pages(). This is
      because we don't mark the buffer / page dirty when journalling data but only
      add the buffer to the running transaction and thus mm does not know there
      are still unwritten data.
      
      Fix the problem by carefully tracking transaction containing inode's data,
      committing this transaction, and writing uncheckpointed buffers when inode
      should be reaped.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      b22570d9
    • Wang Sheng-Hui's avatar
      ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get · 03b5bb34
      Wang Sheng-Hui authored
      In ext2_xattr_get(), the code will acquire xattr_sem first, later checks
      the length of xattr name_len > 255. It's unnecessarily time consuming and
      also ext2_xattr_set() checks the length before other checks. So move the
      check before acquiring xattr_sem to make these two functions consistent.
      Signed-off-by: default avatarWang Sheng-Hui <shhuiw@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      03b5bb34
  3. 20 Jul, 2011 2 commits
  4. 27 Jun, 2011 2 commits
    • Tao Ma's avatar
      jbd: Use WRITE_SYNC in journal checkpoint. · a212d1a7
      Tao Ma authored
      In journal checkpoint, we write the buffer and wait for its finish.
      But in cfq, the async queue has a very low priority, and in our test,
      if there are too many sync queues and every queue is filled up with
      requests, and the process will hang waiting for the log space.
      
      So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
      be moved into sync queue and handled by cfq timely. We also use the new plug,
      sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
      Reported-by: default avatarRobin Dong <sanbai@taobao.com>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      a212d1a7
    • Jan Kara's avatar
      jbd: Fix oops in journal_remove_journal_head() · bb189247
      Jan Kara authored
      journal_remove_journal_head() can oops when trying to access journal_head
      returned by bh2jh(). This is caused for example by the following race:
      
      	TASK1					TASK2
        journal_commit_transaction()
          ...
          processing t_forget list
            __journal_refile_buffer(jh);
            if (!jh->b_transaction) {
              jbd_unlock_bh_state(bh);
      					journal_try_to_free_buffers()
      					  journal_grab_journal_head(bh)
      					  jbd_lock_bh_state(bh)
      					  __journal_try_to_free_buffer()
      					  journal_put_journal_head(jh)
              journal_remove_journal_head(bh);
      
      journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not
      part of any transaction and thus frees journal_head before TASK1 gets to doing
      so. Note that even buffer_head can be released by try_to_free_buffers() after
      journal_put_journal_head() which adds even larger opportunity for oops (but I
      didn't see this happen in reality).
      
      Fix the problem by making transactions hold their own journal_head reference
      (in b_jcount). That way we don't have to remove journal_head explicitely via
      journal_remove_journal_head() and instead just remove journal_head when
      b_jcount drops to zero. The result of this is that [__]journal_refile_buffer(),
      [__]journal_unfile_buffer(), and __journal_remove_checkpoint() can free
      journal_head which needs modification of a few callers. Also we have to be
      careful because once journal_head is removed, buffer_head might be freed as
      well. So we have to get our own buffer_head reference where it matters.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      bb189247
  5. 25 Jun, 2011 12 commits
    • Lukas Czerner's avatar
      ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs() · 2c2ea945
      Lukas Czerner authored
      We should return -EINVAL when the FITRIM parameters are not sane, but
      currently we are exiting silently if start is beyond the end of the
      file system. This commit fixes this so we return -EINVAL as other file
      systems do.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      CC: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      2c2ea945
    • H Hartley Sweeten's avatar
      ext3/ioctl.c: silence sparse warnings about different address spaces · 81fe8c62
      H Hartley Sweeten authored
      The 'from' argument for copy_from_user and the 'to' argument for
      copy_to_user should both be tagged as __user address space.
      Signed-off-by: default avatarH Hartley Sweeten <hsweeten@visionengravers.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      81fe8c62
    • Lukas Czerner's avatar
      ext3/ext4 Documentation: remove bh/nobh since it has been deprecated · ad434017
      Lukas Czerner authored
      Bh and nobh mount option has been deprecated in ext4
      (206f7ab4) and in ext3
      (4c4d3901)
      so remove those options from documentation.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      ad434017
    • Jan Kara's avatar
      ext3: Improve truncate error handling · ee3e77f1
      Jan Kara authored
      New truncate calling convention allows us to handle errors from
      ext3_block_truncate_page(). So reorganize the code so that
      ext3_block_truncate_page() is called before we change inode size.
      
      This also removes unnecessary block zeroing from error recovery after failed
      buffered writes (zeroing isn't needed because we could have never written
      non-zero data to disk). We have to be careful and keep zeroing in direct IO
      write error recovery because there we might have already overwritten end of the
      last file block.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      ee3e77f1
    • Akinobu Mita's avatar
      ext3: use proper little-endian bitops · 90085930
      Akinobu Mita authored
      ext3_{set,clear}_bit() is defined as __test_and_{set,clear}_bit_le()
      for ext3.  But all ext3_{set,clear}_bit() calls ignore return values.
      So these can be replaced with __{set,clear}_bit_le().
      
      This changes ext3_{set,clear}_bit safely, because if someone uses
      these macros without noticing the change, new ext3_{set,clear}_bit
      don't have return value and causes compiler errors where the return
      value is used.
      
      This also removes unused ext3_find_first_zero_bit().
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: linux-ext4@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      90085930
    • Petr Uzel's avatar
      ext2: include fs.h into ext2_fs.h · fbcc9e62
      Petr Uzel authored
      AC_CHECK_HEADERS([linux/ext2_fs.h])
      fails with
      
      configure:34666: checking linux/ext2_fs.h usability
      configure:34666: gcc -std=gnu99 -c -ggdb3 -O0 -Wunreachable-code  conftest.c >&5
      In file included from conftest.c:406:0:
      /usr/include/linux/ext2_fs.h: In function 'ext2_mask_flags':
      /usr/include/linux/ext2_fs.h:182:21: error: 'FS_DIRSYNC_FL' undeclared (first use in this function)
      /usr/include/linux/ext2_fs.h:182:21: note: each undeclared identifier is reported only once for each function it appears in
      /usr/include/linux/ext2_fs.h:182:37: error: 'FS_TOPDIR_FL' undeclared (first use in this function)
      /usr/include/linux/ext2_fs.h:184:19: error: 'FS_NODUMP_FL' undeclared (first use in this function)
      /usr/include/linux/ext2_fs.h:184:34: error: 'FS_NOATIME_FL' undeclared (first use in this function)
      
      It's reasonable to have headers that include all necessary definitions. So fix
      this by including fs.h into ext2_fs.h.
      Signed-off-by: default avatarPetr Uzel <petr.uzel@suse.cz>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      fbcc9e62
    • Jan Kara's avatar
      ext3: Fix oops in ext3_try_to_allocate_with_rsv() · ad95c5e9
      Jan Kara authored
      Block allocation is called from two places: ext3_get_blocks_handle() and
      ext3_xattr_block_set(). These two callers are not necessarily synchronized
      because xattr code holds only xattr_sem and i_mutex, and
      ext3_get_blocks_handle() may hold only truncate_mutex when called from
      writepage() path. Block reservation code does not expect two concurrent
      allocations to happen to the same inode and thus assertions can be triggered
      or reservation structure corruption can occur.
      
      Fix the problem by taking truncate_mutex in xattr code to serialize
      allocations.
      
      CC: Sage Weil <sage@newdream.net>
      CC: stable@kernel.org
      Reported-by: default avatarFyodor Ustinov <ufm@ufm.su>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      ad95c5e9
    • Ding Dinghua's avatar
      jbd: fix a bug of leaking jh->b_jcount · bd5c9e18
      Ding Dinghua authored
      journal_get_create_access should drop jh->b_jcount in error handling path
      Signed-off-by: default avatarDing Dinghua <dingdinghua@nrchpc.ac.cn>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      bd5c9e18
    • Jan Kara's avatar
      jbd: remove dependency on __GFP_NOFAIL · 05713082
      Jan Kara authored
      The callers of start_this_handle() (or better ext3_journal_start()) are not
      really prepared to handle allocation failures. Such failures can for example
      result in silent data loss when it happens in ext3_..._writepage().  OTOH
      __GFP_NOFAIL is going away so we just retry allocation in start_this_handle().
      
      This loop is potentially dangerous because the oom killer cannot be invoked
      for GFP_NOFS allocation, so there is a potential for infinitely looping.
      But still this is better than silent data loss.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      05713082
    • Jan Kara's avatar
      ext3: Convert ext3 to new truncate calling convention · 40680f2f
      Jan Kara authored
      Mostly trivial conversion. We fix a bug that IS_IMMUTABLE and IS_APPEND files
      could not be truncated during failed writes as we change the code.  In fact the
      test is not needed at all because both IS_IMMUTABLE and IS_APPEND is tested in
      upper layers in do_sys_[f]truncate(), may_write(), etc.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      40680f2f
    • Lukas Czerner's avatar
      jbd: Add fixed tracepoints · 99cb1a31
      Lukas Czerner authored
      This commit adds fixed tracepoint for jbd. It has been based on fixed
      tracepoints for jbd2, however there are missing those for collecting
      statistics, since I think that it will require more intrusive patch so I
      should have its own commit, if someone decide that it is needed. Also
      there are new tracepoints in __journal_drop_transaction() and
      journal_update_superblock().
      
      The list of jbd tracepoints:
      
      jbd_checkpoint
      jbd_start_commit
      jbd_commit_locking
      jbd_commit_flushing
      jbd_commit_logging
      jbd_drop_transaction
      jbd_end_commit
      jbd_do_submit_data
      jbd_cleanup_journal_tail
      jbd_update_superblock_end
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      99cb1a31
    • Lukas Czerner's avatar
      ext3: Add fixed tracepoints · 785c4bcc
      Lukas Czerner authored
      This commit adds fixed tracepoints to the ext3 code. It is based on ext4
      tracepoints, however due to the differences of both file systems, there
      are some tracepoints missing (those for delaloc and for multi-block
      allocator) and there are some ext3 specific as well (for reservation
      windows).
      
      Here is a list:
      
      ext3_free_inode
      ext3_request_inode
      ext3_allocate_inode
      ext3_evict_inode
      ext3_drop_inode
      ext3_mark_inode_dirty
      ext3_write_begin
      ext3_ordered_write_end
      ext3_writeback_write_end
      ext3_journalled_write_end
      ext3_ordered_writepage
      ext3_writeback_writepage
      ext3_journalled_writepage
      ext3_readpage
      ext3_releasepage
      ext3_invalidatepage
      ext3_discard_blocks
      ext3_request_blocks
      ext3_allocate_blocks
      ext3_free_blocks
      ext3_sync_file_enter
      ext3_sync_file_exit
      ext3_sync_fs
      ext3_rsv_window_add
      ext3_discard_reservation
      ext3_alloc_new_reservation
      ext3_reserved
      ext3_forget
      ext3_read_block_bitmap
      ext3_direct_IO_enter
      ext3_direct_IO_exit
      ext3_unlink_enter
      ext3_unlink_exit
      ext3_truncate_enter
      ext3_truncate_exit
      ext3_get_blocks_enter
      ext3_get_blocks_exit
      ext3_load_inode
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      785c4bcc
  6. 24 Jun, 2011 16 commits
  7. 23 Jun, 2011 5 commits
    • Roland Dreier's avatar
      target: Convert transport_deregister_session_configfs nacl_sess_lock to save irq state · 23388864
      Roland Dreier authored
      This patch converts transport_deregister_session_configfs() to save/restore
      spinlock IRQ state for struct se_node_acl->nacl_sess_lock access as tcm_qla2xxx
      logic expects to call transport_deregister_session_configfs() code with
      irq save already held for struct qla_hw_data.
      Reported-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      23388864
    • Nicholas Bellinger's avatar
      target: Fix transport_get_lun_for_tmr failure cases · 7fd29aa9
      Nicholas Bellinger authored
      This patch fixes two possible NULL pointer dereferences in target v4.0
      code where se_tmr release path in core_tmr_release_req() can OOPs upon
      transport_get_lun_for_tmr() failure by attempting to access se_device or
      se_tmr->tmr_list without a valid member of se_device->tmr_list during
      transport_free_se_cmd() release.  This patch moves the se_tmr->tmr_dev
      pointer assignment in transport_get_lun_for_tmr() until after possible
      -ENODEV failures during unpacked_lun lookup.
      
      This addresses an OOPs originally reported with LIO v4.1 upstream on
      .39 code here:
      
          TARGET_CORE[qla2xxx]: Detected NON_EXISTENT_LUN Access for 0x00000000
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000550
          IP: [<ffffffff81035ec4>] __ticket_spin_trylock+0x4/0x20
          PGD 0
          Oops: 0000 [#1] SMP
          last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
          CPU 1
          Modules linked in: netconsole target_core_pscsi target_core_file
      tcm_qla2xxx target_core_iblock tcm_loop target_core_mod configfs
      ipmi_devintf ipmi_si ipmi_msghandler serio_raw i7core_edac ioatdma dca
      edac_core ps_bdrv ses enclosure usbhid usb_storage ahci qla2xxx hid
      uas e1000e mpt2sas libahci mlx4_core scsi_transport_fc
      scsi_transport_sas raid_class scsi_tgt [last unloaded: netconsole]
      
          Pid: 0, comm: kworker/0:0 Tainted: G        W   2.6.39+ #1 Xyratex Storage Server
          RIP: 0010:[<ffffffff81035ec4>] [<ffffffff81035ec4>]__ticket_spin_trylock+0x4/0x20
          RSP: 0018:ffff88063e803c08  EFLAGS: 00010286
          RAX: ffff880619ab45e0 RBX: 0000000000000550 RCX: 0000000000000000
          RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000550
          RBP: ffff88063e803c08 R08: 0000000000000002 R09: 0000000000000000
          R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000568
          R13: 0000000000000001 R14: 0000000000000000 R15: ffff88060cd96a20
          FS:  0000000000000000(0000) GS:ffff88063e800000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 0000000000000550 CR3: 0000000001a03000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          Process kworker/0:0 (pid: 0, threadinfo ffff880619ab8000, task ffff880619ab45e0)
          Stack:
           ffff88063e803c28 ffffffff812cf039 0000000000000550 0000000000000568
           ffff88063e803c58 ffffffff8157071e ffffffffa028a1dc ffff88060f7e4600
           0000000000000550 ffff880616961480 ffff88063e803c78 ffffffffa028a1dc
          Call Trace:
      <IRQ>
           [<ffffffff812cf039>] do_raw_spin_trylock+0x19/0x50
           [<ffffffff8157071e>] _raw_spin_lock+0x3e/0x70
           [<ffffffffa028a1dc>] ? core_tmr_release_req+0x2c/0x60 [target_core_mod]
           [<ffffffffa028a1dc>] core_tmr_release_req+0x2c/0x60 [target_core_mod]
           [<ffffffffa028d0d2>] transport_free_se_cmd+0x22/0x50 [target_core_mod]
           [<ffffffffa028d120>] transport_release_cmd_to_pool+0x20/0x40 [target_core_mod]
           [<ffffffffa028e525>] transport_generic_free_cmd+0xa5/0xb0 [target_core_mod]
           [<ffffffffa0147cc4>] tcm_qla2xxx_handle_tmr+0xc4/0xd0 [tcm_qla2xxx]
           [<ffffffffa0191ba3>] __qla24xx_handle_abts+0xd3/0x150 [qla2xxx]
           [<ffffffffa0197651>] qla_tgt_response_pkt+0x171/0x520 [qla2xxx]
           [<ffffffffa0197a2d>] qla_tgt_response_pkt_all_vps+0x2d/0x220 [qla2xxx]
           [<ffffffffa0171dd3>] qla24xx_process_response_queue+0x1a3/0x670 [qla2xxx]
           [<ffffffffa0196281>] ? qla24xx_atio_pkt+0x81/0x120 [qla2xxx]
           [<ffffffffa0174025>] ? qla24xx_msix_default+0x45/0x2a0 [qla2xxx]
           [<ffffffffa0174198>] qla24xx_msix_default+0x1b8/0x2a0 [qla2xxx]
           [<ffffffff810dadb4>] handle_irq_event_percpu+0x54/0x210
           [<ffffffff810dafb8>] handle_irq_event+0x48/0x70
           [<ffffffff810dd5ee>] ? handle_edge_irq+0x1e/0x110
           [<ffffffff810dd647>] handle_edge_irq+0x77/0x110
           [<ffffffff8100d362>] handle_irq+0x22/0x40
           [<ffffffff8157b28d>] do_IRQ+0x5d/0xe0
           [<ffffffff81571413>] common_interrupt+0x13/0x13
      <EOI>
           [<ffffffff813003f7>] ? intel_idle+0xd7/0x130
           [<ffffffff813003f0>] ? intel_idle+0xd0/0x130
           [<ffffffff8144832b>] cpuidle_idle_call+0xab/0x1c0
           [<ffffffff8100a26b>] cpu_idle+0xab/0xf0
           [<ffffffff81566c59>] start_secondary+0x1cb/0x1d2
      Reported-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      7fd29aa9
    • Nishanth Aravamudan's avatar
      libata/sas: only set FROZEN flag if new EH is supported · 3f1e046a
      Nishanth Aravamudan authored
      On 16.06.2011 [08:28:39 -0500], Brian King wrote:
      > On 06/16/2011 02:51 AM, Tejun Heo wrote:
      > > On Wed, Jun 15, 2011 at 04:34:17PM -0700, Nishanth Aravamudan wrote:
      > >>> That looks like the right thing to do. For ipr's usage of
      > >>> libata, we don't have the concept of a port frozen state, so this flag
      > >>> should really never get set. The alternate way to fix this would be to
      > >>> only set ATA_PFLAG_FROZEN in ata_port_alloc if ap->ops->error_handler
      > >>> is not NULL.
      > >>
      > >> It seemed like ipr is as you say, but I wasn't sure if it was
      > >> appropriate to make the change above in the common libata-scis code or
      > >> not. I don't want to break some other device on accident.
      > >>
      > >> Also, I tried your suggestion, but I don't think that can happen in
      > >> ata_port_alloc? ata_port_alloc is allocated ap itself, and it seems like
      > >> ap->ops typically gets set only after ata_port_alloc returns?
      > >
      > > Maybe we can test error_handler in ata_sas_port_start()?
      >
      > Good point. Since libsas is converted to the new eh now, we would need to have
      > this test.
      
      Commit 7b3a24c5 ("ahci: don't enable
      port irq before handler is registered") caused a regression for CD-ROMs
      attached to the IPR SATA bus on Power machines:
      
        ata_port_alloc: ENTER
        ata_port_probe: ata1: bus probe begin
        ata1.00: ata_dev_read_id: ENTER
        ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
        ata1.00: ata_dev_read_id: ENTER
        ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
        ata1.00: limiting speed to UDMA7:PIO5
        ata1.00: ata_dev_read_id: ENTER
        ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
        ata1.00: disabled
        ata_port_probe: ata1: bus probe end
        scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
      
      The FROZEN flag added in that commit is only cleared by the new EH code,
      which is not used by ipr. Clear this flag in the SAS code if we don't
      support new EH.
      Reported-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: default avatarJeff Garzik <jgarzik@pobox.com>
      3f1e046a
    • Tejun Heo's avatar
      libata: apply NOSETXFER horkage to the affected Pioneer drives · cd691876
      Tejun Heo authored
      regardless of firmware revision
      
      It's unlikely NOSETXFER works for a revision of drive but doesn't for
      another and pioneer doesn't seem to be fixing firmwares for the
      affected drives.  Apply NOSETXFER to the affected pioneer drives
      regardless of firmware revision.
      
        http://article.gmane.org/gmane.linux.ide/49734Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: fl-00@gmx.de
      Signed-off-by: default avatarJeff Garzik <jgarzik@pobox.com>
      cd691876
    • Justin P. Mattock's avatar
      drivers/ata/sata_dwc_460ex: Fix typo 'corrresponding' · 8618ccd3
      Justin P. Mattock authored
      The patch below fixes a typo.
      Signed-off-by: default avatarJustin P. Mattock <justinmattock@gmail.com>
      Signed-off-by: default avatarJeff Garzik <jgarzik@pobox.com>
      8618ccd3