1. 29 Jul, 2013 5 commits
    • Amit Shah's avatar
      virtio: console: return -ENODEV on all read operations after unplug · 96f97a83
      Amit Shah authored
      If a port gets unplugged while a user is blocked on read(), -ENODEV is
      returned.  However, subsequent read()s returned 0, indicating there's no
      host-side connection (but not indicating the device went away).
      
      This also happened when a port was unplugged and the user didn't have
      any blocking operation pending.  If the user didn't monitor the SIGIO
      signal, they won't have a chance to find out if the port went away.
      
      Fix by returning -ENODEV on all read()s after the port gets unplugged.
      write() already behaves this way.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarAmit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      96f97a83
    • Amit Shah's avatar
      virtio: console: fix raising SIGIO after port unplug · 92d34538
      Amit Shah authored
      SIGIO should be sent when a port gets unplugged.  It should only be sent
      to prcesses that have the port opened, and have asked for SIGIO to be
      delivered.  We were clearing out guest_connected before calling
      send_sigio_to_port(), resulting in a sigio not getting sent to
      processes.
      
      Fix by setting guest_connected to false after invoking the sigio
      function.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarAmit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      92d34538
    • Amit Shah's avatar
      virtio: console: clean up port data immediately at time of unplug · ea3768b4
      Amit Shah authored
      We used to keep the port's char device structs and the /sys entries
      around till the last reference to the port was dropped.  This is
      actually unnecessary, and resulted in buggy behaviour:
      
      1. Open port in guest
      2. Hot-unplug port
      3. Hot-plug a port with the same 'name' property as the unplugged one
      
      This resulted in hot-plug being unsuccessful, as a port with the same
      name already exists (even though it was unplugged).
      
      This behaviour resulted in a warning message like this one:
      
      -------------------8<---------------------------------------
      WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
      Hardware name: KVM
      sysfs: cannot create duplicate filename
      '/devices/pci0000:00/0000:00:04.0/virtio0/virtio-ports/vport0p1'
      
      Call Trace:
       [<ffffffff8106b607>] ? warn_slowpath_common+0x87/0xc0
       [<ffffffff8106b6f6>] ? warn_slowpath_fmt+0x46/0x50
       [<ffffffff811f2319>] ? sysfs_add_one+0xc9/0x130
       [<ffffffff811f23e8>] ? create_dir+0x68/0xb0
       [<ffffffff811f2469>] ? sysfs_create_dir+0x39/0x50
       [<ffffffff81273129>] ? kobject_add_internal+0xb9/0x260
       [<ffffffff812733d8>] ? kobject_add_varg+0x38/0x60
       [<ffffffff812734b4>] ? kobject_add+0x44/0x70
       [<ffffffff81349de4>] ? get_device_parent+0xf4/0x1d0
       [<ffffffff8134b389>] ? device_add+0xc9/0x650
      
      -------------------8<---------------------------------------
      
      Instead of relying on guest applications to release all references to
      the ports, we should go ahead and unregister the port from all the core
      layers.  Any open/read calls on the port will then just return errors,
      and an unplug/plug operation on the host will succeed as expected.
      
      This also caused buggy behaviour in case of the device removal (not just
      a port): when the device was removed (which means all ports on that
      device are removed automatically as well), the ports with active
      users would clean up only when the last references were dropped -- and
      it would be too late then to be referencing char device pointers,
      resulting in oopses:
      
      -------------------8<---------------------------------------
      PID: 6162   TASK: ffff8801147ad500  CPU: 0   COMMAND: "cat"
       #0 [ffff88011b9d5a90] machine_kexec at ffffffff8103232b
       #1 [ffff88011b9d5af0] crash_kexec at ffffffff810b9322
       #2 [ffff88011b9d5bc0] oops_end at ffffffff814f4a50
       #3 [ffff88011b9d5bf0] die at ffffffff8100f26b
       #4 [ffff88011b9d5c20] do_general_protection at ffffffff814f45e2
       #5 [ffff88011b9d5c50] general_protection at ffffffff814f3db5
          [exception RIP: strlen+2]
          RIP: ffffffff81272ae2  RSP: ffff88011b9d5d00  RFLAGS: 00010246
          RAX: 0000000000000000  RBX: ffff880118901c18  RCX: 0000000000000000
          RDX: ffff88011799982c  RSI: 00000000000000d0  RDI: 3a303030302f3030
          RBP: ffff88011b9d5d38   R8: 0000000000000006   R9: ffffffffa0134500
          R10: 0000000000001000  R11: 0000000000001000  R12: ffff880117a1cc10
          R13: 00000000000000d0  R14: 0000000000000017  R15: ffffffff81aff700
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #6 [ffff88011b9d5d00] kobject_get_path at ffffffff8126dc5d
       #7 [ffff88011b9d5d40] kobject_uevent_env at ffffffff8126e551
       #8 [ffff88011b9d5dd0] kobject_uevent at ffffffff8126e9eb
       #9 [ffff88011b9d5de0] device_del at ffffffff813440c7
      
      -------------------8<---------------------------------------
      
      So clean up when we have all the context, and all that's left to do when
      the references to the port have dropped is to free up the port struct
      itself.
      
      CC: <stable@vger.kernel.org>
      Reported-by: default avatarchayang <chayang@redhat.com>
      Reported-by: default avatarYOGANANTH SUBRAMANIAN <anantyog@in.ibm.com>
      Reported-by: default avatarFuXiangChun <xfu@redhat.com>
      Reported-by: default avatarQunfang Zhang <qzhang@redhat.com>
      Reported-by: default avatarSibiao Luo <sluo@redhat.com>
      Signed-off-by: default avatarAmit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      ea3768b4
    • Amit Shah's avatar
      virtio: console: fix race in port_fops_open() and port unplug · 671bdea2
      Amit Shah authored
      Between open() being called and processed, the port can be unplugged.
      Check if this happened, and bail out.
      
      A simple test script to reproduce this is:
      
      while true; do for i in $(seq 1 100); do echo $i > /dev/vport0p3; done; done;
      
      This opens and closes the port a lot of times; unplugging the port while
      this is happening triggers the bug.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarAmit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      671bdea2
    • Amit Shah's avatar
      virtio: console: fix race with port unplug and open/close · 057b82be
      Amit Shah authored
      There's a window between find_port_by_devt() returning a port and us
      taking a kref on the port, where the port could get unplugged.  Fix it
      by taking the reference in find_port_by_devt() itself.
      
      Problem reported and analyzed by Mateusz Guzik.
      
      CC: <stable@vger.kernel.org>
      Reported-by: default avatarMateusz Guzik <mguzik@redhat.com>
      Signed-off-by: default avatarAmit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      057b82be
  2. 23 Jul, 2013 2 commits
    • Yoshihiro YUNOMAE's avatar
      virtio/console: Add pipe_lock/unlock for splice_write · 2b4fbf02
      Yoshihiro YUNOMAE authored
      Add pipe_lock/unlock for splice_write to avoid oops by following competition:
      
      (1) An application gets fds of a trace buffer, virtio-serial, pipe.
      (2) The application does fork()
      (3) The processes execute splice_read(trace buffer) and
          splice_write(virtio-serial) via same pipe.
      
              <parent>                   <child>
        get fds of a trace buffer,
               virtio-serial, pipe
                |
              fork()----------create--------+
                |                           |
            splice(read)                    |           ---+
            splice(write)                   |              +-- no competition
                |                       splice(read)       |
                |                       splice(write)   ---+
                |                           |
            splice(read)                    |
            splice(write)               splice(read)    ------ competition
                |                       splice(write)
      
      Two processes share a pipe_inode_info structure. If the child execute
      splice(read) when the parent tries to execute splice(write), the
      structure can be broken. Existing virtio-serial driver does not get
      lock for the structure in splice_write, so this competition will induce
      oops.
      
      <oops messages>
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
       IP: [<ffffffff811a6b5f>] splice_from_pipe_feed+0x6f/0x130
       PGD 7223e067 PUD 72391067 PMD 0
       Oops: 0000 [#1] SMP
       Modules linked in: lockd bnep bluetooth rfkill sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd soundcore pcspkr virtio_net virtio_balloon i2c_piix4 i2c_core microcode uinput floppy
       CPU: 0 PID: 1072 Comm: compete-test Not tainted 3.10.0ws+ #55
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
       task: ffff880071b98000 ti: ffff88007b55e000 task.ti: ffff88007b55e000
       RIP: 0010:[<ffffffff811a6b5f>]  [<ffffffff811a6b5f>] splice_from_pipe_feed+0x6f/0x130
       RSP: 0018:ffff88007b55fd78  EFLAGS: 00010287
       RAX: 0000000000000000 RBX: ffff88007b55fe20 RCX: 0000000000000000
       RDX: 0000000000001000 RSI: ffff88007a95ba30 RDI: ffff880036f9e6c0
       RBP: ffff88007b55fda8 R08: 00000000000006ec R09: ffff880077626708
       R10: 0000000000000003 R11: ffffffff8139ca59 R12: ffff88007a95ba30
       R13: 0000000000000000 R14: ffffffff8139dd00 R15: ffff880036f9e6c0
       FS:  00007f2e2e3a0740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000018 CR3: 0000000071bd1000 CR4: 00000000000006f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Stack:
        ffffffff8139ca59 ffff88007b55fe20 ffff880036f9e6c0 ffffffff8139dd00
        ffff8800776266c0 ffff880077626708 ffff88007b55fde8 ffffffff811a6e8e
        ffff88007b55fde8 ffffffff8139ca59 ffff880036f9e6c0 ffff88007b55fe20
       Call Trace:
        [<ffffffff8139ca59>] ? alloc_buf.isra.13+0x39/0xb0
        [<ffffffff8139dd00>] ? virtcons_restore+0x100/0x100
        [<ffffffff811a6e8e>] __splice_from_pipe+0x7e/0x90
        [<ffffffff8139ca59>] ? alloc_buf.isra.13+0x39/0xb0
        [<ffffffff8139d739>] port_fops_splice_write+0xe9/0x140
        [<ffffffff8127a3f4>] ? selinux_file_permission+0xc4/0x120
        [<ffffffff8139d650>] ? wait_port_writable+0x1b0/0x1b0
        [<ffffffff811a6fe0>] do_splice_from+0xa0/0x110
        [<ffffffff811a951f>] SyS_splice+0x5ff/0x6b0
        [<ffffffff8161facf>] tracesys+0xdd/0xe2
       Code: 49 8b 87 80 00 00 00 4c 8d 24 d0 8b 53 04 41 8b 44 24 0c 4d 8b 6c 24 10 39 d0 89 03 76 02 89 13 49 8b 44 24 10 4c 89 e6 4c 89 ff <ff> 50 18 85 c0 0f 85 aa 00 00 00 48 89 da 4c 89 e6 4c 89 ff 41
       RIP  [<ffffffff811a6b5f>] splice_from_pipe_feed+0x6f/0x130
        RSP <ffff88007b55fd78>
       CR2: 0000000000000018
       ---[ end trace 24572beb7764de59 ]---
      
      V2: Fix a locking problem for error
      V3: Add Reviewed-by lines and stable@ line in sign-off area
      Signed-off-by: default avatarYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Reviewed-by: default avatarAmit Shah <amit.shah@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Amit Shah <amit.shah@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      2b4fbf02
    • Yoshihiro YUNOMAE's avatar
      virtio/console: Quit from splice_write if pipe->nrbufs is 0 · 68c034fe
      Yoshihiro YUNOMAE authored
      Quit from splice_write if pipe->nrbufs is 0 for avoiding oops in virtio-serial.
      
      When an application was doing splice from a kernel buffer to virtio-serial on
      a guest, the application received signal(SIGINT). This situation will normally
      happen, but the kernel executed a kernel panic by oops as follows:
      
       BUG: unable to handle kernel paging request at ffff882071c8ef28
       IP: [<ffffffff812de48f>] sg_init_table+0x2f/0x50
       PGD 1fac067 PUD 0
       Oops: 0000 [#1] SMP
       Modules linked in: lockd sunrpc bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd microcode virtio_balloon virtio_net pcspkr soundcore i2c_piix4 i2c_core uinput floppy
       CPU: 1 PID: 908 Comm: trace-cmd Not tainted 3.10.0+ #49
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
       task: ffff880071c64650 ti: ffff88007bf24000 task.ti: ffff88007bf24000
       RIP: 0010:[<ffffffff812de48f>]  [<ffffffff812de48f>] sg_init_table+0x2f/0x50
       RSP: 0018:ffff88007bf25dd8  EFLAGS: 00010286
       RAX: 0000001fffffffe0 RBX: ffff882071c8ef28 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880071c8ef48
       RBP: ffff88007bf25de8 R08: ffff88007fd15d40 R09: ffff880071c8ef48
       R10: ffffea0001c71040 R11: ffffffff8139c555 R12: 0000000000000000
       R13: ffff88007506a3c0 R14: ffff88007c862500 R15: ffff880071c8ef00
       FS:  00007f0a3646c740(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffff882071c8ef28 CR3: 000000007acbb000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Stack:
        ffff880071c8ef48 ffff88007bf25e20 ffff88007bf25e88 ffffffff8139d6fa
        ffff88007bf25e28 ffffffff8127a3f4 0000000000000000 0000000000000000
        ffff880071c8ef48 0000100000000000 0000000000000003 ffff88007bf25e08
       Call Trace:
        [<ffffffff8139d6fa>] port_fops_splice_write+0xaa/0x130
        [<ffffffff8127a3f4>] ? selinux_file_permission+0xc4/0x120
        [<ffffffff8139d650>] ? wait_port_writable+0x1b0/0x1b0
        [<ffffffff811a6fe0>] do_splice_from+0xa0/0x110
        [<ffffffff811a951f>] SyS_splice+0x5ff/0x6b0
        [<ffffffff8161f8c2>] system_call_fastpath+0x16/0x1b
       Code: c1 e2 05 48 89 e5 48 83 ec 10 4c 89 65 f8 41 89 f4 31 f6 48 89 5d f0 48 89 fb e8 8d ce ff ff 41 8d 44 24 ff 48 c1 e0 05 48 01 c3 <48> 8b 03 48 83 e0 fe 48 83 c8 02 48 89 03 48 8b 5d f0 4c 8b 65
       RIP  [<ffffffff812de48f>] sg_init_table+0x2f/0x50
        RSP <ffff88007bf25dd8>
       CR2: ffff882071c8ef28
       ---[ end trace 86323505eb42ea8f ]---
      
      It seems to induce pagefault in sg_init_tabel() when pipe->nrbufs is equal to
      zero. This may happen in a following situation:
      
      (1) The application normally does splice(read) from a kernel buffer, then does
          splice(write) to virtio-serial.
      (2) The application receives SIGINT when is doing splice(read), so splice(read)
          is failed by EINTR. However, the application does not finish the operation.
      (3) The application tries to do splice(write) without pipe->nrbufs.
      (4) The virtio-console driver tries to touch scatterlist structure sgl in
          sg_init_table(), but the region is out of bound.
      
      To avoid the case, a kernel should check whether pipe->nrbufs is empty or not
      when splice_write is executed in the virtio-console driver.
      
      V3: Add Reviewed-by lines and stable@ line in sign-off area.
      Signed-off-by: default avatarYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Reviewed-by: default avatarAmit Shah <amit.shah@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Amit Shah <amit.shah@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      68c034fe
  3. 21 Jul, 2013 5 commits
    • Linus Torvalds's avatar
      Linux 3.11-rc2 · 3b2f64d0
      Linus Torvalds authored
      3b2f64d0
    • Linus Torvalds's avatar
      Merge tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ea45ea70
      Linus Torvalds authored
      Pull ACPI video support fixes from Rafael Wysocki:
       "I'm sending a separate pull request for this as it may be somewhat
        controversial.  The breakage addressed here is not really new and the
        fixes may not satisfy all users of the affected systems, but we've had
        so much back and forth dance in this area over the last several weeks
        that I think it's time to actually make some progress.
      
        The source of the problem is that about a year ago we started to tell
        BIOSes that we're compatible with Windows 8, which we really need to
        do, because some systems shipping with Windows 8 are tested with it
        and nothing else, so if we tell their BIOSes that we aren't compatible
        with Windows 8, we expose our users to untested BIOS/AML code paths.
      
        However, as it turns out, some Windows 8-specific AML code paths are
        not tested either, because Windows 8 actually doesn't use the ACPI
        methods containing them, so if we declare Windows 8 compatibility and
        attempt to use those ACPI methods, things break.  That occurs mostly
        in the backlight support area where in particular the _BCM and _BQC
        methods are plain unusable on some systems if the OS declares Windows
        8 compatibility.
      
        [ The additional twist is that they actually become usable if the OS
          says it is not compatible with Windows 8, but that may cause
          problems to show up elsewhere ]
      
        Investigation carried out by Matthew Garrett indicates that what
        Windows 8 does about backlight is to leave backlight control up to
        individual graphics drivers.  At least there's evidence that it does
        that if the Intel graphics driver is used, so we've decided to follow
        Windows 8 in that respect and allow i915 to control backlight (Daniel
        likes that part).
      
        The first commit from Aaron Lu makes ACPICA export the variable from
        which we can infer whether or not the BIOS believes that we are
        compatible with Windows 8.
      
        The second commit from Matthew Garrett prepares the ACPI video driver
        by making it initialize the ACPI backlight even if it is not going to
        be used afterward (that is needed for backlight control to work on
        Thinkpads).
      
        The third commit implements the actual workaround making i915 take
        over backlight control if the firmware thinks it's dealing with
        Windows 8 and is based on the work of multiple developers, including
        Matthew Garrett, Chun-Yi Lee, Seth Forshee, and Aaron Lu.
      
        The final commit from Aaron Lu makes us follow Windows 8 by informing
        the firmware through the _DOS method that it should not carry out
        automatic brightness changes, so that brightness can be controlled by
        GUI.
      
        Hopefully, this approach will allow us to avoid using blacklists of
        systems that should not declare Windows 8 compatibility just to avoid
        backlight control problems in the future.
      
         - Change from Aaron Lu makes ACPICA export a variable which can be
           used by driver code to determine whether or not the BIOS believes
           that we are compatible with Windows 8.
      
         - Change from Matthew Garrett makes the ACPI video driver initialize
           the ACPI backlight even if it is not going to be used afterward
           (that is needed for backlight control to work on Thinkpads).
      
         - Fix from Rafael J Wysocki implements Windows 8 backlight support
           workaround making i915 take over bakclight control if the firmware
           thinks it's dealing with Windows 8.  Based on the work of multiple
           developers including Matthew Garrett, Chun-Yi Lee, Seth Forshee,
           and Aaron Lu.
      
         - Fix from Aaron Lu makes the kernel follow Windows 8 by informing
           the firmware through the _DOS method that it should not carry out
           automatic brightness changes, so that brightness can be controlled
           by GUI"
      
      * tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / video: no automatic brightness changes by win8-compatible firmware
        ACPI / video / i915: No ACPI backlight if firmware expects Windows 8
        ACPI / video: Always call acpi_video_init_brightness() on init
        ACPICA: expose OSI version
      ea45ea70
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 90db76e8
      Linus Torvalds authored
      Pull ext[34] tmpfile bugfix from Ted Ts'o:
       "Fix regression caused by commit af51a2ac which added ->tmpfile()
        support (along with a similar fix for ext3)"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext3: fix a BUG when opening a file with O_TMPFILE flag
        ext4: fix a BUG when opening a file with O_TMPFILE flag
      90db76e8
    • Zheng Liu's avatar
      ext3: fix a BUG when opening a file with O_TMPFILE flag · dda5690d
      Zheng Liu authored
      When we try to open a file with O_TMPFILE flag, we will trigger a bug.
      The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
      this check always fails because we set ->i_nlink = 1 in
      inode_init_always().  We can use the following program to trigger it:
      
      int main(int argc, char *argv[])
      {
      	int fd;
      
      	fd = open(argv[1], O_TMPFILE, 0666);
      	if (fd < 0) {
      		perror("open ");
      		return -1;
      	}
      	close(fd);
      	return 0;
      }
      
      The oops message looks like this:
      
      kernel: kernel BUG at fs/ext3/namei.c:1992!
      kernel: invalid opcode: 0000 [#1] SMP
      kernel: Modules linked in: ext4 jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport serio_raw sg dcdbas pcspkr i2c_i801 ehci_pci ehci_hcd button acpi_cpufreq mperf e1000e ptp pps_core ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd
      kernel: CPU: 0 PID: 2882 Comm: tst_tmpfile Not tainted 3.11.0-rc1+ #4
      kernel: Hardware name: Dell Inc. OptiPlex 780 /0V4W66, BIOS A05 08/11/2010
      kernel: task: ffff880112d30050 ti: ffff8801124d4000 task.ti: ffff8801124d4000
      kernel: RIP: 0010:[<ffffffffa00db5ae>] [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
      kernel: RSP: 0018:ffff8801124d5cc8  EFLAGS: 00010202
      kernel: RAX: 0000000000000000 RBX: ffff880111510128 RCX: ffff8801114683a0
      kernel: RDX: 0000000000000000 RSI: ffff880111510128 RDI: ffff88010fcf65a8
      kernel: RBP: ffff8801124d5d18 R08: 0080000000000000 R09: ffffffffa00d3b7f
      kernel: R10: ffff8801114683a0 R11: ffff8801032a2558 R12: 0000000000000000
      kernel: R13: ffff88010fcf6800 R14: ffff8801032a2558 R15: ffff8801115100d8
      kernel: FS:  00007f5d172b5700(0000) GS:ffff880117c00000(0000) knlGS:0000000000000000
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      kernel: CR2: 00007f5d16df15d0 CR3: 0000000110b1d000 CR4: 00000000000407f0
      kernel: Stack:
      kernel: 000000000000000c ffff8801048a7dc8 ffff8801114685a8 ffffffffa00b80d7
      kernel: ffff8801124d5e38 ffff8801032a2558 ffff88010ce24d68 0000000000000000
      kernel: ffff88011146b300 ffff8801124d5d44 ffff8801124d5d78 ffffffffa00db7e1
      kernel: Call Trace:
      kernel: [<ffffffffa00b80d7>] ? journal_start+0x8c/0xbd [jbd]
      kernel: [<ffffffffa00db7e1>] ext3_tmpfile+0xb2/0x13b [ext3]
      kernel: [<ffffffff821076f8>] path_openat+0x11f/0x5e7
      kernel: [<ffffffff821c86b4>] ? list_del+0x11/0x30
      kernel: [<ffffffff82065fa2>] ?  __dequeue_entity+0x33/0x38
      kernel: [<ffffffff82107cd5>] do_filp_open+0x3f/0x8d
      kernel: [<ffffffff82112532>] ? __alloc_fd+0x50/0x102
      kernel: [<ffffffff820f9296>] do_sys_open+0x13b/0x1cd
      kernel: [<ffffffff820f935c>] SyS_open+0x1e/0x20
      kernel: [<ffffffff82398c02>] system_call_fastpath+0x16/0x1b
      kernel: Code: 39 c7 0f 85 67 01 00 00 0f b7 03 25 00 f0 00 00 3d 00 40 00 00 74 18 3d 00 80 00 00 74 11 3d 00 a0 00 00 74 0a 83 7b 48 00 74 04 <0f> 0b eb fe 49 8b 85 50 03 00 00 4c 89 f6 48 c7 c7 c0 99 0e a0
      kernel: RIP  [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
      kernel: RSP <ffff8801124d5cc8>
      
      Here we couldn't call clear_nlink() directly because in d_tmpfile() we
      will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
      tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      dda5690d
    • Zheng Liu's avatar
      ext4: fix a BUG when opening a file with O_TMPFILE flag · e94bd349
      Zheng Liu authored
      When we try to open a file with O_TMPFILE flag, we will trigger a bug.
      The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
      this check always fails because we set ->i_nlink = 1 in
      inode_init_always().  We can use the following program to trigger it:
      
      int main(int argc, char *argv[])
      {
      	int fd;
      
      	fd = open(argv[1], O_TMPFILE, 0666);
      	if (fd < 0) {
      		perror("open ");
      		return -1;
      	}
      	close(fd);
      	return 0;
      }
      
      The oops message looks like this:
      
      kernel BUG at fs/ext4/namei.c:2572!
      invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in: dlci bridge stp hidp cmtp kernelcapi l2tp_ppp l2tp_netlink l2tp_core sctp libcrc32c rfcomm tun fuse nfnetli
      nk can_raw ipt_ULOG can_bcm x25 scsi_transport_iscsi ipx p8023 p8022 appletalk phonet psnap vmw_vsock_vmci_transport af_key vmw_vmci rose vsock atm can netrom ax25 af_rxrpc ir
      da pppoe pppox ppp_generic slhc bluetooth nfc rfkill rds caif_socket caif crc_ccitt af_802154 llc2 llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec serio_raw snd_pcm pcsp
      kr edac_core snd_page_alloc snd_timer snd soundcore r8169 mii sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm
      CPU: 1 PID: 1812571 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #12
      Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
      task: ffff88007dfe69a0 ti: ffff88010f7b6000 task.ti: ffff88010f7b6000
      RIP: 0010:[<ffffffff8125ce69>]  [<ffffffff8125ce69>] ext4_orphan_add+0x299/0x2b0
      RSP: 0018:ffff88010f7b7cf8  EFLAGS: 00010202
      RAX: 0000000000000000 RBX: ffff8800966d3020 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff88007dfe70b8 RDI: 0000000000000001
      RBP: ffff88010f7b7d40 R08: ffff880126a3c4e0 R09: ffff88010f7b7ca0
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801271fd668
      R13: ffff8800966d2f78 R14: ffff88011d7089f0 R15: ffff88007dfe69a0
      FS:  00007f70441a3740(0000) GS:ffff88012a800000(0000) knlGS:00000000f77c96c0
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000002834000 CR3: 0000000107964000 CR4: 00000000000007e0
      DR0: 0000000000780000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Stack:
       0000000000002000 00000020810b6dde 0000000000000000 ffff88011d46db00
       ffff8800966d3020 ffff88011d7089f0 ffff88009c7f4c10 ffff88010f7b7f2c
       ffff88007dfe69a0 ffff88010f7b7da8 ffffffff8125cfac ffff880100000004
      Call Trace:
       [<ffffffff8125cfac>] ext4_tmpfile+0x12c/0x180
       [<ffffffff811cba78>] path_openat+0x238/0x700
       [<ffffffff8100afc4>] ? native_sched_clock+0x24/0x80
       [<ffffffff811cc647>] do_filp_open+0x47/0xa0
       [<ffffffff811db73f>] ? __alloc_fd+0xaf/0x200
       [<ffffffff811ba2e4>] do_sys_open+0x124/0x210
       [<ffffffff81010725>] ? syscall_trace_enter+0x25/0x290
       [<ffffffff811ba3ee>] SyS_open+0x1e/0x20
       [<ffffffff816ca8d4>] tracesys+0xdd/0xe2
       [<ffffffff81001001>] ? start_thread_common.constprop.6+0x1/0xa0
      Code: 04 00 00 00 89 04 24 31 c0 e8 c4 77 04 00 e9 43 fe ff ff 66 25 00 d0 66 3d 00 80 0f 84 0e fe ff ff 83 7b 48 00 0f 84 04 fe ff ff <0f> 0b 49 8b 8c 24 50 07 00 00 e9 88 fe ff ff 0f 1f 84 00 00 00
      
      Here we couldn't call clear_nlink() directly because in d_tmpfile() we
      will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
      tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Tested-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e94bd349
  4. 20 Jul, 2013 7 commits
    • Linus Torvalds's avatar
      Merge tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · f6a0d9d5
      Linus Torvalds authored
      Pull staging tree fixes from Greg KH:
       "Here are a few iio driver fixes for 3.11-rc2.  They are still spread
        across drivers/iio and drivers/staging/iio so they are coming in
        through this tree.
      
        I've also removed the drivers/staging/csr/ driver as the developers
        who originally sent it to me have moved on to other companies, and CSR
        still will not send us the specs for the device, making the driver
        pretty much obsolete and impossible to fix up.  Deleting it now
        prevents people from sending in lots of tiny codingsyle fixes that
        will never go anywhere.
      
        It also helps to offset the large lustre filesystem merge that
        happened in 3.11-rc1 in the overall 3.11.0 diffstat.  :)"
      
      * tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: csr: remove driver
        iio: lps331ap: Fix wrong in_pressure_scale output value
        iio staging: fix lis3l02dq, read error handling
        staging:iio:ad7291: add missing .driver_module to struct iio_info
        iio: ti_am335x_adc: add missing .driver_module to struct iio_info
        iio: mxs-lradc: Remove useless check in read_raw
        iio: mxs-lradc: Fix misuse of iio->trig
        iio: inkern: fix iio_convert_raw_to_processed_unlocked
        iio: Fix iio_channel_has_info
        iio:trigger: device_unregister->device_del to avoid double free
        iio: dac: ad7303: fix error return code in ad7303_probe()
      f6a0d9d5
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 36231d25
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "The sget() one is a long-standing bug and will need to go into -stable
        (in fact, it had been originally caught in RHEL6), the other two are
        3.11-only"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        vfs: constify dentry parameter in d_count()
        livelock avoidance in sget()
        allow O_TMPFILE to work with O_WRONLY
      36231d25
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 19bf1c2c
      Linus Torvalds authored
      Pull ext4 bugfixes from Ted Ts'o:
       "Fixes for 3.11-rc2, sent at 5pm, in the professoinal style.  :-)"
      
      I'm not sure I like this new level of "professionalism".
      9-5, people, 9-5.
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: call ext4_es_lru_add() after handling cache miss
        ext4: yield during large unlinks
        ext4: make the extent_status code more robust against ENOMEM failures
        ext4: simplify calculation of blocks to free on error
        ext4: fix error handling in ext4_ext_truncate()
      19bf1c2c
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 3be542d4
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       - Fix a regression against NFSv4 FreeBSD servers when creating a new
         file
       - Fix another regression in rpc_client_register()
      
      * tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFSv4: Fix a regression against the FreeBSD server
        SUNRPC: Fix another issue with rpc_client_register()
      3be542d4
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next · 90290c4e
      Linus Torvalds authored
      Pull btrfs fixes from Josef Bacik:
       "I'm playing the role of Chris Mason this week while he's on vacation.
        There are a few critical fixes for btrfs here, all regressions and
        have been tested well"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next:
        Btrfs: fix wrong write offset when replacing a device
        Btrfs: re-add root to dead root list if we stop dropping it
        Btrfs: fix lock leak when resuming snapshot deletion
        Btrfs: update drop progress before stopping snapshot dropping
      90290c4e
    • Peng Tao's avatar
      vfs: constify dentry parameter in d_count() · 24924a20
      Peng Tao authored
      so that it can be used in places like d_compare/d_hash
      without causing a compiler warning.
      Signed-off-by: default avatarPeng Tao <tao.peng@emc.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      24924a20
    • Al Viro's avatar
      livelock avoidance in sget() · acfec9a5
      Al Viro authored
      Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about
      to fail.  The superblock is on ->fs_supers, ->s_umount is held exclusive,
      ->s_active is 1.  Along comes two more processes, trying to mount the same
      thing; sget() in each is picking that superblock, bumping ->s_count and
      trying to grab ->s_umount.  ->s_active is 3 now.  Original mount(2)
      finally gets to deactivate_locked_super() on failure; ->s_active is 2,
      superblock is still ->fs_supers because shutdown will *not* happen until
      ->s_active hits 0.  ->s_umount is dropped and now we have two processes
      chasing each other:
      s_active = 2, A acquired ->s_umount, B blocked
      A sees that the damn thing is stillborn, does deactivate_locked_super()
      s_active = 1, A drops ->s_umount, B gets it
      A restarts the search and finds the same superblock.  And bumps it ->s_active.
      s_active = 2, B holds ->s_umount, A blocked on trying to get it
      ... and we are in the earlier situation with A and B switched places.
      
      The root cause, of course, is that ->s_active should not grow until we'd
      got MS_BORN.  Then failing ->mount() will have deactivate_locked_super()
      shut the damn thing down.  Fortunately, it's easy to do - the key point
      is that grab_super() is called only for superblocks currently on ->fs_supers,
      so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and
      bump ->s_active; we must never increment ->s_count for superblocks past
      ->kill_sb(), but grab_super() is never called for those.
      
      The bug is pretty old; we would've caught it by now, if not for accidental
      exclusion between sget() for block filesystems; the things like cgroup or
      e.g. mtd-based filesystems don't have anything of that sort, so they get
      bitten.  The right way to deal with that is obviously to fix sget()...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      acfec9a5
  5. 19 Jul, 2013 21 commits