1. 14 Sep, 2012 6 commits
    • Daniel Wagner's avatar
      cgroup: Do not depend on a given order when populating the subsys array · 80f4c877
      Daniel Wagner authored
      The *_subsys_id will be used as index to access the subsys. Therefore
      we need to care we populate the subsystem at the correct position by
      using designated initialization.
      
      With this change we are able to interleave builtin and modules in the subsys
      array.
      Signed-off-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: netdev@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      80f4c877
    • Daniel Wagner's avatar
      cgroup: Wrap subsystem selection macro · 5fc0b025
      Daniel Wagner authored
      Before we are able to define all subsystem ids at compile time we need
      a more fine grained control what gets defined when we include
      cgroup_subsys.h. For example we define the enums for the subsystems or
      to declare for struct cgroup_subsys (builtin subsystem) by including
      cgroup_subsys.h and defining SUBSYS accordingly.
      
      Currently, the decision if a subsys is used is defined inside the
      header by testing if CONFIG_*=y is true. By moving this test outside
      of cgroup_subsys.h we are able to control it on the include level.
      
      This is done by introducing IS_SUBSYS_ENABLED which then is defined
      according the task, e.g. is CONFIG_*=y or CONFIG_*=m.
      Signed-off-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: netdev@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      5fc0b025
    • Daniel Wagner's avatar
      cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT · be45c900
      Daniel Wagner authored
      CGROUP_BUILTIN_SUBSYS_COUNT is used as start index or stop index when
      looping over the subsys array looking either at the builtin or the
      module subsystems. Since all the builtin subsystems have an id which
      is lower then CGROUP_BUILTIN_SUBSYS_COUNT we know that any module will
      have an id larger than CGROUP_BUILTIN_SUBSYS_COUNT. In short the ids
      are sorted.
      
      We are about to change id assignment to happen only at compile time
      later in this series. That means we can't rely on the above trick
      since all ids will always be defined at compile time. Furthermore,
      ordering the builtin subsystems and the module subsystems is not
      really necessary.
      
      So we need a different way to know which subsystem is a builtin or a
      module one. We can use the subsys[]->module pointer for this. Any
      place where we need to know if a subsys is module we just check for
      the pointer. If it is NULL then the subsystem is a builtin one.
      
      With this we are able to drop the CGROUP_BUILTIN_SUBSYS_COUNT
      enum. Though we need to introduce a temporary placeholder so that we
      don't get a compilation error when only CONFIG_CGROUP is selected and
      no single controller. An empty enum definition is not valid. Later in
      this series we are able to remove the placeholder again.
      
      And with this change we get a fix for this:
      
      kernel/cgroup.c: In function ‘cgroup_load_subsys’:
      kernel/cgroup.c:4326:38: warning: array subscript is below array bounds [-Warray-bounds]
      
      when CONFIG_CGROUP=y and no built in controller was enabled.
      Signed-off-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: netdev@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      be45c900
    • Daniel Wagner's avatar
      cgroup: net_prio: Do not define task_netpioidx() when not selected · 51e4e7fa
      Daniel Wagner authored
      task_netprioidx() should not be defined in case the configuration is
      CONFIG_NETPRIO_CGROUP=n. The reason is that in a following patch the
      net_prio_subsys_id will only be defined if CONFIG_NETPRIO_CGROUP!=n.
      When net_prio is not built at all any callee should only get an empty
      task_netprioidx() without any references to net_prio_subsys_id.
      Signed-off-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: netdev@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      51e4e7fa
    • Daniel Wagner's avatar
      cgroup: net_cls: Do not define task_cls_classid() when not selected · 8fb974c9
      Daniel Wagner authored
      task_cls_classid() should not be defined in case the configuration is
      CONFIG_NET_CLS_CGROUP=n. The reason is that in a following patch the
      net_cls_subsys_id will only be defined if CONFIG_NET_CLS_CGROUP!=n.
      When net_cls is not built at all a callee should only get an empty
      task_cls_classid() without any references to net_cls_subsys_id.
      Signed-off-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: netdev@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      8fb974c9
    • Daniel Wagner's avatar
      cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h · f3419807
      Daniel Wagner authored
      The only user of sock_update_classid() is net/socket.c which happens
      to include cls_cgroup.h directly.
      
      tj: Fix build breakage due to missing cls_cgroup.h inclusion in
          drivers/net/tun.c reported in linux-next by Stephen.
      Signed-off-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: netdev@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      f3419807
  2. 13 Sep, 2012 3 commits
  3. 12 Sep, 2012 1 commit
  4. 24 Aug, 2012 4 commits
    • Aristeu Rozanski's avatar
      cgroup: rename subsys_bits to subsys_mask · a1a71b45
      Aristeu Rozanski authored
      In a previous discussion, Tejun Heo suggested to rename references to
      subsys_bits (added_bits, removed_bits, etc) by something more meaningful.
      
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Signed-off-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      a1a71b45
    • Aristeu Rozanski's avatar
      cgroup: add xattr support · 03b1cde6
      Aristeu Rozanski authored
      This is one of the items in the plumber's wish list.
      
      For use cases:
      
      >> What would the use case be for this?
      >
      > Attaching meta information to services, in an easily discoverable
      > way. For example, in systemd we create one cgroup for each service, and
      > could then store data like the main pid of the specific service as an
      > xattr on the cgroup itself. That way we'd have almost all service state
      > in the cgroupfs, which would make it possible to terminate systemd and
      > later restart it without losing any state information. But there's more:
      > for example, some very peculiar services cannot be terminated on
      > shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
      > services in question could just mark that on their cgroup, by setting an
      > xattr. On the more desktopy side of things there are other
      > possibilities: for example there are plans defining what an application
      > is along the lines of a cgroup (i.e. an app being a collection of
      > processes). With xattrs one could then attach an icon or human readable
      > program name on the cgroup.
      >
      > The key idea is that this would allow attaching runtime meta information
      > to cgroups and everything they model (services, apps, vms), that doesn't
      > need any complex userspace infrastructure, has good access control
      > (i.e. because the file system enforces that anyway, and there's the
      > "trusted." xattr namespace), notifications (inotify), and can easily be
      > shared among applications.
      >
      > Lennart
      
      v7:
      - no changes
      v6:
      - remove user xattr namespace, only allow trusted and security
      v5:
      - check for capabilities before setting/removing xattrs
      v4:
      - no changes
      v3:
      - instead of config option, use mount option to enable xattr support
      Original-patch-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      03b1cde6
    • Aristeu Rozanski's avatar
      cgroup: revise how we re-populate root directory · 13af07df
      Aristeu Rozanski authored
      When remounting cgroupfs with some subsystems added to it and some
      removed, cgroup will remove all the files in root directory and then
      re-popluate it.
      
      What I'm doing here is, only remove files which belong to subsystems that
      are to be unbinded, and only create files for newly-added subsystems.
      The purpose is to have all other files untouched.
      
      This is a preparation for cgroup xattr support.
      
      v7:
      - checkpatch warnings fixed
      v6:
      - no changes
      v5:
      - no changes
      v4:
      - refactored cgroup_clear_directory() to not use cgroup_rm_file()
      - instead of going thru the list of files, get the file list using the
        subsystems
      - use 'subsys_mask' instead of {added,removed}_bits and made
        cgroup_populate_dir() to match the parameters with cgroup_clear_directory()
      v3:
      - refresh patches after recent refactoring
      Original-patch-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      13af07df
    • Aristeu Rozanski's avatar
      xattr: extract simple_xattr code from tmpfs · 38f38657
      Aristeu Rozanski authored
      Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.
      
      $ size vmlinux.o
         text    data     bss     dec     hex filename
      4658782  880729 5195032 10734543         a3cbcf vmlinux.o
      $ size vmlinux.o
         text    data     bss     dec     hex filename
      4658957  880729 5195032 10734718         a3cc7e vmlinux.o
      
      v7:
      - checkpatch warnings fixed
      - Implement the changes requested by Hugh Dickins:
      	- make simple_xattrs_init and simple_xattrs_free inline
      	- get rid of locking and list reinitialization in simple_xattrs_free,
      	  they're not needed
      v6:
      - no changes
      v5:
      - no changes
      v4:
      - move simple_xattrs_free() to fs/xattr.c
      v3:
      - in kmem_xattrs_free(), reinitialize the list
      - use simple_xattr_* prefix
      - introduce simple_xattr_add() to prevent direct list usage
      Original-patch-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      38f38657
  5. 22 Aug, 2012 18 commits
    • Linus Torvalds's avatar
      Linux 3.6-rc3 · fea7a08a
      Linus Torvalds authored
      fea7a08a
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 4ff63e47
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Intel: edid fixes, power consumption fix, s/r fix, haswell fix
      
        Radeon: BIOS loading fixes for UEFI and Thunderbolt machines, better
        MSAA validation, lockup timeout fixes, modesetting fixes
      
        One udl dpms fix, one vmwgfx fix, a couple of trivial core changes.
      
        There is an export added to ACPI as part of the radeon bios fixes.
      
        I've also included the fbcon flashing cursor vs deinit race fix, that
        seems the simplest place to start"
      
      Trivial conflict in drivers/video/console/fbcon.c due to me having
      already applied the fbcon flashing cursor vs deinit race fix, and Dave
      had added a comment in there too.
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits)
        fbcon: fix race condition between console lock and cursor timer (v1.1)
        drm: Add missing static storage class specifiers in drm_proc.c file
        drm/udl: dpms off the crtc when disabled.
        drm: Remove two unused fields from struct drm_display_mode
        drm: stop vmgfx driver explosion
        drm/radeon/ss: use num_crtc rather than hardcoded 6
        Revert "drm/radeon: fix bo creation retry path"
        drm/i915: use hsw rps tuning values everywhere on gen6+
        drm/radeon: split ATRM support out from the ATPX handler (v3)
        drm/radeon: convert radeon vfct code to use acpi_get_table_with_size
        ACPI: export symbol acpi_get_table_with_size
        drm/radeon: implement ACPI VFCT vbios fetch (v3)
        drm/radeon/kms: extend the Fujitsu D3003-S2 board connector quirk to cover later silicon stepping
        drm/radeon: fix checking of MSAA renderbuffers on r600-r700
        drm/radeon: allow CMASK and FMASK in the CS checker on r600-r700
        drm/radeon: init lockup timeout on ring init
        drm/radeon: avoid turning off spread spectrum for used pll
        drm/i915: fall back to bit-banging if GMBUS fails in CRT EDID reads
        drm/i915: extract connector update from intel_ddc_get_modes() for reuse
        drm/i915: fix hsw uncached pte
        ...
      4ff63e47
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 09236994
      Linus Torvalds authored
      Pull SCSI target fixes from Nicholas Bellinger:
       "The executive summary includes:
      
         - Post-merge review comments for tcm_vhost (MST + nab)
         - Avoid debugging overhead when not debugging for tcm-fc(FCoE) (MDR)
         - Fix NULL pointer dereference bug on alloc_page failulre (Yi Zou)
         - Fix REPORT_LUNs regression bug with pSCSI export (AlexE + nab)
         - Fix regression bug with handling of zero-length data CDBs (nab)
         - Fix vhost_scsi_target structure alignment (MST)
      
        Thanks again to everyone who contributed a bugfix patch, gave review
        feedback on tcm_vhost code, and/or reported a bug during their own
        testing over the last weeks.
      
        There is one other outstanding bug reported by Roland recently related
        to SCSI transfer length overflow handling, for which the current
        proposed bugfix has been left in queue pending further testing with
        other non iscsi-target based fabric drivers.
      
        As the patch is verified with loopback (local SGL memory from SCSI
        LLD) + tcm_qla2xxx (TCM allocated SGL memory mapped to PCI HW) fabric
        ports, it will be included into the next 3.6-rc-fixes PULL request."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        target: Remove unused se_cmd.cmd_spdtl
        tcm_fc: rcu_deref outside rcu lock/unlock section
        tcm_vhost: Fix vhost_scsi_target structure alignment
        target: Fix regression bug with handling of zero-length data CDBs
        target/pscsi: Fix bug with REPORT_LUNs handling for SCSI passthrough
        tcm_vhost: Change vhost_scsi_target->vhost_wwpn to char *
        target: fix NULL pointer dereference bug alloc_page() fails to get memory
        tcm_fc: Avoid debug overhead when not debugging
        tcm_vhost: Post-merge review changes requested by MST
        tcm_vhost: Fix incorrect IS_ERR() usage in vhost_scsi_map_iov_to_sgl
      09236994
    • Linus Torvalds's avatar
      Merge branch 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux · 2e2d8c93
      Linus Torvalds authored
      Pull i2c-embedded fixes from Wolfram Sang:
       "Some bugfixes for the "embedded" part of the I2C subsystem.  The fixes
        affect mostly drivers which have been largely reworked lately and
        where regressions appeared."
      
      * 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux:
        i2c: tegra: protect suspend/resume callbacks with CONFIG_PM_SLEEP
        i2c: diolan-u2c: Fix master_xfer return code
        I2C: OMAP: xfer: fix runtime PM get/put balance on error
        i2c: nomadik: Add default configuration into the Nomadik I2C driver
      2e2d8c93
    • Linus Torvalds's avatar
      Merge tag 'for-3.6-rc3' of git://gitorious.org/linux-pwm/linux-pwm · fec3c03f
      Linus Torvalds authored
      Pull pwm fixes from Thierry Reding:
       "These patches fix the Samsung PWM driver and perform some minor
        cleanups like fixing checkpatch and sparse warnings.
      
        Two redundant error messages are removed and the Kconfig help text for
        the PWM subsystem is made more descriptive."
      
      * tag 'for-3.6-rc3' of git://gitorious.org/linux-pwm/linux-pwm:
        pwm: Improve Kconfig help text
        pwm: core: Fix coding style issues
        pwm: vt8500: Fix coding style issue
        pwm: Remove a redundant error message when devm_request_and_ioremap fails
        pwm: samsung: add missing device pointer to struct pwm_chip
        pwm: Add missing static storage class specifiers in core.c file
      fec3c03f
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · f753c4ec
      Linus Torvalds authored
      Pull ceph fixes from Sage Weil:
       "Jim's fix closes a narrow race introduced with the msgr changes.  One
        fix resolves problems with debugfs initialization that Yan found when
        multiple client instances are created (e.g., two clusters mounted, or
        rbd + cephfs), another one fixes problems with mounting a nonexistent
        server subdirectory, and the last one fixes a divide by zero error
        from unsanitized ioctl input that Dan Carpenter found."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        ceph: avoid divide by zero in __validate_layout()
        libceph: avoid truncation due to racing banners
        ceph: tolerate (and warn on) extraneous dentry from mds
        libceph: delay debugfs initialization until we learn global_id
      f753c4ec
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · ad746be9
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       - NFSv3 mounts need to fail if the FSINFO rpc call fails
       - Ensure that the NFS commit cache gets torn down when we unload the
         NFS module.
       - Fix memory scribble issues when interrupting a LAYOUTGET rpc call
       - Fix NFSv4 legacy idmapper regressions
       - Fix issues with the NFSv4 getacl command
       - Fix a regression when using the legacy "mount -t nfs4"
      
      * tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFSv3: Ensure that do_proc_get_root() reports errors correctly
        NFSv4: Ensure that nfs4_alloc_client cleans up on error.
        NFS: return -ENOKEY when the upcall fails to map the name
        NFS: Clear key construction data if the idmap upcall fails
        NFSv4: Don't use private xdr_stream fields in decode_getacl
        NFSv4: Fix the acl cache size calculation
        NFSv4: Fix pointer arithmetic in decode_getacl
        NFS: Alias the nfs module to nfs4
        NFS: Fix a regression when loading the NFS v4 module
        NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done
        pnfs-obj: Better IO pattern in case of unaligned offset
        NFS41: add pg_layout_private to nfs_pageio_descriptor
        pnfs: nfs4_proc_layoutget returns void
        pnfs: defer release of pages in layoutget
        nfs: tear down caches in nfs_init_writepagecache when allocation fails
      ad746be9
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 467e9e51
      Linus Torvalds authored
      Pull assorted fixes - mostly vfs - from Al Viro:
       "Assorted fixes, with an unexpected detour into vfio refcounting logics
        (fell out when digging in an analog of eventpoll race in there)."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        task_work: add a scheduling point in task_work_run()
        fs: fix fs/namei.c kernel-doc warnings
        eventpoll: use-after-possible-free in epoll_create1()
        vfio: grab vfio_device reference *before* exposing the sucker via fd_install()
        vfio: get rid of vfio_device_put()/vfio_group_get_device* races
        vfio: get rid of open-coding kref_put_mutex
        introduce kref_put_mutex()
        vfio: don't dereference after kfree...
        mqueue: lift mnt_want_write() outside ->i_mutex, clean up a bit
      467e9e51
    • Eric Dumazet's avatar
      task_work: add a scheduling point in task_work_run() · 88ec2789
      Eric Dumazet authored
      It seems commit 4a9d4b02 (switch fput to task_work_add) reintroduced
      the problem addressed in commit 944be0b2 (close_files(): add scheduling
      point)
      
      If a server process with a lot of files (say 2 million tcp sockets)
      is killed, we can spend a lot of time in task_work_run() and trigger
      a soft lockup.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      88ec2789
    • Randy Dunlap's avatar
      fs: fix fs/namei.c kernel-doc warnings · 55852635
      Randy Dunlap authored
      Fix kernel-doc warnings in fs/namei.c:
      
      Warning(fs/namei.c:360): No description found for parameter 'inode'
      Warning(fs/namei.c:672): No description found for parameter 'nd'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Cc:	Alexander Viro <viro@zeniv.linux.org.uk>
      Cc:	linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      55852635
    • Al Viro's avatar
      eventpoll: use-after-possible-free in epoll_create1() · 98022748
      Al Viro authored
      As soon as we'd installed the file into descriptor table, it can
      get closed by another thread.  Freeing ep in process...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      98022748
    • Al Viro's avatar
      vfio: grab vfio_device reference *before* exposing the sucker via fd_install() · 31605deb
      Al Viro authored
      It's not critical (anymore) since another thread closing the file will block
      on ->device_lock before it gets to dropping the final reference, but it's
      definitely cleaner that way...
      Acked-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      31605deb
    • Al Viro's avatar
      vfio: get rid of vfio_device_put()/vfio_group_get_device* races · 90b1253e
      Al Viro authored
      we really need to make sure that dropping the last reference happens
      under the group->device_lock; otherwise a loop (under device_lock)
      might find vfio_device instance that is being freed right now, has
      already dropped the last reference and waits on device_lock to exclude
      the sucker from the list.
      Acked-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      90b1253e
    • Al Viro's avatar
    • Al Viro's avatar
      introduce kref_put_mutex() · 8ad5db8a
      Al Viro authored
      equivalent of
      	mutex_lock(mutex);
      	if (!kref_put(kref, release))
      		mutex_unlock(mutex);
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8ad5db8a
    • Al Viro's avatar
      934ad4c2
    • Dave Airlie's avatar
      fbcon: fix race condition between console lock and cursor timer (v1.1) · d8636a27
      Dave Airlie authored
      So we've had a fair few reports of fbcon handover breakage between
      efi/vesafb and i915 surface recently, so I dedicated a couple of
      days to finding the problem.
      
      Essentially the last thing we saw was the conflicting framebuffer
      message and that was all.
      
      So after much tracing with direct netconsole writes (printks
      under console_lock not so useful), I think I found the race.
      
      Thread A (driver load)    Thread B (timer thread)
        unbind_con_driver ->              |
        bind_con_driver ->                |
        vc->vc_sw->con_deinit ->          |
        fbcon_deinit ->                   |
        console_lock()                    |
            |                             |
            |                       fbcon_flashcursor timer fires
            |                       console_lock() <- blocked for A
            |
            |
      fbcon_del_cursor_timer ->
        del_timer_sync
        (BOOM)
      
      Of course because all of this is under the console lock,
      we never see anything, also since we also just unbound the active
      console guess what we never see anything.
      
      Hopefully this fixes the problem for anyone seeing vesafb->kms
      driver handoff.
      
      v1.1: add comment suggestion from Alan.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      d8636a27
    • Linus Torvalds's avatar
      Merge branch 'akpm' (Andrew's patch-bomb) · 23dcfa61
      Linus Torvalds authored
      Merge fixes from Andrew Morton.
      
      Random drivers and some VM fixes.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (17 commits)
        mm: compaction: Abort async compaction if locks are contended or taking too long
        mm: have order > 0 compaction start near a pageblock with free pages
        rapidio/tsi721: fix unused variable compiler warning
        rapidio/tsi721: fix inbound doorbell interrupt handling
        drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode
        mm: correct page->pfmemalloc to fix deactivate_slab regression
        drivers/rtc/rtc-pcf2123.c: initialize dynamic sysfs attributes
        mm/compaction.c: fix deferring compaction mistake
        drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources
        string: do not export memweight() to userspace
        hugetlb: update hugetlbpage.txt
        checkpatch: add control statement test to SINGLE_STATEMENT_DO_WHILE_MACRO
        mm: hugetlbfs: correctly populate shared pmd
        cciss: fix incorrect scsi status reporting
        Documentation: update mount option in filesystem/vfat.txt
        mm: change nr_ptes BUG_ON to WARN_ON
        cs5535-clockevt: typo, it's MFGPT, not MFPGT
      23dcfa61
  6. 21 Aug, 2012 8 commits
    • Linus Torvalds's avatar
      Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · a484147a
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "For bug fixes, at soc_camera, si470x, uvcvideo, iguanaworks IR driver,
        radio_shark Kbuild fixes, and at the V4L2 core (radio fixes)."
      
      * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        [media] media: soc_camera: don't clear pix->sizeimage in JPEG mode
        [media] media: mx2_camera: Fix clock handling for i.MX27
        [media] video: mx2_camera: Use clk_prepare_enable/clk_disable_unprepare
        [media] video: mx1_camera: Use clk_prepare_enable/clk_disable_unprepare
        [media] media: mx3_camera: buf_init() add buffer state check
        [media] radio-shark2: Only compile led support when CONFIG_LED_CLASS is set
        [media] radio-shark: Only compile led support when CONFIG_LED_CLASS is set
        [media] radio-shark*: Call cancel_work_sync from disconnect rather then release
        [media] radio-shark*: Remove work-around for dangling pointer in usb intfdata
        [media] Add USB dependency for IguanaWorks USB IR Transceiver
        [media] Add missing logging for rangelow/high of hwseek
        [media] VIDIOC_ENUM_FREQ_BANDS fix
        [media] mem2mem_testdev: fix querycap regression
        [media] si470x: v4l2-compliance fixes
        [media] DocBook: Remove a spurious character
        [media] uvcvideo: Reset the bytesused field when recycling an erroneous buffer
      a484147a
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 8f8ba75e
      Linus Torvalds authored
      Pull networking update from David Miller:
       "A couple weeks of bug fixing in there.  The largest chunk is all the
        broken crap Amerigo Wang found in the netpoll layer."
      
       1) netpoll and it's users has several serious bugs:
          a) uses GFP_KERNEL with locks held
          b) interfaces requiring interrupts disabled are called with them
             enabled
          c) and vice versa
          d) VLAN tag demuxing, as per all other RX packet input paths, is not
             applied
      
          All from Amerigo Wang.
      
       2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for
          good, from Neal Cardwell.
      
       3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials
          when the user doesn't specify one explicitly during sendmsg().
          Instead we attach an empty (zero) SCM credential block which is
          definitely not what we want.  Fix from Eric Dumazet.
      
       4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix
          from Ben Hutchings.
      
       5) inet_csk_route_child_sock() checks wrong inet options pointer, fix
          from Christoph Paasch.
      
       6) When AF_PACKET is used for transmit, packet loopback doesn't behave
          properly when a socket fanout is enabled, from Eric Leblond.
      
       7) On bluetooth l2cap channel create failure, we leak the socket, from
          Jaganath Kanakkassery.
      
       8) Fix all the netprio file handling bugs found by Al Viro, from John
          Fastabend.
      
       9) Several error return and NULL deref bug fixes in networking drivers
          from Julia Lawall.
      
      10) A large smattering of struct padding et al.  kernel memory leaks to
          userspace found of Mathias Krause.
      
      11) Conntrack expections in netfilter can access an uninitialized timer,
          fix from Pablo Neira Ayuso.
      
      12) Several netfilter SIP tracker bug fixes from Patrick McHardy.
      
      13) IPSEC ipv6 routes are not initialized correctly all the time,
          resulting in an OOPS in inet_putpeer().  Also from Patrick McHardy.
      
      14) Bridging does rcu_dereference() outside of RCU protected area, from
          Stephen Hemminger.
      
      15) Fix routing cache removal performance regression when looking up
          output routes that have a local destination.  From Zheng Yan.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
        af_netlink: force credentials passing [CVE-2012-3520]
        ipv4: fix ip header ident selection in __ip_make_skb()
        ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
        tcp: fix possible socket refcount problem
        net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
        net/core/dev.c: fix kernel-doc warning
        netconsole: remove a redundant netconsole_target_put()
        net: ipv6: fix oops in inet_putpeer()
        net/stmmac: fix issue of clk_get for Loongson1B.
        caif: Do not dereference NULL in chnl_recv_cb()
        af_packet: don't emit packet on orig fanout group
        drivers/net/irda: fix error return code
        drivers/net/wan/dscc4.c: fix error return code
        drivers/net/wimax/i2400m/fw.c: fix error return code
        smsc75xx: add missing entry to MAINTAINERS
        net: qmi_wwan: new devices: UML290 and K5006-Z
        net: sh_eth: Add eth support for R8A7779 device
        netdev/phy: skip disabled mdio-mux nodes
        dt: introduce for_each_available_child_of_node, of_get_next_available_child
        net: netprio: fix cgrp create and write priomap race
        ...
      8f8ba75e
    • Mel Gorman's avatar
      mm: compaction: Abort async compaction if locks are contended or taking too long · c67fe375
      Mel Gorman authored
      Jim Schutt reported a problem that pointed at compaction contending
      heavily on locks.  The workload is straight-forward and in his own words;
      
      	The systems in question have 24 SAS drives spread across 3 HBAs,
      	running 24 Ceph OSD instances, one per drive.  FWIW these servers
      	are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
      	Ceph Linux clients doing dd simultaneously to a Ceph file system
      	backed by 12 of these servers.
      
      Early in the test everything looks fine
      
        procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
         r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
        31 15          0     287216        576   38606628    0    0     2  1158    2   14   1  3  95  0  0
        27 15          0     225288        576   38583384    0    0    18 2222016 203357 134876  11 56  17 15  0
        28 17          0     219256        576   38544736    0    0    11 2305932 203141 146296  11 49  23 17  0
         6 18          0     215596        576   38552872    0    0     7 2363207 215264 166502  12 45  22 20  0
        22 18          0     226984        576   38596404    0    0     3 2445741 223114 179527  12 43  23 22  0
      
      and then it goes to pot
      
        procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
         r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
        163  8          0     464308        576   36791368    0    0    11 22210  866  536   3 13  79  4  0
        207 14          0     917752        576   36181928    0    0   712 1345376 134598 47367   7 90   1  2  0
        123 12          0     685516        576   36296148    0    0   429 1386615 158494 60077   8 84   5  3  0
        123 12          0     598572        576   36333728    0    0  1107 1233281 147542 62351   7 84   5  4  0
        622  7          0     660768        576   36118264    0    0   557 1345548 151394 59353   7 85   4  3  0
        223 11          0     283960        576   36463868    0    0    46 1107160 121846 33006   6 93   1  1  0
      
      Note that system CPU usage is very high blocks being written out has
      dropped by 42%. He analysed this with perf and found
      
        perf record -g -a sleep 10
        perf report --sort symbol --call-graph fractal,5
          34.63%  [k] _raw_spin_lock_irqsave
                  |
                  |--97.30%-- isolate_freepages
                  |          compaction_alloc
                  |          unmap_and_move
                  |          migrate_pages
                  |          compact_zone
                  |          compact_zone_order
                  |          try_to_compact_pages
                  |          __alloc_pages_direct_compact
                  |          __alloc_pages_slowpath
                  |          __alloc_pages_nodemask
                  |          alloc_pages_vma
                  |          do_huge_pmd_anonymous_page
                  |          handle_mm_fault
                  |          do_page_fault
                  |          page_fault
                  |          |
                  |          |--87.39%-- skb_copy_datagram_iovec
                  |          |          tcp_recvmsg
                  |          |          inet_recvmsg
                  |          |          sock_recvmsg
                  |          |          sys_recvfrom
                  |          |          system_call
                  |          |          __recv
                  |          |          |
                  |          |           --100.00%-- (nil)
                  |          |
                  |           --12.61%-- memcpy
                   --2.70%-- [...]
      
      There was other data but primarily it is all showing that compaction is
      contended heavily on the zone->lock and zone->lru_lock.
      
      commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled
      while isolating pages for migration] noted that it was possible for
      migration to hold the lru_lock for an excessive amount of time. Very
      broadly speaking this patch expands the concept.
      
      This patch introduces compact_checklock_irqsave() to check if a lock
      is contended or the process needs to be scheduled. If either condition
      is true then async compaction is aborted and the caller is informed.
      The page allocator will fail a THP allocation if compaction failed due
      to contention. This patch also introduces compact_trylock_irqsave()
      which will acquire the lock only if it is not contended and the process
      does not need to schedule.
      Reported-by: default avatarJim Schutt <jaschut@sandia.gov>
      Tested-by: default avatarJim Schutt <jaschut@sandia.gov>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c67fe375
    • Mel Gorman's avatar
      mm: have order > 0 compaction start near a pageblock with free pages · de74f1cc
      Mel Gorman authored
      Commit 7db8889a ("mm: have order > 0 compaction start off where it
      left") introduced a caching mechanism to reduce the amount work the free
      page scanner does in compaction.  However, it has a problem.  Consider
      two process simultaneously scanning free pages
      
      					    			C
      	Process A		M     S     			F
      			|---------------------------------------|
      	Process B		M 	FS
      
      	C is zone->compact_cached_free_pfn
      	S is cc->start_pfree_pfn
      	M is cc->migrate_pfn
      	F is cc->free_pfn
      
      In this diagram, Process A has just reached its migrate scanner, wrapped
      around and updated compact_cached_free_pfn accordingly.
      
      Simultaneously, Process B finishes isolating in a block and updates
      compact_cached_free_pfn again to the location of its free scanner.
      
      Process A moves to "end_of_zone - one_pageblock" and runs this check
      
                      if (cc->order > 0 && (!cc->wrapped ||
                                            zone->compact_cached_free_pfn >
                                            cc->start_free_pfn))
                              pfn = min(pfn, zone->compact_cached_free_pfn);
      
      compact_cached_free_pfn is above where it started so the free scanner
      skips almost the entire space it should have scanned.  When there are
      multiple processes compacting it can end in a situation where the entire
      zone is not being scanned at all.  Further, it is possible for two
      processes to ping-pong update to compact_cached_free_pfn which is just
      random.
      
      Overall, the end result wrecks allocation success rates.
      
      There is not an obvious way around this problem without introducing new
      locking and state so this patch takes a different approach.
      
      First, it gets rid of the skip logic because it's not clear that it
      matters if two free scanners happen to be in the same block but with
      racing updates it's too easy for it to skip over blocks it should not.
      
      Second, it updates compact_cached_free_pfn in a more limited set of
      circumstances.
      
      If a scanner has wrapped, it updates compact_cached_free_pfn to the end
      	of the zone. When a wrapped scanner isolates a page, it updates
      	compact_cached_free_pfn to point to the highest pageblock it
      	can isolate pages from.
      
      If a scanner has not wrapped when it has finished isolated pages it
      	checks if compact_cached_free_pfn is pointing to the end of the
      	zone. If so, the value is updated to point to the highest
      	pageblock that pages were isolated from. This value will not
      	be updated again until a free page scanner wraps and resets
      	compact_cached_free_pfn.
      
      This is not optimal and it can still race but the compact_cached_free_pfn
      will be pointing to or very near a pageblock with free pages.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de74f1cc
    • Alexandre Bounine's avatar
      rapidio/tsi721: fix unused variable compiler warning · 9a9a9a7a
      Alexandre Bounine authored
      Fix unused variable compiler warning when built with CONFIG_RAPIDIO_DEBUG
      option off.
      
      This patch is applicable to kernel versions starting from v3.2
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a9a9a7a
    • Alexandre Bounine's avatar
      rapidio/tsi721: fix inbound doorbell interrupt handling · 3670e7e1
      Alexandre Bounine authored
      Make sure that there is no doorbell messages left behind due to disabled
      interrupts during inbound doorbell processing.
      
      The most common case for this bug is loss of rionet JOIN messages in
      systems with three or more rionet participants and MSI or MSI-X enabled.
      As result, requests for packet transfers may finish with "destination
      unreachable" error message.
      
      This patch is applicable to kernel versions starting from v3.2.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3670e7e1
    • Atsushi Nemoto's avatar
      drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode · 7dbfb315
      Atsushi Nemoto authored
      Correct the offset by subtracting 20 from tm_hour before taking the
      modulo 12.
      
      [ "Why 20?" I hear you ask. Or at least I did.
      
        Here's the reason why: RS5C348_BIT_PM is 32, and is - stupidly -
        included in the RS5C348_HOURS_MASK define.  So it's really subtracting
        out that bit to get "hour+12".  But then because it does things modulo
        12, it needs to add the 12 in again afterwards anyway.
      
        This code is confused.  It would be much clearer if RS5C348_HOURS_MASK
        just didn't include the RS5C348_BIT_PM bit at all, then it wouldn't
        need to do the silly subtract either.
      
        Whatever. It's all just math, the end result is the same.   - Linus ]
      Reported-by: default avatarJames Nute <newten82@gmail.com>
      Tested-by: default avatarJames Nute <newten82@gmail.com>
      Signed-off-by: default avatarAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7dbfb315
    • Alex Shi's avatar
      mm: correct page->pfmemalloc to fix deactivate_slab regression · b121186a
      Alex Shi authored
      Commit cfd19c5a ("mm: only set page->pfmemalloc when
      ALLOC_NO_WATERMARKS was used") tried to narrow down page->pfmemalloc
      setting, but it missed some places the pfmemalloc should be set.
      
      So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS
      cause incorrect deactivate_slab() on our core2 server:
      
          64.73%           fio  [kernel.kallsyms]     [k] _raw_spin_lock
                           |
                           --- _raw_spin_lock
                              |
                              |---0.34%-- deactivate_slab
                              |          __slab_alloc
                              |          kmem_cache_alloc
                              |          |
      
      That causes our fio sync write performance to have a 40% regression.
      
      Move the checking in get_page_from_freelist() which resolves this issue.
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: default avatarSage Weil <sage@inktank.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b121186a