1. 15 Sep, 2015 1 commit
    • Daniel Axtens's avatar
      cxl: Fix unbalanced pci_dev_get in cxl_probe · 2925c2fd
      Daniel Axtens authored
      Currently the first thing we do in cxl_probe is to grab a reference
      on the pci device. Later on, we call device_register on our adapter.
      In our remove path, we call device_unregister, but we never call
      pci_dev_put. We therefore leak the device every time we do a
      reflash.
      
      device_register/unregister is sufficient to hold the reference.
      Therefore, drop the call to pci_dev_get.
      
      Here's why this is safe.
      The proposed cxl_probe(pdev) calls cxl_adapter_init:
          a) init calls cxl_adapter_alloc, which creates a struct cxl,
             conventionally called adapter. This struct contains a
             device entry, adapter->dev.
      
          b) init calls cxl_configure_adapter, where we set
             adapter->dev.parent = &dev->dev (here dev is the pci dev)
      
      So at this point, the cxl adapter's device's parent is the PCI
      device that I want to be refcounted properly.
      
          c) init calls cxl_register_adapter
             *) cxl_register_adapter calls device_register(&adapter->dev)
      
      So now we're in device_register, where dev is the adapter device, and
      we want to know if the PCI device is safe after we return.
      
      device_register(&adapter->dev) calls device_initialize() and then
      device_add().
      
      device_add() does a get_device(). device_add() also explicitly grabs
      the device's parent, and calls get_device() on it:
      
               parent = get_device(dev->parent);
      
      So therefore, device_register() takes a lock on the parent PCI dev,
      which is what pci_dev_get() was guarding. pci_dev_get() can therefore
      be safely removed.
      
      Fixes: f204e0b8 ("cxl: Driver code for powernv PCIe based cards for userspace access")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Acked-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2925c2fd
  2. 10 Sep, 2015 1 commit
    • Paul Mackerras's avatar
      powerpc/MSI: Fix race condition in tearing down MSI interrupts · e297c939
      Paul Mackerras authored
      This fixes a race which can result in the same virtual IRQ number
      being assigned to two different MSI interrupts.  The most visible
      consequence of that is usually a warning and stack trace from the
      sysfs code about an attempt to create a duplicate entry in sysfs.
      
      The race happens when one CPU (say CPU 0) is disposing of an MSI
      while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
      (for example) pnv_teardown_msi_irqs(), which calls
      msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
      hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
      to calling irq_dispose_mapping() to free up the virtal IRQ number,
      CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
      MSI, and gets the same hardware IRQ number that CPU 0 just freed.
      CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
      which sees that there is currently a mapping for that hardware IRQ
      number and returns the corresponding virtual IRQ number (which is
      the same virtual IRQ number that CPU 0 was using).  CPU 0 then
      calls irq_dispose_mapping() and frees that virtual IRQ number.
      Now, if another CPU comes along and calls irq_create_mapping(), it
      is likely to get the virtual IRQ number that was just freed,
      resulting in the same virtual IRQ number apparently being used for
      two different hardware interrupts.
      
      To fix this race, we just move the call to msi_bitmap_free_hwirqs()
      to after the call to irq_dispose_mapping().  Since virq_to_hw()
      doesn't work for the virtual IRQ number after irq_dispose_mapping()
      has been called, we need to call it before irq_dispose_mapping() and
      remember the result for the msi_bitmap_free_hwirqs() call.
      
      The pattern of calling msi_bitmap_free_hwirqs() before
      irq_dispose_mapping() appears in 5 places under arch/powerpc, and
      appears to have originated in commit 05af7bd2 ("[POWERPC] MPIC
      U3/U4 MSI backend") from 2007.
      
      Fixes: 05af7bd2 ("[POWERPC] MPIC U3/U4 MSI backend")
      Cc: stable@vger.kernel.org # v2.6.22+
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e297c939
  3. 09 Sep, 2015 1 commit
  4. 08 Sep, 2015 1 commit
  5. 07 Sep, 2015 3 commits
    • Andrew Donnellan's avatar
      cxl: abort cxl_pci_enable_device_hook() if PCI channel is offline · 7d1647dc
      Andrew Donnellan authored
      cxl_pci_enable_device_hook() is called when attempting to enable an AFU
      sitting on a vPHB. At present, the state of the underlying CXL card's PCI
      channel is only checked when it calls cxl_afu_check_and_enable() at the
      very end, after it has already set DMA options and initialised a default
      context.
      
      Check the CXL card's link status before setting DMA options or initialising
      a default context. If the link is down, print a warning and return
      immediately.
      Signed-off-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7d1647dc
    • Nishanth Aravamudan's avatar
      powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel= · fa144869
      Nishanth Aravamudan authored
      The 32-bit TCE table initialization relies on the DMA window having a
      size equal to a power of 2 (and checks for it explicitly). But
      crashkernel= has no constraint that requires a power-of-2 be specified.
      This causes the kdump kernel to fail to boot as none of the PCI devices
      (including the disk controller) are successfully initialized.
      
      After this change, the PCI devices successfully set up the 32-bit TCE
      table and kdump succeeds.
      
      Fixes: aca6913f ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages")
      Signed-off-by: default avatarNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # 4.2
      Tested-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      fa144869
    • Nishanth Aravamudan's avatar
      powerpc/powernv/pci-ioda: fix 32-bit TCE table init in kdump kernel · bb005455
      Nishanth Aravamudan authored
      When attempting to kdump with the 4.2 kernel, we see for each PCI
      device:
      
       pci 0003:01     : [PE# 000] Assign DMA32 space
       pci 0003:01     : [PE# 000] Setting up 32-bit TCE table at 0..80000000
       pci 0003:01     : [PE# 000] Failed to create 32-bit TCE table, err -22
       PCI: Domain 0004 has 8 available 32-bit DMA segments
       PCI: 4 PE# for a total weight of 70
       pci 0004:01     : [PE# 002] Assign DMA32 space
       pci 0004:01     : [PE# 002] Setting up 32-bit TCE table at 0..80000000
       pci 0004:01     : [PE# 002] Failed to create 32-bit TCE table, err -22
       pci 0004:0d     : [PE# 005] Assign DMA32 space
       pci 0004:0d     : [PE# 005] Setting up 32-bit TCE table at 0..80000000
       pci 0004:0d     : [PE# 005] Failed to create 32-bit TCE table, err -22
       pci 0004:0e     : [PE# 006] Assign DMA32 space
       pci 0004:0e     : [PE# 006] Setting up 32-bit TCE table at 0..80000000
       pci 0004:0e     : [PE# 006] Failed to create 32-bit TCE table, err -22
       pci 0004:10     : [PE# 008] Assign DMA32 space
       pci 0004:10     : [PE# 008] Setting up 32-bit TCE table at 0..80000000
       pci 0004:10     : [PE# 008] Failed to create 32-bit TCE table, err -22
      
      and eventually the kdump kernel fails to boot as none of the PCI devices
      (including the disk controller) are successfully initialized.
      
      The EINVAL response is because the DMA window (the 2GB base window) is
      larger than the kdump kernel's reserved memory (crashkernel=, in this
      case specified to be 1024M). The check in question,
      
       if ((window_size > memory_hotplug_max()) || !is_power_of_2(window_size))
      
      is a valid sanity check for pnv_pci_ioda2_table_alloc_pages(), so adjust
      the caller to pass in a smaller window size if our maximum memory value
      is smaller than the DMA window.
      
      After this change, the PCI devices successfully set up the 32-bit TCE
      table and kdump succeeds.
      
      The problem was seen on a Firestone machine originally.
      
      Fixes: aca6913f ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages")
      Cc: stable@vger.kernel.org # 4.2
      Signed-off-by: default avatarNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      [mpe: Coding style pedantry, use u64, change the indentation]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bb005455
  6. 06 Sep, 2015 6 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 7d9071a0
      Linus Torvalds authored
      Pull vfs updates from Al Viro:
       "In this one:
      
         - d_move fixes (Eric Biederman)
      
         - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
           handling ought to be fixed)
      
         - switch of sb_writers to percpu rwsem (Oleg Nesterov)
      
         - superblock scalability (Josef Bacik and Dave Chinner)
      
         - swapon(2) race fix (Hugh Dickins)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
        vfs: Test for and handle paths that are unreachable from their mnt_root
        dcache: Reduce the scope of i_lock in d_splice_alias
        dcache: Handle escaped paths in prepend_path
        mm: fix potential data race in SyS_swapon
        inode: don't softlockup when evicting inodes
        inode: rename i_wb_list to i_io_list
        sync: serialise per-superblock sync operations
        inode: convert inode_sb_list_lock to per-sb
        inode: add hlist_fake to avoid the inode hash lock in evict
        writeback: plug writeback at a high level
        change sb_writers to use percpu_rw_semaphore
        shift percpu_counter_destroy() into destroy_super_work()
        percpu-rwsem: kill CONFIG_PERCPU_RWSEM
        percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
        percpu-rwsem: introduce percpu_down_read_trylock()
        document rwsem_release() in sb_wait_write()
        fix the broken lockdep logic in __sb_start_write()
        introduce __sb_writers_{acquired,release}() helpers
        ufs_inode_get{frag,block}(): get rid of 'phys' argument
        ufs_getfrag_block(): tidy up a bit
        ...
      7d9071a0
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.3-merge-window-part-1' of... · bd779669
      Linus Torvalds authored
      Merge tag 'for-linus-4.3-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
      
      Pull 9p updates from Eric Van Hensbergen:
       "Just a few cleanups for 4.3 merge window for the 9p file system.  I've
        gotten several more over the past week, but this group has been in
        for-next for at least a couple of weeks so I figured I'd push them
        first while I test the rest.
      
        Most of the ones not in this set are bug-fixes anyways so I could hold
        them for rc1"
      
      * tag 'for-linus-4.3-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
        9p: fix return code of read() when count is 0
        9p: remove unused option Opt_trans
      bd779669
    • Linus Torvalds's avatar
      Merge tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 9cfcc658
      Linus Torvalds authored
      Pull media updates from Mauro Carvalho Chehab:
       - new DVB frontend drivers: ascot2e, cxd2841er, horus3a, lnbh25
       - new HDMI capture driver: tc358743
       - new driver for NetUP DVB new boards (netup_unidvb)
       - IR support for DVBSky cards (smipcie-ir)
       - Coda driver has gain macroblock tiling support
       - Renesas R-Car gains JPEG codec driver
       - new DVB platform driver for STi boards: c8sectpfe
       - added documentation for the media core kABI to device-drivers DocBook
       - lots of driver fixups, cleanups and improvements
      
      * tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (297 commits)
        [media] c8sectpfe: Remove select on undefined LIBELF_32
        [media] i2c: fix platform_no_drv_owner.cocci warnings
        [media] cx231xx: Use wake_up_interruptible() instead of wake_up_interruptible_nr()
        [media] tc358743: only queue subdev notifications if devnode is set
        [media] tc358743: add missing Kconfig dependency/select
        [media] c8sectpfe: Use %pad to print 'dma_addr_t'
        [media] DocBook media: Fix typo "the the" in xml files
        [media] tc358743: make reset gpio optional
        [media] tc358743: set direction of reset gpio using devm_gpiod_get
        [media] dvbdev: document most of the functions/data structs
        [media] dvb_frontend.h: document the struct dvb_frontend
        [media] dvb-frontend.h: document struct dtv_frontend_properties
        [media] dvb-frontend.h: document struct dvb_frontend_ops
        [media] dvb: Use DVBFE_ALGO_HW where applicable
        [media] dvb_frontend.h: document struct analog_demod_ops
        [media] dvb_frontend.h: Document struct dvb_tuner_ops
        [media] Docbook: Document struct analog_parameters
        [media] dvb_frontend.h: get rid of dvbfe_modcod
        [media] add documentation for struct dvb_tuner_info
        [media] dvb_frontend: document dvb_frontend_tune_settings
        ...
      9cfcc658
    • Linus Torvalds's avatar
      Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration · e3a98ac4
      Linus Torvalds authored
      Pull mailbox updates from Jassi Brar:
       "Mainly we move from jiffy based timer to HRTIMER for finer control
        over polling.  Then a controller reduces its polling period from 10 to
        1ms"
      
      * 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: arm_mhu: reduce txpoll_period from 10ms to 1 ms
        mailbox: switch to hrtimer for tx_complete polling
        mailbox: Drop owner assignment from platform_driver
      e3a98ac4
    • Linus Torvalds's avatar
      Merge tag 'md/4.3' of git://neil.brown.name/md · 2a013e37
      Linus Torvalds authored
      Pull md updates from Neil Brown:
      
       - an assortment of little fixes, several for minor races only likely to
         be hit during testing
      
       - further cluster-md-raid1 development, not ready for real use yet.
      
       - new RAID6 syndrome code for ARM NEON
      
       - fix a race where a write can return before failure of one device is
         properly recorded in metadata, so an immediate crash might result in
         that write being lost.
      
      * tag 'md/4.3' of git://neil.brown.name/md: (33 commits)
        md/raid5: ensure device failure recorded before write request returns.
        md/raid5: use bio_list for the list of bios to return.
        md/raid10: ensure device failure recorded before write request returns.
        md/raid1: ensure device failure recorded before write request returns.
        md-cluster: remove inappropriate try_module_get from join()
        md: extend spinlock protection in register_md_cluster_operations
        md-cluster: Read the disk bitmap sb and check if it needs recovery
        md-cluster: only call complete(&cinfo->completion) when node join cluster
        md-cluster: add missed lockres_free
        md-cluster: remove the unused sb_lock
        md-cluster: init suspend_list and suspend_lock early in join
        md-cluster: add the error check if failed to get dlm lock
        md-cluster: init completion within lockres_init
        md-cluster: fix deadlock issue on message lock
        md-cluster: transfer the resync ownership to another node
        md-cluster: split recover_slot for future code reuse
        md-cluster: use %pU to print UUIDs
        md: setup safemode_timer before it's being used
        md/raid5: handle possible race as reshape completes.
        md: sync sync_completed has correct value as recovery finishes.
        ...
      2a013e37
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux · 17447717
      Linus Torvalds authored
      Pull nfsd updates from Bruce Fields:
       "Nothing major, but:
      
         - Add Jeff Layton as an nfsd co-maintainer: no change to existing
           practice, just an acknowledgement of the status quo.
      
         - Two patches ("nfsd: ensure that...") for a race overlooked by the
           state locking rewrite, causing a crash noticed by multiple users.
      
         - Lots of smaller bugfixes all over from Kinglong Mee.
      
         - From Jeff, some cleanup of server rpc code in preparation for
           possible shift of nfsd threads to workqueues"
      
      * tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux: (52 commits)
        nfsd: deal with DELEGRETURN racing with CB_RECALL
        nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case
        nfsd: ensure that delegation stateid hash references are only put once
        nfsd: ensure that the ol stateid hash reference is only put once
        net: sunrpc: fix tracepoint Warning: unknown op '->'
        nfsd: allow more than one laundry job to run at a time
        nfsd: don't WARN/backtrace for invalid container deployment.
        fs: fix fs/locks.c kernel-doc warning
        nfsd: Add Jeff Layton as co-maintainer
        NFSD: Return word2 bitmask if setting security label in OPEN/CREATE
        NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1
        nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL.
        nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug
        NFSD: Store parent's stat in a separate value
        nfsd: Fix two typos in comments
        lockd: NLM grace period shouldn't block NFSv4 opens
        nfsd: include linux/nfs4.h in export.h
        sunrpc: Switch to using hash list instead single list
        sunrpc/nfsd: Remove redundant code by exports seq_operations functions
        sunrpc: Store cache_detail in seq_file's private directly
        ...
      17447717
  7. 05 Sep, 2015 4 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 22365979
      Linus Torvalds authored
      Pull btrfs updates from Chris Mason:
       "This has Jeff Mahoney's long standing trim patch that fixes corners
        where trims were missing.  Omar has some raid5/6 fixes, especially for
        using scrub and device replace when devices are missing.
      
        Zhao Lie continues cleaning and fixing things, this series fixes some
        really hard to hit corners in xfstests.  I had to pull it last merge
        window due to some deadlocks, but those are now resolved.
      
        I added support for Tejun's new blkio controllers.  It seems to work
        well for single devices, we'll expand to multi-device as well"
      
      * 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (47 commits)
        btrfs: fix compile when block cgroups are not enabled
        Btrfs: fix file read corruption after extent cloning and fsync
        Btrfs: check if previous transaction aborted to avoid fs corruption
        btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
        btrfs: Prevent from early transaction abort
        btrfs: Remove unused arguments in tree-log.c
        btrfs: Remove useless condition in start_log_trans()
        Btrfs: add support for blkio controllers
        Btrfs: remove unused mutex from struct 'btrfs_fs_info'
        Btrfs: fix parity scrub of RAID 5/6 with missing device
        Btrfs: fix device replace of a missing RAID 5/6 device
        Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation
        Btrfs: count devices correctly in readahead during RAID 5/6 replace
        Btrfs: remove misleading handling of missing device scrub
        btrfs: fix clone / extent-same deadlocks
        Btrfs: fix defrag to merge tail file extent
        Btrfs: fix warning in backref walking
        btrfs: Add WARN_ON() for double lock in btrfs_tree_lock()
        btrfs: Remove root argument in extent_data_ref_count()
        btrfs: Fix wrong comment of btrfs_alloc_tree_block()
        ...
      22365979
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 6c0f568e
      Linus Torvalds authored
      Merge patch-bomb from Andrew Morton:
      
       - a few misc things
      
       - Andy's "ambient capabilities"
      
       - fs/nofity updates
      
       - the ocfs2 queue
      
       - kernel/watchdog.c updates and feature work.
      
       - some of MM.  Includes Andrea's userfaultfd feature.
      
      [ Hadn't noticed that userfaultfd was 'default y' when applying the
        patches, so that got fixed in this merge instead.  We do _not_ mark
        new features that nobody uses yet 'default y'   - Linus ]
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
        mm/hugetlb.c: make vma_has_reserves() return bool
        mm/madvise.c: make madvise_behaviour_valid() return bool
        mm/memory.c: make tlb_next_batch() return bool
        mm/dmapool.c: change is_page_busy() return from int to bool
        mm: remove struct node_active_region
        mremap: simplify the "overlap" check in mremap_to()
        mremap: don't do uneccesary checks if new_len == old_len
        mremap: don't do mm_populate(new_addr) on failure
        mm: move ->mremap() from file_operations to vm_operations_struct
        mremap: don't leak new_vma if f_op->mremap() fails
        mm/hugetlb.c: make vma_shareable() return bool
        mm: make GUP handle pfn mapping unless FOLL_GET is requested
        mm: fix status code which move_pages() returns for zero page
        mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout()
        genalloc: add support of multiple gen_pools per device
        genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()
        mm/memblock: WARN_ON when nid differs from overlap region
        Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap
        mm: defer flush of writable TLB entries
        mm: send one IPI per CPU to TLB flush all entries after unmapping pages
        ...
      6c0f568e
    • Eric Dumazet's avatar
      task_work: remove fifo ordering guarantee · c8219906
      Eric Dumazet authored
      In commit f341861f ("task_work: add a scheduling point in
      task_work_run()") I fixed a latency problem adding a cond_resched()
      call.
      
      Later, commit ac3d0da8 added yet another loop to reverse a list,
      bringing back the latency spike :
      
      I've seen in some cases this loop taking 275 ms, if for example a
      process with 2,000,000 files is killed.
      
      We could add yet another cond_resched() in the reverse loop, or we
      can simply remove the reversal, as I do not think anything
      would depend on order of task_work_add() submitted works.
      
      Fixes: ac3d0da8 ("task_work: Make task_work_add() lockless")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c8219906
    • NeilBrown's avatar
      Merge linux-block/for-4.3/core into md/for-linux · e89c6fdf
      NeilBrown authored
      There were a few conflicts that are fairly easy to resolve.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      e89c6fdf
  8. 04 Sep, 2015 23 commits