1. 24 Apr, 2015 34 commits
    • Larry Finger's avatar
      rtlwifi: Fix IOMMU mapping leak in AP mode · 19d240f7
      Larry Finger authored
      [ Upstream commit be0b5e63 ]
      
      Transmission of an AP beacon does not call the TX interrupt service routine,
      which usually does the cleanup. Instead, cleanup is handled in a tasklet
      completion routine. Unfortunately, this routine has a serious bug in that it does
      not release the DMA mapping before it frees the skb, thus one IOMMU mapping is
      leaked for each beacon. The test system failed with no free IOMMU mapping slots
      approximately one hour after hostapd was used to start an AP.
      
      This issue was reported and tested at https://github.com/lwfinger/rtlwifi_new/issues/30.
      Reported-and-tested-by: default avatarKevin Mullican <kevin@mullican.com>
      Cc: Kevin Mullican <kevin@mullican.com>
      Signed-off-by: default avatarShao Fu <shaofu@realtek.com>
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Stable <stable@vger.kernel.org>  [3.18+]
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      19d240f7
    • Alex Williamson's avatar
      iommu/vt-d: Detach domain *only* from attached iommus · d2942566
      Alex Williamson authored
      [ Upstream commit 71684406 ]
      
      Device domains never span IOMMU hardware units, which allows the
      domain ID space for each IOMMU to be an independent address space.
      Therefore we can have multiple, independent domains, each with the
      same domain->id, but attached to different hardware units.  This is
      also why we need to do a heavy-weight search for VM domains since
      they can span multiple IOMMUs hardware units and we don't require a
      single global ID to use for all hardware units.
      
      Therefore, if we call iommu_detach_domain() across all active IOMMU
      hardware units for a non-VM domain, the result is that we clear domain
      IDs that are not associated with our domain, allowing them to be
      re-allocated and causing apparent coherency issues when the device
      cannot access IOVAs for the intended domain.
      
      This bug was introduced in commit fb170fb4 ("iommu/vt-d: Introduce
      helper functions to make code symmetric for readability"), but is
      significantly exacerbated by the more recent commit 62c22167
      ("iommu/vt-d: Fix dmar_domain leak in iommu_attach_device") which calls
      domain_exit() more frequently to resolve a domain leak.
      
      Fixes: fb170fb4 ("iommu/vt-d: Introduce helper functions to make code symmetric for readability")
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: stable@vger.kernel.org # v3.17+
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d2942566
    • David Disseldorp's avatar
      cifs: fix use-after-free bug in find_writable_file · e9c75e69
      David Disseldorp authored
      [ Upstream commit e1e9bda2 ]
      
      Under intermittent network outages, find_writable_file() is susceptible
      to the following race condition, which results in a user-after-free in
      the cifs_writepages code-path:
      
      Thread 1                                        Thread 2
      ========                                        ========
      
      inv_file = NULL
      refind = 0
      spin_lock(&cifs_file_list_lock)
      
      // invalidHandle found on openFileList
      
      inv_file = open_file
      // inv_file->count currently 1
      
      cifsFileInfo_get(inv_file)
      // inv_file->count = 2
      
      spin_unlock(&cifs_file_list_lock);
      
      cifs_reopen_file()                            cifs_close()
      // fails (rc != 0)                            ->cifsFileInfo_put()
                                             spin_lock(&cifs_file_list_lock)
                                             // inv_file->count = 1
                                             spin_unlock(&cifs_file_list_lock)
      
      spin_lock(&cifs_file_list_lock);
      list_move_tail(&inv_file->flist,
            &cifs_inode->openFileList);
      spin_unlock(&cifs_file_list_lock);
      
      cifsFileInfo_put(inv_file);
      ->spin_lock(&cifs_file_list_lock)
      
        // inv_file->count = 0
        list_del(&cifs_file->flist);
        // cleanup!!
        kfree(cifs_file);
      
        spin_unlock(&cifs_file_list_lock);
      
      spin_lock(&cifs_file_list_lock);
      ++refind;
      // refind = 1
      goto refind_writable;
      
      At this point we loop back through with an invalid inv_file pointer
      and a refind value of 1. On second pass, inv_file is not overwritten on
      openFileList traversal, and is subsequently dereferenced.
      Signed-off-by: default avatarDavid Disseldorp <ddiss@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@samba.org>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e9c75e69
    • Sachin Prabhu's avatar
      cifs: smb2_clone_range() - exit on unhandled error · 6bb88677
      Sachin Prabhu authored
      [ Upstream commit 2477bc58 ]
      
      While attempting to clone a file on a samba server, we receive a
      STATUS_INVALID_DEVICE_REQUEST. This is mapped to -EOPNOTSUPP which
      isn't handled in smb2_clone_range(). We end up looping in the while loop
      making same call to the samba server over and over again.
      
      The proposed fix is to exit and return the error value when encountered
      with an unhandled error.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: default avatarSteve French <steve.french@primarydata.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6bb88677
    • Stefan Agner's avatar
      tty: serial: fsl_lpuart: clear receive flag on FIFO flush · 465dd8c0
      Stefan Agner authored
      [ Upstream commit 8e4934c6 ]
      
      When the receiver was enabled during startup, a character could
      have been in the FIFO when the UART get initially used. The
      driver configures the (receive) watermark level, and flushes the
      FIFO. However, the receive flag (RDRF) could still be set at that
      stage (as mentioned in the register description of UARTx_RWFIFO).
      This leads to an interrupt which won't be handled properly in
      interrupt mode: The receive interrupt function lpuart_rxint checks
      the FIFO count, which is 0 at that point (due to the flush
      during initialization). The problem does not manifest when using
      DMA to receive characters.
      
      Fix this situation by explicitly read the status register, which
      leads to clearing of the RDRF flag. Due to the flush just after
      the status flag read, a explicit data read is not to required.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      465dd8c0
    • Stefan Agner's avatar
      tty: serial: fsl_lpuart: specify transmit FIFO size · eed8dd7b
      Stefan Agner authored
      [ Upstream commit 4e8f2459 ]
      
      Specify transmit FIFO size which might be different depending on
      LPUART instance. This makes sure uart_wait_until_sent in serial
      core getting called, which in turn waits and checks if the FIFO
      is really empty on shutdown by using the tx_empty callback.
      Without the call of this callback, the last several characters
      might not yet be transmitted when closing the serial port. This
      can be reproduced by simply using echo and redirect the output to
      a ttyLP device.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      eed8dd7b
    • Lu Baolu's avatar
      usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers · 3372538a
      Lu Baolu authored
      [ Upstream commit 227a4fd8 ]
      
      When a device with an isochronous endpoint is plugged into the Intel
      xHCI host controller, and the driver submits multiple frames per URB,
      the xHCI driver will set the Block Event Interrupt (BEI) flag on all
      but the last TD for the URB. This causes the host controller to place
      an event on the event ring, but not send an interrupt. When the last
      TD for the URB completes, BEI is cleared, and we get an interrupt for
      the whole URB.
      
      However, under Intel xHCI host controllers, if the event ring is full
      of events from transfers with BEI set,  an "Event Ring is Full" event
      will be posted to the last entry of the event ring,  but no interrupt
      is generated. Host will cease all transfer and command executions and
      wait until software completes handling the pending events in the event
      ring.  That means xHC stops, but event of "event ring is full" is not
      notified. As the result, the xHC looks like dead to user.
      
      This patch is to apply XHCI_AVOID_BEI quirk to Intel xHC devices. And
      it should be backported to kernels as old as 3.0, that contains the
      commit 69e848c2 ("Intel xhci: Support EHCI/xHCI port switching.").
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Tested-by: default avatarAlistair Grant <akgrant0710@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3372538a
    • Lu Baolu's avatar
      usb: xhci: handle Config Error Change (CEC) in xhci driver · 83961c54
      Lu Baolu authored
      [ Upstream commit 9425183d ]
      
      Linux xHCI driver doesn't report and handle port cofig error change.
      If Port Configure Error for root hub port occurs, CEC bit in PORTSC
      would be set by xHC and remains 1. This happends when the root port
      fails to configure its link partner, e.g. the port fails to exchange
      port capabilities information using Port Capability LMPs.
      
      Then the Port Status Change Events will be blocked until all status
      change bits(CEC is one of the change bits) are cleared('0') (refer to
      xHCI spec 4.19.2). Otherwise, the port status change event for this
      root port will not be generated anymore, then root port would look
      like dead for user and can't be recovered until a Host Controller
      Reset(HCRST).
      
      This patch is to check CEC bit in PORTSC in xhci_get_port_status()
      and set a Config Error in the return status if CEC is set. This will
      cause a ClearPortFeature request, where CEC bit is cleared in
      xhci_clear_port_change_bit().
      
      [The commit log is based on initial Marvell patch posted at
      http://marc.info/?l=linux-kernel&m=142323612321434&w=2]
      Reported-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Cc: stable <stable@vger.kernel.org> # v3.2+
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      83961c54
    • Thomas Schlichter's avatar
      cpuidle: ACPI: do not overwrite name and description of C0 · ae0f5de3
      Thomas Schlichter authored
      [ Upstream commit c7e8bdf5 ]
      
      Fix a bug that leads to showing the name and description of C-state C0
      as "<null>" in sysfs after the ACPI C-states changed (e.g. after AC->DC
      or DC->AC
      transition).
      
      The function poll_idle_init() in drivers/cpuidle/driver.c initializes the
      state 0 during cpuidle_register_driver(), so we better do not overwrite it
      again with '\0' during acpi_processor_cst_has_changed().
      Signed-off-by: default avatarThomas Schlichter <thomas.schlichter@web.de>
      Reviewed-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ae0f5de3
    • Bartlomiej Zolnierkiewicz's avatar
      cpuidle: remove state_count field from struct cpuidle_device · 87119123
      Bartlomiej Zolnierkiewicz authored
      [ Upstream commit d75e4af1 ]
      
      Thomas Schlichter reports the following issue on his Samsung NC20:
      
      "The C-states C1 and C2 to the OS when connected to AC, and additionally
       provides the C3 C-state when disconnected from AC.  However, the number
       of C-states shown in sysfs is fixed to the number of C-states present
       at boot.
         If I boot with AC connected, I always only see the C-states up to C2
         even if I disconnect AC.
      
         The reason is commit 130a5f69 (ACPI / cpuidle: remove dev->state_count
         setting).  It removes the update of dev->state_count, but sysfs uses
         exactly this variable to show the C-states.
      
         The fix is to use drv->state_count in sysfs.  As this is currently the
         last user of dev->state_count, this variable can be completely removed."
      
      Remove dev->state_count as per the above.
      Reported-by: default avatarThomas Schlichter <thomas.schlichter@web.de>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarKyungmin Park <kyungmin.park@samsung.com>
      Acked-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
      [ rjw: Changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      87119123
    • Andreas Werner's avatar
      can: flexcan: Deferred on Regulator return EPROBE_DEFER · a32d3f40
      Andreas Werner authored
      [ Upstream commit 555828ef ]
      
      Return EPROBE_DEFER if Regulator returns EPROBE_DEFER
      
      If the Flexcan driver is built into kernel and a regulator is used to
      enable the CAN transceiver, the Flexcan driver may not use the regulator.
      
      When initializing the Flexcan device with a regulator defined in the device
      tree, but not initialized, the regulator subsystem returns EPROBE_DEFER, hence
      the Flexcan init fails.
      
      The solution for this is to return EPROBE_DEFER if regulator is not initialized
      and wait until the regulator is initialized.
      Signed-off-by: default avatarAndreas Werner <kernel@andy89.org>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a32d3f40
    • Stefan Lippers-Hollmann's avatar
      x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk · 44d8cbec
      Stefan Lippers-Hollmann authored
      [ Upstream commit 80313b30 ]
      
      The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in
      both BIOS and UEFI mode while rebooting unless reboot=pci is
      used. Add a quirk to reboot via the pci method.
      
      The problem is very intermittent and hard to debug, it might succeed
      rebooting just fine 40 times in a row - but fails half a dozen times
      the next day. It seems to be slightly less common in BIOS CSM mode
      than native UEFI (with the CSM disabled), but it does happen in either
      mode. Since I've started testing this patch in late january, rebooting
      has been 100% reliable.
      
      Most of the time it already hangs during POST, but occasionally it
      might even make it through the bootloader and the kernel might even
      start booting, but then hangs before the mode switch. The same symptoms
      occur with grub-efi, gummiboot and grub-pc, just as well as (at least)
      kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16).
      Upgrading to the most current mainboard firmware of the ASRock
      Q1900DC-ITX, version 1.20, does not improve the situation.
      
      ( Searching the web seems to suggest that other Bay Trail-D mainboards
        might be affected as well. )
      --
      Signed-off-by: default avatarStefan Lippers-Hollmann <s.l-h@gmx.de>
      Cc: <stable@vger.kernel.org>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/r/20150330224427.0fb58e42@mirSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      44d8cbec
    • Felix Fietkau's avatar
      ath9k: fix tracking of enabled AP beacons · 7f676736
      Felix Fietkau authored
      [ Upstream commit 1cf48f22 ]
      
      sc->nbcnvifs tracks assigned beacon slots, not enabled beacons.
      Therefore, it cannot be used to decide if cur_conf->enable_beacon (bool)
      should be updated, or if beacons have been enabled already.
      With the current code (depending on the order of calls), beacons often
      do not get enabled in an AP+STA setup.
      To fix tracking of enabled beacons, convert cur_conf->enable_beacon to a
      bitmask of enabled beacon slots.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFelix Fietkau <nbd@openwrt.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7f676736
    • Peter Ujfalusi's avatar
      dmaengine: omap-dma: Fix memory leak when terminating running transfer · 9786bf2b
      Peter Ujfalusi authored
      [ Upstream commit 02d88b73 ]
      
      In omap_dma_start_desc the vdesc->node is removed from the virt-dma
      framework managed lists (to be precise from the desc_issued list).
      If a terminate_all comes before the transfer finishes the omap_desc will
      not be freed up because it is not in any of the lists and we stopped the
      DMA channel so the transfer will not going to complete.
      There is no special sequence for leaking memory when using cyclic (audio)
      transfer: with every start and stop of a cyclic transfer the driver leaks
      struct omap_desc worth of memory.
      
      Free up the allocated memory directly in omap_dma_terminate_all() since the
      framework will not going to do that for us.
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      CC: <stable@vger.kernel.org>
      CC: <linux-omap@vger.kernel.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9786bf2b
    • Petr Kulhavy's avatar
      dmaengine: edma: fix memory leak when terminating running transfers · ba1d2d92
      Petr Kulhavy authored
      [ Upstream commit 5ca9e7ce ]
      
      If edma_terminate_all() was called while a transfer was running (i.e. after
      edma_execute() but before edma_callback()) the echan->edesc was not freed.
      
      This was due to the fact that a running transfer is on none of the
      vchan lists: desc_submitted, desc_issued, desc_completed (edma_execute()
      removes it from the desc_issued list), so the vchan_dma_desc_free_list()
      called at the end of edma_terminate_all() didn't find it and didn't free it.
      
      This bug was found on an AM1808 based hardware (very similar to da850evm,
      however using the second MMC/SD controller), where intense operations on the SD
      card wasted the device 128MB RAM within a couple of days.
      
      Peter Ujfalusi:
      The issue is even more severe since it affects cyclic (audio) transfers as
      well. In this case starting/stopping audio will results memory leak.
      Signed-off-by: default avatarPetr Kulhavy <petr@barix.com>
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      CC: <stable@vger.kernel.org>
      CC: <linux-omap@vger.kernel.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ba1d2d92
    • Darshana Padmadas's avatar
      iio: imu: Use iio_trigger_get for indio_dev->trig assignment · cfb769a8
      Darshana Padmadas authored
      [ Upstream commit 4ce7ca89 ]
      
      This patch uses iio_trigger_get to increment the reference
      count of trigger device, to avoid incorrect assignment.
      Can result in a null pointer dereference during removal if the
      trigger has been changed before removal.
      
      This patch refers to a similar situation encountered through the
      following discussion:
      http://www.spinics.net/lists/linux-iio/msg13669.htmlSigned-off-by: default avatarDarshana Padmadas <darshanapadmadas@gmail.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cfb769a8
    • Stefan Agner's avatar
      iio: adc: vf610: use ADC clock within specification · dcf72cd6
      Stefan Agner authored
      [ Upstream commit f54e9f2b ]
      
      Depending on conversion mode used, the ADC clock (ADCK) needs
      to be below a maximum frequency. According to Vybrid's data
      sheet this is 20MHz for the low power conversion mode.
      
      The ADC clock is depending on input clock, which is the bus
      clock by default. Vybrid SoC are typically clocked at at 400MHz
      or 500MHz, which leads to 66MHz or 83MHz bus clock respectively.
      Hence, a divider of 8 is required to stay below the specified
      maximum clock of 20MHz.
      
      Due to the different bus clock speeds, the resulting sampling
      frequency is not static. Hence use the ADC clock and calculate
      the actual available sampling frequency dynamically.
      
      This fixes bogous values observed on some 500MHz clocked Vybrid
      SoC. The resulting value usually showed Bit 9 being stuck at 1,
      or 0, which lead to a value of +/-512.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Acked-by: default avatarFugang Duan <B38611@freescale.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      dcf72cd6
    • Sathyanarayanan Kuppuswamy's avatar
      iio: bmc150: change sampling frequency · 1930ba06
      Sathyanarayanan Kuppuswamy authored
      [ Upstream commit 0ba8da96 ]
      
      Currently driver reports device bandwidth list as available
      sampling frequency. But sampling frequency is actually twice
      the device bandwidth. This patch fixes this issue.
      Signed-off-by: default avatarSathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@intel.com>
      Signed-off-by: default avatarOctavian Purdila <octavian.purdila@intel.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1930ba06
    • Martin Fuzzey's avatar
      iio: core: Fix double free. · 42ec319b
      Martin Fuzzey authored
      [ Upstream commit c1b03ab5 ]
      
      When an error occurred during event registration memory was freed twice
      resulting in kernel memory corruption and a crash in unrelated code.
      
      The problem was caused by
      	iio_device_unregister_eventset()
      	iio_device_unregister_sysfs()
      
      being called twice, once on the error path and then
      again via iio_dev_release().
      
      Fix this by making these two functions idempotent so they
      may be called multiple times.
      
      The problem was observed before applying
      	78b33216 iio:core: Handle error when mask type is not separate
      Signed-off-by: default avatarMartin Fuzzey <mfuzzey@parkeon.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      42ec319b
    • Viorel Suman's avatar
      iio: inv_mpu6050: Clear timestamps fifo while resetting hardware fifo · 382fd035
      Viorel Suman authored
      [ Upstream commit 4dac0a8e ]
      
      A hardware fifo reset always imply an invalidation of the
      existing timestamps, so we'll clear timestamps fifo on
      successfull hardware fifo reset.
      Signed-off-by: default avatarViorel Suman <viorel.suman@gmail.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      382fd035
    • Bart Van Assche's avatar
      Defer processing of REQ_PREEMPT requests for blocked devices · f37f8f40
      Bart Van Assche authored
      [ Upstream commit bba0bdd7 ]
      
      SCSI transport drivers and SCSI LLDs block a SCSI device if the
      transport layer is not operational. This means that in this state
      no requests should be processed, even if the REQ_PREEMPT flag has
      been set. This patch avoids that a rescan shortly after a cable
      pull sporadically triggers the following kernel oops:
      
      BUG: unable to handle kernel paging request at ffffc9001a6bc084
      IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib]
      Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100)
      Call Trace:
       [<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp]
       [<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp]
       [<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod]
       [<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod]
       [<ffffffff81223b37>] __blk_run_queue+0x27/0x30
       [<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110
       [<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0
       [<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod]
       [<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod]
       [<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod]
       [<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod]
       [<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod]
       [<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod]
       [<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod]
       [<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod]
       [<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160
       [<ffffffff811589de>] vfs_write+0xce/0x140
       [<ffffffff81158b53>] sys_write+0x53/0xa0
       [<ffffffff81464592>] system_call_fastpath+0x16/0x1b
       [<00007f611c9d9300>] 0x7f611c9d92ff
      Reported-by: default avatarMax Gurtuvoy <maxg@mellanox.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f37f8f40
    • Doug Goldstein's avatar
      USB: ftdi_sio: Use jtag quirk for SNAP Connect E10 · 4893f8b2
      Doug Goldstein authored
      [ Upstream commit b229a0f8 ]
      
      This patch uses the existing CALAO Systems ftdi_8u2232c_probe in order
      to avoid attaching a TTY to the JTAG port as this board is based on the
      CALAO Systems reference design and needs the same fix up.
      Signed-off-by: default avatarDoug Goldstein <cardoe@cardoe.com>
      CC: stable <stable@vger.kernel.org>
      [johan: clean up probe logic ]
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4893f8b2
    • Doug Goldstein's avatar
      USB: ftdi_sio: Added custom PID for Synapse Wireless product · 51348505
      Doug Goldstein authored
      [ Upstream commit 4899c054 ]
      
      Synapse Wireless uses the FTDI VID with a custom PID of 0x9090 for their
      SNAP Stick 200 product.
      Signed-off-by: default avatarDoug Goldstein <cardoe@cardoe.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      51348505
    • Nathaniel W Filardo's avatar
      USB: keyspan_pda: add new device id · b8cfabf9
      Nathaniel W Filardo authored
      [ Upstream commit 5e71fc86 ]
      
      Add USB VID/PID for Xircom PGMFHUB USB/serial component.  (The hub and SCSI
      bridge on that hardware are recognized out of the box by existing drivers.)
      Tested VID/PID using new_id and loopback connection and was met with
      success, but that's all the testing done.
      Signed-off-by: default avatarNathaniel Wesley Filardo <nwf@cs.jhu.edu>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b8cfabf9
    • David Miller's avatar
      radeon: Do not directly dereference pointers to BIOS area. · fde250a5
      David Miller authored
      [ Upstream commit f2c9e560 ]
      
      Use readb() and memcpy_fromio() accessors instead.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fde250a5
    • Tejun Heo's avatar
      writeback: fix possible underflow in write bandwidth calculation · 51e17d28
      Tejun Heo authored
      [ Upstream commit c72efb65 ]
      
      From 1ebf33901ecc75d9496862dceb1ef0377980587c Mon Sep 17 00:00:00 2001
      From: Tejun Heo <tj@kernel.org>
      Date: Mon, 23 Mar 2015 00:08:19 -0400
      
      2f800fbd ("writeback: fix dirtied pages accounting on redirty")
      introduced account_page_redirty() which reverts stat updates for a
      redirtied page, making BDI_DIRTIED no longer monotonically increasing.
      
      bdi_update_write_bandwidth() uses the delta in BDI_DIRTIED as the
      basis for bandwidth calculation.  While unlikely, since the above
      patch, the newer value may be lower than the recorded past value and
      underflow the bandwidth calculation leading to a wild result.
      
      Fix it by subtracing min of the old and new values when calculating
      delta.  AFAIK, there hasn't been any report of it happening but the
      resulting erratic behavior would be non-critical and temporary, so
      it's possible that the issue is happening without being reported.  The
      risk of the fix is very low, so tagged for -stable.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Greg Thelen <gthelen@google.com>
      Fixes: 2f800fbd ("writeback: fix dirtied pages accounting on redirty")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      51e17d28
    • Tejun Heo's avatar
      writeback: add missing INITIAL_JIFFIES init in global_update_bandwidth() · 59881078
      Tejun Heo authored
      [ Upstream commit 7d70e154 ]
      
      global_update_bandwidth() uses static variable update_time as the
      timestamp for the last update but forgets to initialize it to
      INITIALIZE_JIFFIES.
      
      This means that global_dirty_limit will be 5 mins into the future on
      32bit and some large amount jiffies into the past on 64bit.  This
      isn't critical as the only effect is that global_dirty_limit won't be
      updated for the first 5 mins after booting on 32bit machines,
      especially given the auxiliary nature of global_dirty_limit's role -
      protecting against global dirty threshold's sudden dips; however, it
      does lead to unintended suboptimal behavior.  Fix it.
      
      Fixes: c42843f2 ("writeback: introduce smoothed global dirty limit")
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      59881078
    • Viresh Kumar's avatar
      cpufreq: Schedule work for the first-online CPU on resume · 5f44e397
      Viresh Kumar authored
      [ Upstream commit c75de0ac ]
      
      All CPUs leaving the first-online CPU are hotplugged out on suspend and
      and cpufreq core stops managing them.
      
      On resume, we need to call cpufreq_update_policy() for this CPU's policy
      to make sure its frequency is in sync with cpufreq's cached value, as it
      might have got updated by hardware during suspend/resume.
      
      The policies are always added to the top of the policy-list. So, in
      normal circumstances, CPU 0's policy will be the last one in the list.
      And so the code checks for the last policy.
      
      But there are cases where it will fail. Consider quad-core system, with
      policy-per core. If CPU0 is hotplugged out and added back again, the
      last policy will be on CPU1 :(
      
      To fix this in a proper way, always look for the policy of the first
      online CPU. That way we will be sure that we are calling
      cpufreq_update_policy() for the only CPU that wasn't hotplugged out.
      
      Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
      Fixes: 2f0aea93 ("cpufreq: suspend governors on system suspend/hibernate")
      Reported-by: default avatarSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5f44e397
    • Brian Silverman's avatar
      sched: Fix RLIMIT_RTTIME when PI-boosting to RT · 42d921d3
      Brian Silverman authored
      [ Upstream commit 746db944 ]
      
      When non-realtime tasks get priority-inheritance boosted to a realtime
      scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
      counter used for checking this (the same one used for SCHED_RR
      timeslices) was not getting reset. This meant that tasks running with a
      non-realtime scheduling class which are repeatedly boosted to a realtime
      one, but never block while they are running realtime, eventually hit the
      timeout without ever running for a time over the limit. This patch
      resets the realtime timeslice counter when un-PI-boosting from an RT to
      a non-RT scheduling class.
      
      I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
      mutex which induces priority boosting and spins while boosted that gets
      killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
      applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
      does happen eventually with PREEMPT_VOLUNTARY kernels.
      Signed-off-by: default avatarBrian Silverman <brian@peloton-tech.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: austin@peloton-tech.com
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      42d921d3
    • Laura Abbott's avatar
      mm/page_alloc.c: call kernel_map_pages in unset_migrateype_isolate · a38edfb2
      Laura Abbott authored
      [ Upstream commit cfa86943 ]
      
      Commit 3c605096 ("mm/page_alloc: restrict max order of merging on
      isolated pageblock") changed the logic of unset_migratetype_isolate to
      check the buddy allocator and explicitly call __free_pages to merge.
      
      The page that is being freed in this path never had prep_new_page called
      so set_page_refcounted is called explicitly but there is no call to
      kernel_map_pages.  With the default kernel_map_pages this is mostly
      harmless but if kernel_map_pages does any manipulation of the page
      tables (unmapping or setting pages to read only) this may trigger a
      fault:
      
          alloc_contig_range test_pages_isolated(ceb00, ced00) failed
          Unable to handle kernel paging request at virtual address ffffffc0cec00000
          pgd = ffffffc045fc4000
          [ffffffc0cec00000] *pgd=0000000000000000
          Internal error: Oops: 9600004f [#1] PREEMPT SMP
          Modules linked in: exfatfs
          CPU: 1 PID: 23237 Comm: TimedEventQueue Not tainted 3.10.49-gc72ad36-dirty #1
          task: ffffffc03de52100 ti: ffffffc015388000 task.ti: ffffffc015388000
          PC is at memset+0xc8/0x1c0
          LR is at kernel_map_pages+0x1ec/0x244
      
      Fix this by calling kernel_map_pages to ensure the page is set in the
      page table properly
      
      Fixes: 3c605096 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
      Signed-off-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a38edfb2
    • Gu Zheng's avatar
      mm/memory hotplug: postpone the reset of obsolete pgdat · ec4db266
      Gu Zheng authored
      [ Upstream commit b0dc3a34 ]
      
      Qiu Xishi reported the following BUG when testing hot-add/hot-remove node under
      stress condition:
      
        BUG: unable to handle kernel paging request at 0000000000025f60
        IP: next_online_pgdat+0x1/0x50
        PGD 0
        Oops: 0000 [#1] SMP
        ACPI: Device does not support D3cold
        Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
        CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
        Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
        Workqueue: events vmstat_update
        task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
        RIP: 0010: next_online_pgdat+0x1/0x50
        RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
        RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
        RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
        RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
        R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
        R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
        FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
          refresh_cpu_vm_stats+0xd0/0x140
          vmstat_update+0x11/0x50
          process_one_work+0x194/0x3d0
          worker_thread+0x12b/0x410
          kthread+0xc6/0xd0
          ret_from_fork+0x7c/0xb0
      
      The cause is the "memset(pgdat, 0, sizeof(*pgdat))" at the end of
      try_offline_node, which will reset all the content of pgdat to 0, as the
      pgdat is accessed lock-free, so that the users still using the pgdat
      will panic, such as the vmstat_update routine.
      
      process A:				offline node XX:
      
      vmstat_updat()
         refresh_cpu_vm_stats()
           for_each_populated_zone()
             find online node XX
           cond_resched()
      					offline cpu and memory, then try_offline_node()
      					node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
             zone = next_zone(zone)
               pg_data_t *pgdat = zone->zone_pgdat;  // here pgdat is NULL now
                 next_online_pgdat(pgdat)
                   next_online_node(pgdat->node_id);  // NULL pointer access
      
      So the solution here is postponing the reset of obsolete pgdat from
      try_offline_node() to hotadd_new_pgdat(), and just resetting
      pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the memset
      0 to avoid breaking pointer information in pgdat.
      Signed-off-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Suggested-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ec4db266
    • Leon Yu's avatar
      mm: fix anon_vma->degree underflow in anon_vma endless growing prevention · 5bc650f0
      Leon Yu authored
      [ Upstream commit 3fe89b3e ]
      
      I have constantly stumbled upon "kernel BUG at mm/rmap.c:399!" after
      upgrading to 3.19 and had no luck with 4.0-rc1 neither.
      
      So, after looking into new logic introduced by commit 7a3ef208 ("mm:
      prevent endless growth of anon_vma hierarchy"), I found chances are that
      unlink_anon_vmas() is called without incrementing dst->anon_vma->degree
      in anon_vma_clone() due to allocation failure.  If dst->anon_vma is not
      NULL in error path, its degree will be incorrectly decremented in
      unlink_anon_vmas() and eventually underflow when exiting as a result of
      another call to unlink_anon_vmas().  That's how "kernel BUG at
      mm/rmap.c:399!" is triggered for me.
      
      This patch fixes the underflow by dropping dst->anon_vma when allocation
      fails.  It's safe to do so regardless of original value of dst->anon_vma
      because dst->anon_vma doesn't have valid meaning if anon_vma_clone()
      fails.  Besides, callers don't care dst->anon_vma in such case neither.
      
      Also suggested by Michal Hocko, we can clean up vma_adjust() a bit as
      anon_vma_clone() now does the work.
      
      [akpm@linux-foundation.org: tweak comment]
      Fixes: 7a3ef208 ("mm: prevent endless growth of anon_vma hierarchy")
      Signed-off-by: default avatarLeon Yu <chianglungyu@gmail.com>
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5bc650f0
    • Konstantin Khlebnikov's avatar
      mm: fix corner case in anon_vma endless growing prevention · f94faaaa
      Konstantin Khlebnikov authored
      [ Upstream commit b800c91a ]
      
      Fix for BUG_ON(anon_vma->degree) splashes in unlink_anon_vmas() ("kernel
      BUG at mm/rmap.c:399!") caused by commit 7a3ef208 ("mm: prevent
      endless growth of anon_vma hierarchy")
      
      Anon_vma_clone() is usually called for a copy of source vma in
      destination argument.  If source vma has anon_vma it should be already
      in dst->anon_vma.  NULL in dst->anon_vma is used as a sign that it's
      called from anon_vma_fork().  In this case anon_vma_clone() finds
      anon_vma for reusing.
      
      Vma_adjust() calls it differently and this breaks anon_vma reusing
      logic: anon_vma_clone() links vma to old anon_vma and updates degree
      counters but vma_adjust() overrides vma->anon_vma right after that.  As
      a result final unlink_anon_vmas() decrements degree for wrong anon_vma.
      
      This patch assigns ->anon_vma before calling anon_vma_clone().
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-and-tested-by: default avatarChris Clayton <chris2553@googlemail.com>
      Reported-and-tested-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Reported-and-tested-by: default avatarChih-Wei Huang <cwhuang@android-x86.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Daniel Forrest <dan.forrest@ssec.wisc.edu>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: stable@vger.kernel.org  # to match back-porting of 7a3ef208Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f94faaaa
    • Konstantin Khlebnikov's avatar
      mm: prevent endless growth of anon_vma hierarchy · 3f161800
      Konstantin Khlebnikov authored
      [ Upstream commit 7a3ef208 ]
      
      Constantly forking task causes unlimited grow of anon_vma chain.  Each
      next child allocates new level of anon_vmas and links vma to all
      previous levels because pages might be inherited from any level.
      
      This patch adds heuristic which decides to reuse existing anon_vma
      instead of forking new one.  It adds counter anon_vma->degree which
      counts linked vmas and directly descending anon_vmas and reuses anon_vma
      if counter is lower than two.  As a result each anon_vma has either vma
      or at least two descending anon_vmas.  In such trees half of nodes are
      leafs with alive vmas, thus count of anon_vmas is no more than two times
      bigger than count of vmas.
      
      This heuristic reuses anon_vmas as few as possible because each reuse
      adds false aliasing among vmas and rmap walker ought to scan more ptes
      when it searches where page is might be mapped.
      
      Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu
      Fixes: 5beb4930 ("mm: change anon_vma linking to fix multi-process server scalability issue")
      [akpm@linux-foundation.org: fix typo, per Rik]
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-by: default avatarDaniel Forrest <dan.forrest@ssec.wisc.edu>
      Tested-by: default avatarMichal Hocko <mhocko@suse.cz>
      Tested-by: default avatarJerome Marchand <jmarchan@redhat.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>	[2.6.34+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3f161800
  2. 23 Apr, 2015 6 commits