1. 13 Feb, 2014 17 commits
  2. 06 Feb, 2014 23 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.29 · 15692657
      Greg Kroah-Hartman authored
      15692657
    • Borislav Petkov's avatar
      x86, cpu, amd: Add workaround for family 16h, erratum 793 · fcac46cc
      Borislav Petkov authored
      commit 3b564968 upstream.
      
      This adds the workaround for erratum 793 as a precaution in case not
      every BIOS implements it.  This addresses CVE-2013-6885.
      
      Erratum text:
      
      [Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
      document 51810 Rev. 3.04 November 2013]
      
      793 Specific Combination of Writes to Write Combined Memory Types and
      Locked Instructions May Cause Core Hang
      
      Description
      
      Under a highly specific and detailed set of internal timing
      conditions, a locked instruction may trigger a timing sequence whereby
      the write to a write combined memory type is not flushed, causing the
      locked instruction to stall indefinitely.
      
      Potential Effect on System
      
      Processor core hang.
      
      Suggested Workaround
      
      BIOS should set MSR
      C001_1020[15] = 1b.
      
      Fix Planned
      
      No fix planned
      
      [ hpa: updated description, fixed typo in MSR name ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnicTested-by: default avatarAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcac46cc
    • Paul Mackerras's avatar
      powerpc: Make sure "cache" directory is removed when offlining cpu · 04c12b68
      Paul Mackerras authored
      commit 91b973f9 upstream.
      
      The code in remove_cache_dir() is supposed to remove the "cache"
      subdirectory from the sysfs directory for a CPU when that CPU is
      being offlined.  It tries to do this by calling kobject_put() on
      the kobject for the subdirectory.  However, the subdirectory only
      gets removed once the last reference goes away, and the reference
      being put here may well not be the last reference.  That means
      that the "cache" subdirectory may still exist when the offlining
      operation has finished.  If the same CPU subsequently gets onlined,
      the code tries to add a new "cache" subdirectory.  If the old
      subdirectory has not yet been removed, we get a WARN_ON in the
      sysfs code, with stack trace, and an error message printed on the
      console.  Further, we ultimately end up with an online cpu with no
      "cache" subdirectory.
      
      This fixes it by doing an explicit kobject_del() at the point where
      we want the subdirectory to go away.  kobject_del() removes the sysfs
      directory even though the object still exists in memory.  The object
      will get freed at some point in the future.  A subsequent onlining
      operation can create a new sysfs directory, even if the old object
      still exists in memory, without causing any problems.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04c12b68
    • Srivatsa S. Bhat's avatar
      powerpc: Fix the setup of CPU-to-Node mappings during CPU online · df8042ba
      Srivatsa S. Bhat authored
      commit d4edc5b6 upstream.
      
      On POWER platforms, the hypervisor can notify the guest kernel about dynamic
      changes in the cpu-numa associativity (VPHN topology update). Hence the
      cpu-to-node mappings that we got from the firmware during boot, may no longer
      be valid after such updates. This is handled using the arch_update_cpu_topology()
      hook in the scheduler, and the sched-domains are rebuilt according to the new
      mappings.
      
      But unfortunately, at the moment, CPU hotplug ignores these updated mappings
      and instead queries the firmware for the cpu-to-numa relationships and uses
      them during CPU online. So the kernel can end up assigning wrong NUMA nodes
      to CPUs during subsequent CPU hotplug online operations (after booting).
      
      Further, a particularly problematic scenario can result from this bug:
      On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8)
      threads per core. The switch to Single-Threaded (ST) mode is performed by
      offlining all except the first CPU thread in each core. Switching back to
      SMT mode involves onlining those other threads back, in each core.
      
      Now consider this scenario:
      
      1. During boot, the kernel gets the cpu-to-node mappings from the firmware
         and assigns the CPUs to NUMA nodes appropriately, during CPU online.
      
      2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and
         communicates this update to the kernel. The kernel in turn updates its
         cpu-to-node associations and rebuilds its sched domains. Everything is
         fine so far.
      
      3. Now, the user switches the machine from SMT to ST mode (say, by running
         ppc64_cpu --smt=1). This involves offlining all except 1 thread in each
         core.
      
      4. The user then tries to switch back from ST to SMT mode (say, by running
         ppc64_cpu --smt=4), and this involves onlining those threads back. Since
         CPU hotplug ignores the new mappings, it queries the firmware and tries to
         associate the newly onlined sibling threads to the old NUMA nodes. This
         results in sibling threads within the same core getting associated with
         different NUMA nodes, which is incorrect.
      
         The scheduler's build-sched-domains code gets thoroughly confused with this
         and enters an infinite loop and causes soft-lockups, as explained in detail
         in commit 3be7db6a (powerpc: VPHN topology change updates all siblings).
      
      So to fix this, use the numa_cpu_lookup_table to remember the updated
      cpu-to-node mappings, and use them during CPU hotplug online operations.
      Further, we also need to ensure that all threads in a core are assigned to a
      common NUMA node, irrespective of whether all those threads were online during
      the topology update. To achieve this, we take care not to use cpu_sibling_mask()
      since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread()
      and set up the mappings manually using the 'threads_per_core' value for that
      particular platform. This helps us ensure that we don't hit this bug with any
      combination of CPU hotplug and SMT mode switching.
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df8042ba
    • David Sterba's avatar
      btrfs: restrict snapshotting to own subvolumes · f0cea52a
      David Sterba authored
      commit d0242061 upstream.
      
      Currently, any user can snapshot any subvolume if the path is accessible and
      thus indirectly create and keep files he does not own under his direcotries.
      This is not possible with traditional directories.
      
      In security context, a user can snapshot root filesystem and pin any
      potentially buggy binaries, even if the updates are applied.
      
      All the snapshots are visible to the administrator, so it's possible to
      verify if there are suspicious snapshots.
      
      Another more practical problem is that any user can pin the space used
      by eg. root and cause ENOSPC.
      
      Original report:
      https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/484786Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0cea52a
    • Wang Shilong's avatar
      Btrfs: handle EAGAIN case properly in btrfs_drop_snapshot() · 5c61a3d3
      Wang Shilong authored
      commit 90515e7f upstream.
      
      We may return early in btrfs_drop_snapshot(), we shouldn't
      call btrfs_std_err() for this case, fix it.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c61a3d3
    • Andy Grover's avatar
      target/iscsi: Fix network portal creation race · 85c3c54b
      Andy Grover authored
      commit ee291e63 upstream.
      
      When creating network portals rapidly, such as when restoring a
      configuration, LIO's code to reuse existing portals can return a false
      negative if the thread hasn't run yet and set np_thread_state to
      ISCSI_NP_THREAD_ACTIVE. This causes an error in the network stack
      when attempting to bind to the same address/port.
      
      This patch sets NP_THREAD_ACTIVE before the np is placed on g_np_list,
      so even if the thread hasn't run yet, iscsit_get_np will return the
      existing np.
      
      Also, convert np_lock -> np_mutex + hold across adding new net portal
      to g_np_list to prevent a race where two threads may attempt to create
      the same network portal, resulting in one of them failing.
      
      (nab: Add missing mutex_unlocks in iscsit_add_np failure paths)
      (DanC: Fix incorrect spin_unlock -> spin_unlock_bh)
      Signed-off-by: default avatarAndy Grover <agrover@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85c3c54b
    • Asias He's avatar
      virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze · 26996fcd
      Asias He authored
      commit f466f753 upstream.
      
      vqs are freed in virtscsi_freeze but the hotcpu_notifier is not
      unregistered. We will have a use-after-free usage when the notifier
      callback is called after virtscsi_freeze.
      
      Fixes: 285e71ea
      ("virtio-scsi: reset virtqueue affinity when doing cpu hotplug")
      Signed-off-by: default avatarAsias He <asias.hejun@gmail.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26996fcd
    • Vijaya Mohan Guvva's avatar
      SCSI: bfa: Chinook quad port 16G FC HBA claim issue · ed9d61e9
      Vijaya Mohan Guvva authored
      commit dcaf9aed upstream.
      
      Bfa driver crash is observed while pushing the firmware on to chinook
      quad port card due to uninitialized bfi_image_ct2 access which gets
      initialized only for CT2 ASIC based cards after request_firmware().
      For quard port chinook (CT2 ASIC based), bfi_image_ct2 is not getting
      initialized as there is no check for chinook PCI device ID before
      request_firmware and instead bfi_image_cb is initialized as it is the
      default case for card type check.
      
      This patch includes changes to read the right firmware for quad port chinook.
      Signed-off-by: default avatarVijaya Mohan Guvva <vmohan@brocade.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed9d61e9
    • Thomas Pugliese's avatar
      usb: core: get config and string descriptors for unauthorized devices · 6f7c6ef1
      Thomas Pugliese authored
      commit 83e83ecb upstream.
      
      There is no need to skip querying the config and string descriptors for
      unauthorized WUSB devices when usb_new_device is called.  It is allowed
      by WUSB spec.  The only action that needs to be delayed until
      authorization time is the set config.  This change allows user mode
      tools to see the config and string descriptors earlier in enumeration
      which is needed for some WUSB devices to function properly on Android
      systems.  It also reduces the amount of divergent code paths needed
      for WUSB devices.
      Signed-off-by: default avatarThomas Pugliese <thomas.pugliese@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f7c6ef1
    • Emmanuel Grumbach's avatar
      iwlwifi: pcie: fix interrupt coalescing for 7260 / 3160 · 6fb6cd45
      Emmanuel Grumbach authored
      commit 6960a059 upstream.
      
      We changed the timeout for the interrupt coealescing for
      calibration, but that wasn't effective since we changed
      that value back before loading the firmware. Since
      calibrations are notification from firmware and not Rx
      packets, this doesn't change anyway - the firmware will
      fire an interrupt straight away regardless of the interrupt
      coalescing value.
      Also, a HW issue has been discovered in 7000 devices series.
      The work around is to disable the new interrupt coalescing
      timeout feature - do this by setting bit 31 in
      CSR_INT_COALESCING.
      This has been fixed in 7265 which means that we can't rely
      on the device family and must have a hint in the iwl_cfg
      structure.
      
      Fixes: 99cd4714 ("iwlwifi: add 7000 series device configuration")
      Reviewed-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6fb6cd45
    • Stephen Warren's avatar
      ALSA: hda/hdmi - allow PIN_OUT to be dynamically enabled · 2b1461bb
      Stephen Warren authored
      (This is upstream 75fae117 "ALSA: hda/hdmi - allow PIN_OUT to be
      dynamically enabled", backported to stable 3.10 through 3.12. 3.13 and
      later can take the original patch.)
      
      Commit 384a48d7 "ALSA: hda: HDMI: Support codecs with fewer cvts
      than pins" dynamically enabled each pin widget's PIN_OUT only when the
      pin was actively in use. This was required on certain NVIDIA CODECs for
      correct operation. Specifically, if multiple pin widgets each had their
      mux input select the same audio converter widget and each pin widget had
      PIN_OUT enabled, then only one of the pin widgets would actually receive
      the audio, and often not the one the user wanted!
      
      However, this apparently broke some Intel systems, and commit
      6169b673 "ALSA: hda - Always turn on pins for HDMI/DP" reverted the
      dynamic setting of PIN_OUT. This in turn broke the afore-mentioned NVIDIA
      CODECs.
      
      This change supports either dynamic or static handling of PIN_OUT,
      selected by a flag set up during CODEC initialization. This flag is
      enabled for all recent NVIDIA GPUs.
      Reported-by: default avatarUosis <uosisl@gmail.com>
      Signed-off-by: default avatarStephen Warren <swarren@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b1461bb
    • Anssi Hannula's avatar
      ALSA: hda - hdmi: introduce patch_nvhdmi() · c6a3cab8
      Anssi Hannula authored
      (This is a backport of *part* of upstream 611885bc "ALSA: hda -
      hdmi: Disallow unsupported 2ch remapping on NVIDIA codecs" to stable
      3.10 through 3.12. Later stable already contain all of the original
      patch.)
      
      Mainline commit 611885bc "ALSA: hda - hdmi: Disallow unsupported 2ch
      remapping on NVIDIA codecs" introduces function patch_nvhdmi(). That
      function is edited by 75fae117 "ALSA: hda/hdmi - allow PIN_OUT to be
      dynamically enabled". In order to backport the PIN_OUT patch, I am first
      back-porting just the addition of function patch_nvhdmi(), so that the
      conflicts applying the PIN_OUT patch are simplified.
      
      Ideally, one might backport all of 611885bc. However, that commit
      doesn't apply to stable kernels, since it relies on a chain of other
      patches which implement new features.
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@iki.fi>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [swarren, extracted just a small part of the original patch]
      Signed-off-by: default avatarStephen Warren <swarren@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6a3cab8
    • Mihai Caraman's avatar
      KVM: PPC: e500: Fix bad address type in deliver_tlb_misss() · 5f03911e
      Mihai Caraman authored
      commit 70713fe3 upstream.
      
      Use gva_t instead of unsigned int for eaddr in deliver_tlb_miss().
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f03911e
    • Andreas Schwab's avatar
    • Helge Deller's avatar
      parisc: fix cache-flushing · 64a00996
      Helge Deller authored
      commit 57737c49 upstream.
      
      This commit:
      f8dae006: parisc: Ensure full cache coherency for kmap/kunmap
      caused negative caching side-effects, e.g. hanging processes with expect and
      too many inequivalent alias messages from flush_dcache_page() on Debian 5 systems.
      
      This patch now partly reverts it and has been in production use on our debian buildd
      makeservers since a week without any major problems.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64a00996
    • Emmanuel Grumbach's avatar
      iwlwifi: pcie: enable oscillator for L1 exit · 50920a1d
      Emmanuel Grumbach authored
      commit 2d93aee1 upstream.
      
      Enabling the oscillator consumes slightly more power (100uA)
      but allows to make sure that we exit from L1 on time.
      
      Not doing so might lead to a PCIe specification violation
      since we might wake up from L1 at the wrong time.
      This issue has been identified on 3160 and 7260 only.
      On older NICs L1 off is not enabled, on newer NICs (7265),
      the issue is fixed.
      
      When the bug occurs the user sees that the NIC has
      disappeared from the PCI bridge, any access to the device
      returns 0xff.
      
      This fixes:
      	https://bugzilla.kernel.org/show_bug.cgi?id=64541
      
      and has been extensively discussed here:
      	http://markmail.org/thread/mfmpzqt3r333n4bo
      
      Fixes: 99cd4714 ("iwlwifi: add 7000 series device configuration")
      Reported-and-tested-by: default avatarwzyboy <wzyboy@wzyboy.org>
      Reviewed-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      50920a1d
    • Nicolas Dichtel's avatar
      ip6tnl: fix double free of fb_tnl_dev on exit · 38c963f2
      Nicolas Dichtel authored
      [ No relevant upstream commit. ]
      
      This problem was fixed upstream by commit 1e9f3d6f ("ip6tnl: fix use after
      free of fb_tnl_dev").
      The upstream patch depends on upstream commit 0bd87628 ("ip6tnl: add x-netns
      support"), which was not backported into 3.10 branch.
      
      First, explain the problem: when the ip6_tunnel module is unloaded,
      ip6_tunnel_cleanup() is called.
      rmmod ip6_tunnel
      => ip6_tunnel_cleanup()
        => rtnl_link_unregister()
          => __rtnl_kill_links()
            => for_each_netdev(net, dev) {
              if (dev->rtnl_link_ops == ops)
              	ops->dellink(dev, &list_kill);
              }
      At this point, the FB device is deleted (and all ip6tnl tunnels).
        => unregister_pernet_device()
          => unregister_pernet_operations()
            => ops_exit_list()
              => ip6_tnl_exit_net()
                => ip6_tnl_destroy_tunnels()
                  => t = rtnl_dereference(ip6n->tnls_wc[0]);
                     unregister_netdevice_queue(t->dev, &list);
      We delete the FB device a second time here!
      
      The previous fix removes these lines, which fix this double free. But the patch
      introduces a memory leak when a netns is destroyed, because the FB device is
      never deleted. By adding an rtnl ops which delete all ip6tnl device excepting
      the FB device, we can keep this exlicit removal in ip6_tnl_destroy_tunnels().
      
      CC: Steven Rostedt <rostedt@goodmis.org>
      CC: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reported-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team)
      Tested-by: default avatar"Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
      Tested-by: default avatarJohn Kacur <jkacur@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38c963f2
    • Nicolas Dichtel's avatar
      Revert "ip6tnl: fix use after free of fb_tnl_dev" · 89ed31c6
      Nicolas Dichtel authored
      [ No relevant upstream commit. ]
      
      This reverts commit 22c3ec55.
      
      This patch is not the right fix, it introduces a memory leak when a netns is
      destroyed (the FB device is never deleted).
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reported-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team)
      Tested-by: default avatar"Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
      Tested-by: default avatarJohn Kacur <jkacur@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89ed31c6
    • Nicolas Dichtel's avatar
      sit: fix double free of fb_tunnel_dev on exit · 1b2a58ff
      Nicolas Dichtel authored
      [ No relevant upstream commit. ]
      
      This problem was fixed upstream by commit 9434266f ("sit: fix use after free
      of fb_tunnel_dev").
      The upstream patch depends on upstream commit 5e6700b3 ("sit: add support of
      x-netns"), which was not backported into 3.10 branch.
      
      First, explain the problem: when the sit module is unloaded, sit_cleanup() is
      called.
      rmmod sit
      => sit_cleanup()
        => rtnl_link_unregister()
          => __rtnl_kill_links()
            => for_each_netdev(net, dev) {
              if (dev->rtnl_link_ops == ops)
              	ops->dellink(dev, &list_kill);
              }
      At this point, the FB device is deleted (and all sit tunnels).
        => unregister_pernet_device()
          => unregister_pernet_operations()
            => ops_exit_list()
              => sit_exit_net()
                => sit_destroy_tunnels()
                In this function, no tunnel is found.
                => unregister_netdevice_queue(sitn->fb_tunnel_dev, &list);
      We delete the FB device a second time here!
      
      Because we cannot simply remove the second deletion (sit_exit_net() must remove
      the FB device when a netns is deleted), we add an rtnl ops which delete all sit
      device excepting the FB device and thus we can keep the explicit deletion in
      sit_exit_net().
      
      CC: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Reported-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team)
      Tested-by: default avatar"Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
      Tested-by: default avatarJohn Kacur <jkacur@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b2a58ff
    • Annie Li's avatar
      xen-netfront: fix resource leak in netfront · 64cee83a
      Annie Li authored
      [ Upstream commit cefe0078 ]
      
      This patch removes grant transfer releasing code from netfront, and uses
      gnttab_end_foreign_access to end grant access since
      gnttab_end_foreign_access_ref may fail when the grant entry is
      currently used for reading or writing.
      
      * clean up grant transfer code kept from old netfront(2.6.18) which grants
      pages for access/map and transfer. But grant transfer is deprecated in current
      netfront, so remove corresponding release code for transfer.
      
      * fix resource leak, release grant access (through gnttab_end_foreign_access)
      and skb for tx/rx path, use get_page to ensure page is released when grant
      access is completed successfully.
      
      Xen-blkfront/xen-tpmfront/xen-pcifront also have similar issue, but patches
      for them will be created separately.
      
      V6: Correct subject line and commit message.
      
      V5: Remove unecessary change in xennet_end_access.
      
      V4: Revert put_page in gnttab_end_foreign_access, and keep netfront change in
      single patch.
      
      V3: Changes as suggestion from David Vrabel, ensure pages are not freed untill
      grant acess is ended.
      
      V2: Improve patch comments.
      Signed-off-by: default avatarAnnie Li <annie.li@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64cee83a
    • Holger Eitzenberger's avatar
      net: Fix memory leak if TPROXY used with TCP early demux · 873c4941
      Holger Eitzenberger authored
      [ Upstream commit a452ce34 ]
      
      I see a memory leak when using a transparent HTTP proxy using TPROXY
      together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):
      
      unreferenced object 0xffff88008cba4a40 (size 1696):
        comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
        hex dump (first 32 bytes):
          0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
          02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9
          [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5
          [<ffffffff812702cf>] sk_clone_lock+0x14/0x283
          [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b
          [<ffffffff8129a893>] netlink_broadcast+0x14/0x16
          [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3
          [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d
          [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0
          [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e
          [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55
          [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725
          [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154
          [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514
          [<ffffffff8127aa77>] process_backlog+0xee/0x1c5
          [<ffffffff8127c949>] net_rx_action+0xa7/0x200
          [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157
      
      But there are many more, resulting in the machine going OOM after some
      days.
      
      From looking at the TPROXY code, and with help from Florian, I see
      that the memory leak is introduced in tcp_v4_early_demux():
      
        void tcp_v4_early_demux(struct sk_buff *skb)
        {
          /* ... */
      
          iph = ip_hdr(skb);
          th = tcp_hdr(skb);
      
          if (th->doff < sizeof(struct tcphdr) / 4)
              return;
      
          sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
                             iph->saddr, th->source,
                             iph->daddr, ntohs(th->dest),
                             skb->skb_iif);
          if (sk) {
              skb->sk = sk;
      
      where the socket is assigned unconditionally to skb->sk, also bumping
      the refcnt on it.  This is problematic, because in our case the skb
      has already a socket assigned in the TPROXY target.  This then results
      in the leak I see.
      
      The very same issue seems to be with IPv6, but haven't tested.
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      873c4941
    • Oliver Hartkopp's avatar
      fib_frontend: fix possible NULL pointer dereference · 7dd52e5d
      Oliver Hartkopp authored
      [ Upstream commit a0065f26 ]
      
      The two commits 0115e8e3 (net: remove delay at device dismantle) and
      748e2d93 (net: reinstate rtnl in call_netdevice_notifiers()) silently
      removed a NULL pointer check for in_dev since Linux 3.7.
      
      This patch re-introduces this check as it causes crashing the kernel when
      setting small mtu values on non-ip capable netdevices.
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7dd52e5d