1. 24 Oct, 2014 17 commits
    • Fabian Frederick's avatar
    • Fabian Frederick's avatar
      lapb: move EXPORT_SYMBOL after functions. · 75da1469
      Fabian Frederick authored
      See Documentation/CodingStyle Chapter 6
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75da1469
    • David S. Miller's avatar
      Merge branch 'berlin_ethernet' · 5f3619f2
      David S. Miller authored
      Sebastian Hesselbarth says:
      
      ====================
      Marvell PXA168 libphy handling and Berlin Ethernet
      
      This patch series deals with a removing a IP feature that can be found
      on all currently supported Marvell Ethernet IP (pxa168_eth, mv643xx_eth,
      mvneta). The MAC IP allows to automatically perform PHY auto-negotiation
      without software interaction.
      
      However, this feature (a) fundamentally clashes with the way libphy works
      and (b) is unable to deal with quirky PHYs that require special treatment.
      In this series, pxa168_eth driver is rewritten to completely disable that
      feature and properly deal with libphy provided PHYs.
      
      As usual, a branch on top of v3.18-rc1 can be found at
      
      git://git.infradead.org/users/hesselba/linux-berlin.git devel/bg2-bg2cd-eth-v2
      
      Patches 1-5 should go through David's net tree, I'll pick up the DT patches
      6-9.
      
      There have been some changes,
      compared to the RFT
      - added phy-connection-type property to BG2Q PHY DT node
      - bail out from pxa168_eth_adjust_link when there is no change in
        PHY parameters. Also, add a call to phy_print_status.
      compared to v1
      - move phy-connection-type to ethernet node instead of PHY node
      
      Patch 1 adds support for Marvell 88E3016 FastEthernet PHY that is also
      integrated in Marvell Berlin BG2/BG2CD SoCs.
      
      Patch 2 allows to pass phy_interface_t on pxa168_eth platform_data that
      is only used by mach-mmp/gplug. From the board setup, I guessed gplug's
      PHY is connected via RMII. The patch still isn't even compile tested.
      
      Patches 3-5 prepare proper libphy handling and finally remove all in-driver
      PHY mangling related to the feature explained above.
      
      Patches 6-9 add corresponding ethernet DT nodes to BG2, BG2CD, add a
      phy-connection-type property to BG2Q and enable ethernet on BG2-based Sony
      NSZ-GS7. I have tested all this on GS7 successfully with ip=dhcp on 100M FD.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f3619f2
    • Sebastian Hesselbarth's avatar
      net: pxa168_eth: Remove in-driver PHY mangling · 9ff32fe1
      Sebastian Hesselbarth authored
      With properly using libphy PHYs now, remove the in-driver PHY
      mangling.
      Tested-by: default avatarAntoine Ténart <antoine.tenart@free-electrons.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ff32fe1
    • Sebastian Hesselbarth's avatar
      net: pxa168_eth: Remove HW auto-negotiaion · 1a149132
      Sebastian Hesselbarth authored
      Marvell Ethernet IP supports PHY negotiation driven by HW. This
      fundamentally clashes with libphy (software) driven negotiation and
      also cannot cope with quirky PHYs. Therefore, always disable any HW
      negotiation features and properly use libphy's phy_device.
      Tested-by: default avatarAntoine Ténart <antoine.tenart@free-electrons.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a149132
    • Sebastian Hesselbarth's avatar
      net: pxa168_eth: Prepare proper libphy handling · 9d8ea73d
      Sebastian Hesselbarth authored
      Current libphy handling in pxa168_eth lacks proper phy_connect. Prepare
      to fix this by first moving phy properties from platform_data to private
      driver data.
      Tested-by: default avatarAntoine Ténart <antoine.tenart@free-electrons.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d8ea73d
    • Sebastian Hesselbarth's avatar
      net: pxa168_eth: Provide phy_interface mode on platform_data · e7de17ab
      Sebastian Hesselbarth authored
      The PXA168 Ethernet IP support MII and RMII connection to its PHY.
      Currently, pxa168 platform_data does not provide a way to pass that
      and there is one user of pxa168 platform_data (mach-mmp/gplug).
      Given the pinctrl settings of gplug it uses RMII, so add and pass
      a corresponding phy_interface_t.
      Tested-by: default avatarAntoine Ténart <antoine.tenart@free-electrons.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7de17ab
    • Sebastian Hesselbarth's avatar
      phy: marvell: Add support for 88E3016 FastEthernet PHY · 6b358aed
      Sebastian Hesselbarth authored
      Marvell 88E3016 is a FastEthernet PHY that also can be found in Marvell
      Berlin SoCs as integrated PHY.
      Tested-by: default avatarAntoine Ténart <antoine.tenart@free-electrons.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b358aed
    • Geert Uytterhoeven's avatar
      natsemi/macsonic: Remove superfluous interrupt disable/restore · d4c3363e
      Geert Uytterhoeven authored
      As of commit e4dc601b ("m68k: Disable/restore interrupts in
      hwreg_present()/hwreg_write()"), this is no longer needed.
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4c3363e
    • Geert Uytterhoeven's avatar
      cirrus/mac89x0: Remove superfluous interrupt disable/restore · 7f30b742
      Geert Uytterhoeven authored
      As of commit e4dc601b ("m68k: Disable/restore interrupts in
      hwreg_present()/hwreg_write()"), this is no longer needed.
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f30b742
    • Rasmus Villemoes's avatar
      net: typhoon: Remove redundant casts · 00fd5d94
      Rasmus Villemoes authored
      Both image_data and typhoon_fw->data are const u8*, so the cast to u8*
      is unnecessary and confusing.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: default avatarDavid Dillow <dave@thedillows.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00fd5d94
    • Sébastien Barré's avatar
      Removed unused function sctp_addr_is_valid() · 16704b12
      Sébastien Barré authored
      sctp_addr_is_valid() only appeared in its definition.
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarSébastien Barré <sebastien.barre@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16704b12
    • David S. Miller's avatar
      Merge branch 'ipv6_route' · fad71e4a
      David S. Miller authored
      Martin KaFai Lau says:
      
      ====================
      ipv6: Reduce the number of fib6_lookup() calls from ip6_pol_route()
      
      This patch set is trying to reduce the number of fib6_lookup()
      calls from ip6_pol_route().
      
      I have adapted davem's udpflooda and kbench_mod test
      (https://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git) to
      support IPv6 and here is the result:
      
      Before:
      [root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done
      
      real    0m34.190s
      user    0m3.047s
      sys     0m31.108s
      
      real    0m34.635s
      user    0m3.125s
      sys     0m31.475s
      
      real    0m34.517s
      user    0m3.034s
      sys     0m31.449s
      
      [root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2
      [  660.160976] ip6_route_kbench: ip6_route_output tdiff: 933
      [  660.207261] ip6_route_kbench: ip6_route_output tdiff: 988
      [  660.253492] ip6_route_kbench: ip6_route_output tdiff: 896
      [  660.298862] ip6_route_kbench: ip6_route_output tdiff: 898
      
      After:
      [root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done
      
      real    0m32.695s
      user    0m2.925s
      sys     0m29.737s
      
      real    0m32.636s
      user    0m3.007s
      sys     0m29.596s
      
      real    0m32.797s
      user    0m2.866s
      sys     0m29.898s
      
      [root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2
      [  881.220793] ip6_route_kbench: ip6_route_output tdiff: 684
      [  881.253477] ip6_route_kbench: ip6_route_output tdiff: 640
      [  881.286867] ip6_route_kbench: ip6_route_output tdiff: 630
      [  881.320749] ip6_route_kbench: ip6_route_output tdiff: 653
      
      /****************************** udpflood.c ******************************/
      /* It is an adaptation of the Eric Dumazet's and David Miller's
       * udpflood tool, by adding IPv6 support.
       */
      
      typedef uint32_t u32;
      
      static int debug =3D 0;
      
      /* Allow -fstrict-aliasing */
      typedef union sa_u {
      	struct sockaddr_storage a46;
      	struct sockaddr_in a4;
      	struct sockaddr_in6 a6;
      } sa_u;
      
      static int usage(void)
      {
      	printf("usage: udpflood [ -l count ] [ -m message_size ] [ -c num_ip_addrs=
       ] IP_ADDRESS\n");
      	return -1;
      }
      
      static u32 get_last32h(const sa_u *sa)
      {
      	if (sa->a46.ss_family =3D=3D PF_INET)
      		return ntohl(sa->a4.sin_addr.s_addr);
      	else
      		return ntohl(sa->a6.sin6_addr.s6_addr32[3]);
      }
      
      static void set_last32h(sa_u *sa, u32 last32h)
      {
      	if (sa->a46.ss_family =3D=3D PF_INET)
      		sa->a4.sin_addr.s_addr =3D htonl(last32h);
      	else
      		sa->a6.sin6_addr.s6_addr32[3] =3D htonl(last32h);
      }
      
      static void print_saddr(const sa_u *sa, const char *msg)
      {
      	char buf[64];
      
      	if (!debug)
      		return;
      
      	switch (sa->a46.ss_family) {
      	case PF_INET:
      		inet_ntop(PF_INET, &(sa->a4.sin_addr.s_addr), buf,
      			  sizeof(buf));
      		break;
      	case PF_INET6:
      		inet_ntop(PF_INET6, &(sa->a6.sin6_addr), buf, sizeof(buf));
      		break;
      	}
      
      	printf("%s: %s\n", msg, buf);
      }
      
      static int send_packets(const sa_u *sa, size_t num_addrs, int count, int ms=
      g_sz)
      {
      	char *msg =3D malloc(msg_sz);
      	sa_u saddr;
      	u32 start_addr32h, end_addr32h, cur_addr32h;
      	int fd, i, err;
      
      	if (!msg)
      		return -ENOMEM;
      
      	memset(msg, 0, msg_sz);
      
      	memcpy(&saddr, sa, sizeof(saddr));
      	cur_addr32h =3D start_addr32h =3D get_last32h(&saddr);
      	end_addr32h =3D start_addr32h + num_addrs;
      
      	fd =3D socket(saddr.a46.ss_family, SOCK_DGRAM, 0);
      	if (fd < 0) {
      		perror("socket");
      		err =3D fd;
      		goto out_nofd;
      	}
      
      	/* connect to avoid the kernel spending time in figuring
      	 * out the source address (i.e pin the src address)
      	 */
      	err =3D connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
      	if (err < 0) {
      		perror("connect");
      		goto out;
      	}
      
      	print_saddr(&saddr, "start_addr");
      	for (i =3D 0; i < count; i++) {
      		print_saddr(&saddr, "sendto");
      		err =3D sendto(fd, msg, msg_sz, 0, (struct sockaddr *)&saddr,
      			     sizeof(saddr));
      		if (err < 0) {
      			perror("sendto");
      			goto out;
      		}
      
      		if (++cur_addr32h >=3D end_addr32h)
      			cur_addr32h =3D start_addr32h;
      		set_last32h(&saddr, cur_addr32h);
      	}
      
      	err =3D 0;
      out:
      	close(fd);
      out_nofd:
      	free(msg);
      	return err;
      }
      
      int main(int argc, char **argv, char **envp)
      {
      	int port, msg_sz, count, num_addrs, ret;
      
      	sa_u start_addr;
      
      	port =3D 6000;
      	msg_sz =3D 32;
      	count =3D 10000000;
      	num_addrs =3D 1;
      
      	while ((ret =3D getopt(argc, argv, "dl:s:p:c:")) >=3D 0) {
      		switch (ret) {
      		case 'l':
      			sscanf(optarg, "%d", &count);
      			break;
      		case 's':
      			sscanf(optarg, "%d", &msg_sz);
      			break;
      		case 'p':
      			sscanf(optarg, "%d", &port);
      			break;
      		case 'c':
      			sscanf(optarg, "%d", &num_addrs);
      			break;
      		case 'd':
      			debug =3D 1;
      			break;
      		case '?':
      			return usage();
      		}
      	}
      
      	if (num_addrs < 1)
      		return usage();
      
      	if (!argv[optind])
      		return usage();
      
      	start_addr.a4.sin_port =3D htons(port);
      	if (inet_pton(PF_INET, argv[optind], &start_addr.a4.sin_addr))
      		start_addr.a46.ss_family =3D PF_INET;
      	else if (inet_pton(PF_INET6, argv[optind], &start_addr.a6.sin6_addr.s6_add=
      r))
      		start_addr.a46.ss_family =3D PF_INET6;
      	else
      		return usage();
      
      	return send_packets(&start_addr, num_addrs, count, msg_sz);
      }
      
      /****************** ip6_route_kbench_mod.c ******************/
      
      /* We can't just use "get_cycles()" as on some platforms, such
       * as sparc64, that gives system cycles rather than cpu clock
       * cycles.
       */
      
      static inline unsigned long long get_tick(void)
      {
      	unsigned long long t;
      
      	__asm__ __volatile__("rd %%tick, %0" : "=r" (t));
      	return t;
      }
      static inline unsigned long long get_tick(void)
      {
      	unsigned long long t;
      
      	rdtscll(t);
      
      	return t;
      }
      static inline unsigned long long get_tick(void)
      {
      	return get_cycles();
      }
      
      static int flow_oif = DEFAULT_OIF;
      static int flow_iif = DEFAULT_IIF;
      static u32 flow_mark = DEFAULT_MARK;
      static struct in6_addr flow_dst_ip_addr;
      static struct in6_addr flow_src_ip_addr;
      static int flow_tos = DEFAULT_TOS;
      
      static char dst_string[64];
      static char src_string[64];
      
      module_param_string(dst, dst_string, sizeof(dst_string), 0);
      module_param_string(src, src_string, sizeof(src_string), 0);
      
      static int __init flow_setup(void)
      {
      	if (dst_string[0] &&
      	    !in6_pton(dst_string, -1, &flow_dst_ip_addr.s6_addr[0], -1, NULL)) {
      		pr_info("cannot parse \"%s\"\n", dst_string);
      		return -1;
      	}
      
      	if (src_string[0] &&
      	    !in6_pton(src_string, -1, &flow_src_ip_addr.s6_addr[0], -1, NULL)) {
      		pr_info("cannot parse \"%s\"\n", dst_string);
      		return -1;
      	}
      
      	return 0;
      }
      
      module_param_named(oif, flow_oif, int, 0);
      module_param_named(iif, flow_iif, int, 0);
      module_param_named(mark, flow_mark, uint, 0);
      module_param_named(tos, flow_tos, int, 0);
      
      static int warmup_count = DEFAULT_WARMUP_COUNT;
      module_param_named(count, warmup_count, int, 0);
      
      static void flow_init(struct flowi6 *fl6)
      {
      	memset(fl6, 0, sizeof(*fl6));
      	fl6->flowi6_proto = IPPROTO_ICMPV6;
      	fl6->flowi6_oif = flow_oif;
      	fl6->flowi6_iif = flow_iif;
      	fl6->flowi6_mark = flow_mark;
      	fl6->flowi6_tos = flow_tos;
      	fl6->daddr = flow_dst_ip_addr;
      	fl6->saddr = flow_src_ip_addr;
      }
      
      static struct sk_buff * fake_skb_get(void)
      {
      	struct ipv6hdr *hdr;
      	struct sk_buff *skb;
      
      	skb = alloc_skb(4096, GFP_KERNEL);
      	if (!skb) {
      		pr_info("Cannot alloc SKB for test\n");
      		return NULL;
      	}
      	skb->dev = __dev_get_by_index(&init_net, flow_iif);
      	if (skb->dev == NULL) {
      		pr_info("Input device (%d) does not exist\n", flow_iif);
      		goto err;
      	}
      
      	skb_reset_mac_header(skb);
      	skb_reset_network_header(skb);
      	skb_reserve(skb, MAX_HEADER + sizeof(struct ipv6hdr));
      	hdr = ipv6_hdr(skb);
      
      	hdr->priority = 0;
      	hdr->version = 6;
      	memset(hdr->flow_lbl, 0, sizeof(hdr->flow_lbl));
      	hdr->payload_len = htons(sizeof(struct icmp6hdr));
      	hdr->nexthdr = IPPROTO_ICMPV6;
      	hdr->saddr = flow_src_ip_addr;
      	hdr->daddr = flow_dst_ip_addr;
      	skb->protocol = htons(ETH_P_IPV6);
      	skb->mark = flow_mark;
      
      	return skb;
      err:
      	kfree_skb(skb);
      	return NULL;
      }
      
      static void do_full_output_lookup_bench(void)
      {
      	unsigned long long t1, t2, tdiff;
      	struct rt6_info *rt;
      	struct flowi6 fl6;
      	int i;
      
      	rt = NULL;
      
      	for (i = 0; i < warmup_count; i++) {
      		flow_init(&fl6);
      
      		rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl6);
      		if (IS_ERR(rt))
      			break;
      		ip6_rt_put(rt);
      	}
      	if (IS_ERR(rt)) {
      		pr_info("ip_route_output_key: err=%ld\n", PTR_ERR(rt));
      		return;
      	}
      
      	flow_init(&fl6);
      
      	t1 = get_tick();
      	rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl6);
      	t2 = get_tick();
      	if (!IS_ERR(rt))
      		ip6_rt_put(rt);
      
      	tdiff = t2 - t1;
      	pr_info("ip6_route_output tdiff: %llu\n", tdiff);
      }
      
      static void do_full_input_lookup_bench(void)
      {
      	unsigned long long t1, t2, tdiff;
      	struct sk_buff *skb;
      	struct rt6_info *rt;
      	int err, i;
      
      	skb = fake_skb_get();
      	if (skb == NULL)
      		goto out_free;
      
      	err = 0;
      	local_bh_disable();
      	for (i = 0; i < warmup_count; i++) {
      		ip6_route_input(skb);
      		rt = (struct rt6_info *)skb_dst(skb);
      		err = (!rt || rt == init_net.ipv6.ip6_null_entry);
      		skb_dst_drop(skb);
      		if (err)
      			break;
      	}
      	local_bh_enable();
      
      	if (err) {
      		pr_info("Input route lookup fails\n");
      		goto out_free;
      	}
      
      	local_bh_disable();
      	t1 = get_tick();
      	ip6_route_input(skb);
      	t2 = get_tick();
      	local_bh_enable();
      
      	rt = (struct rt6_info *)skb_dst(skb);
      	err = (!rt || rt == init_net.ipv6.ip6_null_entry);
      	skb_dst_drop(skb);
      	if (err) {
      		pr_info("Input route lookup fails\n");
      		goto out_free;
      	}
      
      	tdiff = t2 - t1;
      	pr_info("ip6_route_input tdiff: %llu\n", tdiff);
      
      out_free:
      	kfree_skb(skb);
      }
      
      static void do_full_lookup_bench(void)
      {
      	if (!flow_iif)
      		do_full_output_lookup_bench();
      	else
      		do_full_input_lookup_bench();
      }
      
      static void do_bench(void)
      {
      	do_full_lookup_bench();
      	do_full_lookup_bench();
      	do_full_lookup_bench();
      	do_full_lookup_bench();
      }
      
      static int __init kbench_init(void)
      {
      	if (flow_setup())
      		return -EINVAL;
      
      	pr_info("flow [IIF(%d),OIF(%d),MARK(0x%08x),D("IP6_FMT"),"
      		"S("IP6_FMT"),TOS(0x%02x)]\n",
      		flow_iif, flow_oif, flow_mark,
      		IP6_PRT(flow_dst_ip_addr),
      		IP6_PRT(flow_src_ip_addr),
      		flow_tos);
      
      	if (!cpu_has_tsc) {
      		pr_err("X86 TSC is required, but is unavailable.\n");
      		return -EINVAL;
      	}
      
      	pr_info("sizeof(struct rt6_info)==%zu\n", sizeof(struct rt6_info));
      
      	do_bench();
      
      	return -ENODEV;
      }
      
      static void __exit kbench_exit(void)
      {
      }
      
      module_init(kbench_init);
      module_exit(kbench_exit);
      MODULE_LICENSE("GPL");
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fad71e4a
    • Martin KaFai Lau's avatar
      ipv6: Avoid redoing fib6_lookup() with reachable = 0 by saving fn · 367efcb9
      Martin KaFai Lau authored
      This patch save the fn before doing rt6_backtrack.
      Hence, without redo-ing the fib6_lookup(), saved_fn can be used
      to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off.
      
      Some minor changes I think make sense to review as a single patch:
      * Remove the 'out:' goto label.
      * Remove the 'reachable' variable. Only use the 'strict' variable instead.
      
      After this patch, "failing ip6_ins_rt()" should be the only case that
      requires a redo of fib6_lookup().
      
      Cc: David Miller <davem@davemloft.net>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      367efcb9
    • Martin KaFai Lau's avatar
      ipv6: Avoid redoing fib6_lookup() for RTF_CACHE hit case · 94c77bb4
      Martin KaFai Lau authored
      When there is a RTF_CACHE hit, no need to redo fib6_lookup()
      with reachable=0.
      
      Cc: David Miller <davem@davemloft.net>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94c77bb4
    • Martin KaFai Lau's avatar
      ipv6: Remove BACKTRACK macro · a3c00e46
      Martin KaFai Lau authored
      It is the prep work to reduce the number of calls to fib6_lookup().
      
      The BACKTRACK macro could be hard-to-read and error-prone due to
      its side effects (mainly goto).
      
      This patch is to:
      1. Replace BACKTRACK macro with a function (fib6_backtrack) with the following
         return values:
         * If it is backtrack-able, returns next fn for retry.
         * If it reaches the root, returns NULL.
      2. The caller needs to decide if a backtrack is needed (by testing
         rt == net->ipv6.ip6_null_entry).
      3. Rename the goto labels in ip6_pol_route() to make the next few
         patches easier to read.
      
      Cc: David Miller <davem@davemloft.net>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3c00e46
    • Kenjiro Nakayama's avatar
      net: Remove trailing whitespace in tcp.h icmp.c syncookies.c · 105970f6
      Kenjiro Nakayama authored
      Remove trailing whitespace in tcp.h icmp.c syncookies.c
      Signed-off-by: default avatarKenjiro Nakayama <nakayamakenjiro@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      105970f6
  2. 22 Oct, 2014 13 commits
  3. 21 Oct, 2014 5 commits
    • Ying Xue's avatar
      tipc: fix lockdep warning when intra-node messages are delivered · 1a194c2d
      Ying Xue authored
      When running tipcTC&tipcTS test suite, below lockdep unsafe locking
      scenario is reported:
      
      [ 1109.997854]
      [ 1109.997988] =================================
      [ 1109.998290] [ INFO: inconsistent lock state ]
      [ 1109.998575] 3.17.0-rc1+ #113 Not tainted
      [ 1109.998762] ---------------------------------
      [ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [ 1109.998762]  (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
      [ 1109.998762] {SOFTIRQ-ON-W} state was registered at:
      [ 1109.998762]   [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80
      [ 1109.998762]   [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
      [ 1109.998762]   [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
      [ 1109.998762]   [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
      [ 1109.998762]   [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc]
      [ 1109.998762]   [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc]
      [ 1109.998762]   [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc]
      [ 1109.998762]   [<ffffffff817676ee>] SYSC_connect+0xae/0xc0
      [ 1109.998762]   [<ffffffff81767b7e>] SyS_connect+0xe/0x10
      [ 1109.998762]   [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200
      [ 1109.998762]   [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f
      [ 1109.998762] irq event stamp: 241060
      [ 1109.998762] hardirqs last  enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0
      [ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0
      [ 1109.998762] softirqs last  enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50
      [ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0
      [ 1109.998762]
      [ 1109.998762] other info that might help us debug this:
      [ 1109.998762]  Possible unsafe locking scenario:
      [ 1109.998762]
      [ 1109.998762]        CPU0
      [ 1109.998762]        ----
      [ 1109.998762]   lock(slock-AF_TIPC);
      [ 1109.998762]   <Interrupt>
      [ 1109.998762]     lock(slock-AF_TIPC);
      [ 1109.998762]
      [ 1109.998762]  *** DEADLOCK ***
      [ 1109.998762]
      [ 1109.998762] 2 locks held by swapper/7/0:
      [ 1109.998762]  #0:  (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70
      [ 1109.998762]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc]
      [ 1109.998762]
      [ 1109.998762] stack backtrace:
      [ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113
      [ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [ 1109.998762]  ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007
      [ 1109.998762]  ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001
      [ 1109.998762]  ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000
      [ 1109.998762] Call Trace:
      [ 1109.998762]  <IRQ>  [<ffffffff81a209eb>] dump_stack+0x4e/0x68
      [ 1109.998762]  [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202
      [ 1109.998762]  [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50
      [ 1109.998762]  [<ffffffff810a406c>] mark_lock+0x28c/0x2f0
      [ 1109.998762]  [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0
      [ 1109.998762]  [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80
      [ 1109.998762]  [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10
      [ 1109.998762]  [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0
      [ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
      [ 1109.998762]  [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0
      [ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
      [ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
      [ 1109.998762]  [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
      [ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
      [ 1109.998762]  [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0
      [ 1109.998762]  [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
      [ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
      [ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
      [ 1109.998762]  [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
      [ 1109.998762]  [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc]
      [ 1109.998762]  [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc]
      [ 1109.998762]  [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc]
      [ 1109.998762]  [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70
      [ 1109.998762]  [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70
      [ 1109.998762]  [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0
      [ 1109.998762]  [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70
      [ 1109.998762]  [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0
      [ 1109.998762]  [<ffffffff81785518>] napi_gro_receive+0xd8/0x240
      [ 1109.998762]  [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530
      [ 1109.998762]  [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0
      [ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
      [ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
      [ 1109.998762]  [<ffffffff817842b1>] net_rx_action+0x141/0x310
      [ 1109.998762]  [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150
      [ 1109.998762]  [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0
      [ 1109.998762]  [<ffffffff8105a626>] irq_exit+0x96/0xc0
      [ 1109.998762]  [<ffffffff81a30d07>] do_IRQ+0x67/0x110
      [ 1109.998762]  [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f
      [ 1109.998762]  <EOI>  [<ffffffff8100d2b7>] ? default_idle+0x37/0x250
      [ 1109.998762]  [<ffffffff8100d2b5>] ? default_idle+0x35/0x250
      [ 1109.998762]  [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20
      [ 1109.998762]  [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0
      [ 1109.998762]  [<ffffffff81034c78>] start_secondary+0x188/0x1f0
      
      When intra-node messages are delivered from one process to another
      process, tipc_link_xmit() doesn't disable BH before it directly calls
      tipc_sk_rcv() on process context to forward messages to destination
      socket. Meanwhile, if messages delivered by remote node arrive at the
      node and their destinations are also the same socket, tipc_sk_rcv()
      running on process context might be preempted by tipc_sk_rcv() running
      BH context. As a result, the latter cannot obtain the socket lock as
      the lock was obtained by the former, however, the former has no chance
      to be run as the latter is owning the CPU now, so headlock happens. To
      avoid it, BH should be always disabled in tipc_sk_rcv().
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a194c2d
    • Ying Xue's avatar
      tipc: fix a potential deadlock · 7b8613e0
      Ying Xue authored
      Locking dependency detected below possible unsafe locking scenario:
      
                 CPU0                          CPU1
      T0:  tipc_named_rcv()                tipc_rcv()
      T1:  [grab nametble write lock]*     [grab node lock]*
      T2:  tipc_update_nametbl()           tipc_node_link_up()
      T3:  tipc_nodesub_subscribe()        tipc_nametbl_publish()
      T4:  [grab node lock]*               [grab nametble write lock]*
      
      The opposite order of holding nametbl write lock and node lock on
      above two different paths may result in a deadlock. If we move the
      the updating of the name table after link state named out of node
      lock, the reverse order of holding locks will be eliminated, and
      as a result, the deadlock risk.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b8613e0
    • David S. Miller's avatar
      Merge branch 'enic' · 73829bf6
      David S. Miller authored
      Govindarajulu Varadarajan says:
      
      ====================
      enic: Bug fixes
      
      This series fixes the following problem.
      
      Please apply this to net.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73829bf6
    • Govindarajulu Varadarajan's avatar
      enic: Do not call napi_disable when preemption is disabled. · 39dc90c1
      Govindarajulu Varadarajan authored
      In enic_stop, we disable preemption using local_bh_disable(). We disable
      preemption to wait for busy_poll to finish.
      
      napi_disable should not be called here as it might sleep.
      
      Moving napi_disable() call out side of local_bh_disable.
      
      BUG: sleeping function called from invalid context at include/linux/netdevice.h:477
      in_atomic(): 1, irqs_disabled(): 0, pid: 443, name: ifconfig
      INFO: lockdep is turned off.
      Preemption disabled at:[<ffffffffa029c5c4>] enic_rfs_flw_tbl_free+0x34/0xd0 [enic]
      
      CPU: 31 PID: 443 Comm: ifconfig Not tainted 3.17.0-netnext-05504-g59f35b81 #268
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       ffff8800dac10000 ffff88020b8dfcb8 ffffffff8148a57c 0000000000000000
       ffff88020b8dfcd0 ffffffff8107e253 ffff8800dac12a40 ffff88020b8dfd10
       ffffffffa029305b ffff88020b8dfd48 ffff8800dac10000 ffff88020b8dfd48
      Call Trace:
       [<ffffffff8148a57c>] dump_stack+0x4e/0x7a
       [<ffffffff8107e253>] __might_sleep+0x123/0x1a0
       [<ffffffffa029305b>] enic_stop+0xdb/0x4d0 [enic]
       [<ffffffff8138ed7d>] __dev_close_many+0x9d/0xf0
       [<ffffffff8138ef81>] __dev_close+0x31/0x50
       [<ffffffff813974a8>] __dev_change_flags+0x98/0x160
       [<ffffffff81397594>] dev_change_flags+0x24/0x60
       [<ffffffff814085fd>] devinet_ioctl+0x63d/0x710
       [<ffffffff81139c16>] ? might_fault+0x56/0xc0
       [<ffffffff81409ef5>] inet_ioctl+0x65/0x90
       [<ffffffff813768e0>] sock_do_ioctl+0x20/0x50
       [<ffffffff81376ebb>] sock_ioctl+0x20b/0x2e0
       [<ffffffff81197250>] do_vfs_ioctl+0x2e0/0x500
       [<ffffffff81492619>] ? sysret_check+0x22/0x5d
       [<ffffffff81285f23>] ? __this_cpu_preempt_check+0x13/0x20
       [<ffffffff8109fe19>] ? trace_hardirqs_on_caller+0x119/0x270
       [<ffffffff811974ac>] SyS_ioctl+0x3c/0x80
       [<ffffffff814925ed>] system_call_fastpath+0x1a/0x1f
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39dc90c1
    • Govindarajulu Varadarajan's avatar
      enic: fix possible deadlock in enic_stop/ enic_rfs_flw_tbl_free · b6931c9b
      Govindarajulu Varadarajan authored
      The following warning is shown when spinlock debug is enabled.
      
      This occurs when enic_flow_may_expire timer function is running and
      enic_stop is called on same CPU.
      
      Fix this by using spink_lock_bh().
      
      =================================
      [ INFO: inconsistent lock state ]
      3.17.0-netnext-05504-g59f35b81 #268 Not tainted
      ---------------------------------
      inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      ifconfig/443 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (&(&enic->rfs_h.lock)->rlock){+.?...}, at:
      enic_rfs_flw_tbl_free+0x34/0xd0 [enic]
      {IN-SOFTIRQ-W} state was registered at:
        [<ffffffff810a25af>] __lock_acquire+0x83f/0x21c0
        [<ffffffff810a45f2>] lock_acquire+0xa2/0xd0
        [<ffffffff814913fc>] _raw_spin_lock+0x3c/0x80
        [<ffffffffa029c3d5>] enic_flow_may_expire+0x25/0x130[enic]
        [<ffffffff810bcd07>] call_timer_fn+0x77/0x100
        [<ffffffff810bd8e3>] run_timer_softirq+0x1e3/0x270
        [<ffffffff8105f9ae>] __do_softirq+0x14e/0x280
        [<ffffffff8105fdae>] irq_exit+0x8e/0xb0
        [<ffffffff8103da0f>] smp_apic_timer_interrupt+0x3f/0x50
        [<ffffffff81493742>] apic_timer_interrupt+0x72/0x80
        [<ffffffff81018143>] default_idle+0x13/0x20
        [<ffffffff81018a6a>] arch_cpu_idle+0xa/0x10
        [<ffffffff81097676>] cpu_startup_entry+0x2c6/0x330
        [<ffffffff8103b7ad>] start_secondary+0x21d/0x290
      irq event stamp: 2997
      hardirqs last  enabled at (2997): [<ffffffff81491865>] _raw_spin_unlock_irqrestore+0x65/0x90
      hardirqs last disabled at (2996): [<ffffffff814915e6>] _raw_spin_lock_irqsave+0x26/0x90
      softirqs last  enabled at (2968): [<ffffffff813b57a3>] dev_deactivate_many+0x213/0x260
      softirqs last disabled at (2966): [<ffffffff813b5783>] dev_deactivate_many+0x1f3/0x260
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&enic->rfs_h.lock)->rlock);
        <Interrupt>
          lock(&(&enic->rfs_h.lock)->rlock);
      
       *** DEADLOCK ***
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6931c9b
  4. 20 Oct, 2014 5 commits
    • David S. Miller's avatar
      Merge branch 'gso_encap_fixes' · d10845fc
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      net: minor gso encapsulation fixes
      
      The following series fixes a minor bug in the gso segmentation handlers
      when encapsulation offload is used.
      
      Theoretically this could cause kernel panic when the stack tries
      to software-segment such a GRE offload packet, but it looks like there
      is only one affected call site (tbf scheduler) and it handles NULL
      return value.
      
      I've included a followup patch to add IS_ERR_OR_NULL checks where needed.
      
      While looking into this, I also found that size computation of the individual
      segments is incorrect if skb->encapsulation is set.
      
      Please see individual patches for delta vs. v1.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d10845fc
    • Florian Westphal's avatar
      net: core: handle encapsulation offloads when computing segment lengths · f993bc25
      Florian Westphal authored
      if ->encapsulation is set we have to use inner_tcp_hdrlen and add the
      size of the inner network headers too.
      
      This is 'mostly harmless'; tbf might send skb that is slightly over
      quota or drop skb even if it would have fit.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f993bc25
    • Florian Westphal's avatar
      net: make skb_gso_segment error handling more robust · 330966e5
      Florian Westphal authored
      skb_gso_segment has three possible return values:
      1. a pointer to the first segmented skb
      2. an errno value (IS_ERR())
      3. NULL.  This can happen when GSO is used for header verification.
      
      However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
      and would oops when NULL is returned.
      
      Note that these call sites should never actually see such a NULL return
      value; all callers mask out the GSO bits in the feature argument.
      
      However, there have been issues with some protocol handlers erronously not
      respecting the specified feature mask in some cases.
      
      It is preferable to get 'have to turn off hw offloading, else slow' reports
      rather than 'kernel crashes'.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      330966e5
    • Florian Westphal's avatar
      net: gso: use feature flag argument in all protocol gso handlers · 1e16aa3d
      Florian Westphal authored
      skb_gso_segment() has a 'features' argument representing offload features
      available to the output path.
      
      A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use
      those instead of the provided ones when handing encapsulation/tunnels.
      
      Depending on dev->hw_enc_features of the output device skb_gso_segment() can
      then return NULL even when the caller has disabled all GSO feature bits,
      as segmentation of inner header thinks device will take care of segmentation.
      
      This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs
      that did not fit the remaining token quota as the segmentation does not work
      when device supports corresponding hw offload capabilities.
      
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e16aa3d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · ce8ec489
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      netfilter fixes for net
      
      The following patchset contains netfilter fixes for your net tree,
      they are:
      
      1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules.
      
      2) Restrict nat and masq expressions to the nat chain type. Otherwise,
         users may crash their kernel if they attach a nat/masq rule to a non
         nat chain.
      
      3) Fix hook validation in nft_compat when non-base chains are used.
         Basically, initialize hook_mask to zero.
      
      4) Make sure you use match/targets in nft_compat from the right chain
         type. The existing validation relies on the table name which can be
         avoided by
      
      5) Better netlink attribute validation in nft_nat. This expression has
         to reject the configuration when no address and proto configurations
         are specified.
      
      6) Interpret NFTA_NAT_REG_*_MAX if only if NFTA_NAT_REG_*_MIN is set.
         Yet another sanity check to reject incorrect configurations from
         userspace.
      
      7) Conditional NAT attribute dumping depending on the existing
         configuration.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce8ec489