Commit f6775a28 authored by David S. Miller's avatar David S. Miller

Merge branch 'netvsc-transparent-VF-support'

Stephen Hemminger says:

====================
netvsc: transparent VF support

This patch set changes how SR-IOV Virtual Function devices are managed
in the Hyper-V network driver. This version is rebased onto current net-next.

Background

In Hyper-V SR-IOV can be enabled (and disabled) by changing guest settings
on host. When SR-IOV is enabled a matching PCI device is hot plugged and
visible on guest. The VF device is an add-on to an existing netvsc
device, and has the same MAC address.

How is this different?

The original support of VF relied on using bonding driver in active
standby mode to handle the VF device.

With the new netvsc VF logic, the Linux hyper-V network
virtual driver will directly manage the link to SR-IOV VF device.
When VF device is detected (hot plug) it is automatically made a
slave device of the netvsc device. The VF device state reflects
the state of the netvsc device; i.e. if netvsc is set down, then
VF is set down. If netvsc is set up, then VF is brought up.

Packet flow is independent of VF status; all packets are sent and
received as if they were associated with the netvsc device. If VF is
removed or link is down then the synthetic VMBUS path is used.

What was wrong with using bonding script?

A lot of work went into getting the bonding script to work on all
distributions, but it was a major struggle. Linux network devices
can be configured many, many ways and there is no one solution from
userspace to make it all work. What is really hard is when
configuration is attached to synthetic device during boot (eth0) and
then the same addresses and firewall rules needs to also work later if
doing bonding. The new code gets around all of this.

How does VF work during initialization?

Since all packets are sent and received through the logical netvsc
device, initialization is much easier. Just configure the regular
netvsc Ethernet device; when/if SR-IOV is enabled it just
works. Provisioning and cloud init only need to worry about setting up
netvsc device (eth0). If SR-IOV is enabled (even as a later step), the
address and rules stay the same.

What devices show up?

Both netvsc and PCI devices are visible in the system. The netvsc
device is active and named in usual manner (eth0). The PCI device is
visible to Linux and gets renamed by udev to a persistent name
(enP2p3s0). The PCI device name is now irrelevant now.

The logic also sets the PCI VF device SLAVE flag on the network
device so network tools can see the relationship if they are smart
enough to understand how layered devices work.

This is a lot like how I see Windows working.
The VF device is visible in Device Manager, but is not configured.

Is there any performance impact?
There is no visible change in performance. The bonding
and netvsc driver both have equivalent steps.

Is it compatible with old bonding script?

It turns out that if you use the old bonding script, then everything
still works but in a sub-optimum manner. What happens is that bonding
is unable to steal the VF from the netvsc device so it creates a one
legged bond.  Packet flow then is:
	bond0 <--> eth0 <- -> VF (enP2p3s0).
In other words, if you get it wrong it still works, just
awkward and slower.

What if I add address or firewall rule onto the VF?

Same problems occur with now as already occur with bonding, bridging,
teaming on Linux if user incorrectly does configuration onto
an underlying slave device. It will sort of work, packets will come in
and out but the Linux kernel gets confused and things like ARP don’t
work right.  There is no way to block manipulation of the slave
device, and I am sure someone will find some special use case where
they want it.
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 638ce0fc 12aa7469
Hyper-V network driver
======================
Compatibility
=============
This driver is compatible with Windows Server 2012 R2, 2016 and
Windows 10.
Features
========
Checksum offload
----------------
The netvsc driver supports checksum offload as long as the
Hyper-V host version does. Windows Server 2016 and Azure
support checksum offload for TCP and UDP for both IPv4 and
IPv6. Windows Server 2012 only supports checksum offload for TCP.
Receive Side Scaling
--------------------
Hyper-V supports receive side scaling. For TCP, packets are
distributed among available queues based on IP address and port
number. Current versions of Hyper-V host, only distribute UDP
packets based on the IP source and destination address.
The port number is not used as part of the hash value for UDP.
Fragmented IP packets are not distributed between queues;
all fragmented packets arrive on the first channel.
Generic Receive Offload, aka GRO
--------------------------------
The driver supports GRO and it is enabled by default. GRO coalesces
like packets and significantly reduces CPU usage under heavy Rx
load.
SR-IOV support
--------------
Hyper-V supports SR-IOV as a hardware acceleration option. If SR-IOV
is enabled in both the vSwitch and the guest configuration, then the
Virtual Function (VF) device is passed to the guest as a PCI
device. In this case, both a synthetic (netvsc) and VF device are
visible in the guest OS and both NIC's have the same MAC address.
The VF is enslaved by netvsc device. The netvsc driver will transparently
switch the data path to the VF when it is available and up.
Network state (addresses, firewall, etc) should be applied only to the
netvsc device; the slave device should not be accessed directly in
most cases. The exceptions are if some special queue discipline or
flow direction is desired, these should be applied directly to the
VF slave device.
Receive Buffer
--------------
Packets are received into a receive area which is created when device
is probed. The receive area is broken into MTU sized chunks and each may
contain one or more packets. The number of receive sections may be changed
via ethtool Rx ring parameters.
There is a similar send buffer which is used to aggregate packets for sending.
The send area is broken into chunks of 6144 bytes, each of section may
contain one or more packets. The send buffer is an optimization, the driver
will use slower method to handle very large packets or if the send buffer
area is exhausted.
...@@ -6258,6 +6258,7 @@ M: Haiyang Zhang <haiyangz@microsoft.com> ...@@ -6258,6 +6258,7 @@ M: Haiyang Zhang <haiyangz@microsoft.com>
M: Stephen Hemminger <sthemmin@microsoft.com> M: Stephen Hemminger <sthemmin@microsoft.com>
L: devel@linuxdriverproject.org L: devel@linuxdriverproject.org
S: Maintained S: Maintained
F: Documentation/networking/netvsc.txt
F: arch/x86/include/asm/mshyperv.h F: arch/x86/include/asm/mshyperv.h
F: arch/x86/include/uapi/asm/hyperv.h F: arch/x86/include/uapi/asm/hyperv.h
F: arch/x86/kernel/cpu/mshyperv.c F: arch/x86/kernel/cpu/mshyperv.c
......
...@@ -680,6 +680,15 @@ struct netvsc_ethtool_stats { ...@@ -680,6 +680,15 @@ struct netvsc_ethtool_stats {
unsigned long tx_busy; unsigned long tx_busy;
}; };
struct netvsc_vf_pcpu_stats {
u64 rx_packets;
u64 rx_bytes;
u64 tx_packets;
u64 tx_bytes;
struct u64_stats_sync syncp;
u32 tx_dropped;
};
struct netvsc_reconfig { struct netvsc_reconfig {
struct list_head list; struct list_head list;
u32 event; u32 event;
...@@ -713,6 +722,9 @@ struct net_device_context { ...@@ -713,6 +722,9 @@ struct net_device_context {
/* State to manage the associated VF interface. */ /* State to manage the associated VF interface. */
struct net_device __rcu *vf_netdev; struct net_device __rcu *vf_netdev;
struct netvsc_vf_pcpu_stats __percpu *vf_stats;
struct work_struct vf_takeover;
struct work_struct vf_notify;
/* 1: allocated, serial number is valid. 0: not allocated */ /* 1: allocated, serial number is valid. 0: not allocated */
u32 vf_alloc; u32 vf_alloc;
......
This diff is collapsed.
#!/bin/bash
# This example script creates bonding network devices based on synthetic NIC
# (the virtual network adapter usually provided by Hyper-V) and the matching
# VF NIC (SRIOV virtual function). So the synthetic NIC and VF NIC can
# function as one network device, and fail over to the synthetic NIC if VF is
# down.
#
# Usage:
# - After configured vSwitch and vNIC with SRIOV, start Linux virtual
# machine (VM)
# - Run this scripts on the VM. It will create configuration files in
# distro specific directory.
# - Reboot the VM, so that the bonding config are enabled.
#
# The config files are DHCP by default. You may edit them if you need to change
# to Static IP or change other settings.
#
sysdir=/sys/class/net
netvsc_cls={f8615163-df3e-46c5-913f-f2d2f965ed0e}
bondcnt=0
# Detect Distro
if [ -f /etc/redhat-release ];
then
cfgdir=/etc/sysconfig/network-scripts
distro=redhat
elif grep -q 'Ubuntu' /etc/issue
then
cfgdir=/etc/network
distro=ubuntu
elif grep -q 'SUSE' /etc/issue
then
cfgdir=/etc/sysconfig/network
distro=suse
else
echo "Unsupported Distro"
exit 1
fi
echo Detected Distro: $distro, or compatible
# Get a list of ethernet names
list_eth=(`cd $sysdir && ls -d */ | cut -d/ -f1 | grep -v bond`)
eth_cnt=${#list_eth[@]}
echo List of net devices:
# Get the MAC addresses
for (( i=0; i < $eth_cnt; i++ ))
do
list_mac[$i]=`cat $sysdir/${list_eth[$i]}/address`
echo ${list_eth[$i]}, ${list_mac[$i]}
done
# Find NIC with matching MAC
for (( i=0; i < $eth_cnt-1; i++ ))
do
for (( j=i+1; j < $eth_cnt; j++ ))
do
if [ "${list_mac[$i]}" = "${list_mac[$j]}" ]
then
list_match[$i]=${list_eth[$j]}
break
fi
done
done
function create_eth_cfg_redhat {
local fn=$cfgdir/ifcfg-$1
rm -f $fn
echo DEVICE=$1 >>$fn
echo TYPE=Ethernet >>$fn
echo BOOTPROTO=none >>$fn
echo UUID=`uuidgen` >>$fn
echo ONBOOT=yes >>$fn
echo PEERDNS=yes >>$fn
echo IPV6INIT=yes >>$fn
echo MASTER=$2 >>$fn
echo SLAVE=yes >>$fn
}
function create_eth_cfg_pri_redhat {
create_eth_cfg_redhat $1 $2
}
function create_bond_cfg_redhat {
local fn=$cfgdir/ifcfg-$1
rm -f $fn
echo DEVICE=$1 >>$fn
echo TYPE=Bond >>$fn
echo BOOTPROTO=dhcp >>$fn
echo UUID=`uuidgen` >>$fn
echo ONBOOT=yes >>$fn
echo PEERDNS=yes >>$fn
echo IPV6INIT=yes >>$fn
echo BONDING_MASTER=yes >>$fn
echo BONDING_OPTS=\"mode=active-backup miimon=100 primary=$2\" >>$fn
}
function del_eth_cfg_ubuntu {
local mainfn=$cfgdir/interfaces
local fnlist=( $mainfn )
local dirlist=(`awk '/^[ \t]*source/{print $2}' $mainfn`)
local i
for i in "${dirlist[@]}"
do
fnlist+=(`ls $i 2>/dev/null`)
done
local tmpfl=$(mktemp)
local nic_start='^[ \t]*(auto|iface|mapping|allow-.*)[ \t]+'$1
local nic_end='^[ \t]*(auto|iface|mapping|allow-.*|source)'
local fn
for fn in "${fnlist[@]}"
do
awk "/$nic_end/{x=0} x{next} /$nic_start/{x=1;next} 1" \
$fn >$tmpfl
cp $tmpfl $fn
done
rm $tmpfl
}
function create_eth_cfg_ubuntu {
local fn=$cfgdir/interfaces
del_eth_cfg_ubuntu $1
echo $'\n'auto $1 >>$fn
echo iface $1 inet manual >>$fn
echo bond-master $2 >>$fn
}
function create_eth_cfg_pri_ubuntu {
local fn=$cfgdir/interfaces
del_eth_cfg_ubuntu $1
echo $'\n'allow-hotplug $1 >>$fn
echo iface $1 inet manual >>$fn
echo bond-master $2 >>$fn
echo bond-primary $1 >>$fn
}
function create_bond_cfg_ubuntu {
local fn=$cfgdir/interfaces
del_eth_cfg_ubuntu $1
echo $'\n'auto $1 >>$fn
echo iface $1 inet dhcp >>$fn
echo bond-mode active-backup >>$fn
echo bond-miimon 100 >>$fn
echo bond-slaves none >>$fn
}
function create_eth_cfg_suse {
local fn=$cfgdir/ifcfg-$1
rm -f $fn
echo BOOTPROTO=none >>$fn
echo STARTMODE=auto >>$fn
}
function create_eth_cfg_pri_suse {
local fn=$cfgdir/ifcfg-$1
rm -f $fn
echo BOOTPROTO=none >>$fn
echo STARTMODE=hotplug >>$fn
}
function create_bond_cfg_suse {
local fn=$cfgdir/ifcfg-$1
rm -f $fn
echo BOOTPROTO=dhcp >>$fn
echo STARTMODE=auto >>$fn
echo BONDING_MASTER=yes >>$fn
echo BONDING_SLAVE_0=$2 >>$fn
echo BONDING_SLAVE_1=$3 >>$fn
echo BONDING_MODULE_OPTS=\'mode=active-backup miimon=100 primary=$2\' >>$fn
}
function create_bond {
local bondname=bond$bondcnt
local primary
local secondary
local class_id1=`cat $sysdir/$1/device/class_id 2>/dev/null`
local class_id2=`cat $sysdir/$2/device/class_id 2>/dev/null`
if [ "$class_id1" = "$netvsc_cls" ]
then
primary=$2
secondary=$1
elif [ "$class_id2" = "$netvsc_cls" ]
then
primary=$1
secondary=$2
else
return 0
fi
echo $'\nBond name:' $bondname
if [ $distro == ubuntu ]
then
local mainfn=$cfgdir/interfaces
local s="^[ \t]*(auto|iface|mapping|allow-.*)[ \t]+${bondname}"
grep -E "$s" $mainfn
if [ $? -eq 0 ]
then
echo "WARNING: ${bondname} has been configured already"
return
fi
elif [ $distro == redhat ] || [ $distro == suse ]
then
local fn=$cfgdir/ifcfg-$bondname
if [ -f $fn ]
then
echo "WARNING: ${bondname} has been configured already"
return
fi
else
echo "Unsupported Distro: ${distro}"
return
fi
echo configuring $primary
create_eth_cfg_pri_$distro $primary $bondname
echo configuring $secondary
create_eth_cfg_$distro $secondary $bondname
echo creating: $bondname with primary slave: $primary
create_bond_cfg_$distro $bondname $primary $secondary
}
for (( i=0; i < $eth_cnt-1; i++ ))
do
if [ -n "${list_match[$i]}" ]
then
create_bond ${list_eth[$i]} ${list_match[$i]}
let bondcnt=bondcnt+1
fi
done
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment