Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
L
linux
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
linux
Commits
ded67c0e
Commit
ded67c0e
authored
Sep 09, 2008
by
David S. Miller
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'master' of
git://eden-feed.erg.abdn.ac.uk/net-next-2.6
parents
f8ef6e44
410e27a4
Changes
36
Hide whitespace changes
Inline
Side-by-side
Showing
36 changed files
with
2884 additions
and
3971 deletions
+2884
-3971
Documentation/networking/dccp.txt
Documentation/networking/dccp.txt
+12
-42
include/linux/dccp.h
include/linux/dccp.h
+71
-51
include/net/tcp.h
include/net/tcp.h
+0
-15
net/dccp/Kconfig
net/dccp/Kconfig
+3
-0
net/dccp/Makefile
net/dccp/Makefile
+3
-2
net/dccp/ackvec.c
net/dccp/ackvec.c
+345
-274
net/dccp/ackvec.h
net/dccp/ackvec.h
+113
-91
net/dccp/ccid.c
net/dccp/ccid.c
+24
-77
net/dccp/ccid.h
net/dccp/ccid.h
+33
-80
net/dccp/ccids/Kconfig
net/dccp/ccids/Kconfig
+6
-24
net/dccp/ccids/ccid2.c
net/dccp/ccids/ccid2.c
+372
-250
net/dccp/ccids/ccid2.h
net/dccp/ccids/ccid2.h
+24
-39
net/dccp/ccids/ccid3.c
net/dccp/ccids/ccid3.c
+461
-301
net/dccp/ccids/ccid3.h
net/dccp/ccids/ccid3.h
+85
-68
net/dccp/ccids/lib/loss_interval.c
net/dccp/ccids/lib/loss_interval.c
+14
-16
net/dccp/ccids/lib/loss_interval.h
net/dccp/ccids/lib/loss_interval.h
+2
-2
net/dccp/ccids/lib/packet_history.c
net/dccp/ccids/lib/packet_history.c
+137
-145
net/dccp/ccids/lib/packet_history.h
net/dccp/ccids/lib/packet_history.h
+11
-67
net/dccp/ccids/lib/tfrc.h
net/dccp/ccids/lib/tfrc.h
+0
-16
net/dccp/ccids/lib/tfrc_equation.c
net/dccp/ccids/lib/tfrc_equation.c
+4
-25
net/dccp/dccp.h
net/dccp/dccp.h
+27
-77
net/dccp/diag.c
net/dccp/diag.c
+1
-1
net/dccp/feat.c
net/dccp/feat.c
+461
-1344
net/dccp/feat.h
net/dccp/feat.h
+24
-120
net/dccp/input.c
net/dccp/input.c
+91
-73
net/dccp/ipv4.c
net/dccp/ipv4.c
+1
-3
net/dccp/ipv6.c
net/dccp/ipv6.c
+1
-3
net/dccp/minisocks.c
net/dccp/minisocks.c
+64
-23
net/dccp/options.c
net/dccp/options.c
+163
-178
net/dccp/output.c
net/dccp/output.c
+102
-177
net/dccp/probe.c
net/dccp/probe.c
+48
-27
net/dccp/proto.c
net/dccp/proto.c
+106
-175
net/dccp/qpolicy.c
net/dccp/qpolicy.c
+0
-137
net/dccp/sysctl.c
net/dccp/sysctl.c
+34
-30
net/dccp/timer.c
net/dccp/timer.c
+26
-16
net/ipv4/tcp_input.c
net/ipv4/tcp_input.c
+15
-2
No files found.
Documentation/networking/dccp.txt
View file @
ded67c0e
...
@@ -45,25 +45,6 @@ http://linux-net.osdl.org/index.php/DCCP_Testing#Experimental_DCCP_source_tree
...
@@ -45,25 +45,6 @@ http://linux-net.osdl.org/index.php/DCCP_Testing#Experimental_DCCP_source_tree
Socket options
Socket options
==============
==============
DCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes
a policy ID as argument and can only be set before the connection (i.e. changes
during an established connection are not supported). Currently, two policies are
defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special,
and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an
u32 priority value as ancillary data to sendmsg(), where higher numbers indicate
a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to
be formatted using a cmsg(3) message header filled in as follows:
cmsg->cmsg_level = SOL_DCCP;
cmsg->cmsg_type = DCCP_SCM_PRIORITY;
cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */
DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero
value is always interpreted as unbounded queue length. If different from zero,
the interpretation of this parameter depends on the current dequeuing policy
(see above): the "simple" policy will enforce a fixed queue size by returning
EAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the
lowest-priority packet first. The default value for this parameter is
initialised from /proc/sys/net/dccp/default/tx_qlen.
DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
service codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
service codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
...
@@ -76,24 +57,6 @@ can be set before calling bind().
...
@@ -76,24 +57,6 @@ can be set before calling bind().
DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet
DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet
size (application payload size) in bytes, see RFC 4340, section 14.
size (application payload size) in bytes, see RFC 4340, section 14.
DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs
supported by the endpoint (see include/linux/dccp.h for symbolic constants).
The caller needs to provide a sufficiently large (> 2) array of type uint8_t.
DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same
time, combining the operation of the next two socket options. This option is
preferrable over the latter two, since often applications will use the same
type of CCID for both directions; and mixed use of CCIDs is not currently well
understood. This socket option takes as argument at least one uint8_t value, or
an array of uint8_t values, which must match available CCIDS (see above). CCIDs
must be registered on the socket before calling connect() or listen().
DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets
the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID.
Please note that the getsockopt argument type here is `int', not uint8_t.
DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID.
DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold
DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold
timewait state when closing the connection (RFC 4340, 8.3). The usual case is
timewait state when closing the connection (RFC 4340, 8.3). The usual case is
that the closing server sends a CloseReq, whereupon the client holds timewait
that the closing server sends a CloseReq, whereupon the client holds timewait
...
@@ -152,16 +115,23 @@ retries2
...
@@ -152,16 +115,23 @@ retries2
importance for retransmitted acknowledgments and feature negotiation,
importance for retransmitted acknowledgments and feature negotiation,
data packets are never retransmitted. Analogue of tcp_retries2.
data packets are never retransmitted. Analogue of tcp_retries2.
send_ndp = 1
Whether or not to send NDP count options (sec. 7.7.2).
send_ackvec = 1
Whether or not to send Ack Vector options (sec. 11.5).
ack_ratio = 2
The default Ack Ratio (sec. 11.3) to use.
tx_ccid = 2
tx_ccid = 2
Default CCID for the sender-receiver half-connection. Depending on the
Default CCID for the sender-receiver half-connection.
choice of CCID, the Send Ack Vector feature is enabled automatically.
rx_ccid = 2
rx_ccid = 2
Default CCID for the receiver-sender half-connection
; see tx_ccid
.
Default CCID for the receiver-sender half-connection.
seq_window = 100
seq_window = 100
The initial sequence window (sec. 7.5.2) of the sender. This influences
The initial sequence window (sec. 7.5.2).
the local ackno validity and the remote seqno validity windows (7.5.1).
tx_qlen = 5
tx_qlen = 5
The size of the transmit buffer in packets. A value of 0 corresponds
The size of the transmit buffer in packets. A value of 0 corresponds
...
...
include/linux/dccp.h
View file @
ded67c0e
...
@@ -165,13 +165,9 @@ enum {
...
@@ -165,13 +165,9 @@ enum {
DCCPO_TIMESTAMP_ECHO
=
42
,
DCCPO_TIMESTAMP_ECHO
=
42
,
DCCPO_ELAPSED_TIME
=
43
,
DCCPO_ELAPSED_TIME
=
43
,
DCCPO_MAX
=
45
,
DCCPO_MAX
=
45
,
DCCPO_MIN_RX_CCID_SPECIFIC
=
128
,
/* from sender to receiver */
DCCPO_MIN_CCID_SPECIFIC
=
128
,
DCCPO_MAX_RX_CCID_SPECIFIC
=
191
,
DCCPO_MAX_CCID_SPECIFIC
=
255
,
DCCPO_MIN_TX_CCID_SPECIFIC
=
192
,
/* from receiver to sender */
DCCPO_MAX_TX_CCID_SPECIFIC
=
255
,
};
};
/* maximum size of a single TLV-encoded DCCP option (sans type/len bytes) */
#define DCCP_SINGLE_OPT_MAXLEN 253
/* DCCP CCIDS */
/* DCCP CCIDS */
enum
{
enum
{
...
@@ -180,36 +176,27 @@ enum {
...
@@ -180,36 +176,27 @@ enum {
};
};
/* DCCP features (RFC 4340 section 6.4) */
/* DCCP features (RFC 4340 section 6.4) */
enum
dccp_feature_numbers
{
enum
{
DCCPF_RESERVED
=
0
,
DCCPF_RESERVED
=
0
,
DCCPF_CCID
=
1
,
DCCPF_CCID
=
1
,
DCCPF_SHORT_SEQNOS
=
2
,
DCCPF_SHORT_SEQNOS
=
2
,
/* XXX: not yet implemented */
DCCPF_SEQUENCE_WINDOW
=
3
,
DCCPF_SEQUENCE_WINDOW
=
3
,
DCCPF_ECN_INCAPABLE
=
4
,
DCCPF_ECN_INCAPABLE
=
4
,
/* XXX: not yet implemented */
DCCPF_ACK_RATIO
=
5
,
DCCPF_ACK_RATIO
=
5
,
DCCPF_SEND_ACK_VECTOR
=
6
,
DCCPF_SEND_ACK_VECTOR
=
6
,
DCCPF_SEND_NDP_COUNT
=
7
,
DCCPF_SEND_NDP_COUNT
=
7
,
DCCPF_MIN_CSUM_COVER
=
8
,
DCCPF_MIN_CSUM_COVER
=
8
,
DCCPF_DATA_CHECKSUM
=
9
,
DCCPF_DATA_CHECKSUM
=
9
,
/* XXX: not yet implemented */
/* 10-127 reserved */
/* 10-127 reserved */
DCCPF_MIN_CCID_SPECIFIC
=
128
,
DCCPF_MIN_CCID_SPECIFIC
=
128
,
DCCPF_SEND_LEV_RATE
=
192
,
/* RFC 4342, sec. 8.4 */
DCCPF_MAX_CCID_SPECIFIC
=
255
,
DCCPF_MAX_CCID_SPECIFIC
=
255
,
};
};
/* DCCP socket control message types for cmsg */
/* this structure is argument to DCCP_SOCKOPT_CHANGE_X */
enum
dccp_cmsg_type
{
struct
dccp_so_feat
{
DCCP_SCM_PRIORITY
=
1
,
__u8
dccpsf_feat
;
DCCP_SCM_QPOLICY_MAX
=
0xFFFF
,
__u8
__user
*
dccpsf_val
;
/* ^-- Up to here reserved exclusively for qpolicy parameters */
__u8
dccpsf_len
;
DCCP_SCM_MAX
};
/* DCCP priorities for outgoing/queued packets */
enum
dccp_packet_dequeueing_policy
{
DCCPQ_POLICY_SIMPLE
,
DCCPQ_POLICY_PRIO
,
DCCPQ_POLICY_MAX
};
};
/* DCCP socket options */
/* DCCP socket options */
...
@@ -221,12 +208,6 @@ enum dccp_packet_dequeueing_policy {
...
@@ -221,12 +208,6 @@ enum dccp_packet_dequeueing_policy {
#define DCCP_SOCKOPT_SERVER_TIMEWAIT 6
#define DCCP_SOCKOPT_SERVER_TIMEWAIT 6
#define DCCP_SOCKOPT_SEND_CSCOV 10
#define DCCP_SOCKOPT_SEND_CSCOV 10
#define DCCP_SOCKOPT_RECV_CSCOV 11
#define DCCP_SOCKOPT_RECV_CSCOV 11
#define DCCP_SOCKOPT_AVAILABLE_CCIDS 12
#define DCCP_SOCKOPT_CCID 13
#define DCCP_SOCKOPT_TX_CCID 14
#define DCCP_SOCKOPT_RX_CCID 15
#define DCCP_SOCKOPT_QPOLICY_ID 16
#define DCCP_SOCKOPT_QPOLICY_TXQLEN 17
#define DCCP_SOCKOPT_CCID_RX_INFO 128
#define DCCP_SOCKOPT_CCID_RX_INFO 128
#define DCCP_SOCKOPT_CCID_TX_INFO 192
#define DCCP_SOCKOPT_CCID_TX_INFO 192
...
@@ -374,13 +355,62 @@ static inline unsigned int dccp_hdr_len(const struct sk_buff *skb)
...
@@ -374,13 +355,62 @@ static inline unsigned int dccp_hdr_len(const struct sk_buff *skb)
return
__dccp_hdr_len
(
dccp_hdr
(
skb
));
return
__dccp_hdr_len
(
dccp_hdr
(
skb
));
}
}
/* initial values for each feature */
#define DCCPF_INITIAL_SEQUENCE_WINDOW 100
#define DCCPF_INITIAL_ACK_RATIO 2
#define DCCPF_INITIAL_CCID DCCPC_CCID2
#define DCCPF_INITIAL_SEND_ACK_VECTOR 1
/* FIXME: for now we're default to 1 but it should really be 0 */
#define DCCPF_INITIAL_SEND_NDP_COUNT 1
/**
* struct dccp_minisock - Minimal DCCP connection representation
*
* Will be used to pass the state from dccp_request_sock to dccp_sock.
*
* @dccpms_sequence_window - Sequence Window Feature (section 7.5.2)
* @dccpms_ccid - Congestion Control Id (CCID) (section 10)
* @dccpms_send_ack_vector - Send Ack Vector Feature (section 11.5)
* @dccpms_send_ndp_count - Send NDP Count Feature (7.7.2)
* @dccpms_ack_ratio - Ack Ratio Feature (section 11.3)
* @dccpms_pending - List of features being negotiated
* @dccpms_conf -
*/
struct
dccp_minisock
{
__u64
dccpms_sequence_window
;
__u8
dccpms_rx_ccid
;
__u8
dccpms_tx_ccid
;
__u8
dccpms_send_ack_vector
;
__u8
dccpms_send_ndp_count
;
__u8
dccpms_ack_ratio
;
struct
list_head
dccpms_pending
;
struct
list_head
dccpms_conf
;
};
struct
dccp_opt_conf
{
__u8
*
dccpoc_val
;
__u8
dccpoc_len
;
};
struct
dccp_opt_pend
{
struct
list_head
dccpop_node
;
__u8
dccpop_type
;
__u8
dccpop_feat
;
__u8
*
dccpop_val
;
__u8
dccpop_len
;
int
dccpop_conf
;
struct
dccp_opt_conf
*
dccpop_sc
;
};
extern
void
dccp_minisock_init
(
struct
dccp_minisock
*
dmsk
);
/**
/**
* struct dccp_request_sock - represent DCCP-specific connection request
* struct dccp_request_sock - represent DCCP-specific connection request
* @dreq_inet_rsk: structure inherited from
* @dreq_inet_rsk: structure inherited from
* @dreq_iss: initial sequence number sent on the Response (RFC 4340, 7.1)
* @dreq_iss: initial sequence number sent on the Response (RFC 4340, 7.1)
* @dreq_isr: initial sequence number received on the Request
* @dreq_isr: initial sequence number received on the Request
* @dreq_service: service code present on the Request (there is just one)
* @dreq_service: service code present on the Request (there is just one)
* @dreq_featneg: feature negotiation options for this connection
* The following two fields are analogous to the ones in dccp_sock:
* The following two fields are analogous to the ones in dccp_sock:
* @dreq_timestamp_echo: last received timestamp to echo (13.1)
* @dreq_timestamp_echo: last received timestamp to echo (13.1)
* @dreq_timestamp_echo: the time of receiving the last @dreq_timestamp_echo
* @dreq_timestamp_echo: the time of receiving the last @dreq_timestamp_echo
...
@@ -390,7 +420,6 @@ struct dccp_request_sock {
...
@@ -390,7 +420,6 @@ struct dccp_request_sock {
__u64
dreq_iss
;
__u64
dreq_iss
;
__u64
dreq_isr
;
__u64
dreq_isr
;
__be32
dreq_service
;
__be32
dreq_service
;
struct
list_head
dreq_featneg
;
__u32
dreq_timestamp_echo
;
__u32
dreq_timestamp_echo
;
__u32
dreq_timestamp_time
;
__u32
dreq_timestamp_time
;
};
};
...
@@ -462,28 +491,21 @@ struct dccp_ackvec;
...
@@ -462,28 +491,21 @@ struct dccp_ackvec;
* @dccps_timestamp_time - time of receiving latest @dccps_timestamp_echo
* @dccps_timestamp_time - time of receiving latest @dccps_timestamp_echo
* @dccps_l_ack_ratio - feature-local Ack Ratio
* @dccps_l_ack_ratio - feature-local Ack Ratio
* @dccps_r_ack_ratio - feature-remote Ack Ratio
* @dccps_r_ack_ratio - feature-remote Ack Ratio
* @dccps_l_seq_win - local Sequence Window (influences ack number validity)
* @dccps_r_seq_win - remote Sequence Window (influences seq number validity)
* @dccps_pcslen - sender partial checksum coverage (via sockopt)
* @dccps_pcslen - sender partial checksum coverage (via sockopt)
* @dccps_pcrlen - receiver partial checksum coverage (via sockopt)
* @dccps_pcrlen - receiver partial checksum coverage (via sockopt)
* @dccps_send_ndp_count - local Send NDP Count feature (7.7.2)
* @dccps_ndp_count - number of Non Data Packets since last data packet
* @dccps_ndp_count - number of Non Data Packets since last data packet
* @dccps_mss_cache - current value of MSS (path MTU minus header sizes)
* @dccps_mss_cache - current value of MSS (path MTU minus header sizes)
* @dccps_rate_last - timestamp for rate-limiting DCCP-Sync (RFC 4340, 7.5.4)
* @dccps_rate_last - timestamp for rate-limiting DCCP-Sync (RFC 4340, 7.5.4)
* @dccps_
featneg - tracks feature-negotiation state (mostly during handshake
)
* @dccps_
minisock - associated minisock (accessed via dccp_msk
)
* @dccps_hc_rx_ackvec - rx half connection ack vector
* @dccps_hc_rx_ackvec - rx half connection ack vector
* @dccps_hc_rx_ccid - CCID used for the receiver (or receiving half-connection)
* @dccps_hc_rx_ccid - CCID used for the receiver (or receiving half-connection)
* @dccps_hc_tx_ccid - CCID used for the sender (or sending half-connection)
* @dccps_hc_tx_ccid - CCID used for the sender (or sending half-connection)
* @dccps_options_received - parsed set of retrieved options
* @dccps_options_received - parsed set of retrieved options
* @dccps_qpolicy - TX dequeueing policy, one of %dccp_packet_dequeueing_policy
* @dccps_tx_qlen - maximum length of the TX queue
* @dccps_role - role of this sock, one of %dccp_role
* @dccps_role - role of this sock, one of %dccp_role
* @dccps_hc_rx_insert_options - receiver wants to add options when acking
* @dccps_hc_rx_insert_options - receiver wants to add options when acking
* @dccps_hc_tx_insert_options - sender wants to add options when sending
* @dccps_hc_tx_insert_options - sender wants to add options when sending
* @dccps_server_timewait - server holds timewait state on close (RFC 4340, 8.3)
* @dccps_server_timewait - server holds timewait state on close (RFC 4340, 8.3)
* @dccps_sync_scheduled - flag which signals "send out-of-band message soon"
* @dccps_xmit_timer - timer for when CCID is not ready to send
* @dccps_xmitlet - tasklet scheduled by the TX CCID to dequeue data packets
* @dccps_xmit_timer - used by the TX CCID to delay sending (rate-based pacing)
* @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
* @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
*/
*/
struct
dccp_sock
{
struct
dccp_sock
{
...
@@ -507,26 +529,19 @@ struct dccp_sock {
...
@@ -507,26 +529,19 @@ struct dccp_sock {
__u32
dccps_timestamp_time
;
__u32
dccps_timestamp_time
;
__u16
dccps_l_ack_ratio
;
__u16
dccps_l_ack_ratio
;
__u16
dccps_r_ack_ratio
;
__u16
dccps_r_ack_ratio
;
__u64
dccps_l_seq_win
:
48
;
__u16
dccps_pcslen
;
__u64
dccps_r_seq_win
:
48
;
__u16
dccps_pcrlen
;
__u8
dccps_pcslen
:
4
;
__u8
dccps_pcrlen
:
4
;
__u8
dccps_send_ndp_count
:
1
;
__u64
dccps_ndp_count
:
48
;
__u64
dccps_ndp_count
:
48
;
unsigned
long
dccps_rate_last
;
unsigned
long
dccps_rate_last
;
struct
list_head
dccps_featneg
;
struct
dccp_minisock
dccps_minisock
;
struct
dccp_ackvec
*
dccps_hc_rx_ackvec
;
struct
dccp_ackvec
*
dccps_hc_rx_ackvec
;
struct
ccid
*
dccps_hc_rx_ccid
;
struct
ccid
*
dccps_hc_rx_ccid
;
struct
ccid
*
dccps_hc_tx_ccid
;
struct
ccid
*
dccps_hc_tx_ccid
;
struct
dccp_options_received
dccps_options_received
;
struct
dccp_options_received
dccps_options_received
;
__u8
dccps_qpolicy
;
__u32
dccps_tx_qlen
;
enum
dccp_role
dccps_role
:
2
;
enum
dccp_role
dccps_role
:
2
;
__u8
dccps_hc_rx_insert_options
:
1
;
__u8
dccps_hc_rx_insert_options
:
1
;
__u8
dccps_hc_tx_insert_options
:
1
;
__u8
dccps_hc_tx_insert_options
:
1
;
__u8
dccps_server_timewait
:
1
;
__u8
dccps_server_timewait
:
1
;
__u8
dccps_sync_scheduled
:
1
;
struct
tasklet_struct
dccps_xmitlet
;
struct
timer_list
dccps_xmit_timer
;
struct
timer_list
dccps_xmit_timer
;
};
};
...
@@ -535,6 +550,11 @@ static inline struct dccp_sock *dccp_sk(const struct sock *sk)
...
@@ -535,6 +550,11 @@ static inline struct dccp_sock *dccp_sk(const struct sock *sk)
return
(
struct
dccp_sock
*
)
sk
;
return
(
struct
dccp_sock
*
)
sk
;
}
}
static
inline
struct
dccp_minisock
*
dccp_msk
(
const
struct
sock
*
sk
)
{
return
(
struct
dccp_minisock
*
)
&
dccp_sk
(
sk
)
->
dccps_minisock
;
}
static
inline
const
char
*
dccp_role
(
const
struct
sock
*
sk
)
static
inline
const
char
*
dccp_role
(
const
struct
sock
*
sk
)
{
{
switch
(
dccp_sk
(
sk
)
->
dccps_role
)
{
switch
(
dccp_sk
(
sk
)
->
dccps_role
)
{
...
...
include/net/tcp.h
View file @
ded67c0e
...
@@ -782,21 +782,6 @@ static inline __u32 tcp_current_ssthresh(const struct sock *sk)
...
@@ -782,21 +782,6 @@ static inline __u32 tcp_current_ssthresh(const struct sock *sk)
/* Use define here intentionally to get WARN_ON location shown at the caller */
/* Use define here intentionally to get WARN_ON location shown at the caller */
#define tcp_verify_left_out(tp) WARN_ON(tcp_left_out(tp) > tp->packets_out)
#define tcp_verify_left_out(tp) WARN_ON(tcp_left_out(tp) > tp->packets_out)
/*
* Convert RFC3390 larger initial windows into an equivalent number of packets.
*
* John Heffner states:
*
* The RFC specifies a window of no more than 4380 bytes
* unless 2*MSS > 4380. Reading the pseudocode in the RFC
* is a bit misleading because they use a clamp at 4380 bytes
* rather than a multiplier in the relevant range.
*/
static
inline
u32
rfc3390_bytes_to_packets
(
const
u32
bytes
)
{
return
bytes
<=
1095
?
4
:
(
bytes
>
1460
?
2
:
3
);
}
extern
void
tcp_enter_cwr
(
struct
sock
*
sk
,
const
int
set_ssthresh
);
extern
void
tcp_enter_cwr
(
struct
sock
*
sk
,
const
int
set_ssthresh
);
extern
__u32
tcp_init_cwnd
(
struct
tcp_sock
*
tp
,
struct
dst_entry
*
dst
);
extern
__u32
tcp_init_cwnd
(
struct
tcp_sock
*
tp
,
struct
dst_entry
*
dst
);
...
...
net/dccp/Kconfig
View file @
ded67c0e
...
@@ -25,6 +25,9 @@ config INET_DCCP_DIAG
...
@@ -25,6 +25,9 @@ config INET_DCCP_DIAG
def_tristate y if (IP_DCCP = y && INET_DIAG = y)
def_tristate y if (IP_DCCP = y && INET_DIAG = y)
def_tristate m
def_tristate m
config IP_DCCP_ACKVEC
bool
source "net/dccp/ccids/Kconfig"
source "net/dccp/ccids/Kconfig"
menu "DCCP Kernel Hacking"
menu "DCCP Kernel Hacking"
...
...
net/dccp/Makefile
View file @
ded67c0e
obj-$(CONFIG_IP_DCCP)
+=
dccp.o dccp_ipv4.o
obj-$(CONFIG_IP_DCCP)
+=
dccp.o dccp_ipv4.o
dccp-y
:=
ccid.o feat.o input.o minisocks.o options.o
\
dccp-y
:=
ccid.o feat.o input.o minisocks.o options.o output.o proto.o timer.o
qpolicy.o output.o proto.o timer.o ackvec.o
dccp_ipv4-y
:=
ipv4.o
dccp_ipv4-y
:=
ipv4.o
...
@@ -9,6 +8,8 @@ dccp_ipv4-y := ipv4.o
...
@@ -9,6 +8,8 @@ dccp_ipv4-y := ipv4.o
obj-$(subst
y,$(CONFIG_IP_DCCP),
$(CONFIG_IPV6))
+=
dccp_ipv6.o
obj-$(subst
y,$(CONFIG_IP_DCCP),
$(CONFIG_IPV6))
+=
dccp_ipv6.o
dccp_ipv6-y
:=
ipv6.o
dccp_ipv6-y
:=
ipv6.o
dccp-$(CONFIG_IP_DCCP_ACKVEC)
+=
ackvec.o
obj-$(CONFIG_INET_DCCP_DIAG)
+=
dccp_diag.o
obj-$(CONFIG_INET_DCCP_DIAG)
+=
dccp_diag.o
obj-$(CONFIG_NET_DCCPPROBE)
+=
dccp_probe.o
obj-$(CONFIG_NET_DCCPPROBE)
+=
dccp_probe.o
...
...
net/dccp/ackvec.c
View file @
ded67c0e
/*
/*
* net/dccp/ackvec.c
* net/dccp/ackvec.c
*
*
* An implementation of Ack Vectors for the DCCP protocol
* An implementation of the DCCP protocol
* Copyright (c) 2007 University of Aberdeen, Scotland, UK
* Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
* Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
*
*
* This program is free software; you can redistribute it and/or modify it
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
* under the terms of the GNU General Public License as published by the
* Free Software Foundation; version 2 of the License;
* Free Software Foundation; version 2 of the License;
*/
*/
#include "ackvec.h"
#include "dccp.h"
#include "dccp.h"
#include <linux/dccp.h>
#include <linux/init.h>
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/kernel.h>
#include <linux/skbuff.h>
#include <linux/slab.h>
#include <linux/slab.h>
#include <net/sock.h>
static
struct
kmem_cache
*
dccp_ackvec_slab
;
static
struct
kmem_cache
*
dccp_ackvec_slab
;
static
struct
kmem_cache
*
dccp_ackvec_record_slab
;
static
struct
kmem_cache
*
dccp_ackvec_record_slab
;
st
ruct
dccp_ackvec
*
dccp_ackvec_alloc
(
const
gfp_t
priority
)
st
atic
struct
dccp_ackvec_record
*
dccp_ackvec_record_new
(
void
)
{
{
struct
dccp_ackvec
*
av
=
kmem_cache_zalloc
(
dccp_ackvec_slab
,
priority
);
struct
dccp_ackvec_record
*
avr
=
kmem_cache_alloc
(
dccp_ackvec_record_slab
,
GFP_ATOMIC
);
if
(
av
!=
NULL
)
{
if
(
avr
!=
NULL
)
av
->
av_buf_head
=
av
->
av_buf_tail
=
DCCPAV_MAX_ACKVEC_LEN
-
1
;
INIT_LIST_HEAD
(
&
avr
->
avr_node
);
INIT_LIST_HEAD
(
&
av
->
av_records
);
}
return
avr
;
return
av
;
}
}
static
void
dccp_ackvec_
purge_records
(
struct
dccp_ackvec
*
av
)
static
void
dccp_ackvec_
record_delete
(
struct
dccp_ackvec_record
*
avr
)
{
{
struct
dccp_ackvec_record
*
cur
,
*
next
;
if
(
unlikely
(
avr
==
NULL
))
return
;
list_for_each_entry_safe
(
cur
,
next
,
&
av
->
av_records
,
avr_node
)
/* Check if deleting a linked record */
kmem_cache_free
(
dccp_ackvec_record_slab
,
cur
);
WARN_ON
(
!
list_empty
(
&
avr
->
avr_node
)
);
INIT_LIST_HEAD
(
&
av
->
av_records
);
kmem_cache_free
(
dccp_ackvec_record_slab
,
avr
);
}
}
void
dccp_ackvec_free
(
struct
dccp_ackvec
*
av
)
static
void
dccp_ackvec_insert_avr
(
struct
dccp_ackvec
*
av
,
struct
dccp_ackvec_record
*
avr
)
{
{
if
(
likely
(
av
!=
NULL
))
{
/*
dccp_ackvec_purge_records
(
av
);
* AVRs are sorted by seqno. Since we are sending them in order, we
kmem_cache_free
(
dccp_ackvec_slab
,
av
);
* just add the AVR at the head of the list.
* -sorbo.
*/
if
(
!
list_empty
(
&
av
->
av_records
))
{
const
struct
dccp_ackvec_record
*
head
=
list_entry
(
av
->
av_records
.
next
,
struct
dccp_ackvec_record
,
avr_node
);
BUG_ON
(
before48
(
avr
->
avr_ack_seqno
,
head
->
avr_ack_seqno
));
}
}
list_add
(
&
avr
->
avr_node
,
&
av
->
av_records
);
}
}
/**
int
dccp_insert_option_ackvec
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
* dccp_ackvec_update_records - Record information about sent Ack Vectors
* @av: Ack Vector records to update
* @seqno: Sequence number of the packet carrying the Ack Vector just sent
* @nonce_sum: The sum of all buffer nonces contained in the Ack Vector
*/
int
dccp_ackvec_update_records
(
struct
dccp_ackvec
*
av
,
u64
seqno
,
u8
nonce_sum
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_ackvec
*
av
=
dp
->
dccps_hc_rx_ackvec
;
/* Figure out how many options do we need to represent the ackvec */
const
u16
nr_opts
=
DIV_ROUND_UP
(
av
->
av_vec_len
,
DCCP_MAX_ACKVEC_OPT_LEN
);
u16
len
=
av
->
av_vec_len
+
2
*
nr_opts
,
i
;
u32
elapsed_time
;
const
unsigned
char
*
tail
,
*
from
;
unsigned
char
*
to
;
struct
dccp_ackvec_record
*
avr
;
struct
dccp_ackvec_record
*
avr
;
suseconds_t
delta
;
if
(
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
+
len
>
DCCP_MAX_OPT_LEN
)
return
-
1
;
delta
=
ktime_us_delta
(
ktime_get_real
(),
av
->
av_time
);
elapsed_time
=
delta
/
10
;
avr
=
kmem_cache_alloc
(
dccp_ackvec_record_slab
,
GFP_ATOMIC
);
if
(
elapsed_time
!=
0
&&
dccp_insert_option_elapsed_time
(
sk
,
skb
,
elapsed_time
))
return
-
1
;
avr
=
dccp_ackvec_record_new
();
if
(
avr
==
NULL
)
if
(
avr
==
NULL
)
return
-
ENOBUFS
;
return
-
1
;
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
+=
len
;
to
=
skb_push
(
skb
,
len
);
len
=
av
->
av_vec_len
;
from
=
av
->
av_buf
+
av
->
av_buf_head
;
tail
=
av
->
av_buf
+
DCCP_MAX_ACKVEC_LEN
;
for
(
i
=
0
;
i
<
nr_opts
;
++
i
)
{
int
copylen
=
len
;
if
(
len
>
DCCP_MAX_ACKVEC_OPT_LEN
)
copylen
=
DCCP_MAX_ACKVEC_OPT_LEN
;
*
to
++
=
DCCPO_ACK_VECTOR_0
;
*
to
++
=
copylen
+
2
;
/* Check if buf_head wraps */
if
(
from
+
copylen
>
tail
)
{
const
u16
tailsize
=
tail
-
from
;
memcpy
(
to
,
from
,
tailsize
);
to
+=
tailsize
;
len
-=
tailsize
;
copylen
-=
tailsize
;
from
=
av
->
av_buf
;
}
memcpy
(
to
,
from
,
copylen
);
from
+=
copylen
;
to
+=
copylen
;
len
-=
copylen
;
}
avr
->
avr_ack_seqno
=
seqno
;
avr
->
avr_ack_ptr
=
av
->
av_buf_head
;
avr
->
avr_ack_ackno
=
av
->
av_buf_ackno
;
avr
->
avr_ack_nonce
=
nonce_sum
;
avr
->
avr_ack_runlen
=
dccp_ackvec_runlen
(
av
->
av_buf
+
av
->
av_buf_head
);
/*
* When the buffer overflows, we keep no more than one record. This is
* the simplest way of disambiguating sender-Acks dating from before the
* overflow from sender-Acks which refer to after the overflow; a simple
* solution is preferable here since we are handling an exception.
*/
if
(
av
->
av_overflow
)
dccp_ackvec_purge_records
(
av
);
/*
/*
* Since GSS is incremented for each packet, the list is automatically
* From RFC 4340, A.2:
* arranged in descending order of @ack_seqno.
*
* For each acknowledgement it sends, the HC-Receiver will add an
* acknowledgement record. ack_seqno will equal the HC-Receiver
* sequence number it used for the ack packet; ack_ptr will equal
* buf_head; ack_ackno will equal buf_ackno; and ack_nonce will
* equal buf_nonce.
*/
*/
list_add
(
&
avr
->
avr_node
,
&
av
->
av_records
);
avr
->
avr_ack_seqno
=
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
;
avr
->
avr_ack_ptr
=
av
->
av_buf_head
;
avr
->
avr_ack_ackno
=
av
->
av_buf_ackno
;
avr
->
avr_ack_nonce
=
av
->
av_buf_nonce
;
avr
->
avr_sent_len
=
av
->
av_vec_len
;
dccp_pr_debug
(
"Added Vector, ack_seqno=%llu, ack_ackno=%llu (rl=%u)
\n
"
,
dccp_ackvec_insert_avr
(
av
,
avr
);
dccp_pr_debug
(
"%s ACK Vector 0, len=%d, ack_seqno=%llu, "
"ack_ackno=%llu
\n
"
,
dccp_role
(
sk
),
avr
->
avr_sent_len
,
(
unsigned
long
long
)
avr
->
avr_ack_seqno
,
(
unsigned
long
long
)
avr
->
avr_ack_seqno
,
(
unsigned
long
long
)
avr
->
avr_ack_ackno
,
(
unsigned
long
long
)
avr
->
avr_ack_ackno
);
avr
->
avr_ack_runlen
);
return
0
;
return
0
;
}
}
static
struct
dccp_ackvec_record
*
dccp_ackvec_lookup
(
struct
list_head
*
av_list
,
struct
dccp_ackvec
*
dccp_ackvec_alloc
(
const
gfp_t
priority
)
const
u64
ackno
)
{
{
struct
dccp_ackvec_record
*
avr
;
struct
dccp_ackvec
*
av
=
kmem_cache_alloc
(
dccp_ackvec_slab
,
priority
);
/*
* Exploit that records are inserted in descending order of sequence
if
(
av
!=
NULL
)
{
* number, start with the oldest record first. If @ackno is `before'
av
->
av_buf_head
=
DCCP_MAX_ACKVEC_LEN
-
1
;
* the earliest ack_ackno, the packet is too old to be considered.
av
->
av_buf_ackno
=
UINT48_MAX
+
1
;
*/
av
->
av_buf_nonce
=
0
;
list_for_each_entry_reverse
(
avr
,
av_list
,
avr_node
)
{
av
->
av_time
=
ktime_set
(
0
,
0
);
if
(
avr
->
avr_ack_seqno
==
ackno
)
av
->
av_vec_len
=
0
;
return
avr
;
INIT_LIST_HEAD
(
&
av
->
av_records
);
if
(
before48
(
ackno
,
avr
->
avr_ack_seqno
))
break
;
}
}
return
NULL
;
return
av
;
}
}
/*
void
dccp_ackvec_free
(
struct
dccp_ackvec
*
av
)
* Buffer index and length computation using modulo-buffersize arithmetic.
* Note that, as pointers move from right to left, head is `before' tail.
*/
static
inline
u16
__ackvec_idx_add
(
const
u16
a
,
const
u16
b
)
{
{
return
(
a
+
b
)
%
DCCPAV_MAX_ACKVEC_LEN
;
if
(
unlikely
(
av
==
NULL
))
return
;
if
(
!
list_empty
(
&
av
->
av_records
))
{
struct
dccp_ackvec_record
*
avr
,
*
next
;
list_for_each_entry_safe
(
avr
,
next
,
&
av
->
av_records
,
avr_node
)
{
list_del_init
(
&
avr
->
avr_node
);
dccp_ackvec_record_delete
(
avr
);
}
}
kmem_cache_free
(
dccp_ackvec_slab
,
av
);
}
}
static
inline
u16
__ackvec_idx_sub
(
const
u16
a
,
const
u16
b
)
static
inline
u8
dccp_ackvec_state
(
const
struct
dccp_ackvec
*
av
,
const
u32
index
)
{
{
return
__ackvec_idx_add
(
a
,
DCCPAV_MAX_ACKVEC_LEN
-
b
)
;
return
av
->
av_buf
[
index
]
&
DCCP_ACKVEC_STATE_MASK
;
}
}
u16
dccp_ackvec_buflen
(
const
struct
dccp_ackvec
*
av
)
static
inline
u8
dccp_ackvec_len
(
const
struct
dccp_ackvec
*
av
,
const
u32
index
)
{
{
if
(
unlikely
(
av
->
av_overflow
))
return
av
->
av_buf
[
index
]
&
DCCP_ACKVEC_LEN_MASK
;
return
DCCPAV_MAX_ACKVEC_LEN
;
return
__ackvec_idx_sub
(
av
->
av_buf_tail
,
av
->
av_buf_head
);
}
}
/**
/*
* dccp_ackvec_update_old - Update previous state as per RFC 4340, 11.4.1
* If several packets are missing, the HC-Receiver may prefer to enter multiple
* @av: non-empty buffer to update
* bytes with run length 0, rather than a single byte with a larger run length;
* @distance: negative or zero distance of @seqno from buf_ackno downward
* this simplifies table updates if one of the missing packets arrives.
* @seqno: the (old) sequence number whose record is to be updated
* @state: state in which packet carrying @seqno was received
*/
*/
static
void
dccp_ackvec_update_old
(
struct
dccp_ackvec
*
av
,
s64
distance
,
static
inline
int
dccp_ackvec_set_buf_head_state
(
struct
dccp_ackvec
*
av
,
u64
seqno
,
enum
dccp_ackvec_states
state
)
const
unsigned
int
packets
,
const
unsigned
char
state
)
{
{
u16
ptr
=
av
->
av_buf_head
;
unsigned
int
gap
;
long
new_head
;
BUG_ON
(
distance
>
0
);
if
(
av
->
av_vec_len
+
packets
>
DCCP_MAX_ACKVEC_LEN
)
if
(
unlikely
(
dccp_ackvec_is_empty
(
av
)))
return
-
ENOBUFS
;
return
;
do
{
gap
=
packets
-
1
;
u8
runlen
=
dccp_ackvec_runlen
(
av
->
av_buf
+
ptr
)
;
new_head
=
av
->
av_buf_head
-
packets
;
if
(
distance
+
runlen
>=
0
)
{
if
(
new_head
<
0
)
{
/*
if
(
gap
>
0
)
{
* Only update the state if packet has not been received
memset
(
av
->
av_buf
,
DCCP_ACKVEC_STATE_NOT_RECEIVED
,
* yet. This is OK as per the second table in RFC 4340,
gap
+
new_head
+
1
);
* 11.4.1; i.e. here we are using the following table:
gap
=
-
new_head
;
* RECEIVED
* 0 1 3
* S +---+---+---+
* T 0 | 0 | 0 | 0 |
* O +---+---+---+
* R 1 | 1 | 1 | 1 |
* E +---+---+---+
* D 3 | 0 | 1 | 3 |
* +---+---+---+
* The "Not Received" state was set by reserve_seats().
*/
if
(
av
->
av_buf
[
ptr
]
==
DCCPAV_NOT_RECEIVED
)
av
->
av_buf
[
ptr
]
=
state
;
else
dccp_pr_debug
(
"Not changing %llu state to %u
\n
"
,
(
unsigned
long
long
)
seqno
,
state
);
break
;
}
}
new_head
+=
DCCP_MAX_ACKVEC_LEN
;
}
distance
+=
runlen
+
1
;
av
->
av_buf_head
=
new_head
;
ptr
=
__ackvec_idx_add
(
ptr
,
1
);
}
while
(
ptr
!=
av
->
av_buf_tail
);
if
(
gap
>
0
)
}
memset
(
av
->
av_buf
+
av
->
av_buf_head
+
1
,
DCCP_ACKVEC_STATE_NOT_RECEIVED
,
gap
);
/* Mark @num entries after buf_head as "Not yet received". */
av
->
av_buf
[
av
->
av_buf_head
]
=
state
;
static
void
dccp_ackvec_reserve_seats
(
struct
dccp_ackvec
*
av
,
u16
num
)
av
->
av_vec_len
+=
packets
;
{
return
0
;
u16
start
=
__ackvec_idx_add
(
av
->
av_buf_head
,
1
),
len
=
DCCPAV_MAX_ACKVEC_LEN
-
start
;
/* check for buffer wrap-around */
if
(
num
>
len
)
{
memset
(
av
->
av_buf
+
start
,
DCCPAV_NOT_RECEIVED
,
len
);
start
=
0
;
num
-=
len
;
}
if
(
num
)
memset
(
av
->
av_buf
+
start
,
DCCPAV_NOT_RECEIVED
,
num
);
}
}
/**
/*
* dccp_ackvec_add_new - Record one or more new entries in Ack Vector buffer
* Implements the RFC 4340, Appendix A
* @av: container of buffer to update (can be empty or non-empty)
* @num_packets: number of packets to register (must be >= 1)
* @seqno: sequence number of the first packet in @num_packets
* @state: state in which packet carrying @seqno was received
*/
*/
static
void
dccp_ackvec_add_new
(
struct
dccp_ackvec
*
av
,
u32
num_packets
,
int
dccp_ackvec_add
(
struct
dccp_ackvec
*
av
,
const
struct
sock
*
sk
,
u64
seqno
,
enum
dccp_ackvec_states
state
)
const
u64
ackno
,
const
u8
state
)
{
{
u32
num_cells
=
num_packets
;
/*
* Check at the right places if the buffer is full, if it is, tell the
* caller to start dropping packets till the HC-Sender acks our ACK
* vectors, when we will free up space in av_buf.
*
* We may well decide to do buffer compression, etc, but for now lets
* just drop.
*
* From Appendix A.1.1 (`New Packets'):
*
* Of course, the circular buffer may overflow, either when the
* HC-Sender is sending data at a very high rate, when the
* HC-Receiver's acknowledgements are not reaching the HC-Sender,
* or when the HC-Sender is forgetting to acknowledge those acks
* (so the HC-Receiver is unable to clean up old state). In this
* case, the HC-Receiver should either compress the buffer (by
* increasing run lengths when possible), transfer its state to
* a larger buffer, or, as a last resort, drop all received
* packets, without processing them whatsoever, until its buffer
* shrinks again.
*/
if
(
num_packets
>
DCCPAV_BURST_THRESH
)
{
/* See if this is the first ackno being inserted */
u32
lost_packets
=
num_packets
-
1
;
if
(
av
->
av_vec_len
==
0
)
{
av
->
av_buf
[
av
->
av_buf_head
]
=
state
;
av
->
av_vec_len
=
1
;
}
else
if
(
after48
(
ackno
,
av
->
av_buf_ackno
))
{
const
u64
delta
=
dccp_delta_seqno
(
av
->
av_buf_ackno
,
ackno
);
DCCP_WARN
(
"Warning: large burst loss (%u)
\n
"
,
lost_packets
);
/*
/*
* We received 1 packet and have a loss of size "num_packets-1"
* Look if the state of this packet is the same as the
* which we squeeze into num_cells-1 rather than reserving an
* previous ackno and if so if we can bump the head len.
* entire byte for each lost packet.
* The reason is that the vector grows in O(burst_length); when
* it grows too large there will no room left for the payload.
* This is a trade-off: if a few packets out of the burst show
* up later, their state will not be changed; it is simply too
* costly to reshuffle/reallocate/copy the buffer each time.
* Should such problems persist, we will need to switch to a
* different underlying data structure.
*/
*/
for
(
num_packets
=
num_cells
=
1
;
lost_packets
;
++
num_cells
)
{
if
(
delta
==
1
&&
u8
len
=
min
(
lost_packets
,
(
u32
)
DCCPAV_MAX_RUNLEN
);
dccp_ackvec_state
(
av
,
av
->
av_buf_head
)
==
state
&&
dccp_ackvec_len
(
av
,
av
->
av_buf_head
)
<
DCCP_ACKVEC_LEN_MASK
)
av
->
av_buf_head
=
__ackvec_idx_sub
(
av
->
av_buf_head
,
1
);
av
->
av_buf
[
av
->
av_buf_head
]
++
;
av
->
av_buf
[
av
->
av_buf_head
]
=
DCCPAV_NOT_RECEIVED
|
len
;
else
if
(
dccp_ackvec_set_buf_head_state
(
av
,
delta
,
state
))
return
-
ENOBUFS
;
}
else
{
/*
* A.1.2. Old Packets
*
* When a packet with Sequence Number S <= buf_ackno
* arrives, the HC-Receiver will scan the table for
* the byte corresponding to S. (Indexing structures
* could reduce the complexity of this scan.)
*/
u64
delta
=
dccp_delta_seqno
(
ackno
,
av
->
av_buf_ackno
);
u32
index
=
av
->
av_buf_head
;
lost_packets
-=
len
;
while
(
1
)
{
const
u8
len
=
dccp_ackvec_len
(
av
,
index
);
const
u8
av_state
=
dccp_ackvec_state
(
av
,
index
);
/*
* valid packets not yet in av_buf have a reserved
* entry, with a len equal to 0.
*/
if
(
av_state
==
DCCP_ACKVEC_STATE_NOT_RECEIVED
&&
len
==
0
&&
delta
==
0
)
{
/* Found our
reserved seat! */
dccp_pr_debug
(
"Found %llu reserved seat!
\n
"
,
(
unsigned
long
long
)
ackno
);
av
->
av_buf
[
index
]
=
state
;
goto
out
;
}
/* len == 0 means one packet */
if
(
delta
<
len
+
1
)
goto
out_duplicate
;
delta
-=
len
+
1
;
if
(
++
index
==
DCCP_MAX_ACKVEC_LEN
)
index
=
0
;
}
}
}
}
if
(
num_cells
+
dccp_ackvec_buflen
(
av
)
>=
DCCPAV_MAX_ACKVEC_LEN
)
{
av
->
av_buf_ackno
=
ackno
;
DCCP_CRIT
(
"Ack Vector buffer overflow: dropping old entries
\n
"
);
av
->
av_time
=
ktime_get_real
();
av
->
av_overflow
=
true
;
out:
}
return
0
;
av
->
av_buf_head
=
__ackvec_idx_sub
(
av
->
av_buf_head
,
num_packets
);
if
(
av
->
av_overflow
)
av
->
av_buf_tail
=
av
->
av_buf_head
;
av
->
av_buf
[
av
->
av_buf_head
]
=
state
;
av
->
av_buf_ackno
=
seqno
;
if
(
num_packets
>
1
)
out_duplicate:
dccp_ackvec_reserve_seats
(
av
,
num_packets
-
1
);
/* Duplicate packet */
dccp_pr_debug
(
"Received a dup or already considered lost "
"packet: %llu
\n
"
,
(
unsigned
long
long
)
ackno
);
return
-
EILSEQ
;
}
}
/**
static
void
dccp_ackvec_throw_record
(
struct
dccp_ackvec
*
av
,
* dccp_ackvec_input - Register incoming packet in the buffer
struct
dccp_ackvec_record
*
avr
)
*/
void
dccp_ackvec_input
(
struct
dccp_ackvec
*
av
,
struct
sk_buff
*
skb
)
{
{
u64
seqno
=
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
;
struct
dccp_ackvec_record
*
next
;
enum
dccp_ackvec_states
state
=
DCCPAV_RECEIVED
;
if
(
dccp_ackvec_is_empty
(
av
))
{
/* sort out vector length */
dccp_ackvec_add_new
(
av
,
1
,
seqno
,
state
);
if
(
av
->
av_buf_head
<=
avr
->
avr_ack_ptr
)
av
->
av_tail_ackno
=
seqno
;
av
->
av_vec_len
=
avr
->
avr_ack_ptr
-
av
->
av_buf_head
;
else
av
->
av_vec_len
=
DCCP_MAX_ACKVEC_LEN
-
1
-
av
->
av_buf_head
+
avr
->
avr_ack_ptr
;
}
else
{
/* free records */
s64
num_packets
=
dccp_delta_seqno
(
av
->
av_buf_ackno
,
seqno
);
list_for_each_entry_safe_from
(
avr
,
next
,
&
av
->
av_records
,
avr_node
)
{
u8
*
current_head
=
av
->
av_buf
+
av
->
av_buf_head
;
list_del_init
(
&
avr
->
avr_node
);
dccp_ackvec_record_delete
(
avr
);
if
(
num_packets
==
1
&&
}
dccp_ackvec_state
(
current_head
)
==
state
&&
}
dccp_ackvec_runlen
(
current_head
)
<
DCCPAV_MAX_RUNLEN
)
{
*
current_head
+=
1
;
void
dccp_ackvec_check_rcv_ackno
(
struct
dccp_ackvec
*
av
,
struct
sock
*
sk
,
av
->
av_buf_ackno
=
seqno
;
const
u64
ackno
)
{
struct
dccp_ackvec_record
*
avr
;
}
else
if
(
num_packets
>
0
)
{
/*
dccp_ackvec_add_new
(
av
,
num_packets
,
seqno
,
state
);
* If we traverse backwards, it should be faster when we have large
}
else
{
* windows. We will be receiving ACKs for stuff we sent a while back
dccp_ackvec_update_old
(
av
,
num_packets
,
seqno
,
state
);
* -sorbo.
}
*/
list_for_each_entry_reverse
(
avr
,
&
av
->
av_records
,
avr_node
)
{
if
(
ackno
==
avr
->
avr_ack_seqno
)
{
dccp_pr_debug
(
"%s ACK packet 0, len=%d, ack_seqno=%llu, "
"ack_ackno=%llu, ACKED!
\n
"
,
dccp_role
(
sk
),
1
,
(
unsigned
long
long
)
avr
->
avr_ack_seqno
,
(
unsigned
long
long
)
avr
->
avr_ack_ackno
);
dccp_ackvec_throw_record
(
av
,
avr
);
break
;
}
else
if
(
avr
->
avr_ack_seqno
>
ackno
)
break
;
/* old news */
}
}
}
}
/**
static
void
dccp_ackvec_check_rcv_ackvector
(
struct
dccp_ackvec
*
av
,
* dccp_ackvec_clear_state - Perform house-keeping / garbage-collection
struct
sock
*
sk
,
u64
*
ackno
,
* This routine is called when the peer acknowledges the receipt of Ack Vectors
const
unsigned
char
len
,
* up to and including @ackno. While based on on section A.3 of RFC 4340, here
const
unsigned
char
*
vector
)
* are additional precautions to prevent corrupted buffer state. In particular,
{
* we use tail_ackno to identify outdated records; it always marks the earliest
unsigned
char
i
;
* packet of group (2) in 11.4.2.
struct
dccp_ackvec_record
*
avr
;
*/
void
dccp_ackvec_clear_state
(
struct
dccp_ackvec
*
av
,
const
u64
ackno
)
{
struct
dccp_ackvec_record
*
avr
,
*
next
;
u8
runlen_now
,
eff_runlen
;
s64
delta
;
avr
=
dccp_ackvec_lookup
(
&
av
->
av_records
,
ackno
);
/* Check if we actually sent an ACK vector */
if
(
avr
==
NULL
)
if
(
list_empty
(
&
av
->
av_records
)
)
return
;
return
;
/*
* Deal with outdated acknowledgments: this arises when e.g. there are
* several old records and the acks from the peer come in slowly. In
* that case we may still have records that pre-date tail_ackno.
*/
delta
=
dccp_delta_seqno
(
av
->
av_tail_ackno
,
avr
->
avr_ack_ackno
);
if
(
delta
<
0
)
goto
free_records
;
/*
* Deal with overlapping Ack Vectors: don't subtract more than the
* number of packets between tail_ackno and ack_ackno.
*/
eff_runlen
=
delta
<
avr
->
avr_ack_runlen
?
delta
:
avr
->
avr_ack_runlen
;
runlen_now
=
dccp_ackvec_runlen
(
av
->
av_buf
+
avr
->
avr_ack_ptr
)
;
i
=
len
;
/*
/*
* The run length of Ack Vector cells does not decrease over time. If
* XXX
* the run length is the same as at the time the Ack Vector was sent, we
* I think it might be more efficient to work backwards. See comment on
* free the ack_ptr cell. That cell can however not be freed if the run
* rcv_ackno. -sorbo.
* length has increased: in this case we need to move the tail pointer
* backwards (towards higher indices), to its next-oldest neighbour.
*/
*/
if
(
runlen_now
>
eff_runlen
)
{
avr
=
list_entry
(
av
->
av_records
.
next
,
struct
dccp_ackvec_record
,
avr_node
);
while
(
i
--
)
{
const
u8
rl
=
*
vector
&
DCCP_ACKVEC_LEN_MASK
;
u64
ackno_end_rl
;
av
->
av_buf
[
avr
->
avr_ack_ptr
]
-=
eff_runlen
+
1
;
dccp_set_seqno
(
&
ackno_end_rl
,
*
ackno
-
rl
);
av
->
av_buf_tail
=
__ackvec_idx_add
(
avr
->
avr_ack_ptr
,
1
);
/* This move may not have cleared the overflow flag. */
if
(
av
->
av_overflow
)
av
->
av_overflow
=
(
av
->
av_buf_head
==
av
->
av_buf_tail
);
}
else
{
av
->
av_buf_tail
=
avr
->
avr_ack_ptr
;
/*
/*
* We have made sure that avr points to a valid cell within the
* If our AVR sequence number is greater than the ack, go
* buffer. This cell is either older than head, or equals head
* forward in the AVR list until it is not so.
* (empty buffer): in both cases we no longer have any overflow.
*/
*/
av
->
av_overflow
=
0
;
list_for_each_entry_from
(
avr
,
&
av
->
av_records
,
avr_node
)
{
}
if
(
!
after48
(
avr
->
avr_ack_seqno
,
*
ackno
))
goto
found
;
/*
}
* The peer has acknowledged up to and including ack_ackno. Hence the
/* End of the av_records list, not found, exit */
* first packet in group (2) of 11.4.2 is the successor of ack_ackno.
break
;
*/
found:
av
->
av_tail_ackno
=
ADD48
(
avr
->
avr_ack_ackno
,
1
);
if
(
between48
(
avr
->
avr_ack_seqno
,
ackno_end_rl
,
*
ackno
))
{
const
u8
state
=
*
vector
&
DCCP_ACKVEC_STATE_MASK
;
if
(
state
!=
DCCP_ACKVEC_STATE_NOT_RECEIVED
)
{
dccp_pr_debug
(
"%s ACK vector 0, len=%d, "
"ack_seqno=%llu, ack_ackno=%llu, "
"ACKED!
\n
"
,
dccp_role
(
sk
),
len
,
(
unsigned
long
long
)
avr
->
avr_ack_seqno
,
(
unsigned
long
long
)
avr
->
avr_ack_ackno
);
dccp_ackvec_throw_record
(
av
,
avr
);
break
;
}
/*
* If it wasn't received, continue scanning... we might
* find another one.
*/
}
free_records:
dccp_set_seqno
(
ackno
,
ackno_end_rl
-
1
);
list_for_each_entry_safe_from
(
avr
,
next
,
&
av
->
av_records
,
avr_node
)
{
++
vector
;
list_del
(
&
avr
->
avr_node
);
kmem_cache_free
(
dccp_ackvec_record_slab
,
avr
);
}
}
}
}
/*
int
dccp_ackvec_parse
(
struct
sock
*
sk
,
const
struct
sk_buff
*
skb
,
* Routines to keep track of Ack Vectors received in an skb
u64
*
ackno
,
const
u8
opt
,
const
u8
*
value
,
const
u8
len
)
*/
int
dccp_ackvec_parsed_add
(
struct
list_head
*
head
,
u8
*
vec
,
u8
len
,
u8
nonce
)
{
{
struct
dccp_ackvec_parsed
*
new
=
kmalloc
(
sizeof
(
*
new
),
GFP_ATOMIC
);
if
(
len
>
DCCP_MAX_ACKVEC_OPT_LEN
)
return
-
1
;
if
(
new
==
NULL
)
return
-
ENOBUFS
;
new
->
vec
=
vec
;
new
->
len
=
len
;
new
->
nonce
=
nonce
;
list_add_tail
(
&
new
->
node
,
head
);
/* dccp_ackvector_print(DCCP_SKB_CB(skb)->dccpd_ack_seq, value, len); */
dccp_ackvec_check_rcv_ackvector
(
dccp_sk
(
sk
)
->
dccps_hc_rx_ackvec
,
sk
,
ackno
,
len
,
value
);
return
0
;
return
0
;
}
}
EXPORT_SYMBOL_GPL
(
dccp_ackvec_parsed_add
);
void
dccp_ackvec_parsed_cleanup
(
struct
list_head
*
parsed_chunks
)
{
struct
dccp_ackvec_parsed
*
cur
,
*
next
;
list_for_each_entry_safe
(
cur
,
next
,
parsed_chunks
,
node
)
kfree
(
cur
);
INIT_LIST_HEAD
(
parsed_chunks
);
}
EXPORT_SYMBOL_GPL
(
dccp_ackvec_parsed_cleanup
);
int
__init
dccp_ackvec_init
(
void
)
int
__init
dccp_ackvec_init
(
void
)
{
{
...
@@ -379,9 +449,10 @@ int __init dccp_ackvec_init(void)
...
@@ -379,9 +449,10 @@ int __init dccp_ackvec_init(void)
if
(
dccp_ackvec_slab
==
NULL
)
if
(
dccp_ackvec_slab
==
NULL
)
goto
out_err
;
goto
out_err
;
dccp_ackvec_record_slab
=
kmem_cache_create
(
"dccp_ackvec_record"
,
dccp_ackvec_record_slab
=
sizeof
(
struct
dccp_ackvec_record
),
kmem_cache_create
(
"dccp_ackvec_record"
,
0
,
SLAB_HWCACHE_ALIGN
,
NULL
);
sizeof
(
struct
dccp_ackvec_record
),
0
,
SLAB_HWCACHE_ALIGN
,
NULL
);
if
(
dccp_ackvec_record_slab
==
NULL
)
if
(
dccp_ackvec_record_slab
==
NULL
)
goto
out_destroy_slab
;
goto
out_destroy_slab
;
...
...
net/dccp/ackvec.h
View file @
ded67c0e
...
@@ -3,134 +3,156 @@
...
@@ -3,134 +3,156 @@
/*
/*
* net/dccp/ackvec.h
* net/dccp/ackvec.h
*
*
* An implementation of Ack Vectors for the DCCP protocol
* An implementation of the DCCP protocol
* Copyright (c) 2007 University of Aberdeen, Scotland, UK
* Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@mandriva.com>
* Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@mandriva.com>
*
* This program is free software; you can redistribute it and/or modify it
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as
* under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
* published by the Free Software Foundation.
*/
*/
#include <linux/dccp.h>
#include <linux/compiler.h>
#include <linux/compiler.h>
#include <linux/ktime.h>
#include <linux/list.h>
#include <linux/list.h>
#include <linux/types.h>
#include <linux/types.h>
/*
/* Read about the ECN nonce to see why it is 253 */
* Ack Vector buffer space is static, in multiples of %DCCP_SINGLE_OPT_MAXLEN,
#define DCCP_MAX_ACKVEC_OPT_LEN 253
* the maximum size of a single Ack Vector. Setting %DCCPAV_NUM_ACKVECS to 1
/* We can spread an ack vector across multiple options */
* will be sufficient for most cases of low Ack Ratios, using a value of 2 gives
#define DCCP_MAX_ACKVEC_LEN (DCCP_MAX_ACKVEC_OPT_LEN * 2)
* more headroom if Ack Ratio is higher or when the sender acknowledges slowly.
* The maximum value is bounded by the u16 types for indices and functions.
*/
#define DCCPAV_NUM_ACKVECS 2
#define DCCPAV_MAX_ACKVEC_LEN (DCCP_SINGLE_OPT_MAXLEN * DCCPAV_NUM_ACKVECS)
/* Estimated minimum average Ack Vector length - used for updating MPS */
#define DCCPAV_MIN_OPTLEN 16
/* Threshold for coping with large bursts of losses */
#define DCCPAV_BURST_THRESH (DCCPAV_MAX_ACKVEC_LEN / 8)
enum
dccp_ackvec_states
{
DCCPAV_RECEIVED
=
0x00
,
DCCPAV_ECN_MARKED
=
0x40
,
DCCPAV_RESERVED
=
0x80
,
DCCPAV_NOT_RECEIVED
=
0xC0
};
#define DCCPAV_MAX_RUNLEN 0x3F
static
inline
u8
dccp_ackvec_runlen
(
const
u8
*
cell
)
#define DCCP_ACKVEC_STATE_RECEIVED 0
{
#define DCCP_ACKVEC_STATE_ECN_MARKED (1 << 6)
return
*
cell
&
DCCPAV_MAX_RUNLEN
;
#define DCCP_ACKVEC_STATE_NOT_RECEIVED (3 << 6)
}
static
inline
u8
dccp_ackvec_state
(
const
u8
*
cell
)
#define DCCP_ACKVEC_STATE_MASK 0xC0
/* 11000000 */
{
#define DCCP_ACKVEC_LEN_MASK 0x3F
/* 00111111 */
return
*
cell
&
~
DCCPAV_MAX_RUNLEN
;
}
/** struct dccp_ackvec - Ack Vector main data structure
/** struct dccp_ackvec - ack vector
*
* This data structure is the one defined in RFC 4340, Appendix A.
*
*
* This implements a fixed-size circular buffer within an array and is largely
* @av_buf_head - circular buffer head
* based on Appendix A of RFC 4340.
* @av_buf_tail - circular buffer tail
* @av_buf_ackno - ack # of the most recent packet acknowledgeable in the
* buffer (i.e. %av_buf_head)
* @av_buf_nonce - the one-bit sum of the ECN Nonces on all packets acked
* by the buffer with State 0
*
*
*
@av_buf: circular buffer storage area
*
Additionally, the HC-Receiver must keep some information about the
*
@av_buf_head: head index; begin of live portion in @av_buf
*
Ack Vectors it has recently sent. For each packet sent carrying an
*
@av_buf_tail: tail index; first index _after_ the live portion in @av_buf
*
Ack Vector, it remembers four variables:
*
@av_buf_ackno: highest seqno of acknowledgeable packet recorded in @av_buf
*
* @av_
tail_ackno: lowest seqno of acknowledgeable packet recorded in @av_buf
* @av_
records - list of dccp_ackvec_record
* @av_
buf_nonce: ECN nonce sums, each covering subsequent segments of up to
* @av_
ack_nonce - the one-bit sum of the ECN Nonces for all State 0.
*
%DCCP_SINGLE_OPT_MAXLEN cells in the live portion of @av_buf
*
* @av_
overflow: if 1 then buf_head == buf_tail indicates buffer wraparound
* @av_
time - the time in usecs
* @av_
records: list of %dccp_ackvec_record (Ack Vectors sent previously)
* @av_
buf - circular buffer of acknowledgeable packets
*/
*/
struct
dccp_ackvec
{
struct
dccp_ackvec
{
u8
av_buf
[
DCCPAV_MAX_ACKVEC_LEN
];
u64
av_buf_ackno
;
u16
av_buf_head
;
u16
av_buf_tail
;
u64
av_buf_ackno
:
48
;
u64
av_tail_ackno
:
48
;
bool
av_buf_nonce
[
DCCPAV_NUM_ACKVECS
];
u8
av_overflow
:
1
;
struct
list_head
av_records
;
struct
list_head
av_records
;
ktime_t
av_time
;
u16
av_buf_head
;
u16
av_vec_len
;
u8
av_buf_nonce
;
u8
av_ack_nonce
;
u8
av_buf
[
DCCP_MAX_ACKVEC_LEN
];
};
};
/** struct dccp_ackvec_record -
Records information about sent Ack Vectors
/** struct dccp_ackvec_record -
ack vector record
*
*
* These list entries define the additional information which the HC-Receiver
* ACK vector record as defined in Appendix A of spec.
* keeps about recently-sent Ack Vectors; again refer to RFC 4340, Appendix A.
*
*
* @avr_node: the list node in @av_records
* The list is sorted by avr_ack_seqno
* @avr_ack_seqno: sequence number of the packet the Ack Vector was sent on
* @avr_ack_ackno: the Ack number that this record/Ack Vector refers to
* @avr_ack_ptr: pointer into @av_buf where this record starts
* @avr_ack_runlen: run length of @avr_ack_ptr at the time of sending
* @avr_ack_nonce: the sum of @av_buf_nonce's at the time this record was sent
*
*
* The list as a whole is sorted in descending order by @avr_ack_seqno.
* @avr_node - node in av_records
* @avr_ack_seqno - sequence number of the packet this record was sent on
* @avr_ack_ackno - sequence number being acknowledged
* @avr_ack_ptr - pointer into av_buf where this record starts
* @avr_ack_nonce - av_ack_nonce at the time this record was sent
* @avr_sent_len - lenght of the record in av_buf
*/
*/
struct
dccp_ackvec_record
{
struct
dccp_ackvec_record
{
struct
list_head
avr_node
;
struct
list_head
avr_node
;
u64
avr_ack_seqno
:
48
;
u64
avr_ack_seqno
;
u64
avr_ack_ackno
:
48
;
u64
avr_ack_ackno
;
u16
avr_ack_ptr
;
u16
avr_ack_ptr
;
u
8
avr_ack_run
len
;
u
16
avr_sent_
len
;
u8
avr_ack_nonce
:
1
;
u8
avr_ack_nonce
;
};
};
extern
int
dccp_ackvec_init
(
void
);
struct
sock
;
struct
sk_buff
;
#ifdef CONFIG_IP_DCCP_ACKVEC
extern
int
dccp_ackvec_init
(
void
);
extern
void
dccp_ackvec_exit
(
void
);
extern
void
dccp_ackvec_exit
(
void
);
extern
struct
dccp_ackvec
*
dccp_ackvec_alloc
(
const
gfp_t
priority
);
extern
struct
dccp_ackvec
*
dccp_ackvec_alloc
(
const
gfp_t
priority
);
extern
void
dccp_ackvec_free
(
struct
dccp_ackvec
*
av
);
extern
void
dccp_ackvec_free
(
struct
dccp_ackvec
*
av
);
extern
void
dccp_ackvec_input
(
struct
dccp_ackvec
*
av
,
struct
sk_buff
*
skb
);
extern
int
dccp_ackvec_add
(
struct
dccp_ackvec
*
av
,
const
struct
sock
*
sk
,
extern
int
dccp_ackvec_update_records
(
struct
dccp_ackvec
*
av
,
u64
seq
,
u8
sum
);
const
u64
ackno
,
const
u8
state
);
extern
void
dccp_ackvec_clear_state
(
struct
dccp_ackvec
*
av
,
const
u64
ackno
);
extern
u16
dccp_ackvec_buflen
(
const
struct
dccp_ackvec
*
av
);
extern
void
dccp_ackvec_check_rcv_ackno
(
struct
dccp_ackvec
*
av
,
struct
sock
*
sk
,
const
u64
ackno
);
extern
int
dccp_ackvec_parse
(
struct
sock
*
sk
,
const
struct
sk_buff
*
skb
,
u64
*
ackno
,
const
u8
opt
,
const
u8
*
value
,
const
u8
len
);
static
inline
bool
dccp_ackvec_is_empty
(
const
struct
dccp_ackvec
*
av
)
extern
int
dccp_insert_option_ackvec
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
static
inline
int
dccp_ackvec_pending
(
const
struct
dccp_ackvec
*
av
)
{
return
av
->
av_vec_len
;
}
#else
/* CONFIG_IP_DCCP_ACKVEC */
static
inline
int
dccp_ackvec_init
(
void
)
{
{
return
av
->
av_overflow
==
0
&&
av
->
av_buf_head
==
av
->
av_buf_tail
;
return
0
;
}
}
/**
static
inline
void
dccp_ackvec_exit
(
void
)
* struct dccp_ackvec_parsed - Record offsets of Ack Vectors in skb
{
* @vec: start of vector (offset into skb)
}
* @len: length of @vec
* @nonce: whether @vec had an ECN nonce of 0 or 1
static
inline
struct
dccp_ackvec
*
dccp_ackvec_alloc
(
const
gfp_t
priority
)
* @node: FIFO - arranged in descending order of ack_ackno
{
* This structure is used by CCIDs to access Ack Vectors in a received skb.
return
NULL
;
*/
}
struct
dccp_ackvec_parsed
{
u8
*
vec
,
static
inline
void
dccp_ackvec_free
(
struct
dccp_ackvec
*
av
)
len
,
{
nonce:
1
;
}
struct
list_head
node
;
};
static
inline
int
dccp_ackvec_add
(
struct
dccp_ackvec
*
av
,
const
struct
sock
*
sk
,
const
u64
ackno
,
const
u8
state
)
{
return
-
1
;
}
extern
int
dccp_ackvec_parsed_add
(
struct
list_head
*
head
,
static
inline
void
dccp_ackvec_check_rcv_ackno
(
struct
dccp_ackvec
*
av
,
u8
*
vec
,
u8
len
,
u8
nonce
);
struct
sock
*
sk
,
const
u64
ackno
)
extern
void
dccp_ackvec_parsed_cleanup
(
struct
list_head
*
parsed_chunks
);
{
}
static
inline
int
dccp_ackvec_parse
(
struct
sock
*
sk
,
const
struct
sk_buff
*
skb
,
const
u64
*
ackno
,
const
u8
opt
,
const
u8
*
value
,
const
u8
len
)
{
return
-
1
;
}
static
inline
int
dccp_insert_option_ackvec
(
const
struct
sock
*
sk
,
const
struct
sk_buff
*
skb
)
{
return
-
1
;
}
static
inline
int
dccp_ackvec_pending
(
const
struct
dccp_ackvec
*
av
)
{
return
0
;
}
#endif
/* CONFIG_IP_DCCP_ACKVEC */
#endif
/* _ACKVEC_H */
#endif
/* _ACKVEC_H */
net/dccp/ccid.c
View file @
ded67c0e
...
@@ -13,13 +13,6 @@
...
@@ -13,13 +13,6 @@
#include "ccid.h"
#include "ccid.h"
static
u8
builtin_ccids
[]
=
{
DCCPC_CCID2
,
/* CCID2 is supported by default */
#if defined(CONFIG_IP_DCCP_CCID3) || defined(CONFIG_IP_DCCP_CCID3_MODULE)
DCCPC_CCID3
,
#endif
};
static
struct
ccid_operations
*
ccids
[
CCID_MAX
];
static
struct
ccid_operations
*
ccids
[
CCID_MAX
];
#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
static
atomic_t
ccids_lockct
=
ATOMIC_INIT
(
0
);
static
atomic_t
ccids_lockct
=
ATOMIC_INIT
(
0
);
...
@@ -93,47 +86,6 @@ static void ccid_kmem_cache_destroy(struct kmem_cache *slab)
...
@@ -93,47 +86,6 @@ static void ccid_kmem_cache_destroy(struct kmem_cache *slab)
}
}
}
}
/* check that up to @array_len members in @ccid_array are supported */
bool
ccid_support_check
(
u8
const
*
ccid_array
,
u8
array_len
)
{
u8
i
,
j
,
found
;
for
(
i
=
0
,
found
=
0
;
i
<
array_len
;
i
++
,
found
=
0
)
{
for
(
j
=
0
;
!
found
&&
j
<
ARRAY_SIZE
(
builtin_ccids
);
j
++
)
found
=
(
ccid_array
[
i
]
==
builtin_ccids
[
j
]);
if
(
!
found
)
return
false
;
}
return
true
;
}
/**
* ccid_get_builtin_ccids - Provide copy of `builtin' CCID array
* @ccid_array: pointer to copy into
* @array_len: value to return length into
* This function allocates memory - caller must see that it is freed after use.
*/
int
ccid_get_builtin_ccids
(
u8
**
ccid_array
,
u8
*
array_len
)
{
*
ccid_array
=
kmemdup
(
builtin_ccids
,
sizeof
(
builtin_ccids
),
gfp_any
());
if
(
*
ccid_array
==
NULL
)
return
-
ENOBUFS
;
*
array_len
=
ARRAY_SIZE
(
builtin_ccids
);
return
0
;
}
int
ccid_getsockopt_builtin_ccids
(
struct
sock
*
sk
,
int
len
,
char
__user
*
optval
,
int
__user
*
optlen
)
{
if
(
len
<
sizeof
(
builtin_ccids
))
return
-
EINVAL
;
if
(
put_user
(
sizeof
(
builtin_ccids
),
optlen
)
||
copy_to_user
(
optval
,
builtin_ccids
,
sizeof
(
builtin_ccids
)))
return
-
EFAULT
;
return
0
;
}
int
ccid_register
(
struct
ccid_operations
*
ccid_ops
)
int
ccid_register
(
struct
ccid_operations
*
ccid_ops
)
{
{
int
err
=
-
ENOBUFS
;
int
err
=
-
ENOBUFS
;
...
@@ -196,41 +148,22 @@ int ccid_unregister(struct ccid_operations *ccid_ops)
...
@@ -196,41 +148,22 @@ int ccid_unregister(struct ccid_operations *ccid_ops)
EXPORT_SYMBOL_GPL
(
ccid_unregister
);
EXPORT_SYMBOL_GPL
(
ccid_unregister
);
/**
* ccid_request_module - Pre-load CCID module for later use
* This should be called only from process context (e.g. during connection
* setup) and is necessary for later calls to ccid_new (typically in software
* interrupt), so that it has the modules available when they are needed.
*/
static
int
ccid_request_module
(
u8
id
)
{
if
(
!
in_atomic
())
{
ccids_read_lock
();
if
(
ccids
[
id
]
==
NULL
)
{
ccids_read_unlock
();
return
request_module
(
"net-dccp-ccid-%d"
,
id
);
}
ccids_read_unlock
();
}
return
0
;
}
int
ccid_request_modules
(
u8
const
*
ccid_array
,
u8
array_len
)
{
#ifdef CONFIG_KMOD
while
(
array_len
--
)
if
(
ccid_request_module
(
ccid_array
[
array_len
]))
return
-
1
;
#endif
return
0
;
}
struct
ccid
*
ccid_new
(
unsigned
char
id
,
struct
sock
*
sk
,
int
rx
,
gfp_t
gfp
)
struct
ccid
*
ccid_new
(
unsigned
char
id
,
struct
sock
*
sk
,
int
rx
,
gfp_t
gfp
)
{
{
struct
ccid_operations
*
ccid_ops
;
struct
ccid_operations
*
ccid_ops
;
struct
ccid
*
ccid
=
NULL
;
struct
ccid
*
ccid
=
NULL
;
ccids_read_lock
();
ccids_read_lock
();
#ifdef CONFIG_KMOD
if
(
ccids
[
id
]
==
NULL
)
{
/* We only try to load if in process context */
ccids_read_unlock
();
if
(
gfp
&
GFP_ATOMIC
)
goto
out
;
request_module
(
"net-dccp-ccid-%d"
,
id
);
ccids_read_lock
();
}
#endif
ccid_ops
=
ccids
[
id
];
ccid_ops
=
ccids
[
id
];
if
(
ccid_ops
==
NULL
)
if
(
ccid_ops
==
NULL
)
goto
out_unlock
;
goto
out_unlock
;
...
@@ -272,6 +205,20 @@ struct ccid *ccid_new(unsigned char id, struct sock *sk, int rx, gfp_t gfp)
...
@@ -272,6 +205,20 @@ struct ccid *ccid_new(unsigned char id, struct sock *sk, int rx, gfp_t gfp)
EXPORT_SYMBOL_GPL
(
ccid_new
);
EXPORT_SYMBOL_GPL
(
ccid_new
);
struct
ccid
*
ccid_hc_rx_new
(
unsigned
char
id
,
struct
sock
*
sk
,
gfp_t
gfp
)
{
return
ccid_new
(
id
,
sk
,
1
,
gfp
);
}
EXPORT_SYMBOL_GPL
(
ccid_hc_rx_new
);
struct
ccid
*
ccid_hc_tx_new
(
unsigned
char
id
,
struct
sock
*
sk
,
gfp_t
gfp
)
{
return
ccid_new
(
id
,
sk
,
0
,
gfp
);
}
EXPORT_SYMBOL_GPL
(
ccid_hc_tx_new
);
static
void
ccid_delete
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
int
rx
)
static
void
ccid_delete
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
int
rx
)
{
{
struct
ccid_operations
*
ccid_ops
;
struct
ccid_operations
*
ccid_ops
;
...
...
net/dccp/ccid.h
View file @
ded67c0e
...
@@ -60,18 +60,22 @@ struct ccid_operations {
...
@@ -60,18 +60,22 @@ struct ccid_operations {
void
(
*
ccid_hc_tx_exit
)(
struct
sock
*
sk
);
void
(
*
ccid_hc_tx_exit
)(
struct
sock
*
sk
);
void
(
*
ccid_hc_rx_packet_recv
)(
struct
sock
*
sk
,
void
(
*
ccid_hc_rx_packet_recv
)(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
struct
sk_buff
*
skb
);
int
(
*
ccid_hc_rx_parse_options
)(
struct
sock
*
sk
,
u8
pkt
,
int
(
*
ccid_hc_rx_parse_options
)(
struct
sock
*
sk
,
u8
opt
,
u8
*
val
,
u8
len
);
unsigned
char
option
,
unsigned
char
len
,
u16
idx
,
unsigned
char
*
value
);
int
(
*
ccid_hc_rx_insert_options
)(
struct
sock
*
sk
,
int
(
*
ccid_hc_rx_insert_options
)(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
struct
sk_buff
*
skb
);
void
(
*
ccid_hc_tx_packet_recv
)(
struct
sock
*
sk
,
void
(
*
ccid_hc_tx_packet_recv
)(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
struct
sk_buff
*
skb
);
int
(
*
ccid_hc_tx_parse_options
)(
struct
sock
*
sk
,
u8
pkt
,
int
(
*
ccid_hc_tx_parse_options
)(
struct
sock
*
sk
,
u8
opt
,
u8
*
val
,
u8
len
);
unsigned
char
option
,
unsigned
char
len
,
u16
idx
,
unsigned
char
*
value
);
int
(
*
ccid_hc_tx_send_packet
)(
struct
sock
*
sk
,
int
(
*
ccid_hc_tx_send_packet
)(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
struct
sk_buff
*
skb
);
void
(
*
ccid_hc_tx_packet_sent
)(
struct
sock
*
sk
,
void
(
*
ccid_hc_tx_packet_sent
)(
struct
sock
*
sk
,
unsigned
int
len
);
int
more
,
unsigned
int
len
);
void
(
*
ccid_hc_rx_get_info
)(
struct
sock
*
sk
,
void
(
*
ccid_hc_rx_get_info
)(
struct
sock
*
sk
,
struct
tcp_info
*
info
);
struct
tcp_info
*
info
);
void
(
*
ccid_hc_tx_get_info
)(
struct
sock
*
sk
,
void
(
*
ccid_hc_tx_get_info
)(
struct
sock
*
sk
,
...
@@ -99,78 +103,31 @@ static inline void *ccid_priv(const struct ccid *ccid)
...
@@ -99,78 +103,31 @@ static inline void *ccid_priv(const struct ccid *ccid)
return
(
void
*
)
ccid
->
ccid_priv
;
return
(
void
*
)
ccid
->
ccid_priv
;
}
}
extern
bool
ccid_support_check
(
u8
const
*
ccid_array
,
u8
array_len
);
extern
int
ccid_get_builtin_ccids
(
u8
**
ccid_array
,
u8
*
array_len
);
extern
int
ccid_getsockopt_builtin_ccids
(
struct
sock
*
sk
,
int
len
,
char
__user
*
,
int
__user
*
);
extern
int
ccid_request_modules
(
u8
const
*
ccid_array
,
u8
array_len
);
extern
struct
ccid
*
ccid_new
(
unsigned
char
id
,
struct
sock
*
sk
,
int
rx
,
extern
struct
ccid
*
ccid_new
(
unsigned
char
id
,
struct
sock
*
sk
,
int
rx
,
gfp_t
gfp
);
gfp_t
gfp
);
static
inline
int
ccid_get_current_rx_ccid
(
struct
dccp_sock
*
dp
)
extern
struct
ccid
*
ccid_hc_rx_new
(
unsigned
char
id
,
struct
sock
*
sk
,
{
gfp_t
gfp
);
struct
ccid
*
ccid
=
dp
->
dccps_hc_rx_ccid
;
extern
struct
ccid
*
ccid_hc_tx_new
(
unsigned
char
id
,
struct
sock
*
sk
,
gfp_t
gfp
);
if
(
ccid
==
NULL
||
ccid
->
ccid_ops
==
NULL
)
return
-
1
;
return
ccid
->
ccid_ops
->
ccid_id
;
}
static
inline
int
ccid_get_current_tx_ccid
(
struct
dccp_sock
*
dp
)
{
struct
ccid
*
ccid
=
dp
->
dccps_hc_tx_ccid
;
if
(
ccid
==
NULL
||
ccid
->
ccid_ops
==
NULL
)
return
-
1
;
return
ccid
->
ccid_ops
->
ccid_id
;
}
extern
void
ccid_hc_rx_delete
(
struct
ccid
*
ccid
,
struct
sock
*
sk
);
extern
void
ccid_hc_rx_delete
(
struct
ccid
*
ccid
,
struct
sock
*
sk
);
extern
void
ccid_hc_tx_delete
(
struct
ccid
*
ccid
,
struct
sock
*
sk
);
extern
void
ccid_hc_tx_delete
(
struct
ccid
*
ccid
,
struct
sock
*
sk
);
/*
* Congestion control of queued data packets via CCID decision.
*
* The TX CCID performs its congestion-control by indicating whether and when a
* queued packet may be sent, using the return code of ccid_hc_tx_send_packet().
* The following modes are supported via the symbolic constants below:
* - timer-based pacing (CCID returns a delay value in milliseconds);
* - autonomous dequeueing (CCID internally schedules dccps_xmitlet).
*/
enum
ccid_dequeueing_decision
{
CCID_PACKET_SEND_AT_ONCE
=
0x00000
,
/* "green light": no delay */
CCID_PACKET_DELAY_MAX
=
0x0FFFF
,
/* maximum delay in msecs */
CCID_PACKET_DELAY
=
0x10000
,
/* CCID msec-delay mode */
CCID_PACKET_WILL_DEQUEUE_LATER
=
0x20000
,
/* CCID autonomous mode */
CCID_PACKET_ERR
=
0xF0000
,
/* error condition */
};
static
inline
int
ccid_packet_dequeue_eval
(
const
int
return_code
)
{
if
(
return_code
<
0
)
return
CCID_PACKET_ERR
;
if
(
return_code
==
0
)
return
CCID_PACKET_SEND_AT_ONCE
;
if
(
return_code
<=
CCID_PACKET_DELAY_MAX
)
return
CCID_PACKET_DELAY
;
return
return_code
;
}
static
inline
int
ccid_hc_tx_send_packet
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
static
inline
int
ccid_hc_tx_send_packet
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
struct
sk_buff
*
skb
)
{
{
int
rc
=
0
;
if
(
ccid
->
ccid_ops
->
ccid_hc_tx_send_packet
!=
NULL
)
if
(
ccid
->
ccid_ops
->
ccid_hc_tx_send_packet
!=
NULL
)
r
eturn
ccid
->
ccid_ops
->
ccid_hc_tx_send_packet
(
sk
,
skb
);
r
c
=
ccid
->
ccid_ops
->
ccid_hc_tx_send_packet
(
sk
,
skb
);
return
CCID_PACKET_SEND_AT_ONCE
;
return
rc
;
}
}
static
inline
void
ccid_hc_tx_packet_sent
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
static
inline
void
ccid_hc_tx_packet_sent
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
unsigned
int
len
)
int
more
,
unsigned
int
len
)
{
{
if
(
ccid
->
ccid_ops
->
ccid_hc_tx_packet_sent
!=
NULL
)
if
(
ccid
->
ccid_ops
->
ccid_hc_tx_packet_sent
!=
NULL
)
ccid
->
ccid_ops
->
ccid_hc_tx_packet_sent
(
sk
,
len
);
ccid
->
ccid_ops
->
ccid_hc_tx_packet_sent
(
sk
,
more
,
len
);
}
}
static
inline
void
ccid_hc_rx_packet_recv
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
static
inline
void
ccid_hc_rx_packet_recv
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
...
@@ -187,31 +144,27 @@ static inline void ccid_hc_tx_packet_recv(struct ccid *ccid, struct sock *sk,
...
@@ -187,31 +144,27 @@ static inline void ccid_hc_tx_packet_recv(struct ccid *ccid, struct sock *sk,
ccid
->
ccid_ops
->
ccid_hc_tx_packet_recv
(
sk
,
skb
);
ccid
->
ccid_ops
->
ccid_hc_tx_packet_recv
(
sk
,
skb
);
}
}
/**
* ccid_hc_tx_parse_options - Parse CCID-specific options sent by the receiver
* @pkt: type of packet that @opt appears on (RFC 4340, 5.1)
* @opt: the CCID-specific option type (RFC 4340, 5.8 and 10.3)
* @val: value of @opt
* @len: length of @val in bytes
*/
static
inline
int
ccid_hc_tx_parse_options
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
static
inline
int
ccid_hc_tx_parse_options
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
u8
pkt
,
u8
opt
,
u8
*
val
,
u8
len
)
unsigned
char
option
,
unsigned
char
len
,
u16
idx
,
unsigned
char
*
value
)
{
{
if
(
ccid
->
ccid_ops
->
ccid_hc_tx_parse_options
==
NULL
)
int
rc
=
0
;
return
0
;
if
(
ccid
->
ccid_ops
->
ccid_hc_tx_parse_options
!=
NULL
)
return
ccid
->
ccid_ops
->
ccid_hc_tx_parse_options
(
sk
,
pkt
,
opt
,
val
,
len
);
rc
=
ccid
->
ccid_ops
->
ccid_hc_tx_parse_options
(
sk
,
option
,
len
,
idx
,
value
);
return
rc
;
}
}
/**
* ccid_hc_rx_parse_options - Parse CCID-specific options sent by the sender
* Arguments are analogous to ccid_hc_tx_parse_options()
*/
static
inline
int
ccid_hc_rx_parse_options
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
static
inline
int
ccid_hc_rx_parse_options
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
u8
pkt
,
u8
opt
,
u8
*
val
,
u8
len
)
unsigned
char
option
,
unsigned
char
len
,
u16
idx
,
unsigned
char
*
value
)
{
{
if
(
ccid
->
ccid_ops
->
ccid_hc_rx_parse_options
==
NULL
)
int
rc
=
0
;
return
0
;
if
(
ccid
->
ccid_ops
->
ccid_hc_rx_parse_options
!=
NULL
)
return
ccid
->
ccid_ops
->
ccid_hc_rx_parse_options
(
sk
,
pkt
,
opt
,
val
,
len
);
rc
=
ccid
->
ccid_ops
->
ccid_hc_rx_parse_options
(
sk
,
option
,
len
,
idx
,
value
);
return
rc
;
}
}
static
inline
int
ccid_hc_rx_insert_options
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
static
inline
int
ccid_hc_rx_insert_options
(
struct
ccid
*
ccid
,
struct
sock
*
sk
,
...
...
net/dccp/ccids/Kconfig
View file @
ded67c0e
menu "DCCP CCIDs Configuration (EXPERIMENTAL)"
menu "DCCP CCIDs Configuration (EXPERIMENTAL)"
depends on EXPERIMENTAL
config IP_DCCP_CCID2
config IP_DCCP_CCID2
tristate "CCID2 (TCP-Like)"
tristate "CCID2 (TCP-Like)
(EXPERIMENTAL)
"
def_tristate IP_DCCP
def_tristate IP_DCCP
select IP_DCCP_ACKVEC
---help---
---help---
CCID 2, TCP-like Congestion Control, denotes Additive Increase,
CCID 2, TCP-like Congestion Control, denotes Additive Increase,
Multiplicative Decrease (AIMD) congestion control with behavior
Multiplicative Decrease (AIMD) congestion control with behavior
...
@@ -34,7 +36,7 @@ config IP_DCCP_CCID2_DEBUG
...
@@ -34,7 +36,7 @@ config IP_DCCP_CCID2_DEBUG
If in doubt, say N.
If in doubt, say N.
config IP_DCCP_CCID3
config IP_DCCP_CCID3
tristate "CCID3 (TCP-Friendly)"
tristate "CCID3 (TCP-Friendly)
(EXPERIMENTAL)
"
def_tristate IP_DCCP
def_tristate IP_DCCP
select IP_DCCP_TFRC_LIB
select IP_DCCP_TFRC_LIB
---help---
---help---
...
@@ -62,9 +64,9 @@ config IP_DCCP_CCID3
...
@@ -62,9 +64,9 @@ config IP_DCCP_CCID3
If in doubt, say M.
If in doubt, say M.
if IP_DCCP_CCID3
config IP_DCCP_CCID3_DEBUG
config IP_DCCP_CCID3_DEBUG
bool "CCID3 debugging messages"
bool "CCID3 debugging messages"
depends on IP_DCCP_CCID3
---help---
---help---
Enable CCID3-specific debugging messages.
Enable CCID3-specific debugging messages.
...
@@ -74,29 +76,10 @@ config IP_DCCP_CCID3_DEBUG
...
@@ -74,29 +76,10 @@ config IP_DCCP_CCID3_DEBUG
If in doubt, say N.
If in doubt, say N.
choice
prompt "Select method for measuring the packet size s"
default IP_DCCP_CCID3_MEASURE_S_AS_MPS
config IP_DCCP_CCID3_MEASURE_S_AS_MPS
bool "Always use MPS in place of s"
---help---
This use is recommended as it is consistent with the initialisation
of X and suggested when s varies (rfc3448bis, (1) in section 4.1).
config IP_DCCP_CCID3_MEASURE_S_AS_AVG
bool "Use moving average"
---help---
An alternative way of tracking s, also supported by rfc3448bis.
This used to be the default for CCID-3 in previous kernels.
config IP_DCCP_CCID3_MEASURE_S_AS_MAX
bool "Track the maximum payload length"
---help---
An experimental method based on tracking the maximum packet size.
endchoice
config IP_DCCP_CCID3_RTO
config IP_DCCP_CCID3_RTO
int "Use higher bound for nofeedback timer"
int "Use higher bound for nofeedback timer"
default 100
default 100
depends on IP_DCCP_CCID3 && EXPERIMENTAL
---help---
---help---
Use higher lower bound for nofeedback timer expiration.
Use higher lower bound for nofeedback timer expiration.
...
@@ -123,7 +106,6 @@ config IP_DCCP_CCID3_RTO
...
@@ -123,7 +106,6 @@ config IP_DCCP_CCID3_RTO
The purpose of the nofeedback timer is to slow DCCP down when there
The purpose of the nofeedback timer is to slow DCCP down when there
is serious network congestion: experimenting with larger values should
is serious network congestion: experimenting with larger values should
therefore not be performed on WANs.
therefore not be performed on WANs.
endif # IP_DCCP_CCID3
config IP_DCCP_TFRC_LIB
config IP_DCCP_TFRC_LIB
tristate
tristate
...
...
net/dccp/ccids/ccid2.c
View file @
ded67c0e
...
@@ -25,7 +25,7 @@
...
@@ -25,7 +25,7 @@
/*
/*
* This implementation should follow RFC 4341
* This implementation should follow RFC 4341
*/
*/
#include "../feat.h"
#include "../ccid.h"
#include "../ccid.h"
#include "../dccp.h"
#include "../dccp.h"
#include "ccid2.h"
#include "ccid2.h"
...
@@ -34,8 +34,51 @@
...
@@ -34,8 +34,51 @@
#ifdef CONFIG_IP_DCCP_CCID2_DEBUG
#ifdef CONFIG_IP_DCCP_CCID2_DEBUG
static
int
ccid2_debug
;
static
int
ccid2_debug
;
#define ccid2_pr_debug(format, a...) DCCP_PR_DEBUG(ccid2_debug, format, ##a)
#define ccid2_pr_debug(format, a...) DCCP_PR_DEBUG(ccid2_debug, format, ##a)
static
void
ccid2_hc_tx_check_sanity
(
const
struct
ccid2_hc_tx_sock
*
hctx
)
{
int
len
=
0
;
int
pipe
=
0
;
struct
ccid2_seq
*
seqp
=
hctx
->
ccid2hctx_seqh
;
/* there is data in the chain */
if
(
seqp
!=
hctx
->
ccid2hctx_seqt
)
{
seqp
=
seqp
->
ccid2s_prev
;
len
++
;
if
(
!
seqp
->
ccid2s_acked
)
pipe
++
;
while
(
seqp
!=
hctx
->
ccid2hctx_seqt
)
{
struct
ccid2_seq
*
prev
=
seqp
->
ccid2s_prev
;
len
++
;
if
(
!
prev
->
ccid2s_acked
)
pipe
++
;
/* packets are sent sequentially */
BUG_ON
(
dccp_delta_seqno
(
seqp
->
ccid2s_seq
,
prev
->
ccid2s_seq
)
>=
0
);
BUG_ON
(
time_before
(
seqp
->
ccid2s_sent
,
prev
->
ccid2s_sent
));
seqp
=
prev
;
}
}
BUG_ON
(
pipe
!=
hctx
->
ccid2hctx_pipe
);
ccid2_pr_debug
(
"len of chain=%d
\n
"
,
len
);
do
{
seqp
=
seqp
->
ccid2s_prev
;
len
++
;
}
while
(
seqp
!=
hctx
->
ccid2hctx_seqh
);
ccid2_pr_debug
(
"total len=%d
\n
"
,
len
);
BUG_ON
(
len
!=
hctx
->
ccid2hctx_seqbufc
*
CCID2_SEQBUF_LEN
);
}
#else
#else
#define ccid2_pr_debug(format, a...)
#define ccid2_pr_debug(format, a...)
#define ccid2_hc_tx_check_sanity(hctx)
#endif
#endif
static
int
ccid2_hc_tx_alloc_seq
(
struct
ccid2_hc_tx_sock
*
hctx
)
static
int
ccid2_hc_tx_alloc_seq
(
struct
ccid2_hc_tx_sock
*
hctx
)
...
@@ -44,7 +87,8 @@ static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock *hctx)
...
@@ -44,7 +87,8 @@ static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock *hctx)
int
i
;
int
i
;
/* check if we have space to preserve the pointer to the buffer */
/* check if we have space to preserve the pointer to the buffer */
if
(
hctx
->
seqbufc
>=
sizeof
(
hctx
->
seqbuf
)
/
sizeof
(
struct
ccid2_seq
*
))
if
(
hctx
->
ccid2hctx_seqbufc
>=
(
sizeof
(
hctx
->
ccid2hctx_seqbuf
)
/
sizeof
(
struct
ccid2_seq
*
)))
return
-
ENOMEM
;
return
-
ENOMEM
;
/* allocate buffer and initialize linked list */
/* allocate buffer and initialize linked list */
...
@@ -60,35 +104,38 @@ static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock *hctx)
...
@@ -60,35 +104,38 @@ static int ccid2_hc_tx_alloc_seq(struct ccid2_hc_tx_sock *hctx)
seqp
->
ccid2s_prev
=
&
seqp
[
CCID2_SEQBUF_LEN
-
1
];
seqp
->
ccid2s_prev
=
&
seqp
[
CCID2_SEQBUF_LEN
-
1
];
/* This is the first allocation. Initiate the head and tail. */
/* This is the first allocation. Initiate the head and tail. */
if
(
hctx
->
seqbufc
==
0
)
if
(
hctx
->
ccid2hctx_
seqbufc
==
0
)
hctx
->
seqh
=
hctx
->
seqt
=
seqp
;
hctx
->
ccid2hctx_seqh
=
hctx
->
ccid2hctx_
seqt
=
seqp
;
else
{
else
{
/* link the existing list with the one we just created */
/* link the existing list with the one we just created */
hctx
->
seqh
->
ccid2s_next
=
seqp
;
hctx
->
ccid2hctx_
seqh
->
ccid2s_next
=
seqp
;
seqp
->
ccid2s_prev
=
hctx
->
seqh
;
seqp
->
ccid2s_prev
=
hctx
->
ccid2hctx_
seqh
;
hctx
->
seqt
->
ccid2s_prev
=
&
seqp
[
CCID2_SEQBUF_LEN
-
1
];
hctx
->
ccid2hctx_
seqt
->
ccid2s_prev
=
&
seqp
[
CCID2_SEQBUF_LEN
-
1
];
seqp
[
CCID2_SEQBUF_LEN
-
1
].
ccid2s_next
=
hctx
->
seqt
;
seqp
[
CCID2_SEQBUF_LEN
-
1
].
ccid2s_next
=
hctx
->
ccid2hctx_
seqt
;
}
}
/* store the original pointer to the buffer so we can free it */
/* store the original pointer to the buffer so we can free it */
hctx
->
seqbuf
[
hctx
->
seqbufc
]
=
seqp
;
hctx
->
ccid2hctx_seqbuf
[
hctx
->
ccid2hctx_
seqbufc
]
=
seqp
;
hctx
->
seqbufc
++
;
hctx
->
ccid2hctx_
seqbufc
++
;
return
0
;
return
0
;
}
}
static
int
ccid2_hc_tx_send_packet
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
int
ccid2_hc_tx_send_packet
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
{
if
(
ccid2_cwnd_network_limited
(
ccid2_hc_tx_sk
(
sk
)))
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
return
CCID_PACKET_WILL_DEQUEUE_LATER
;
return
CCID_PACKET_SEND_AT_ONCE
;
if
(
hctx
->
ccid2hctx_pipe
<
hctx
->
ccid2hctx_cwnd
)
return
0
;
return
1
;
/* XXX CCID should dequeue when ready instead of polling */
}
}
static
void
ccid2_change_l_ack_ratio
(
struct
sock
*
sk
,
u32
val
)
static
void
ccid2_change_l_ack_ratio
(
struct
sock
*
sk
,
u32
val
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
u32
max_ratio
=
DIV_ROUND_UP
(
ccid2_hc_tx_sk
(
sk
)
->
cwnd
,
2
);
u32
max_ratio
=
DIV_ROUND_UP
(
ccid2_hc_tx_sk
(
sk
)
->
c
cid2hctx_c
wnd
,
2
);
/*
/*
* Ensure that Ack Ratio does not exceed ceil(cwnd/2), which is (2) from
* Ensure that Ack Ratio does not exceed ceil(cwnd/2), which is (2) from
...
@@ -100,8 +147,8 @@ static void ccid2_change_l_ack_ratio(struct sock *sk, u32 val)
...
@@ -100,8 +147,8 @@ static void ccid2_change_l_ack_ratio(struct sock *sk, u32 val)
DCCP_WARN
(
"Limiting Ack Ratio (%u) to %u
\n
"
,
val
,
max_ratio
);
DCCP_WARN
(
"Limiting Ack Ratio (%u) to %u
\n
"
,
val
,
max_ratio
);
val
=
max_ratio
;
val
=
max_ratio
;
}
}
if
(
val
>
DCCPF_ACK_RATIO_MAX
)
if
(
val
>
0xFFFF
)
/* RFC 4340, 11.3 */
val
=
DCCPF_ACK_RATIO_MAX
;
val
=
0xFFFF
;
if
(
val
==
dp
->
dccps_l_ack_ratio
)
if
(
val
==
dp
->
dccps_l_ack_ratio
)
return
;
return
;
...
@@ -110,77 +157,99 @@ static void ccid2_change_l_ack_ratio(struct sock *sk, u32 val)
...
@@ -110,77 +157,99 @@ static void ccid2_change_l_ack_ratio(struct sock *sk, u32 val)
dp
->
dccps_l_ack_ratio
=
val
;
dp
->
dccps_l_ack_ratio
=
val
;
}
}
static
void
ccid2_change_srtt
(
struct
ccid2_hc_tx_sock
*
hctx
,
long
val
)
{
ccid2_pr_debug
(
"change SRTT to %ld
\n
"
,
val
);
hctx
->
ccid2hctx_srtt
=
val
;
}
static
void
ccid2_start_rto_timer
(
struct
sock
*
sk
);
static
void
ccid2_hc_tx_rto_expire
(
unsigned
long
data
)
static
void
ccid2_hc_tx_rto_expire
(
unsigned
long
data
)
{
{
struct
sock
*
sk
=
(
struct
sock
*
)
data
;
struct
sock
*
sk
=
(
struct
sock
*
)
data
;
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
const
bool
sender_was_blocked
=
ccid2_cwnd_network_limited
(
hctx
)
;
long
s
;
bh_lock_sock
(
sk
);
bh_lock_sock
(
sk
);
if
(
sock_owned_by_user
(
sk
))
{
if
(
sock_owned_by_user
(
sk
))
{
sk_reset_timer
(
sk
,
&
hctx
->
rtotimer
,
jiffies
+
HZ
/
5
);
sk_reset_timer
(
sk
,
&
hctx
->
ccid2hctx_rtotimer
,
jiffies
+
HZ
/
5
);
goto
out
;
goto
out
;
}
}
ccid2_pr_debug
(
"RTO_EXPIRE
\n
"
);
ccid2_pr_debug
(
"RTO_EXPIRE
\n
"
);
ccid2_hc_tx_check_sanity
(
hctx
);
/* back-off timer */
/* back-off timer */
hctx
->
rto
<<=
1
;
hctx
->
ccid2hctx_rto
<<=
1
;
if
(
hctx
->
rto
>
DCCP_RTO_MAX
)
hctx
->
rto
=
DCCP_RTO_MAX
;
s
=
hctx
->
ccid2hctx_rto
/
HZ
;
if
(
s
>
60
)
hctx
->
ccid2hctx_rto
=
60
*
HZ
;
ccid2_start_rto_timer
(
sk
);
/* adjust pipe, cwnd etc */
/* adjust pipe, cwnd etc */
hctx
->
ssthresh
=
hctx
->
cwnd
/
2
;
hctx
->
ccid2hctx_ssthresh
=
hctx
->
ccid2hctx_
cwnd
/
2
;
if
(
hctx
->
ssthresh
<
2
)
if
(
hctx
->
ccid2hctx_
ssthresh
<
2
)
hctx
->
ssthresh
=
2
;
hctx
->
ccid2hctx_
ssthresh
=
2
;
hctx
->
c
wnd
=
1
;
hctx
->
c
cid2hctx_cwnd
=
1
;
hctx
->
pipe
=
0
;
hctx
->
ccid2hctx_pipe
=
0
;
/* clear state about stuff we sent */
/* clear state about stuff we sent */
hctx
->
seqt
=
hctx
->
seqh
;
hctx
->
ccid2hctx_seqt
=
hctx
->
ccid2hctx_
seqh
;
hctx
->
packets_acked
=
0
;
hctx
->
ccid2hctx_
packets_acked
=
0
;
/* clear ack ratio state. */
/* clear ack ratio state. */
hctx
->
rpseq
=
0
;
hctx
->
ccid2hctx_rpseq
=
0
;
hctx
->
rpdupack
=
-
1
;
hctx
->
ccid2hctx_
rpdupack
=
-
1
;
ccid2_change_l_ack_ratio
(
sk
,
1
);
ccid2_change_l_ack_ratio
(
sk
,
1
);
ccid2_hc_tx_check_sanity
(
hctx
);
/* if we were blocked before, we may now send cwnd=1 packet */
if
(
sender_was_blocked
)
tasklet_schedule
(
&
dccp_sk
(
sk
)
->
dccps_xmitlet
);
/* restart backed-off timer */
sk_reset_timer
(
sk
,
&
hctx
->
rtotimer
,
jiffies
+
hctx
->
rto
);
out:
out:
bh_unlock_sock
(
sk
);
bh_unlock_sock
(
sk
);
sock_put
(
sk
);
sock_put
(
sk
);
}
}
static
void
ccid2_hc_tx_packet_sent
(
struct
sock
*
sk
,
unsigned
int
len
)
static
void
ccid2_start_rto_timer
(
struct
sock
*
sk
)
{
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
ccid2_pr_debug
(
"setting RTO timeout=%ld
\n
"
,
hctx
->
ccid2hctx_rto
);
BUG_ON
(
timer_pending
(
&
hctx
->
ccid2hctx_rtotimer
));
sk_reset_timer
(
sk
,
&
hctx
->
ccid2hctx_rtotimer
,
jiffies
+
hctx
->
ccid2hctx_rto
);
}
static
void
ccid2_hc_tx_packet_sent
(
struct
sock
*
sk
,
int
more
,
unsigned
int
len
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
struct
ccid2_seq
*
next
;
struct
ccid2_seq
*
next
;
hctx
->
pipe
++
;
hctx
->
ccid2hctx_
pipe
++
;
hctx
->
seqh
->
ccid2s_seq
=
dp
->
dccps_gss
;
hctx
->
ccid2hctx_
seqh
->
ccid2s_seq
=
dp
->
dccps_gss
;
hctx
->
seqh
->
ccid2s_acked
=
0
;
hctx
->
ccid2hctx_
seqh
->
ccid2s_acked
=
0
;
hctx
->
seqh
->
ccid2s_sent
=
jiffies
;
hctx
->
ccid2hctx_
seqh
->
ccid2s_sent
=
jiffies
;
next
=
hctx
->
seqh
->
ccid2s_next
;
next
=
hctx
->
ccid2hctx_
seqh
->
ccid2s_next
;
/* check if we need to alloc more space */
/* check if we need to alloc more space */
if
(
next
==
hctx
->
seqt
)
{
if
(
next
==
hctx
->
ccid2hctx_
seqt
)
{
if
(
ccid2_hc_tx_alloc_seq
(
hctx
))
{
if
(
ccid2_hc_tx_alloc_seq
(
hctx
))
{
DCCP_CRIT
(
"packet history - out of memory!"
);
DCCP_CRIT
(
"packet history - out of memory!"
);
/* FIXME: find a more graceful way to bail out */
/* FIXME: find a more graceful way to bail out */
return
;
return
;
}
}
next
=
hctx
->
seqh
->
ccid2s_next
;
next
=
hctx
->
ccid2hctx_
seqh
->
ccid2s_next
;
BUG_ON
(
next
==
hctx
->
seqt
);
BUG_ON
(
next
==
hctx
->
ccid2hctx_
seqt
);
}
}
hctx
->
seqh
=
next
;
hctx
->
ccid2hctx_
seqh
=
next
;
ccid2_pr_debug
(
"cwnd=%d pipe=%d
\n
"
,
hctx
->
cwnd
,
hctx
->
pipe
);
ccid2_pr_debug
(
"cwnd=%d pipe=%d
\n
"
,
hctx
->
ccid2hctx_cwnd
,
hctx
->
ccid2hctx_pipe
);
/*
/*
* FIXME: The code below is broken and the variables have been removed
* FIXME: The code below is broken and the variables have been removed
...
@@ -203,12 +272,12 @@ static void ccid2_hc_tx_packet_sent(struct sock *sk, unsigned int len)
...
@@ -203,12 +272,12 @@ static void ccid2_hc_tx_packet_sent(struct sock *sk, unsigned int len)
*/
*/
#if 0
#if 0
/* Ack Ratio. Need to maintain a concept of how many windows we sent */
/* Ack Ratio. Need to maintain a concept of how many windows we sent */
hctx->arsent++;
hctx->
ccid2hctx_
arsent++;
/* We had an ack loss in this window... */
/* We had an ack loss in this window... */
if (hctx->ackloss) {
if (hctx->
ccid2hctx_
ackloss) {
if (hctx->
arsent >= hctx->
cwnd) {
if (hctx->
ccid2hctx_arsent >= hctx->ccid2hctx_
cwnd) {
hctx->
arsent
= 0;
hctx->
ccid2hctx_arsent
= 0;
hctx->
ackloss
= 0;
hctx->
ccid2hctx_ackloss
= 0;
}
}
} else {
} else {
/* No acks lost up to now... */
/* No acks lost up to now... */
...
@@ -218,28 +287,28 @@ static void ccid2_hc_tx_packet_sent(struct sock *sk, unsigned int len)
...
@@ -218,28 +287,28 @@ static void ccid2_hc_tx_packet_sent(struct sock *sk, unsigned int len)
int denom = dp->dccps_l_ack_ratio * dp->dccps_l_ack_ratio -
int denom = dp->dccps_l_ack_ratio * dp->dccps_l_ack_ratio -
dp->dccps_l_ack_ratio;
dp->dccps_l_ack_ratio;
denom = hctx->c
wnd * hctx->
cwnd / denom;
denom = hctx->c
cid2hctx_cwnd * hctx->ccid2hctx_
cwnd / denom;
if (hctx->arsent >= denom) {
if (hctx->
ccid2hctx_
arsent >= denom) {
ccid2_change_l_ack_ratio(sk, dp->dccps_l_ack_ratio - 1);
ccid2_change_l_ack_ratio(sk, dp->dccps_l_ack_ratio - 1);
hctx->arsent = 0;
hctx->
ccid2hctx_
arsent = 0;
}
}
} else {
} else {
/* we can't increase ack ratio further [1] */
/* we can't increase ack ratio further [1] */
hctx->arsent = 0; /* or maybe set it to cwnd*/
hctx->
ccid2hctx_
arsent = 0; /* or maybe set it to cwnd*/
}
}
}
}
#endif
#endif
/* setup RTO timer */
/* setup RTO timer */
if
(
!
timer_pending
(
&
hctx
->
rtotimer
))
if
(
!
timer_pending
(
&
hctx
->
ccid2hctx_
rtotimer
))
sk_reset_timer
(
sk
,
&
hctx
->
rtotimer
,
jiffies
+
hctx
->
rto
);
ccid2_start_rto_timer
(
sk
);
#ifdef CONFIG_IP_DCCP_CCID2_DEBUG
#ifdef CONFIG_IP_DCCP_CCID2_DEBUG
do
{
do
{
struct
ccid2_seq
*
seqp
=
hctx
->
seqt
;
struct
ccid2_seq
*
seqp
=
hctx
->
ccid2hctx_
seqt
;
while
(
seqp
!=
hctx
->
seqh
)
{
while
(
seqp
!=
hctx
->
ccid2hctx_
seqh
)
{
ccid2_pr_debug
(
"out seq=%llu acked=%d time=%lu
\n
"
,
ccid2_pr_debug
(
"out seq=%llu acked=%d time=%lu
\n
"
,
(
unsigned
long
long
)
seqp
->
ccid2s_seq
,
(
unsigned
long
long
)
seqp
->
ccid2s_seq
,
seqp
->
ccid2s_acked
,
seqp
->
ccid2s_sent
);
seqp
->
ccid2s_acked
,
seqp
->
ccid2s_sent
);
...
@@ -247,158 +316,205 @@ static void ccid2_hc_tx_packet_sent(struct sock *sk, unsigned int len)
...
@@ -247,158 +316,205 @@ static void ccid2_hc_tx_packet_sent(struct sock *sk, unsigned int len)
}
}
}
while
(
0
);
}
while
(
0
);
ccid2_pr_debug
(
"=========
\n
"
);
ccid2_pr_debug
(
"=========
\n
"
);
ccid2_hc_tx_check_sanity
(
hctx
);
#endif
#endif
}
}
/**
/* XXX Lame code duplication!
* ccid2_rtt_estimator - Sample RTT and compute RTO using RFC2988 algorithm
* returns -1 if none was found.
* This code is almost identical with TCP's tcp_rtt_estimator(), since
* else returns the next offset to use in the function call.
* - it has a higher sampling frequency (recommended by RFC 1323),
* - the RTO does not collapse into RTT due to RTTVAR going towards zero,
* - it is simple (cf. more complex proposals such as Eifel timer or research
* which suggests that the gain should be set according to window size),
* - in tests it was found to work well with CCID2 [gerrit].
*/
*/
static
void
ccid2_rtt_estimator
(
struct
sock
*
sk
,
const
long
mrtt
)
static
int
ccid2_ackvector
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
,
int
offset
,
unsigned
char
**
vec
,
unsigned
char
*
veclen
)
{
{
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
const
struct
dccp_hdr
*
dh
=
dccp_hdr
(
skb
);
long
m
=
mrtt
?
:
1
;
unsigned
char
*
options
=
(
unsigned
char
*
)
dh
+
dccp_hdr_len
(
skb
);
unsigned
char
*
opt_ptr
;
if
(
hctx
->
srtt
==
0
)
{
const
unsigned
char
*
opt_end
=
(
unsigned
char
*
)
dh
+
/* First measurement m */
(
dh
->
dccph_doff
*
4
);
hctx
->
srtt
=
m
<<
3
;
unsigned
char
opt
,
len
;
hctx
->
mdev
=
m
<<
1
;
unsigned
char
*
value
;
hctx
->
mdev_max
=
max
(
TCP_RTO_MIN
,
hctx
->
mdev
);
BUG_ON
(
offset
<
0
);
hctx
->
rttvar
=
hctx
->
mdev_max
;
options
+=
offset
;
hctx
->
rtt_seq
=
dccp_sk
(
sk
)
->
dccps_gss
;
opt_ptr
=
options
;
}
else
{
if
(
opt_ptr
>=
opt_end
)
/* Update scaled SRTT as SRTT += 1/8 * (m - SRTT) */
return
-
1
;
m
-=
(
hctx
->
srtt
>>
3
);
hctx
->
srtt
+=
m
;
while
(
opt_ptr
!=
opt_end
)
{
opt
=
*
opt_ptr
++
;
/* Similarly, update scaled mdev with regard to |m| */
len
=
0
;
if
(
m
<
0
)
{
value
=
NULL
;
m
=
-
m
;
m
-=
(
hctx
->
mdev
>>
2
);
/* Check if this isn't a single byte option */
if
(
opt
>
DCCPO_MAX_RESERVED
)
{
if
(
opt_ptr
==
opt_end
)
goto
out_invalid_option
;
len
=
*
opt_ptr
++
;
if
(
len
<
3
)
goto
out_invalid_option
;
/*
/*
* This neutralises RTO increase when RTT < SRTT - mdev
* Remove the type and len fields, leaving
* (see P. Sarolahti, A. Kuznetsov,"Congestion Control
* just the value size
* in Linux TCP", USENIX 2002, pp. 49-62).
*/
*/
if
(
m
>
0
)
len
-=
2
;
m
>>=
3
;
value
=
opt_ptr
;
}
else
{
opt_ptr
+=
len
;
m
-=
(
hctx
->
mdev
>>
2
);
}
hctx
->
mdev
+=
m
;
if
(
hctx
->
mdev
>
hctx
->
mdev_max
)
{
if
(
opt_ptr
>
opt_end
)
hctx
->
mdev_max
=
hctx
->
mdev
;
goto
out_invalid_option
;
if
(
hctx
->
mdev_max
>
hctx
->
rttvar
)
hctx
->
rttvar
=
hctx
->
mdev_max
;
}
}
/*
switch
(
opt
)
{
* Decay RTTVAR at most once per flight, exploiting that
case
DCCPO_ACK_VECTOR_0
:
* 1) pipe <= cwnd <= Sequence_Window = W (RFC 4340, 7.5.2)
case
DCCPO_ACK_VECTOR_1
:
* 2) AWL = GSS-W+1 <= GAR <= GSS (RFC 4340, 7.5.1)
*
vec
=
value
;
* GAR is a useful bound for FlightSize = pipe, AWL is probably
*
veclen
=
len
;
* too low as it over-estimates pipe.
return
offset
+
(
opt_ptr
-
options
);
*/
if
(
after48
(
dccp_sk
(
sk
)
->
dccps_gar
,
hctx
->
rtt_seq
))
{
if
(
hctx
->
mdev_max
<
hctx
->
rttvar
)
hctx
->
rttvar
-=
(
hctx
->
rttvar
-
hctx
->
mdev_max
)
>>
2
;
hctx
->
rtt_seq
=
dccp_sk
(
sk
)
->
dccps_gss
;
hctx
->
mdev_max
=
TCP_RTO_MIN
;
}
}
}
}
/*
return
-
1
;
* Set RTO from SRTT and RTTVAR
* Clock granularity is ignored since the minimum error for RTTVAR is
* clamped to 50msec (corresponding to HZ=20). This leads to a minimum
* RTO of 200msec. This agrees with TCP and RFC 4341, 5.: "Because DCCP
* does not retransmit data, DCCP does not require TCP's recommended
* minimum timeout of one second".
*/
hctx
->
rto
=
(
hctx
->
srtt
>>
3
)
+
hctx
->
rttvar
;
if
(
hctx
->
rto
>
DCCP_RTO_MAX
)
out_invalid_option:
hctx
->
rto
=
DCCP_RTO_MAX
;
DCCP_BUG
(
"Invalid option - this should not happen (previous parsing)!"
);
return
-
1
;
}
}
static
void
ccid2_new_ack
(
struct
sock
*
sk
,
struct
ccid2_seq
*
seqp
,
static
void
ccid2_hc_tx_kill_rto_timer
(
struct
sock
*
sk
)
unsigned
int
*
maxincr
)
{
{
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
if
(
hctx
->
cwnd
<
hctx
->
ssthresh
)
{
sk_stop_timer
(
sk
,
&
hctx
->
ccid2hctx_rtotimer
);
if
(
*
maxincr
>
0
&&
++
hctx
->
packets_acked
==
2
)
{
ccid2_pr_debug
(
"deleted RTO timer
\n
"
);
hctx
->
cwnd
+=
1
;
*
maxincr
-=
1
;
hctx
->
packets_acked
=
0
;
}
}
else
if
(
++
hctx
->
packets_acked
>=
hctx
->
cwnd
)
{
hctx
->
cwnd
+=
1
;
hctx
->
packets_acked
=
0
;
}
/*
* FIXME: RTT is sampled several times per acknowledgment (for each
* entry in the Ack Vector), instead of once per Ack (as in TCP SACK).
* This causes the RTT to be over-estimated, since the older entries
* in the Ack Vector have earlier sending times.
* The cleanest solution is to not use the ccid2s_sent field at all
* and instead use DCCP timestamps - need to be resolved at some time.
*/
ccid2_rtt_estimator
(
sk
,
jiffies
-
seqp
->
ccid2s_sent
);
}
}
static
void
ccid2_congestion_event
(
struct
sock
*
sk
,
struct
ccid2_seq
*
seqp
)
static
inline
void
ccid2_new_ack
(
struct
sock
*
sk
,
struct
ccid2_seq
*
seqp
,
unsigned
int
*
maxincr
)
{
{
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
if
(
time_before
(
seqp
->
ccid2s_sent
,
hctx
->
last_cong
))
{
if
(
hctx
->
ccid2hctx_cwnd
<
hctx
->
ccid2hctx_ssthresh
)
{
ccid2_pr_debug
(
"Multiple losses in an RTT---treating as one
\n
"
);
if
(
*
maxincr
>
0
&&
++
hctx
->
ccid2hctx_packets_acked
==
2
)
{
return
;
hctx
->
ccid2hctx_cwnd
+=
1
;
*
maxincr
-=
1
;
hctx
->
ccid2hctx_packets_acked
=
0
;
}
}
else
if
(
++
hctx
->
ccid2hctx_packets_acked
>=
hctx
->
ccid2hctx_cwnd
)
{
hctx
->
ccid2hctx_cwnd
+=
1
;
hctx
->
ccid2hctx_packets_acked
=
0
;
}
}
hctx
->
last_cong
=
jiffies
;
/* update RTO */
if
(
hctx
->
ccid2hctx_srtt
==
-
1
||
time_after
(
jiffies
,
hctx
->
ccid2hctx_lastrtt
+
hctx
->
ccid2hctx_srtt
))
{
unsigned
long
r
=
(
long
)
jiffies
-
(
long
)
seqp
->
ccid2s_sent
;
int
s
;
/* first measurement */
if
(
hctx
->
ccid2hctx_srtt
==
-
1
)
{
ccid2_pr_debug
(
"R: %lu Time=%lu seq=%llu
\n
"
,
r
,
jiffies
,
(
unsigned
long
long
)
seqp
->
ccid2s_seq
);
ccid2_change_srtt
(
hctx
,
r
);
hctx
->
ccid2hctx_rttvar
=
r
>>
1
;
}
else
{
/* RTTVAR */
long
tmp
=
hctx
->
ccid2hctx_srtt
-
r
;
long
srtt
;
if
(
tmp
<
0
)
tmp
*=
-
1
;
tmp
>>=
2
;
hctx
->
ccid2hctx_rttvar
*=
3
;
hctx
->
ccid2hctx_rttvar
>>=
2
;
hctx
->
ccid2hctx_rttvar
+=
tmp
;
/* SRTT */
srtt
=
hctx
->
ccid2hctx_srtt
;
srtt
*=
7
;
srtt
>>=
3
;
tmp
=
r
>>
3
;
srtt
+=
tmp
;
ccid2_change_srtt
(
hctx
,
srtt
);
}
s
=
hctx
->
ccid2hctx_rttvar
<<
2
;
/* clock granularity is 1 when based on jiffies */
if
(
!
s
)
s
=
1
;
hctx
->
ccid2hctx_rto
=
hctx
->
ccid2hctx_srtt
+
s
;
/* must be at least a second */
s
=
hctx
->
ccid2hctx_rto
/
HZ
;
/* DCCP doesn't require this [but I like it cuz my code sux] */
#if 1
if
(
s
<
1
)
hctx
->
ccid2hctx_rto
=
HZ
;
#endif
/* max 60 seconds */
if
(
s
>
60
)
hctx
->
ccid2hctx_rto
=
HZ
*
60
;
hctx
->
cwnd
=
hctx
->
cwnd
/
2
?
:
1U
;
hctx
->
ccid2hctx_lastrtt
=
jiffies
;
hctx
->
ssthresh
=
max
(
hctx
->
cwnd
,
2U
);
/* Avoid spurious timeouts resulting from Ack Ratio > cwnd */
ccid2_pr_debug
(
"srtt: %ld rttvar: %ld rto: %ld (HZ=%d) R=%lu
\n
"
,
if
(
dccp_sk
(
sk
)
->
dccps_l_ack_ratio
>
hctx
->
cwnd
)
hctx
->
ccid2hctx_srtt
,
hctx
->
ccid2hctx_rttvar
,
ccid2_change_l_ack_ratio
(
sk
,
hctx
->
cwnd
);
hctx
->
ccid2hctx_rto
,
HZ
,
r
);
}
/* we got a new ack, so re-start RTO timer */
ccid2_hc_tx_kill_rto_timer
(
sk
);
ccid2_start_rto_timer
(
sk
);
}
}
static
int
ccid2_hc_tx_parse_options
(
struct
sock
*
sk
,
u8
packet_type
,
static
void
ccid2_hc_tx_dec_pipe
(
struct
sock
*
sk
)
u8
option
,
u8
*
optval
,
u8
optlen
)
{
{
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
switch
(
option
)
{
if
(
hctx
->
ccid2hctx_pipe
==
0
)
case
DCCPO_ACK_VECTOR_0
:
DCCP_BUG
(
"pipe == 0"
);
case
DCCPO_ACK_VECTOR_1
:
else
return
dccp_ackvec_parsed_add
(
&
hctx
->
av_chunks
,
optval
,
optlen
,
hctx
->
ccid2hctx_pipe
--
;
option
-
DCCPO_ACK_VECTOR_0
);
if
(
hctx
->
ccid2hctx_pipe
==
0
)
ccid2_hc_tx_kill_rto_timer
(
sk
);
}
static
void
ccid2_congestion_event
(
struct
sock
*
sk
,
struct
ccid2_seq
*
seqp
)
{
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
if
(
time_before
(
seqp
->
ccid2s_sent
,
hctx
->
ccid2hctx_last_cong
))
{
ccid2_pr_debug
(
"Multiple losses in an RTT---treating as one
\n
"
);
return
;
}
}
return
0
;
hctx
->
ccid2hctx_last_cong
=
jiffies
;
hctx
->
ccid2hctx_cwnd
=
hctx
->
ccid2hctx_cwnd
/
2
?
:
1U
;
hctx
->
ccid2hctx_ssthresh
=
max
(
hctx
->
ccid2hctx_cwnd
,
2U
);
/* Avoid spurious timeouts resulting from Ack Ratio > cwnd */
if
(
dccp_sk
(
sk
)
->
dccps_l_ack_ratio
>
hctx
->
ccid2hctx_cwnd
)
ccid2_change_l_ack_ratio
(
sk
,
hctx
->
ccid2hctx_cwnd
);
}
}
static
void
ccid2_hc_tx_packet_recv
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
void
ccid2_hc_tx_packet_recv
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
const
bool
sender_was_blocked
=
ccid2_cwnd_network_limited
(
hctx
);
struct
dccp_ackvec_parsed
*
avp
;
u64
ackno
,
seqno
;
u64
ackno
,
seqno
;
struct
ccid2_seq
*
seqp
;
struct
ccid2_seq
*
seqp
;
unsigned
char
*
vector
;
unsigned
char
veclen
;
int
offset
=
0
;
int
done
=
0
;
int
done
=
0
;
unsigned
int
maxincr
=
0
;
unsigned
int
maxincr
=
0
;
ccid2_hc_tx_check_sanity
(
hctx
);
/* check reverse path congestion */
/* check reverse path congestion */
seqno
=
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
;
seqno
=
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
;
...
@@ -407,21 +523,21 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -407,21 +523,21 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
* -sorbo.
* -sorbo.
*/
*/
/* need to bootstrap */
/* need to bootstrap */
if
(
hctx
->
rpdupack
==
-
1
)
{
if
(
hctx
->
ccid2hctx_
rpdupack
==
-
1
)
{
hctx
->
rpdupack
=
0
;
hctx
->
ccid2hctx_
rpdupack
=
0
;
hctx
->
rpseq
=
seqno
;
hctx
->
ccid2hctx_
rpseq
=
seqno
;
}
else
{
}
else
{
/* check if packet is consecutive */
/* check if packet is consecutive */
if
(
dccp_delta_seqno
(
hctx
->
rpseq
,
seqno
)
==
1
)
if
(
dccp_delta_seqno
(
hctx
->
ccid2hctx_
rpseq
,
seqno
)
==
1
)
hctx
->
rpseq
=
seqno
;
hctx
->
ccid2hctx_
rpseq
=
seqno
;
/* it's a later packet */
/* it's a later packet */
else
if
(
after48
(
seqno
,
hctx
->
rpseq
))
{
else
if
(
after48
(
seqno
,
hctx
->
ccid2hctx_
rpseq
))
{
hctx
->
rpdupack
++
;
hctx
->
ccid2hctx_
rpdupack
++
;
/* check if we got enough dupacks */
/* check if we got enough dupacks */
if
(
hctx
->
rpdupack
>=
NUMDUPACK
)
{
if
(
hctx
->
ccid2hctx_
rpdupack
>=
NUMDUPACK
)
{
hctx
->
rpdupack
=
-
1
;
/* XXX lame */
hctx
->
ccid2hctx_
rpdupack
=
-
1
;
/* XXX lame */
hctx
->
rpseq
=
0
;
hctx
->
ccid2hctx_
rpseq
=
0
;
ccid2_change_l_ack_ratio
(
sk
,
2
*
dp
->
dccps_l_ack_ratio
);
ccid2_change_l_ack_ratio
(
sk
,
2
*
dp
->
dccps_l_ack_ratio
);
}
}
...
@@ -429,22 +545,27 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -429,22 +545,27 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
}
}
/* check forward path congestion */
/* check forward path congestion */
if
(
dccp_packet_without_ack
(
skb
))
/* still didn't send out new data packets */
if
(
hctx
->
ccid2hctx_seqh
==
hctx
->
ccid2hctx_seqt
)
return
;
return
;
/* still didn't send out new data packets */
switch
(
DCCP_SKB_CB
(
skb
)
->
dccpd_type
)
{
if
(
hctx
->
seqh
==
hctx
->
seqt
)
case
DCCP_PKT_ACK
:
goto
done
;
case
DCCP_PKT_DATAACK
:
break
;
default:
return
;
}
ackno
=
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
;
ackno
=
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
;
if
(
after48
(
ackno
,
hctx
->
high_ack
))
if
(
after48
(
ackno
,
hctx
->
ccid2hctx_
high_ack
))
hctx
->
high_ack
=
ackno
;
hctx
->
ccid2hctx_
high_ack
=
ackno
;
seqp
=
hctx
->
seqt
;
seqp
=
hctx
->
ccid2hctx_
seqt
;
while
(
before48
(
seqp
->
ccid2s_seq
,
ackno
))
{
while
(
before48
(
seqp
->
ccid2s_seq
,
ackno
))
{
seqp
=
seqp
->
ccid2s_next
;
seqp
=
seqp
->
ccid2s_next
;
if
(
seqp
==
hctx
->
seqh
)
{
if
(
seqp
==
hctx
->
ccid2hctx_
seqh
)
{
seqp
=
hctx
->
seqh
->
ccid2s_prev
;
seqp
=
hctx
->
ccid2hctx_
seqh
->
ccid2s_prev
;
break
;
break
;
}
}
}
}
...
@@ -454,26 +575,26 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -454,26 +575,26 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
* packets per acknowledgement. Rounding up avoids that cwnd is not
* packets per acknowledgement. Rounding up avoids that cwnd is not
* advanced when Ack Ratio is 1 and gives a slight edge otherwise.
* advanced when Ack Ratio is 1 and gives a slight edge otherwise.
*/
*/
if
(
hctx
->
c
wnd
<
hctx
->
ssthresh
)
if
(
hctx
->
c
cid2hctx_cwnd
<
hctx
->
ccid2hctx_
ssthresh
)
maxincr
=
DIV_ROUND_UP
(
dp
->
dccps_l_ack_ratio
,
2
);
maxincr
=
DIV_ROUND_UP
(
dp
->
dccps_l_ack_ratio
,
2
);
/* go through all ack vectors */
/* go through all ack vectors */
list_for_each_entry
(
avp
,
&
hctx
->
av_chunks
,
node
)
{
while
((
offset
=
ccid2_ackvector
(
sk
,
skb
,
offset
,
&
vector
,
&
veclen
))
!=
-
1
)
{
/* go through this ack vector */
/* go through this ack vector */
for
(;
avp
->
len
--
;
avp
->
vec
++
)
{
while
(
veclen
--
)
{
u64
ackno_end_rl
=
SUB48
(
ackno
,
const
u8
rl
=
*
vector
&
DCCP_ACKVEC_LEN_MASK
;
dccp_ackvec_runlen
(
avp
->
vec
)
);
u64
ackno_end_rl
=
SUB48
(
ackno
,
rl
);
ccid2_pr_debug
(
"ackvec
%llu |%u,%u|
\n
"
,
ccid2_pr_debug
(
"ackvec
start:%llu end:%llu
\n
"
,
(
unsigned
long
long
)
ackno
,
(
unsigned
long
long
)
ackno
,
dccp_ackvec_state
(
avp
->
vec
)
>>
6
,
(
unsigned
long
long
)
ackno_end_rl
);
dccp_ackvec_runlen
(
avp
->
vec
));
/* if the seqno we are analyzing is larger than the
/* if the seqno we are analyzing is larger than the
* current ackno, then move towards the tail of our
* current ackno, then move towards the tail of our
* seqnos.
* seqnos.
*/
*/
while
(
after48
(
seqp
->
ccid2s_seq
,
ackno
))
{
while
(
after48
(
seqp
->
ccid2s_seq
,
ackno
))
{
if
(
seqp
==
hctx
->
seqt
)
{
if
(
seqp
==
hctx
->
ccid2hctx_
seqt
)
{
done
=
1
;
done
=
1
;
break
;
break
;
}
}
...
@@ -486,24 +607,26 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -486,24 +607,26 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
* run length
* run length
*/
*/
while
(
between48
(
seqp
->
ccid2s_seq
,
ackno_end_rl
,
ackno
))
{
while
(
between48
(
seqp
->
ccid2s_seq
,
ackno_end_rl
,
ackno
))
{
const
u8
state
=
dccp_ackvec_state
(
avp
->
vec
);
const
u8
state
=
*
vector
&
DCCP_ACKVEC_STATE_MASK
;
/* new packet received or marked */
/* new packet received or marked */
if
(
state
!=
DCCP
AV
_NOT_RECEIVED
&&
if
(
state
!=
DCCP
_ACKVEC_STATE
_NOT_RECEIVED
&&
!
seqp
->
ccid2s_acked
)
{
!
seqp
->
ccid2s_acked
)
{
if
(
state
==
DCCPAV_ECN_MARKED
)
if
(
state
==
DCCP_ACKVEC_STATE_ECN_MARKED
)
{
ccid2_congestion_event
(
sk
,
ccid2_congestion_event
(
sk
,
seqp
);
seqp
);
else
}
else
ccid2_new_ack
(
sk
,
seqp
,
ccid2_new_ack
(
sk
,
seqp
,
&
maxincr
);
&
maxincr
);
seqp
->
ccid2s_acked
=
1
;
seqp
->
ccid2s_acked
=
1
;
ccid2_pr_debug
(
"Got ack for %llu
\n
"
,
ccid2_pr_debug
(
"Got ack for %llu
\n
"
,
(
unsigned
long
long
)
seqp
->
ccid2s_seq
);
(
unsigned
long
long
)
seqp
->
ccid2s_seq
);
hctx
->
pipe
--
;
ccid2_hc_tx_dec_pipe
(
sk
)
;
}
}
if
(
seqp
==
hctx
->
seqt
)
{
if
(
seqp
==
hctx
->
ccid2hctx_
seqt
)
{
done
=
1
;
done
=
1
;
break
;
break
;
}
}
...
@@ -513,6 +636,7 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -513,6 +636,7 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
break
;
break
;
ackno
=
SUB48
(
ackno_end_rl
,
1
);
ackno
=
SUB48
(
ackno_end_rl
,
1
);
vector
++
;
}
}
if
(
done
)
if
(
done
)
break
;
break
;
...
@@ -521,11 +645,11 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -521,11 +645,11 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
/* The state about what is acked should be correct now
/* The state about what is acked should be correct now
* Check for NUMDUPACK
* Check for NUMDUPACK
*/
*/
seqp
=
hctx
->
seqt
;
seqp
=
hctx
->
ccid2hctx_
seqt
;
while
(
before48
(
seqp
->
ccid2s_seq
,
hctx
->
high_ack
))
{
while
(
before48
(
seqp
->
ccid2s_seq
,
hctx
->
ccid2hctx_
high_ack
))
{
seqp
=
seqp
->
ccid2s_next
;
seqp
=
seqp
->
ccid2s_next
;
if
(
seqp
==
hctx
->
seqh
)
{
if
(
seqp
==
hctx
->
ccid2hctx_
seqh
)
{
seqp
=
hctx
->
seqh
->
ccid2s_prev
;
seqp
=
hctx
->
ccid2hctx_
seqh
->
ccid2s_prev
;
break
;
break
;
}
}
}
}
...
@@ -536,7 +660,7 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -536,7 +660,7 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
if
(
done
==
NUMDUPACK
)
if
(
done
==
NUMDUPACK
)
break
;
break
;
}
}
if
(
seqp
==
hctx
->
seqt
)
if
(
seqp
==
hctx
->
ccid2hctx_
seqt
)
break
;
break
;
seqp
=
seqp
->
ccid2s_prev
;
seqp
=
seqp
->
ccid2s_prev
;
}
}
...
@@ -557,34 +681,25 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -557,34 +681,25 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
* one ack vector.
* one ack vector.
*/
*/
ccid2_congestion_event
(
sk
,
seqp
);
ccid2_congestion_event
(
sk
,
seqp
);
hctx
->
pipe
--
;
ccid2_hc_tx_dec_pipe
(
sk
)
;
}
}
if
(
seqp
==
hctx
->
seqt
)
if
(
seqp
==
hctx
->
ccid2hctx_
seqt
)
break
;
break
;
seqp
=
seqp
->
ccid2s_prev
;
seqp
=
seqp
->
ccid2s_prev
;
}
}
hctx
->
seqt
=
last_acked
;
hctx
->
ccid2hctx_
seqt
=
last_acked
;
}
}
/* trim acked packets in tail */
/* trim acked packets in tail */
while
(
hctx
->
seqt
!=
hctx
->
seqh
)
{
while
(
hctx
->
ccid2hctx_seqt
!=
hctx
->
ccid2hctx_
seqh
)
{
if
(
!
hctx
->
seqt
->
ccid2s_acked
)
if
(
!
hctx
->
ccid2hctx_
seqt
->
ccid2s_acked
)
break
;
break
;
hctx
->
seqt
=
hctx
->
seqt
->
ccid2s_next
;
hctx
->
ccid2hctx_seqt
=
hctx
->
ccid2hctx_
seqt
->
ccid2s_next
;
}
}
/* restart RTO timer if not all outstanding data has been acked */
ccid2_hc_tx_check_sanity
(
hctx
);
if
(
hctx
->
pipe
==
0
)
sk_stop_timer
(
sk
,
&
hctx
->
rtotimer
);
else
sk_reset_timer
(
sk
,
&
hctx
->
rtotimer
,
jiffies
+
hctx
->
rto
);
done:
/* check if incoming Acks allow pending packets to be sent */
if
(
sender_was_blocked
&&
!
ccid2_cwnd_network_limited
(
hctx
))
tasklet_schedule
(
&
dccp_sk
(
sk
)
->
dccps_xmitlet
);
dccp_ackvec_parsed_cleanup
(
&
hctx
->
av_chunks
);
}
}
static
int
ccid2_hc_tx_init
(
struct
ccid
*
ccid
,
struct
sock
*
sk
)
static
int
ccid2_hc_tx_init
(
struct
ccid
*
ccid
,
struct
sock
*
sk
)
...
@@ -594,13 +709,17 @@ static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
...
@@ -594,13 +709,17 @@ static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
u32
max_ratio
;
u32
max_ratio
;
/* RFC 4341, 5: initialise ssthresh to arbitrarily high (max) value */
/* RFC 4341, 5: initialise ssthresh to arbitrarily high (max) value */
hctx
->
ssthresh
=
~
0U
;
hctx
->
ccid2hctx_ssthresh
=
~
0U
;
/* Use larger initial windows (RFC 3390, rfc2581bis) */
/*
hctx
->
cwnd
=
rfc3390_bytes_to_packets
(
dp
->
dccps_mss_cache
);
* RFC 4341, 5: "The cwnd parameter is initialized to at most four
* packets for new connections, following the rules from [RFC3390]".
* We need to convert the bytes of RFC3390 into the packets of RFC 4341.
*/
hctx
->
ccid2hctx_cwnd
=
clamp
(
4380U
/
dp
->
dccps_mss_cache
,
2U
,
4U
);
/* Make sure that Ack Ratio is enabled and within bounds. */
/* Make sure that Ack Ratio is enabled and within bounds. */
max_ratio
=
DIV_ROUND_UP
(
hctx
->
cwnd
,
2
);
max_ratio
=
DIV_ROUND_UP
(
hctx
->
c
cid2hctx_c
wnd
,
2
);
if
(
dp
->
dccps_l_ack_ratio
==
0
||
dp
->
dccps_l_ack_ratio
>
max_ratio
)
if
(
dp
->
dccps_l_ack_ratio
==
0
||
dp
->
dccps_l_ack_ratio
>
max_ratio
)
dp
->
dccps_l_ack_ratio
=
max_ratio
;
dp
->
dccps_l_ack_ratio
=
max_ratio
;
...
@@ -608,11 +727,15 @@ static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
...
@@ -608,11 +727,15 @@ static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
if
(
ccid2_hc_tx_alloc_seq
(
hctx
))
if
(
ccid2_hc_tx_alloc_seq
(
hctx
))
return
-
ENOMEM
;
return
-
ENOMEM
;
hctx
->
rto
=
DCCP_TIMEOUT_INIT
;
hctx
->
ccid2hctx_rto
=
3
*
HZ
;
hctx
->
rpdupack
=
-
1
;
ccid2_change_srtt
(
hctx
,
-
1
);
hctx
->
last_cong
=
jiffies
;
hctx
->
ccid2hctx_rttvar
=
-
1
;
setup_timer
(
&
hctx
->
rtotimer
,
ccid2_hc_tx_rto_expire
,
(
unsigned
long
)
sk
);
hctx
->
ccid2hctx_rpdupack
=
-
1
;
INIT_LIST_HEAD
(
&
hctx
->
av_chunks
);
hctx
->
ccid2hctx_last_cong
=
jiffies
;
setup_timer
(
&
hctx
->
ccid2hctx_rtotimer
,
ccid2_hc_tx_rto_expire
,
(
unsigned
long
)
sk
);
ccid2_hc_tx_check_sanity
(
hctx
);
return
0
;
return
0
;
}
}
...
@@ -621,11 +744,11 @@ static void ccid2_hc_tx_exit(struct sock *sk)
...
@@ -621,11 +744,11 @@ static void ccid2_hc_tx_exit(struct sock *sk)
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
struct
ccid2_hc_tx_sock
*
hctx
=
ccid2_hc_tx_sk
(
sk
);
int
i
;
int
i
;
sk_stop_timer
(
sk
,
&
hctx
->
rtotimer
);
ccid2_hc_tx_kill_rto_timer
(
sk
);
for
(
i
=
0
;
i
<
hctx
->
seqbufc
;
i
++
)
for
(
i
=
0
;
i
<
hctx
->
ccid2hctx_
seqbufc
;
i
++
)
kfree
(
hctx
->
seqbuf
[
i
]);
kfree
(
hctx
->
ccid2hctx_
seqbuf
[
i
]);
hctx
->
seqbufc
=
0
;
hctx
->
ccid2hctx_
seqbufc
=
0
;
}
}
static
void
ccid2_hc_rx_packet_recv
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
void
ccid2_hc_rx_packet_recv
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
...
@@ -636,28 +759,27 @@ static void ccid2_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -636,28 +759,27 @@ static void ccid2_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
switch
(
DCCP_SKB_CB
(
skb
)
->
dccpd_type
)
{
switch
(
DCCP_SKB_CB
(
skb
)
->
dccpd_type
)
{
case
DCCP_PKT_DATA
:
case
DCCP_PKT_DATA
:
case
DCCP_PKT_DATAACK
:
case
DCCP_PKT_DATAACK
:
hcrx
->
data
++
;
hcrx
->
ccid2hcrx_
data
++
;
if
(
hcrx
->
data
>=
dp
->
dccps_r_ack_ratio
)
{
if
(
hcrx
->
ccid2hcrx_
data
>=
dp
->
dccps_r_ack_ratio
)
{
dccp_send_ack
(
sk
);
dccp_send_ack
(
sk
);
hcrx
->
data
=
0
;
hcrx
->
ccid2hcrx_
data
=
0
;
}
}
break
;
break
;
}
}
}
}
static
struct
ccid_operations
ccid2
=
{
static
struct
ccid_operations
ccid2
=
{
.
ccid_id
=
DCCPC_CCID2
,
.
ccid_id
=
DCCPC_CCID2
,
.
ccid_name
=
"TCP-like"
,
.
ccid_name
=
"TCP-like"
,
.
ccid_owner
=
THIS_MODULE
,
.
ccid_owner
=
THIS_MODULE
,
.
ccid_hc_tx_obj_size
=
sizeof
(
struct
ccid2_hc_tx_sock
),
.
ccid_hc_tx_obj_size
=
sizeof
(
struct
ccid2_hc_tx_sock
),
.
ccid_hc_tx_init
=
ccid2_hc_tx_init
,
.
ccid_hc_tx_init
=
ccid2_hc_tx_init
,
.
ccid_hc_tx_exit
=
ccid2_hc_tx_exit
,
.
ccid_hc_tx_exit
=
ccid2_hc_tx_exit
,
.
ccid_hc_tx_send_packet
=
ccid2_hc_tx_send_packet
,
.
ccid_hc_tx_send_packet
=
ccid2_hc_tx_send_packet
,
.
ccid_hc_tx_packet_sent
=
ccid2_hc_tx_packet_sent
,
.
ccid_hc_tx_packet_sent
=
ccid2_hc_tx_packet_sent
,
.
ccid_hc_tx_parse_options
=
ccid2_hc_tx_parse_options
,
.
ccid_hc_tx_packet_recv
=
ccid2_hc_tx_packet_recv
,
.
ccid_hc_tx_packet_recv
=
ccid2_hc_tx_packet_recv
,
.
ccid_hc_rx_obj_size
=
sizeof
(
struct
ccid2_hc_rx_sock
),
.
ccid_hc_rx_obj_size
=
sizeof
(
struct
ccid2_hc_rx_sock
),
.
ccid_hc_rx_packet_recv
=
ccid2_hc_rx_packet_recv
,
.
ccid_hc_rx_packet_recv
=
ccid2_hc_rx_packet_recv
,
};
};
#ifdef CONFIG_IP_DCCP_CCID2_DEBUG
#ifdef CONFIG_IP_DCCP_CCID2_DEBUG
...
...
net/dccp/ccids/ccid2.h
View file @
ded67c0e
...
@@ -42,49 +42,34 @@ struct ccid2_seq {
...
@@ -42,49 +42,34 @@ struct ccid2_seq {
/** struct ccid2_hc_tx_sock - CCID2 TX half connection
/** struct ccid2_hc_tx_sock - CCID2 TX half connection
*
*
* @{cwnd,ssthresh,pipe}: as per RFC 4341, section 5
* @ccid2hctx_{cwnd,ssthresh,pipe}: as per RFC 4341, section 5
* @packets_acked: Ack counter for deriving cwnd growth (RFC 3465)
* @ccid2hctx_packets_acked - Ack counter for deriving cwnd growth (RFC 3465)
* @srtt: smoothed RTT estimate, scaled by 2^3
* @ccid2hctx_lastrtt -time RTT was last measured
* @mdev: smoothed RTT variation, scaled by 2^2
* @ccid2hctx_rpseq - last consecutive seqno
* @mdev_max: maximum of @mdev during one flight
* @ccid2hctx_rpdupack - dupacks since rpseq
* @rttvar: moving average/maximum of @mdev_max
*/
* @rto: RTO value deriving from SRTT and RTTVAR (RFC 2988)
* @rtt_seq: to decay RTTVAR at most once per flight
* @rpseq: last consecutive seqno
* @rpdupack: dupacks since rpseq
* @av_chunks: list of Ack Vectors received on current skb
*/
struct
ccid2_hc_tx_sock
{
struct
ccid2_hc_tx_sock
{
u32
cwnd
;
u32
ccid2hctx_cwnd
;
u32
ssthresh
;
u32
ccid2hctx_ssthresh
;
u32
pipe
;
u32
ccid2hctx_pipe
;
u32
packets_acked
;
u32
ccid2hctx_packets_acked
;
struct
ccid2_seq
*
seqbuf
[
CCID2_SEQBUF_MAX
];
struct
ccid2_seq
*
ccid2hctx_seqbuf
[
CCID2_SEQBUF_MAX
];
int
seqbufc
;
int
ccid2hctx_seqbufc
;
struct
ccid2_seq
*
seqh
;
struct
ccid2_seq
*
ccid2hctx_seqh
;
struct
ccid2_seq
*
seqt
;
struct
ccid2_seq
*
ccid2hctx_seqt
;
/* RTT measurement: variables/principles are the same as in TCP */
long
ccid2hctx_rto
;
u32
srtt
,
long
ccid2hctx_srtt
;
mdev
,
long
ccid2hctx_rttvar
;
mdev_max
,
unsigned
long
ccid2hctx_lastrtt
;
rttvar
,
struct
timer_list
ccid2hctx_rtotimer
;
rto
;
u64
ccid2hctx_rpseq
;
u64
rtt_seq
:
48
;
int
ccid2hctx_rpdupack
;
struct
timer_list
rtotimer
;
unsigned
long
ccid2hctx_last_cong
;
u64
rpseq
;
u64
ccid2hctx_high_ack
;
int
rpdupack
;
unsigned
long
last_cong
;
u64
high_ack
;
struct
list_head
av_chunks
;
};
};
static
inline
bool
ccid2_cwnd_network_limited
(
struct
ccid2_hc_tx_sock
*
hctx
)
{
return
(
hctx
->
pipe
>=
hctx
->
cwnd
);
}
struct
ccid2_hc_rx_sock
{
struct
ccid2_hc_rx_sock
{
int
data
;
int
ccid2hcrx_
data
;
};
};
static
inline
struct
ccid2_hc_tx_sock
*
ccid2_hc_tx_sk
(
const
struct
sock
*
sk
)
static
inline
struct
ccid2_hc_tx_sock
*
ccid2_hc_tx_sk
(
const
struct
sock
*
sk
)
...
...
net/dccp/ccids/ccid3.c
View file @
ded67c0e
...
@@ -49,41 +49,75 @@ static int ccid3_debug;
...
@@ -49,41 +49,75 @@ static int ccid3_debug;
/*
/*
* Transmitter Half-Connection Routines
* Transmitter Half-Connection Routines
*/
*/
/* Oscillation Prevention/Reduction: recommended by rfc3448bis, on by default */
#ifdef CONFIG_IP_DCCP_CCID3_DEBUG
static
int
do_osc_prev
=
true
;
static
const
char
*
ccid3_tx_state_name
(
enum
ccid3_hc_tx_states
state
)
{
static
char
*
ccid3_state_names
[]
=
{
[
TFRC_SSTATE_NO_SENT
]
=
"NO_SENT"
,
[
TFRC_SSTATE_NO_FBACK
]
=
"NO_FBACK"
,
[
TFRC_SSTATE_FBACK
]
=
"FBACK"
,
[
TFRC_SSTATE_TERM
]
=
"TERM"
,
};
return
ccid3_state_names
[
state
];
}
#endif
static
void
ccid3_hc_tx_set_state
(
struct
sock
*
sk
,
enum
ccid3_hc_tx_states
state
)
{
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
enum
ccid3_hc_tx_states
oldstate
=
hctx
->
ccid3hctx_state
;
ccid3_pr_debug
(
"%s(%p) %-8.8s -> %s
\n
"
,
dccp_role
(
sk
),
sk
,
ccid3_tx_state_name
(
oldstate
),
ccid3_tx_state_name
(
state
));
WARN_ON
(
state
==
oldstate
);
hctx
->
ccid3hctx_state
=
state
;
}
/*
/*
* Compute the initial sending rate X_init in the manner of RFC 3390:
* Compute the initial sending rate X_init in the manner of RFC 3390:
*
*
* X_init = min(4 *
MPS, max(2 * MPS
, 4380 bytes)) / RTT
* X_init = min(4 *
s, max(2 * s
, 4380 bytes)) / RTT
*
*
* Note that RFC 3390 uses MSS, RFC 4342 refers to RFC 3390, and rfc3448bis
* (rev-02) clarifies the use of RFC 3390 with regard to the above formula.
* For consistency with other parts of the code, X_init is scaled by 2^6.
* For consistency with other parts of the code, X_init is scaled by 2^6.
*/
*/
static
inline
u64
rfc3390_initial_rate
(
struct
sock
*
sk
)
static
inline
u64
rfc3390_initial_rate
(
struct
sock
*
sk
)
{
{
const
u32
mps
=
dccp_sk
(
sk
)
->
dccps_mss_cache
,
const
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
w_init
=
clamp
(
4380U
,
2
*
mps
,
4
*
mps
);
const
__u32
w_init
=
clamp_t
(
__u32
,
4380U
,
2
*
hctx
->
ccid3hctx_s
,
4
*
hctx
->
ccid3hctx_s
);
return
scaled_div
(
w_init
<<
6
,
ccid3_hc_tx_sk
(
sk
)
->
rtt
);
return
scaled_div
(
w_init
<<
6
,
hctx
->
ccid3hctx_
rtt
);
}
}
/**
/*
* ccid3_update_send_interval - Calculate new t_ipi = s / X
* Recalculate t_ipi and delta (should be called whenever X changes)
* This respects the granularity of X (64 * bytes/second) and enforces the
* scaled minimum of s * 64 / t_mbi = `s' bytes/second as per RFC 3448/4342.
*/
*/
static
void
ccid3_update_send_interval
(
struct
ccid3_hc_tx_sock
*
hctx
)
static
void
ccid3_update_send_interval
(
struct
ccid3_hc_tx_sock
*
hctx
)
{
{
if
(
unlikely
(
hctx
->
x
<=
hctx
->
s
))
/* Calculate new t_ipi = s / X_inst (X_inst is in 64 * bytes/second) */
hctx
->
x
=
hctx
->
s
;
hctx
->
ccid3hctx_t_ipi
=
scaled_div32
(((
u64
)
hctx
->
ccid3hctx_s
)
<<
6
,
hctx
->
t_ipi
=
scaled_div32
(((
u64
)
hctx
->
s
)
<<
6
,
hctx
->
x
);
hctx
->
ccid3hctx_x
);
/* Calculate new delta by delta = min(t_ipi / 2, t_gran / 2) */
hctx
->
ccid3hctx_delta
=
min_t
(
u32
,
hctx
->
ccid3hctx_t_ipi
/
2
,
TFRC_OPSYS_HALF_TIME_GRAN
);
ccid3_pr_debug
(
"t_ipi=%u, delta=%u, s=%u, X=%u
\n
"
,
hctx
->
ccid3hctx_t_ipi
,
hctx
->
ccid3hctx_delta
,
hctx
->
ccid3hctx_s
,
(
unsigned
)(
hctx
->
ccid3hctx_x
>>
6
));
}
}
static
u32
ccid3_hc_tx_idle_rtt
(
struct
ccid3_hc_tx_sock
*
hctx
,
ktime_t
now
)
static
u32
ccid3_hc_tx_idle_rtt
(
struct
ccid3_hc_tx_sock
*
hctx
,
ktime_t
now
)
{
{
u32
delta
=
ktime_us_delta
(
now
,
hctx
->
t_last_win_count
);
u32
delta
=
ktime_us_delta
(
now
,
hctx
->
ccid3hctx_
t_last_win_count
);
return
delta
/
hctx
->
rtt
;
return
delta
/
hctx
->
ccid3hctx_
rtt
;
}
}
/**
/**
...
@@ -99,8 +133,8 @@ static u32 ccid3_hc_tx_idle_rtt(struct ccid3_hc_tx_sock *hctx, ktime_t now)
...
@@ -99,8 +133,8 @@ static u32 ccid3_hc_tx_idle_rtt(struct ccid3_hc_tx_sock *hctx, ktime_t now)
static
void
ccid3_hc_tx_update_x
(
struct
sock
*
sk
,
ktime_t
*
stamp
)
static
void
ccid3_hc_tx_update_x
(
struct
sock
*
sk
,
ktime_t
*
stamp
)
{
{
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
u64
min_rate
=
2
*
hctx
->
x_recv
;
__u64
min_rate
=
2
*
hctx
->
ccid3hctx_
x_recv
;
const
u64
old_x
=
hctx
->
x
;
const
__u64
old_x
=
hctx
->
ccid3hctx_
x
;
ktime_t
now
=
stamp
?
*
stamp
:
ktime_get_real
();
ktime_t
now
=
stamp
?
*
stamp
:
ktime_get_real
();
/*
/*
...
@@ -111,44 +145,50 @@ static void ccid3_hc_tx_update_x(struct sock *sk, ktime_t *stamp)
...
@@ -111,44 +145,50 @@ static void ccid3_hc_tx_update_x(struct sock *sk, ktime_t *stamp)
*/
*/
if
(
ccid3_hc_tx_idle_rtt
(
hctx
,
now
)
>=
2
)
{
if
(
ccid3_hc_tx_idle_rtt
(
hctx
,
now
)
>=
2
)
{
min_rate
=
rfc3390_initial_rate
(
sk
);
min_rate
=
rfc3390_initial_rate
(
sk
);
min_rate
=
max
(
min_rate
,
2
*
hctx
->
x_recv
);
min_rate
=
max
(
min_rate
,
2
*
hctx
->
ccid3hctx_
x_recv
);
}
}
if
(
hctx
->
p
>
0
)
{
if
(
hctx
->
ccid3hctx_
p
>
0
)
{
hctx
->
x
=
min
(((
u64
)
hctx
->
x_calc
)
<<
6
,
min_rate
);
hctx
->
ccid3hctx_x
=
min
(((
__u64
)
hctx
->
ccid3hctx_x_calc
)
<<
6
,
min_rate
);
hctx
->
ccid3hctx_x
=
max
(
hctx
->
ccid3hctx_x
,
(((
__u64
)
hctx
->
ccid3hctx_s
)
<<
6
)
/
TFRC_T_MBI
);
}
else
if
(
ktime_us_delta
(
now
,
hctx
->
t_ld
)
-
(
s64
)
hctx
->
rtt
>=
0
)
{
}
else
if
(
ktime_us_delta
(
now
,
hctx
->
ccid3hctx_t_ld
)
-
(
s64
)
hctx
->
ccid3hctx_rtt
>=
0
)
{
hctx
->
x
=
min
(
2
*
hctx
->
x
,
min_rate
);
hctx
->
ccid3hctx_x
=
min
(
2
*
hctx
->
ccid3hctx_x
,
min_rate
);
hctx
->
x
=
max
(
hctx
->
x
,
hctx
->
ccid3hctx_x
=
max
(
hctx
->
ccid3hctx_x
,
scaled_div
(((
u64
)
hctx
->
s
)
<<
6
,
hctx
->
rtt
));
scaled_div
(((
__u64
)
hctx
->
ccid3hctx_s
)
<<
6
,
hctx
->
t_ld
=
now
;
hctx
->
ccid3hctx_rtt
));
hctx
->
ccid3hctx_t_ld
=
now
;
}
}
if
(
hctx
->
x
!=
old_x
)
{
if
(
hctx
->
ccid3hctx_
x
!=
old_x
)
{
ccid3_pr_debug
(
"X_prev=%u, X_now=%u, X_calc=%u, "
ccid3_pr_debug
(
"X_prev=%u, X_now=%u, X_calc=%u, "
"X_recv=%u
\n
"
,
(
unsigned
)(
old_x
>>
6
),
"X_recv=%u
\n
"
,
(
unsigned
)(
old_x
>>
6
),
(
unsigned
)(
hctx
->
x
>>
6
),
hctx
->
x_calc
,
(
unsigned
)(
hctx
->
ccid3hctx_x
>>
6
),
(
unsigned
)(
hctx
->
x_recv
>>
6
));
hctx
->
ccid3hctx_x_calc
,
(
unsigned
)(
hctx
->
ccid3hctx_x_recv
>>
6
));
ccid3_update_send_interval
(
hctx
);
ccid3_update_send_interval
(
hctx
);
}
}
}
}
/*
/*
*
ccid3_hc_tx_measure_packet_size - Measuring the packet size `s' (sec
4.1)
*
Track the mean packet size `s' (cf. RFC 4342, 5.3 and RFC 3448,
4.1)
*
@new_len: DCCP payload size in bytes (not used by all methods)
*
@len: DCCP packet payload size in bytes
*/
*/
static
u32
ccid3_hc_tx_measure_packet_size
(
struct
sock
*
sk
,
const
u16
new_
len
)
static
inline
void
ccid3_hc_tx_update_s
(
struct
ccid3_hc_tx_sock
*
hctx
,
int
len
)
{
{
#if defined(CONFIG_IP_DCCP_CCID3_MEASURE_S_AS_AVG)
const
u16
old_s
=
hctx
->
ccid3hctx_s
;
return
tfrc_ewma
(
ccid3_hc_tx_sk
(
sk
)
->
s
,
new_len
,
9
);
#elif defined(CONFIG_IP_DCCP_CCID3_MEASURE_S_AS_MAX)
hctx
->
ccid3hctx_s
=
tfrc_ewma
(
hctx
->
ccid3hctx_s
,
len
,
9
);
return
max
(
ccid3_hc_tx_sk
(
sk
)
->
s
,
new_len
);
#else
/* CONFIG_IP_DCCP_CCID3_MEASURE_S_AS_MPS */
if
(
hctx
->
ccid3hctx_s
!=
old_s
)
return
dccp_sk
(
sk
)
->
dccps_mss_cache
;
ccid3_update_send_interval
(
hctx
);
#endif
}
}
/*
/*
...
@@ -158,13 +198,13 @@ static u32 ccid3_hc_tx_measure_packet_size(struct sock *sk, const u16 new_len)
...
@@ -158,13 +198,13 @@ static u32 ccid3_hc_tx_measure_packet_size(struct sock *sk, const u16 new_len)
static
inline
void
ccid3_hc_tx_update_win_count
(
struct
ccid3_hc_tx_sock
*
hctx
,
static
inline
void
ccid3_hc_tx_update_win_count
(
struct
ccid3_hc_tx_sock
*
hctx
,
ktime_t
now
)
ktime_t
now
)
{
{
u32
delta
=
ktime_us_delta
(
now
,
hctx
->
t_last_win_count
),
u32
delta
=
ktime_us_delta
(
now
,
hctx
->
ccid3hctx_
t_last_win_count
),
quarter_rtts
=
(
4
*
delta
)
/
hctx
->
rtt
;
quarter_rtts
=
(
4
*
delta
)
/
hctx
->
ccid3hctx_
rtt
;
if
(
quarter_rtts
>
0
)
{
if
(
quarter_rtts
>
0
)
{
hctx
->
t_last_win_count
=
now
;
hctx
->
ccid3hctx_
t_last_win_count
=
now
;
hctx
->
last_win_count
+=
min
(
quarter_rtts
,
5U
);
hctx
->
ccid3hctx_
last_win_count
+=
min
(
quarter_rtts
,
5U
);
hctx
->
last_win_count
&=
0xF
;
/* mod 16 */
hctx
->
ccid3hctx_last_win_count
&=
0xF
;
/* mod 16 */
}
}
}
}
...
@@ -181,26 +221,25 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long data)
...
@@ -181,26 +221,25 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long data)
goto
restart_timer
;
goto
restart_timer
;
}
}
ccid3_pr_debug
(
"%s(%p
) entry with%s feedback
\n
"
,
dccp_role
(
sk
),
sk
,
ccid3_pr_debug
(
"%s(%p
, state=%s) - entry
\n
"
,
dccp_role
(
sk
),
sk
,
hctx
->
feedback
?
""
:
"out"
);
ccid3_tx_state_name
(
hctx
->
ccid3hctx_state
)
);
/* Ignore and do not restart after leaving the established state */
if
(
hctx
->
ccid3hctx_state
==
TFRC_SSTATE_FBACK
)
if
((
1
<<
sk
->
sk_state
)
&
~
(
DCCPF_OPEN
|
DCCPF_PARTOPEN
))
ccid3_hc_tx_set_state
(
sk
,
TFRC_SSTATE_NO_FBACK
);
else
if
(
hctx
->
ccid3hctx_state
!=
TFRC_SSTATE_NO_FBACK
)
goto
out
;
goto
out
;
/* Reset feedback state to "no feedback received" */
hctx
->
feedback
=
false
;
/*
/*
* Determine new allowed sending rate X as per draft rfc3448bis-00, 4.4
* Determine new allowed sending rate X as per draft rfc3448bis-00, 4.4
* RTO is 0 if and only if no feedback has been received yet.
*/
*/
if
(
hctx
->
t_rto
==
0
||
hctx
->
p
==
0
)
{
if
(
hctx
->
ccid3hctx_t_rto
==
0
||
/* no feedback received yet */
hctx
->
ccid3hctx_p
==
0
)
{
/* halve send rate directly */
/* halve send rate directly */
hctx
->
x
/=
2
;
hctx
->
ccid3hctx_x
=
max
(
hctx
->
ccid3hctx_x
/
2
,
(((
__u64
)
hctx
->
ccid3hctx_s
)
<<
6
)
/
TFRC_T_MBI
);
ccid3_update_send_interval
(
hctx
);
ccid3_update_send_interval
(
hctx
);
}
else
{
}
else
{
/*
/*
* Modify the cached value of X_recv
* Modify the cached value of X_recv
...
@@ -212,41 +251,44 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long data)
...
@@ -212,41 +251,44 @@ static void ccid3_hc_tx_no_feedback_timer(unsigned long data)
*
*
* Note that X_recv is scaled by 2^6 while X_calc is not
* Note that X_recv is scaled by 2^6 while X_calc is not
*/
*/
BUG_ON
(
hctx
->
p
&&
!
hctx
->
x_calc
);
BUG_ON
(
hctx
->
ccid3hctx_p
&&
!
hctx
->
ccid3hctx_
x_calc
);
if
(
hctx
->
x_calc
>
(
hctx
->
x_recv
>>
5
))
if
(
hctx
->
ccid3hctx_x_calc
>
(
hctx
->
ccid3hctx_x_recv
>>
5
))
hctx
->
x_recv
/=
2
;
hctx
->
ccid3hctx_x_recv
=
max
(
hctx
->
ccid3hctx_x_recv
/
2
,
(((
__u64
)
hctx
->
ccid3hctx_s
)
<<
6
)
/
(
2
*
TFRC_T_MBI
));
else
{
else
{
hctx
->
x_recv
=
hctx
->
x_calc
;
hctx
->
ccid3hctx_x_recv
=
hctx
->
ccid3hctx_
x_calc
;
hctx
->
x_recv
<<=
4
;
hctx
->
ccid3hctx_
x_recv
<<=
4
;
}
}
ccid3_hc_tx_update_x
(
sk
,
NULL
);
ccid3_hc_tx_update_x
(
sk
,
NULL
);
}
}
ccid3_pr_debug
(
"Reduced X to %llu/64 bytes/sec
\n
"
,
ccid3_pr_debug
(
"Reduced X to %llu/64 bytes/sec
\n
"
,
(
unsigned
long
long
)
hctx
->
x
);
(
unsigned
long
long
)
hctx
->
ccid3hctx_
x
);
/*
/*
* Set new timeout for the nofeedback timer.
* Set new timeout for the nofeedback timer.
* See comments in packet_recv() regarding the value of t_RTO.
* See comments in packet_recv() regarding the value of t_RTO.
*/
*/
if
(
unlikely
(
hctx
->
t_rto
==
0
))
/* no feedback received
yet */
if
(
unlikely
(
hctx
->
ccid3hctx_t_rto
==
0
))
/* no feedback
yet */
t_nfb
=
TFRC_INITIAL_TIMEOUT
;
t_nfb
=
TFRC_INITIAL_TIMEOUT
;
else
else
t_nfb
=
max
(
hctx
->
t_rto
,
2
*
hctx
->
t_ipi
);
t_nfb
=
max
(
hctx
->
ccid3hctx_t_rto
,
2
*
hctx
->
ccid3hctx_
t_ipi
);
restart_timer:
restart_timer:
sk_reset_timer
(
sk
,
&
hctx
->
no_feedback_timer
,
sk_reset_timer
(
sk
,
&
hctx
->
ccid3hctx_
no_feedback_timer
,
jiffies
+
usecs_to_jiffies
(
t_nfb
));
jiffies
+
usecs_to_jiffies
(
t_nfb
));
out:
out:
bh_unlock_sock
(
sk
);
bh_unlock_sock
(
sk
);
sock_put
(
sk
);
sock_put
(
sk
);
}
}
/*
*
/*
*
ccid3_hc_tx_send_packet - Delay-based dequeueing of TX packet
s
*
return
s
*
@skb: next packet candidate to send on @sk
*
> 0: delay (in msecs) that should pass before actually sending
*
This function uses the convention of ccid_packet_dequeue_eval() and
*
= 0: can send immediately
*
returns a millisecond-delay value between 0 and t_mbi = 64000 msec.
*
< 0: error condition; do not send packet
*/
*/
static
int
ccid3_hc_tx_send_packet
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
int
ccid3_hc_tx_send_packet
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
{
...
@@ -263,14 +305,18 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
...
@@ -263,14 +305,18 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
if
(
unlikely
(
skb
->
len
==
0
))
if
(
unlikely
(
skb
->
len
==
0
))
return
-
EBADMSG
;
return
-
EBADMSG
;
if
(
hctx
->
s
==
0
)
{
switch
(
hctx
->
ccid3hctx_state
)
{
sk_reset_timer
(
sk
,
&
hctx
->
no_feedback_timer
,
(
jiffies
+
case
TFRC_SSTATE_NO_SENT
:
sk_reset_timer
(
sk
,
&
hctx
->
ccid3hctx_no_feedback_timer
,
(
jiffies
+
usecs_to_jiffies
(
TFRC_INITIAL_TIMEOUT
)));
usecs_to_jiffies
(
TFRC_INITIAL_TIMEOUT
)));
hctx
->
last_win_count
=
0
;
hctx
->
ccid3hctx_last_win_count
=
0
;
hctx
->
t_last_win_count
=
now
;
hctx
->
ccid3hctx_
t_last_win_count
=
now
;
/* Set t_0 for initial packet */
/* Set t_0 for initial packet */
hctx
->
t_nom
=
now
;
hctx
->
ccid3hctx_t_nom
=
now
;
hctx
->
ccid3hctx_s
=
skb
->
len
;
/*
/*
* Use initial RTT sample when available: recommended by erratum
* Use initial RTT sample when available: recommended by erratum
...
@@ -279,9 +325,9 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
...
@@ -279,9 +325,9 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
*/
*/
if
(
dp
->
dccps_syn_rtt
)
{
if
(
dp
->
dccps_syn_rtt
)
{
ccid3_pr_debug
(
"SYN RTT = %uus
\n
"
,
dp
->
dccps_syn_rtt
);
ccid3_pr_debug
(
"SYN RTT = %uus
\n
"
,
dp
->
dccps_syn_rtt
);
hctx
->
rtt
=
dp
->
dccps_syn_rtt
;
hctx
->
ccid3hctx_
rtt
=
dp
->
dccps_syn_rtt
;
hctx
->
x
=
rfc3390_initial_rate
(
sk
);
hctx
->
ccid3hctx_
x
=
rfc3390_initial_rate
(
sk
);
hctx
->
t_ld
=
now
;
hctx
->
ccid3hctx_
t_ld
=
now
;
}
else
{
}
else
{
/*
/*
* Sender does not have RTT sample:
* Sender does not have RTT sample:
...
@@ -289,20 +335,17 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
...
@@ -289,20 +335,17 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
* is needed in several parts (e.g. window counter);
* is needed in several parts (e.g. window counter);
* - set sending rate X_pps = 1pps as per RFC 3448, 4.2.
* - set sending rate X_pps = 1pps as per RFC 3448, 4.2.
*/
*/
hctx
->
rtt
=
DCCP_FALLBACK_RTT
;
hctx
->
ccid3hctx_
rtt
=
DCCP_FALLBACK_RTT
;
hctx
->
x
=
dp
->
dccps_mss_cache
;
hctx
->
ccid3hctx_x
=
hctx
->
ccid3hctx_s
;
hctx
->
x
<<=
6
;
hctx
->
ccid3hctx_
x
<<=
6
;
}
}
/* Compute t_ipi = s / X */
hctx
->
s
=
ccid3_hc_tx_measure_packet_size
(
sk
,
skb
->
len
);
ccid3_update_send_interval
(
hctx
);
ccid3_update_send_interval
(
hctx
);
/* Seed value for Oscillation Prevention (sec. 4.5) */
ccid3_hc_tx_set_state
(
sk
,
TFRC_SSTATE_NO_FBACK
);
hctx
->
r_sqmean
=
tfrc_scaled_sqrt
(
hctx
->
rtt
)
;
break
;
case
TFRC_SSTATE_NO_FBACK
:
}
else
{
case
TFRC_SSTATE_FBACK
:
delay
=
ktime_us_delta
(
hctx
->
t_nom
,
now
);
delay
=
ktime_us_delta
(
hctx
->
ccid3hctx_
t_nom
,
now
);
ccid3_pr_debug
(
"delay=%ld
\n
"
,
(
long
)
delay
);
ccid3_pr_debug
(
"delay=%ld
\n
"
,
(
long
)
delay
);
/*
/*
* Scheduling of packet transmissions [RFC 3448, 4.6]
* Scheduling of packet transmissions [RFC 3448, 4.6]
...
@@ -312,80 +355,99 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
...
@@ -312,80 +355,99 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
* else
* else
* // send the packet in (t_nom - t_now) milliseconds.
* // send the packet in (t_nom - t_now) milliseconds.
*/
*/
if
(
delay
>=
TFRC_T_DELTA
)
if
(
delay
-
(
s64
)
hctx
->
ccid3hctx_delta
>=
1000
)
return
(
u32
)
delay
/
USEC_PER_MSEC
;
return
(
u32
)
delay
/
1000L
;
ccid3_hc_tx_update_win_count
(
hctx
,
now
);
ccid3_hc_tx_update_win_count
(
hctx
,
now
);
break
;
case
TFRC_SSTATE_TERM
:
DCCP_BUG
(
"%s(%p) - Illegal state TERM"
,
dccp_role
(
sk
),
sk
);
return
-
EINVAL
;
}
}
/* prepare to send now (add options etc.) */
/* prepare to send now (add options etc.) */
dp
->
dccps_hc_tx_insert_options
=
1
;
dp
->
dccps_hc_tx_insert_options
=
1
;
DCCP_SKB_CB
(
skb
)
->
dccpd_ccval
=
hctx
->
last_win_count
;
DCCP_SKB_CB
(
skb
)
->
dccpd_ccval
=
hctx
->
ccid3hctx_
last_win_count
;
/* set the nominal send time for the next following packet */
/* set the nominal send time for the next following packet */
hctx
->
t_nom
=
ktime_add_us
(
hctx
->
t_nom
,
hctx
->
t_ipi
);
hctx
->
ccid3hctx_t_nom
=
ktime_add_us
(
hctx
->
ccid3hctx_t_nom
,
return
CCID_PACKET_SEND_AT_ONCE
;
hctx
->
ccid3hctx_t_ipi
);
return
0
;
}
}
static
void
ccid3_hc_tx_packet_sent
(
struct
sock
*
sk
,
unsigned
int
len
)
static
void
ccid3_hc_tx_packet_sent
(
struct
sock
*
sk
,
int
more
,
unsigned
int
len
)
{
{
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
/* Changes to s will become effective the next time X is computed */
ccid3_hc_tx_update_s
(
hctx
,
len
);
hctx
->
s
=
ccid3_hc_tx_measure_packet_size
(
sk
,
len
);
if
(
tfrc_tx_hist_add
(
&
hctx
->
hist
,
dccp_sk
(
sk
)
->
dccps_gss
))
if
(
tfrc_tx_hist_add
(
&
hctx
->
ccid3hctx_
hist
,
dccp_sk
(
sk
)
->
dccps_gss
))
DCCP_CRIT
(
"packet history - out of memory!"
);
DCCP_CRIT
(
"packet history - out of memory!"
);
}
}
static
void
ccid3_hc_tx_packet_recv
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
void
ccid3_hc_tx_packet_recv
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
{
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
struct
tfrc_tx_hist_entry
*
acked
;
struct
ccid3_options_received
*
opt_recv
;
ktime_t
now
;
ktime_t
now
;
unsigned
long
t_nfb
;
unsigned
long
t_nfb
;
u32
r_sample
;
u32
pinv
,
r_sample
;
/* we are only interested in ACKs */
/* we are only interested in ACKs */
if
(
!
(
DCCP_SKB_CB
(
skb
)
->
dccpd_type
==
DCCP_PKT_ACK
||
if
(
!
(
DCCP_SKB_CB
(
skb
)
->
dccpd_type
==
DCCP_PKT_ACK
||
DCCP_SKB_CB
(
skb
)
->
dccpd_type
==
DCCP_PKT_DATAACK
))
DCCP_SKB_CB
(
skb
)
->
dccpd_type
==
DCCP_PKT_DATAACK
))
return
;
return
;
/*
/* ... and only in the established state */
* Locate the acknowledged packet in the TX history.
if
(
hctx
->
ccid3hctx_state
!=
TFRC_SSTATE_FBACK
&&
*
hctx
->
ccid3hctx_state
!=
TFRC_SSTATE_NO_FBACK
)
* Returning "entry not found" here can for instance happen when
return
;
* - the host has not sent out anything (e.g. a passive server),
* - the Ack is outdated (packet with higher Ack number was received),
opt_recv
=
&
hctx
->
ccid3hctx_options_received
;
* - it is a bogus Ack (for a packet not sent on this connection).
now
=
ktime_get_real
();
*/
acked
=
tfrc_tx_hist_find_entry
(
hctx
->
hist
,
dccp_hdr_ack_seq
(
skb
));
/* Estimate RTT from history if ACK number is valid */
if
(
acked
==
NULL
)
r_sample
=
tfrc_tx_hist_rtt
(
hctx
->
ccid3hctx_hist
,
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
,
now
);
if
(
r_sample
==
0
)
{
DCCP_WARN
(
"%s(%p): %s with bogus ACK-%llu
\n
"
,
dccp_role
(
sk
),
sk
,
dccp_packet_name
(
DCCP_SKB_CB
(
skb
)
->
dccpd_type
),
(
unsigned
long
long
)
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
);
return
;
return
;
/* For the sake of RTT sampling, ignore/remove all older entries */
}
tfrc_tx_hist_purge
(
&
acked
->
next
);
/* Update the moving average for the RTT estimate (RFC 3448, 4.3) */
/* Update receive rate in units of 64 * bytes/second */
now
=
ktime_get_real
();
hctx
->
ccid3hctx_x_recv
=
opt_recv
->
ccid3or_receive_rate
;
r_sample
=
dccp_sample_rtt
(
sk
,
ktime_us_delta
(
now
,
acked
->
stamp
));
hctx
->
ccid3hctx_x_recv
<<=
6
;
hctx
->
rtt
=
tfrc_ewma
(
hctx
->
rtt
,
r_sample
,
9
);
/* Update loss event rate (which is scaled by 1e6) */
pinv
=
opt_recv
->
ccid3or_loss_event_rate
;
if
(
pinv
==
~
0U
||
pinv
==
0
)
/* see RFC 4342, 8.5 */
hctx
->
ccid3hctx_p
=
0
;
else
/* can not exceed 100% */
hctx
->
ccid3hctx_p
=
scaled_div
(
1
,
pinv
);
/*
* Validate new RTT sample and update moving average
*/
r_sample
=
dccp_sample_rtt
(
sk
,
r_sample
);
hctx
->
ccid3hctx_rtt
=
tfrc_ewma
(
hctx
->
ccid3hctx_rtt
,
r_sample
,
9
);
/*
/*
* Update allowed sending rate X as per draft rfc3448bis-00, 4.2/3
* Update allowed sending rate X as per draft rfc3448bis-00, 4.2/3
*/
*/
if
(
!
hctx
->
feedback
)
{
if
(
hctx
->
ccid3hctx_state
==
TFRC_SSTATE_NO_FBACK
)
{
hctx
->
feedback
=
true
;
ccid3_hc_tx_set_state
(
sk
,
TFRC_SSTATE_FBACK
)
;
if
(
hctx
->
t_rto
==
0
)
{
if
(
hctx
->
ccid3hctx_
t_rto
==
0
)
{
/*
/*
* Initial feedback packet: Larger Initial Windows (4.2)
* Initial feedback packet: Larger Initial Windows (4.2)
*/
*/
hctx
->
x
=
rfc3390_initial_rate
(
sk
);
hctx
->
ccid3hctx_
x
=
rfc3390_initial_rate
(
sk
);
hctx
->
t_ld
=
now
;
hctx
->
ccid3hctx_
t_ld
=
now
;
ccid3_update_send_interval
(
hctx
);
ccid3_update_send_interval
(
hctx
);
goto
done_computing_x
;
goto
done_computing_x
;
}
else
if
(
hctx
->
p
==
0
)
{
}
else
if
(
hctx
->
ccid3hctx_
p
==
0
)
{
/*
/*
* First feedback after nofeedback timer expiry (4.3)
* First feedback after nofeedback timer expiry (4.3)
*/
*/
...
@@ -394,52 +456,25 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -394,52 +456,25 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
}
}
/* Update sending rate (step 4 of [RFC 3448, 4.3]) */
/* Update sending rate (step 4 of [RFC 3448, 4.3]) */
if
(
hctx
->
p
>
0
)
if
(
hctx
->
ccid3hctx_p
>
0
)
hctx
->
x_calc
=
tfrc_calc_x
(
hctx
->
s
,
hctx
->
rtt
,
hctx
->
p
);
hctx
->
ccid3hctx_x_calc
=
tfrc_calc_x
(
hctx
->
ccid3hctx_s
,
hctx
->
ccid3hctx_rtt
,
hctx
->
ccid3hctx_p
);
ccid3_hc_tx_update_x
(
sk
,
&
now
);
ccid3_hc_tx_update_x
(
sk
,
&
now
);
done_computing_x:
done_computing_x:
ccid3_pr_debug
(
"%s(%p), RTT=%uus (sample=%uus), s=%u, "
ccid3_pr_debug
(
"%s(%p), RTT=%uus (sample=%uus), s=%u, "
"p=%u, X_calc=%u, X_recv=%u, X=%u
\n
"
,
"p=%u, X_calc=%u, X_recv=%u, X=%u
\n
"
,
dccp_role
(
sk
),
sk
,
hctx
->
rtt
,
r_sample
,
dccp_role
(
sk
),
hctx
->
s
,
hctx
->
p
,
hctx
->
x_calc
,
sk
,
hctx
->
ccid3hctx_rtt
,
r_sample
,
(
unsigned
)(
hctx
->
x_recv
>>
6
),
hctx
->
ccid3hctx_s
,
hctx
->
ccid3hctx_p
,
(
unsigned
)(
hctx
->
x
>>
6
));
hctx
->
ccid3hctx_x_calc
,
/*
(
unsigned
)(
hctx
->
ccid3hctx_x_recv
>>
6
),
* Oscillation Reduction (RFC 3448, 4.5) - modifying t_ipi according to
(
unsigned
)(
hctx
->
ccid3hctx_x
>>
6
));
* RTT changes, multiplying by X/X_inst = sqrt(R_sample)/R_sqmean. This
* can be useful if few connections share a link, avoiding that buffer
* fill levels (RTT) oscillate as a result of frequent adjustments to X.
* A useful presentation with background information is in
* Joerg Widmer, "Equation-Based Congestion Control",
* MSc Thesis, University of Mannheim, Germany, 2000
* (sec. 3.6.4), who calls this ISM ("Inter-packet Space Modulation").
*/
if
(
do_osc_prev
)
{
r_sample
=
tfrc_scaled_sqrt
(
r_sample
);
/*
* The modulation can work in both ways: increase/decrease t_ipi
* according to long-term increases/decreases of the RTT. The
* former is a useful measure, since it works against queue
* build-up. The latter temporarily increases the sending rate,
* so that buffers fill up more quickly. This in turn causes
* the RTT to increase, so that either later reduction becomes
* necessary or the RTT stays at a very high level. Decreasing
* t_ipi is therefore not supported.
* Furthermore, during the initial slow-start phase the RTT
* naturally increases, where using the algorithm would cause
* delays. Hence it is disabled during the initial slow-start.
*/
if
(
r_sample
>
hctx
->
r_sqmean
&&
hctx
->
p
>
0
)
hctx
->
t_ipi
=
div_u64
((
u64
)
hctx
->
t_ipi
*
(
u64
)
r_sample
,
hctx
->
r_sqmean
);
hctx
->
t_ipi
=
min_t
(
u32
,
hctx
->
t_ipi
,
TFRC_T_MBI
);
/* update R_sqmean _after_ computing the modulation factor */
hctx
->
r_sqmean
=
tfrc_ewma
(
hctx
->
r_sqmean
,
r_sample
,
9
);
}
/* unschedule no feedback timer */
/* unschedule no feedback timer */
sk_stop_timer
(
sk
,
&
hctx
->
no_feedback_timer
);
sk_stop_timer
(
sk
,
&
hctx
->
ccid3hctx_
no_feedback_timer
);
/*
/*
* As we have calculated new ipi, delta, t_nom it is possible
* As we have calculated new ipi, delta, t_nom it is possible
...
@@ -453,66 +488,95 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
...
@@ -453,66 +488,95 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
* This can help avoid triggering the nofeedback timer too
* This can help avoid triggering the nofeedback timer too
* often ('spinning') on LANs with small RTTs.
* often ('spinning') on LANs with small RTTs.
*/
*/
hctx
->
t_rto
=
max_t
(
u32
,
4
*
hctx
->
rtt
,
(
CONFIG_IP_DCCP_CCID3_RTO
*
hctx
->
ccid3hctx_t_rto
=
max_t
(
u32
,
4
*
hctx
->
ccid3hctx_rtt
,
(
USEC_PER_SEC
/
1000
)));
(
CONFIG_IP_DCCP_CCID3_RTO
*
(
USEC_PER_SEC
/
1000
)));
/*
/*
* Schedule no feedback timer to expire in
* Schedule no feedback timer to expire in
* max(t_RTO, 2 * s/X) = max(t_RTO, 2 * t_ipi)
* max(t_RTO, 2 * s/X) = max(t_RTO, 2 * t_ipi)
*/
*/
t_nfb
=
max
(
hctx
->
t_rto
,
2
*
hctx
->
t_ipi
);
t_nfb
=
max
(
hctx
->
ccid3hctx_t_rto
,
2
*
hctx
->
ccid3hctx_
t_ipi
);
ccid3_pr_debug
(
"%s(%p), Scheduled no feedback timer to "
ccid3_pr_debug
(
"%s(%p), Scheduled no feedback timer to "
"expire in %lu jiffies (%luus)
\n
"
,
"expire in %lu jiffies (%luus)
\n
"
,
dccp_role
(
sk
),
sk
,
usecs_to_jiffies
(
t_nfb
),
t_nfb
);
dccp_role
(
sk
),
sk
,
usecs_to_jiffies
(
t_nfb
),
t_nfb
);
sk_reset_timer
(
sk
,
&
hctx
->
no_feedback_timer
,
sk_reset_timer
(
sk
,
&
hctx
->
ccid3hctx_
no_feedback_timer
,
jiffies
+
usecs_to_jiffies
(
t_nfb
));
jiffies
+
usecs_to_jiffies
(
t_nfb
));
}
}
static
int
ccid3_hc_tx_parse_options
(
struct
sock
*
sk
,
u8
packet_type
,
static
int
ccid3_hc_tx_parse_options
(
struct
sock
*
sk
,
unsigned
char
option
,
u8
option
,
u8
*
optval
,
u8
optlen
)
unsigned
char
len
,
u16
idx
,
unsigned
char
*
value
)
{
{
int
rc
=
0
;
const
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
struct
ccid3_options_received
*
opt_recv
;
__be32
opt_val
;
__be32
opt_val
;
switch
(
option
)
{
opt_recv
=
&
hctx
->
ccid3hctx_options_received
;
case
TFRC_OPT_RECEIVE_RATE
:
case
TFRC_OPT_LOSS_EVENT_RATE
:
/* Must be ignored on Data packets, cf. RFC 4342 8.3 and 8.5 */
if
(
packet_type
==
DCCP_PKT_DATA
)
break
;
if
(
unlikely
(
optlen
!=
4
))
{
DCCP_WARN
(
"%s(%p), invalid len %d for %u
\n
"
,
dccp_role
(
sk
),
sk
,
optlen
,
option
);
return
-
EINVAL
;
}
opt_val
=
ntohl
(
get_unaligned
((
__be32
*
)
optval
));
if
(
option
==
TFRC_OPT_RECEIVE_RATE
)
{
if
(
opt_recv
->
ccid3or_seqno
!=
dp
->
dccps_gsr
)
{
/* Receive Rate is kept in units of 64 bytes/second */
opt_recv
->
ccid3or_seqno
=
dp
->
dccps_gsr
;
hctx
->
x_recv
=
opt_val
;
opt_recv
->
ccid3or_loss_event_rate
=
~
0
;
hctx
->
x_recv
<<=
6
;
opt_recv
->
ccid3or_loss_intervals_idx
=
0
;
opt_recv
->
ccid3or_loss_intervals_len
=
0
;
opt_recv
->
ccid3or_receive_rate
=
0
;
}
ccid3_pr_debug
(
"%s(%p), RECEIVE_RATE=%u
\n
"
,
switch
(
option
)
{
dccp_role
(
sk
),
sk
,
opt_val
);
case
TFRC_OPT_LOSS_EVENT_RATE
:
if
(
unlikely
(
len
!=
4
))
{
DCCP_WARN
(
"%s(%p), invalid len %d "
"for TFRC_OPT_LOSS_EVENT_RATE
\n
"
,
dccp_role
(
sk
),
sk
,
len
);
rc
=
-
EINVAL
;
}
else
{
}
else
{
/* Update the fixpoint Loss Event Rate fraction */
opt_val
=
get_unaligned
((
__be32
*
)
value
);
hctx
->
p
=
tfrc_invert_loss_event_rate
(
opt_val
);
opt_recv
->
ccid3or_loss_event_rate
=
ntohl
(
opt_val
);
ccid3_pr_debug
(
"%s(%p), LOSS_EVENT_RATE=%u
\n
"
,
ccid3_pr_debug
(
"%s(%p), LOSS_EVENT_RATE=%u
\n
"
,
dccp_role
(
sk
),
sk
,
opt_val
);
dccp_role
(
sk
),
sk
,
opt_recv
->
ccid3or_loss_event_rate
);
}
}
break
;
case
TFRC_OPT_LOSS_INTERVALS
:
opt_recv
->
ccid3or_loss_intervals_idx
=
idx
;
opt_recv
->
ccid3or_loss_intervals_len
=
len
;
ccid3_pr_debug
(
"%s(%p), LOSS_INTERVALS=(%u, %u)
\n
"
,
dccp_role
(
sk
),
sk
,
opt_recv
->
ccid3or_loss_intervals_idx
,
opt_recv
->
ccid3or_loss_intervals_len
);
break
;
case
TFRC_OPT_RECEIVE_RATE
:
if
(
unlikely
(
len
!=
4
))
{
DCCP_WARN
(
"%s(%p), invalid len %d "
"for TFRC_OPT_RECEIVE_RATE
\n
"
,
dccp_role
(
sk
),
sk
,
len
);
rc
=
-
EINVAL
;
}
else
{
opt_val
=
get_unaligned
((
__be32
*
)
value
);
opt_recv
->
ccid3or_receive_rate
=
ntohl
(
opt_val
);
ccid3_pr_debug
(
"%s(%p), RECEIVE_RATE=%u
\n
"
,
dccp_role
(
sk
),
sk
,
opt_recv
->
ccid3or_receive_rate
);
}
break
;
}
}
return
0
;
return
rc
;
}
}
static
int
ccid3_hc_tx_init
(
struct
ccid
*
ccid
,
struct
sock
*
sk
)
static
int
ccid3_hc_tx_init
(
struct
ccid
*
ccid
,
struct
sock
*
sk
)
{
{
struct
ccid3_hc_tx_sock
*
hctx
=
ccid_priv
(
ccid
);
struct
ccid3_hc_tx_sock
*
hctx
=
ccid_priv
(
ccid
);
hctx
->
hist
=
NULL
;
hctx
->
ccid3hctx_state
=
TFRC_SSTATE_NO_SENT
;
setup_timer
(
&
hctx
->
no_feedback_timer
,
hctx
->
ccid3hctx_hist
=
NULL
;
ccid3_hc_tx_no_feedback_timer
,
(
unsigned
long
)
sk
);
setup_timer
(
&
hctx
->
ccid3hctx_no_feedback_timer
,
ccid3_hc_tx_no_feedback_timer
,
(
unsigned
long
)
sk
);
return
0
;
return
0
;
}
}
...
@@ -520,36 +584,42 @@ static void ccid3_hc_tx_exit(struct sock *sk)
...
@@ -520,36 +584,42 @@ static void ccid3_hc_tx_exit(struct sock *sk)
{
{
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
sk_stop_timer
(
sk
,
&
hctx
->
no_feedback_timer
);
ccid3_hc_tx_set_state
(
sk
,
TFRC_SSTATE_TERM
);
tfrc_tx_hist_purge
(
&
hctx
->
hist
);
sk_stop_timer
(
sk
,
&
hctx
->
ccid3hctx_no_feedback_timer
);
tfrc_tx_hist_purge
(
&
hctx
->
ccid3hctx_hist
);
}
}
static
void
ccid3_hc_tx_get_info
(
struct
sock
*
sk
,
struct
tcp_info
*
info
)
static
void
ccid3_hc_tx_get_info
(
struct
sock
*
sk
,
struct
tcp_info
*
info
)
{
{
info
->
tcpi_rto
=
ccid3_hc_tx_sk
(
sk
)
->
t_rto
;
struct
ccid3_hc_tx_sock
*
hctx
;
info
->
tcpi_rtt
=
ccid3_hc_tx_sk
(
sk
)
->
rtt
;
/* Listen socks doesn't have a private CCID block */
if
(
sk
->
sk_state
==
DCCP_LISTEN
)
return
;
hctx
=
ccid3_hc_tx_sk
(
sk
);
info
->
tcpi_rto
=
hctx
->
ccid3hctx_t_rto
;
info
->
tcpi_rtt
=
hctx
->
ccid3hctx_rtt
;
}
}
static
int
ccid3_hc_tx_getsockopt
(
struct
sock
*
sk
,
const
int
optname
,
int
len
,
static
int
ccid3_hc_tx_getsockopt
(
struct
sock
*
sk
,
const
int
optname
,
int
len
,
u32
__user
*
optval
,
int
__user
*
optlen
)
u32
__user
*
optval
,
int
__user
*
optlen
)
{
{
const
struct
ccid3_hc_tx_sock
*
hctx
=
ccid3_hc_tx_sk
(
sk
);
const
struct
ccid3_hc_tx_sock
*
hctx
;
struct
tfrc_tx_info
tfrc
;
const
void
*
val
;
const
void
*
val
;
/* Listen socks doesn't have a private CCID block */
if
(
sk
->
sk_state
==
DCCP_LISTEN
)
return
-
EINVAL
;
hctx
=
ccid3_hc_tx_sk
(
sk
);
switch
(
optname
)
{
switch
(
optname
)
{
case
DCCP_SOCKOPT_CCID_TX_INFO
:
case
DCCP_SOCKOPT_CCID_TX_INFO
:
if
(
len
<
sizeof
(
tfrc
))
if
(
len
<
sizeof
(
hctx
->
ccid3hctx_
tfrc
))
return
-
EINVAL
;
return
-
EINVAL
;
tfrc
.
tfrctx_x
=
hctx
->
x
;
len
=
sizeof
(
hctx
->
ccid3hctx_tfrc
);
tfrc
.
tfrctx_x_recv
=
hctx
->
x_recv
;
val
=
&
hctx
->
ccid3hctx_tfrc
;
tfrc
.
tfrctx_x_calc
=
hctx
->
x_calc
;
tfrc
.
tfrctx_rtt
=
hctx
->
rtt
;
tfrc
.
tfrctx_p
=
hctx
->
p
;
tfrc
.
tfrctx_rto
=
hctx
->
t_rto
;
tfrc
.
tfrctx_ipi
=
hctx
->
t_ipi
;
len
=
sizeof
(
tfrc
);
val
=
&
tfrc
;
break
;
break
;
default:
default:
return
-
ENOPROTOOPT
;
return
-
ENOPROTOOPT
;
...
@@ -564,82 +634,112 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len,
...
@@ -564,82 +634,112 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len,
/*
/*
* Receiver Half-Connection Routines
* Receiver Half-Connection Routines
*/
*/
/* CCID3 feedback types */
enum
ccid3_fback_type
{
CCID3_FBACK_NONE
=
0
,
CCID3_FBACK_INITIAL
,
CCID3_FBACK_PERIODIC
,
CCID3_FBACK_PARAM_CHANGE
};
#ifdef CONFIG_IP_DCCP_CCID3_DEBUG
static
const
char
*
ccid3_rx_state_name
(
enum
ccid3_hc_rx_states
state
)
{
static
char
*
ccid3_rx_state_names
[]
=
{
[
TFRC_RSTATE_NO_DATA
]
=
"NO_DATA"
,
[
TFRC_RSTATE_DATA
]
=
"DATA"
,
[
TFRC_RSTATE_TERM
]
=
"TERM"
,
};
return
ccid3_rx_state_names
[
state
];
}
#endif
static
void
ccid3_hc_rx_set_state
(
struct
sock
*
sk
,
enum
ccid3_hc_rx_states
state
)
{
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
);
enum
ccid3_hc_rx_states
oldstate
=
hcrx
->
ccid3hcrx_state
;
ccid3_pr_debug
(
"%s(%p) %-8.8s -> %s
\n
"
,
dccp_role
(
sk
),
sk
,
ccid3_rx_state_name
(
oldstate
),
ccid3_rx_state_name
(
state
));
WARN_ON
(
state
==
oldstate
);
hcrx
->
ccid3hcrx_state
=
state
;
}
static
void
ccid3_hc_rx_send_feedback
(
struct
sock
*
sk
,
static
void
ccid3_hc_rx_send_feedback
(
struct
sock
*
sk
,
const
struct
sk_buff
*
skb
,
const
struct
sk_buff
*
skb
,
enum
ccid3_fback_type
fbtype
)
enum
ccid3_fback_type
fbtype
)
{
{
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
);
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
ktime_t
now
;
s64
delta
=
0
;
if
(
unlikely
(
hcrx
->
ccid3hcrx_state
==
TFRC_RSTATE_TERM
))
return
;
now
=
ktime_get_real
();
switch
(
fbtype
)
{
switch
(
fbtype
)
{
case
CCID3_FBACK_INITIAL
:
case
CCID3_FBACK_INITIAL
:
hcrx
->
x_recv
=
0
;
hcrx
->
ccid3hcrx_
x_recv
=
0
;
hcrx
->
p_inverse
=
~
0U
;
/* see RFC 4342, 8.5 */
hcrx
->
ccid3hcrx_pinv
=
~
0U
;
/* see RFC 4342, 8.5 */
break
;
break
;
case
CCID3_FBACK_PARAM_CHANGE
:
case
CCID3_FBACK_PARAM_CHANGE
:
if
(
unlikely
(
hcrx
->
feedback
==
CCID3_FBACK_NONE
))
{
/*
* rfc3448bis-06, 6.3.1: First packet(s) lost or marked
* FIXME: in rfc3448bis the receiver returns X_recv=0
* here as it normally would in the first feedback packet.
* However this is not possible yet, since the code still
* uses RFC 3448, i.e.
* If (p > 0)
* Calculate X_calc using the TCP throughput equation.
* X = max(min(X_calc, 2*X_recv), s/t_mbi);
* would bring X down to s/t_mbi. That is why we return
* X_recv according to rfc3448bis-06 for the moment.
*/
u32
s
=
tfrc_rx_hist_packet_size
(
&
hcrx
->
hist
),
rtt
=
tfrc_rx_hist_rtt
(
&
hcrx
->
hist
);
hcrx
->
x_recv
=
scaled_div32
(
s
,
2
*
rtt
);
break
;
}
/*
/*
* When parameters change (new loss or p > p_prev), we do not
* When parameters change (new loss or p > p_prev), we do not
* have a reliable estimate for R_m of [RFC 3448, 6.2] and so
* have a reliable estimate for R_m of [RFC 3448, 6.2] and so
* always check whether at least RTT time units were covered.
* need to reuse the previous value of X_recv. However, when
* X_recv was 0 (due to early loss), this would kill X down to
* s/t_mbi (i.e. one packet in 64 seconds).
* To avoid such drastic reduction, we approximate X_recv as
* the number of bytes since last feedback.
* This is a safe fallback, since X is bounded above by X_calc.
*/
*/
hcrx
->
x_recv
=
tfrc_rx_hist_x_recv
(
&
hcrx
->
hist
,
hcrx
->
x_recv
);
if
(
hcrx
->
ccid3hcrx_x_recv
>
0
)
break
;
break
;
/* fall through */
case
CCID3_FBACK_PERIODIC
:
case
CCID3_FBACK_PERIODIC
:
/*
delta
=
ktime_us_delta
(
now
,
hcrx
->
ccid3hcrx_tstamp_last_feedback
);
* Step (2) of rfc3448bis-06, 6.2:
if
(
delta
<=
0
)
* - if no data packets have been received, just restart timer
DCCP_BUG
(
"delta (%ld) <= 0"
,
(
long
)
delta
);
* - if data packets have been received, re-compute X_recv
else
*/
hcrx
->
ccid3hcrx_x_recv
=
if
(
hcrx
->
hist
.
bytes_recvd
==
0
)
scaled_div32
(
hcrx
->
ccid3hcrx_bytes_recv
,
delta
);
goto
prepare_for_next_time
;
hcrx
->
x_recv
=
tfrc_rx_hist_x_recv
(
&
hcrx
->
hist
,
hcrx
->
x_recv
);
break
;
break
;
default:
default:
return
;
return
;
}
}
ccid3_pr_debug
(
"X_recv=%u, 1/p=%u
\n
"
,
hcrx
->
x_recv
,
hcrx
->
p_inverse
);
ccid3_pr_debug
(
"Interval %ldusec, X_recv=%u, 1/p=%u
\n
"
,
(
long
)
delta
,
hcrx
->
ccid3hcrx_x_recv
,
hcrx
->
ccid3hcrx_pinv
);
dccp_sk
(
sk
)
->
dccps_hc_rx_insert_options
=
1
;
hcrx
->
ccid3hcrx_tstamp_last_feedback
=
now
;
dccp_send_ack
(
sk
);
hcrx
->
ccid3hcrx_last_counter
=
dccp_hdr
(
skb
)
->
dccph_ccval
;
hcrx
->
ccid3hcrx_bytes_recv
=
0
;
prepare_for_next_time:
dp
->
dccps_hc_rx_insert_options
=
1
;
tfrc_rx_hist_restart_byte_counter
(
&
hcrx
->
hist
);
dccp_send_ack
(
sk
);
hcrx
->
last_counter
=
dccp_hdr
(
skb
)
->
dccph_ccval
;
hcrx
->
feedback
=
fbtype
;
}
}
static
int
ccid3_hc_rx_insert_options
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
int
ccid3_hc_rx_insert_options
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
{
const
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
)
;
const
struct
ccid3_hc_rx_sock
*
hcrx
;
__be32
x_recv
,
pinv
;
__be32
x_recv
,
pinv
;
if
(
!
(
sk
->
sk_state
==
DCCP_OPEN
||
sk
->
sk_state
==
DCCP_PARTOPEN
))
if
(
!
(
sk
->
sk_state
==
DCCP_OPEN
||
sk
->
sk_state
==
DCCP_PARTOPEN
))
return
0
;
return
0
;
hcrx
=
ccid3_hc_rx_sk
(
sk
);
if
(
dccp_packet_without_ack
(
skb
))
if
(
dccp_packet_without_ack
(
skb
))
return
0
;
return
0
;
x_recv
=
htonl
(
hcrx
->
x_recv
);
x_recv
=
htonl
(
hcrx
->
ccid3hcrx_
x_recv
);
pinv
=
htonl
(
hcrx
->
p_inverse
);
pinv
=
htonl
(
hcrx
->
ccid3hcrx_pinv
);
if
(
dccp_insert_option
(
sk
,
skb
,
TFRC_OPT_LOSS_EVENT_RATE
,
if
(
dccp_insert_option
(
sk
,
skb
,
TFRC_OPT_LOSS_EVENT_RATE
,
&
pinv
,
sizeof
(
pinv
))
||
&
pinv
,
sizeof
(
pinv
))
||
...
@@ -662,95 +762,171 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb)
...
@@ -662,95 +762,171 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb)
static
u32
ccid3_first_li
(
struct
sock
*
sk
)
static
u32
ccid3_first_li
(
struct
sock
*
sk
)
{
{
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
);
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
);
u32
s
=
tfrc_rx_hist_packet_size
(
&
hcrx
->
hist
),
u32
x_recv
,
p
,
delta
;
rtt
=
tfrc_rx_hist_rtt
(
&
hcrx
->
hist
),
x_recv
,
p
;
u64
fval
;
u64
fval
;
/*
if
(
hcrx
->
ccid3hcrx_rtt
==
0
)
{
* rfc3448bis-06, 6.3.1: First data packet(s) are marked or lost. Set p
DCCP_WARN
(
"No RTT estimate available, using fallback RTT
\n
"
);
* to give the equivalent of X_target = s/(2*R). Thus fval = 2 and so p
hcrx
->
ccid3hcrx_rtt
=
DCCP_FALLBACK_RTT
;
* is about 20.64%. This yields an interval length of 4.84 (rounded up).
}
*/
if
(
unlikely
(
hcrx
->
feedback
==
CCID3_FBACK_NONE
))
return
5
;
x_recv
=
tfrc_rx_hist_x_recv
(
&
hcrx
->
hist
,
hcrx
->
x_recv
);
delta
=
ktime_to_us
(
net_timedelta
(
hcrx
->
ccid3hcrx_tstamp_last_feedback
));
if
(
x_recv
==
0
)
x_recv
=
scaled_div32
(
hcrx
->
ccid3hcrx_bytes_recv
,
delta
);
goto
failed
;
if
(
x_recv
==
0
)
{
/* would also trigger divide-by-zero */
DCCP_WARN
(
"X_recv==0
\n
"
);
if
((
x_recv
=
hcrx
->
ccid3hcrx_x_recv
)
==
0
)
{
DCCP_BUG
(
"stored value of X_recv is zero"
);
return
~
0U
;
}
}
fval
=
scaled_div32
(
scaled_div
(
s
,
rtt
),
x_recv
);
fval
=
scaled_div
(
hcrx
->
ccid3hcrx_s
,
hcrx
->
ccid3hcrx_rtt
);
fval
=
scaled_div32
(
fval
,
x_recv
);
p
=
tfrc_calc_x_reverse_lookup
(
fval
);
p
=
tfrc_calc_x_reverse_lookup
(
fval
);
ccid3_pr_debug
(
"%s(%p), receive rate=%u bytes/s, implied "
ccid3_pr_debug
(
"%s(%p), receive rate=%u bytes/s, implied "
"loss rate=%u
\n
"
,
dccp_role
(
sk
),
sk
,
x_recv
,
p
);
"loss rate=%u
\n
"
,
dccp_role
(
sk
),
sk
,
x_recv
,
p
);
if
(
p
>
0
)
return
p
==
0
?
~
0U
:
scaled_div
(
1
,
p
);
return
scaled_div
(
1
,
p
);
failed:
return
UINT_MAX
;
}
}
static
void
ccid3_hc_rx_packet_recv
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
void
ccid3_hc_rx_packet_recv
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
{
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
);
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
);
enum
ccid3_fback_type
do_feedback
=
CCID3_FBACK_NONE
;
const
u64
ndp
=
dccp_sk
(
sk
)
->
dccps_options_received
.
dccpor_ndp
;
const
u64
ndp
=
dccp_sk
(
sk
)
->
dccps_options_received
.
dccpor_ndp
;
const
bool
is_data_packet
=
dccp_data_packet
(
skb
);
const
bool
is_data_packet
=
dccp_data_packet
(
skb
);
if
(
unlikely
(
hcrx
->
ccid3hcrx_state
==
TFRC_RSTATE_NO_DATA
))
{
if
(
is_data_packet
)
{
const
u32
payload
=
skb
->
len
-
dccp_hdr
(
skb
)
->
dccph_doff
*
4
;
do_feedback
=
CCID3_FBACK_INITIAL
;
ccid3_hc_rx_set_state
(
sk
,
TFRC_RSTATE_DATA
);
hcrx
->
ccid3hcrx_s
=
payload
;
/*
* Not necessary to update ccid3hcrx_bytes_recv here,
* since X_recv = 0 for the first feedback packet (cf.
* RFC 3448, 6.3) -- gerrit
*/
}
goto
update_records
;
}
if
(
tfrc_rx_hist_duplicate
(
&
hcrx
->
ccid3hcrx_hist
,
skb
))
return
;
/* done receiving */
if
(
is_data_packet
)
{
const
u32
payload
=
skb
->
len
-
dccp_hdr
(
skb
)
->
dccph_doff
*
4
;
/*
* Update moving-average of s and the sum of received payload bytes
*/
hcrx
->
ccid3hcrx_s
=
tfrc_ewma
(
hcrx
->
ccid3hcrx_s
,
payload
,
9
);
hcrx
->
ccid3hcrx_bytes_recv
+=
payload
;
}
/*
/*
* Perform loss detection and handle pending losses
* Perform loss detection and handle pending losses
*/
*/
if
(
tfrc_rx_congestion_event
(
&
hcrx
->
hist
,
&
hcrx
->
li_hist
,
if
(
tfrc_rx_handle_loss
(
&
hcrx
->
ccid3hcrx_hist
,
&
hcrx
->
ccid3hcrx_li_hist
,
skb
,
ndp
,
ccid3_first_li
,
sk
))
skb
,
ndp
,
ccid3_first_li
,
sk
))
{
ccid3_hc_rx_send_feedback
(
sk
,
skb
,
CCID3_FBACK_PARAM_CHANGE
);
do_feedback
=
CCID3_FBACK_PARAM_CHANGE
;
goto
done_receiving
;
}
if
(
tfrc_rx_hist_loss_pending
(
&
hcrx
->
ccid3hcrx_hist
))
return
;
/* done receiving */
/*
/*
*
Feedback for first non-empty data packet (RFC 3448, 6.3)
*
Handle data packets: RTT sampling and monitoring p
*/
*/
else
if
(
unlikely
(
hcrx
->
feedback
==
CCID3_FBACK_NONE
&&
is_data_packet
))
if
(
unlikely
(
!
is_data_packet
))
ccid3_hc_rx_send_feedback
(
sk
,
skb
,
CCID3_FBACK_INITIAL
);
goto
update_records
;
if
(
!
tfrc_lh_is_initialised
(
&
hcrx
->
ccid3hcrx_li_hist
))
{
const
u32
sample
=
tfrc_rx_hist_sample_rtt
(
&
hcrx
->
ccid3hcrx_hist
,
skb
);
/*
* Empty loss history: no loss so far, hence p stays 0.
* Sample RTT values, since an RTT estimate is required for the
* computation of p when the first loss occurs; RFC 3448, 6.3.1.
*/
if
(
sample
!=
0
)
hcrx
->
ccid3hcrx_rtt
=
tfrc_ewma
(
hcrx
->
ccid3hcrx_rtt
,
sample
,
9
);
}
else
if
(
tfrc_lh_update_i_mean
(
&
hcrx
->
ccid3hcrx_li_hist
,
skb
))
{
/*
* Step (3) of [RFC 3448, 6.1]: Recompute I_mean and, if I_mean
* has decreased (resp. p has increased), send feedback now.
*/
do_feedback
=
CCID3_FBACK_PARAM_CHANGE
;
}
/*
/*
* Check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3
* Check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3
*/
*/
else
if
(
!
tfrc_rx_hist_loss_pending
(
&
hcrx
->
hist
)
&&
is_data_packet
&&
if
(
SUB16
(
dccp_hdr
(
skb
)
->
dccph_ccval
,
hcrx
->
ccid3hcrx_last_counter
)
>
3
)
SUB16
(
dccp_hdr
(
skb
)
->
dccph_ccval
,
hcrx
->
last_counter
)
>
3
)
do_feedback
=
CCID3_FBACK_PERIODIC
;
ccid3_hc_rx_send_feedback
(
sk
,
skb
,
CCID3_FBACK_PERIODIC
);
update_records:
tfrc_rx_hist_add_packet
(
&
hcrx
->
ccid3hcrx_hist
,
skb
,
ndp
);
done_receiving:
if
(
do_feedback
)
ccid3_hc_rx_send_feedback
(
sk
,
skb
,
do_feedback
);
}
}
static
int
ccid3_hc_rx_init
(
struct
ccid
*
ccid
,
struct
sock
*
sk
)
static
int
ccid3_hc_rx_init
(
struct
ccid
*
ccid
,
struct
sock
*
sk
)
{
{
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid_priv
(
ccid
);
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid_priv
(
ccid
);
tfrc_lh_init
(
&
hcrx
->
li_hist
);
hcrx
->
ccid3hcrx_state
=
TFRC_RSTATE_NO_DATA
;
return
tfrc_rx_hist_init
(
&
hcrx
->
hist
,
sk
);
tfrc_lh_init
(
&
hcrx
->
ccid3hcrx_li_hist
);
return
tfrc_rx_hist_alloc
(
&
hcrx
->
ccid3hcrx_hist
);
}
}
static
void
ccid3_hc_rx_exit
(
struct
sock
*
sk
)
static
void
ccid3_hc_rx_exit
(
struct
sock
*
sk
)
{
{
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
);
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
);
tfrc_rx_hist_purge
(
&
hcrx
->
hist
);
ccid3_hc_rx_set_state
(
sk
,
TFRC_RSTATE_TERM
);
tfrc_lh_cleanup
(
&
hcrx
->
li_hist
);
tfrc_rx_hist_purge
(
&
hcrx
->
ccid3hcrx_hist
);
tfrc_lh_cleanup
(
&
hcrx
->
ccid3hcrx_li_hist
);
}
}
static
void
ccid3_hc_rx_get_info
(
struct
sock
*
sk
,
struct
tcp_info
*
info
)
static
void
ccid3_hc_rx_get_info
(
struct
sock
*
sk
,
struct
tcp_info
*
info
)
{
{
const
struct
ccid3_hc_rx_sock
*
hcrx
;
/* Listen socks doesn't have a private CCID block */
if
(
sk
->
sk_state
==
DCCP_LISTEN
)
return
;
hcrx
=
ccid3_hc_rx_sk
(
sk
);
info
->
tcpi_ca_state
=
hcrx
->
ccid3hcrx_state
;
info
->
tcpi_options
|=
TCPI_OPT_TIMESTAMPS
;
info
->
tcpi_options
|=
TCPI_OPT_TIMESTAMPS
;
info
->
tcpi_rcv_rtt
=
tfrc_rx_hist_rtt
(
&
ccid3_hc_rx_sk
(
sk
)
->
hist
)
;
info
->
tcpi_rcv_rtt
=
hcrx
->
ccid3hcrx_rtt
;
}
}
static
int
ccid3_hc_rx_getsockopt
(
struct
sock
*
sk
,
const
int
optname
,
int
len
,
static
int
ccid3_hc_rx_getsockopt
(
struct
sock
*
sk
,
const
int
optname
,
int
len
,
u32
__user
*
optval
,
int
__user
*
optlen
)
u32
__user
*
optval
,
int
__user
*
optlen
)
{
{
const
struct
ccid3_hc_rx_sock
*
hcrx
=
ccid3_hc_rx_sk
(
sk
)
;
const
struct
ccid3_hc_rx_sock
*
hcrx
;
struct
tfrc_rx_info
rx_info
;
struct
tfrc_rx_info
rx_info
;
const
void
*
val
;
const
void
*
val
;
/* Listen socks doesn't have a private CCID block */
if
(
sk
->
sk_state
==
DCCP_LISTEN
)
return
-
EINVAL
;
hcrx
=
ccid3_hc_rx_sk
(
sk
);
switch
(
optname
)
{
switch
(
optname
)
{
case
DCCP_SOCKOPT_CCID_RX_INFO
:
case
DCCP_SOCKOPT_CCID_RX_INFO
:
if
(
len
<
sizeof
(
rx_info
))
if
(
len
<
sizeof
(
rx_info
))
return
-
EINVAL
;
return
-
EINVAL
;
rx_info
.
tfrcrx_x_recv
=
hcrx
->
x_recv
;
rx_info
.
tfrcrx_x_recv
=
hcrx
->
ccid3hcrx_x_recv
;
rx_info
.
tfrcrx_rtt
=
tfrc_rx_hist_rtt
(
&
hcrx
->
hist
);
rx_info
.
tfrcrx_rtt
=
hcrx
->
ccid3hcrx_rtt
;
rx_info
.
tfrcrx_p
=
tfrc_invert_loss_event_rate
(
hcrx
->
p_inverse
);
rx_info
.
tfrcrx_p
=
hcrx
->
ccid3hcrx_pinv
==
0
?
~
0U
:
scaled_div
(
1
,
hcrx
->
ccid3hcrx_pinv
);
len
=
sizeof
(
rx_info
);
len
=
sizeof
(
rx_info
);
val
=
&
rx_info
;
val
=
&
rx_info
;
break
;
break
;
...
@@ -786,9 +962,6 @@ static struct ccid_operations ccid3 = {
...
@@ -786,9 +962,6 @@ static struct ccid_operations ccid3 = {
.
ccid_hc_tx_getsockopt
=
ccid3_hc_tx_getsockopt
,
.
ccid_hc_tx_getsockopt
=
ccid3_hc_tx_getsockopt
,
};
};
module_param
(
do_osc_prev
,
bool
,
0644
);
MODULE_PARM_DESC
(
do_osc_prev
,
"Use Oscillation Prevention (RFC 3448, 4.5)"
);
#ifdef CONFIG_IP_DCCP_CCID3_DEBUG
#ifdef CONFIG_IP_DCCP_CCID3_DEBUG
module_param
(
ccid3_debug
,
bool
,
0644
);
module_param
(
ccid3_debug
,
bool
,
0644
);
MODULE_PARM_DESC
(
ccid3_debug
,
"Enable debug messages"
);
MODULE_PARM_DESC
(
ccid3_debug
,
"Enable debug messages"
);
...
@@ -796,19 +969,6 @@ MODULE_PARM_DESC(ccid3_debug, "Enable debug messages");
...
@@ -796,19 +969,6 @@ MODULE_PARM_DESC(ccid3_debug, "Enable debug messages");
static
__init
int
ccid3_module_init
(
void
)
static
__init
int
ccid3_module_init
(
void
)
{
{
struct
timespec
tp
;
/*
* Without a fine-grained clock resolution, RTTs/X_recv are not sampled
* correctly and feedback is sent either too early or too late.
*/
hrtimer_get_res
(
CLOCK_MONOTONIC
,
&
tp
);
if
(
tp
.
tv_sec
||
tp
.
tv_nsec
>
DCCP_TIME_RESOLUTION
*
NSEC_PER_USEC
)
{
printk
(
KERN_ERR
"%s: Timer too coarse (%ld usec), need %u-usec"
" resolution - check your clocksource.
\n
"
,
__func__
,
tp
.
tv_nsec
/
NSEC_PER_USEC
,
DCCP_TIME_RESOLUTION
);
return
-
ESOCKTNOSUPPORT
;
}
return
ccid_register
(
&
ccid3
);
return
ccid_register
(
&
ccid3
);
}
}
module_init
(
ccid3_module_init
);
module_init
(
ccid3_module_init
);
...
...
net/dccp/ccids/ccid3.h
View file @
ded67c0e
...
@@ -47,22 +47,11 @@
...
@@ -47,22 +47,11 @@
/* Two seconds as per RFC 3448 4.2 */
/* Two seconds as per RFC 3448 4.2 */
#define TFRC_INITIAL_TIMEOUT (2 * USEC_PER_SEC)
#define TFRC_INITIAL_TIMEOUT (2 * USEC_PER_SEC)
/*
Maximum backoff interval t_mbi (RFC 3448, 4.3)
*/
/*
In usecs - half the scheduling granularity as per RFC3448 4.6
*/
#define TFRC_
T_MBI (64 * USEC_PER_SEC
)
#define TFRC_
OPSYS_HALF_TIME_GRAN (USEC_PER_SEC / (2 * HZ)
)
/*
/* Parameter t_mbi from [RFC 3448, 4.3]: backoff interval in seconds */
* The t_delta parameter (RFC 3448, 4.6): delays of less than %USEC_PER_MSEC are
#define TFRC_T_MBI 64
* rounded down to 0, since sk_reset_timer() here uses millisecond granularity.
* Hence we can use a constant t_delta = %USEC_PER_MSEC when HZ >= 500. A coarse
* resolution of HZ < 500 means that the error is below one timer tick (t_gran)
* when using the constant t_delta = t_gran / 2 = %USEC_PER_SEC / (2 * HZ).
*/
#if (HZ >= 500)
# define TFRC_T_DELTA USEC_PER_MSEC
#else
# define TFRC_T_DELTA (USEC_PER_SEC / (2 * HZ))
#warning Coarse CONFIG_HZ resolution -- higher value recommended for TFRC.
#endif
enum
ccid3_options
{
enum
ccid3_options
{
TFRC_OPT_LOSS_EVENT_RATE
=
192
,
TFRC_OPT_LOSS_EVENT_RATE
=
192
,
...
@@ -70,43 +59,62 @@ enum ccid3_options {
...
@@ -70,43 +59,62 @@ enum ccid3_options {
TFRC_OPT_RECEIVE_RATE
=
194
,
TFRC_OPT_RECEIVE_RATE
=
194
,
};
};
struct
ccid3_options_received
{
u64
ccid3or_seqno
:
48
,
ccid3or_loss_intervals_idx:
16
;
u16
ccid3or_loss_intervals_len
;
u32
ccid3or_loss_event_rate
;
u32
ccid3or_receive_rate
;
};
/* TFRC sender states */
enum
ccid3_hc_tx_states
{
TFRC_SSTATE_NO_SENT
=
1
,
TFRC_SSTATE_NO_FBACK
,
TFRC_SSTATE_FBACK
,
TFRC_SSTATE_TERM
,
};
/** struct ccid3_hc_tx_sock - CCID3 sender half-connection socket
/** struct ccid3_hc_tx_sock - CCID3 sender half-connection socket
*
*
* @x - Current sending rate in 64 * bytes per second
* @ccid3hctx_x - Current sending rate in 64 * bytes per second
* @x_recv - Receive rate in 64 * bytes per second
* @ccid3hctx_x_recv - Receive rate in 64 * bytes per second
* @x_calc - Calculated rate in bytes per second
* @ccid3hctx_x_calc - Calculated rate in bytes per second
* @rtt - Estimate of current round trip time in usecs
* @ccid3hctx_rtt - Estimate of current round trip time in usecs
* @r_sqmean - Estimate of long-term RTT (RFC 3448, 4.5)
* @ccid3hctx_p - Current loss event rate (0-1) scaled by 1000000
* @p - Current loss event rate (0-1) scaled by 1000000
* @ccid3hctx_s - Packet size in bytes
* @s - Packet size in bytes
* @ccid3hctx_t_rto - Nofeedback Timer setting in usecs
* @t_rto - Nofeedback Timer setting in usecs
* @ccid3hctx_t_ipi - Interpacket (send) interval (RFC 3448, 4.6) in usecs
* @t_ipi - Interpacket (send) interval (RFC 3448, 4.6) in usecs
* @ccid3hctx_state - Sender state, one of %ccid3_hc_tx_states
* @feedback - Whether feedback has been received or not
* @ccid3hctx_last_win_count - Last window counter sent
* @last_win_count - Last window counter sent
* @ccid3hctx_t_last_win_count - Timestamp of earliest packet
* @t_last_win_count - Timestamp of earliest packet with
* with last_win_count value sent
* last_win_count value sent
* @ccid3hctx_no_feedback_timer - Handle to no feedback timer
* @no_feedback_timer - Handle to no feedback timer
* @ccid3hctx_t_ld - Time last doubled during slow start
* @t_ld - Time last doubled during slow start
* @ccid3hctx_t_nom - Nominal send time of next packet
* @t_nom - Nominal send time of next packet
* @ccid3hctx_delta - Send timer delta (RFC 3448, 4.6) in usecs
* @hist - Packet history
* @ccid3hctx_hist - Packet history
* @ccid3hctx_options_received - Parsed set of retrieved options
*/
*/
struct
ccid3_hc_tx_sock
{
struct
ccid3_hc_tx_sock
{
u64
x
;
struct
tfrc_tx_info
ccid3hctx_tfrc
;
u64
x_recv
;
#define ccid3hctx_x ccid3hctx_tfrc.tfrctx_x
u32
x_calc
;
#define ccid3hctx_x_recv ccid3hctx_tfrc.tfrctx_x_recv
u32
rtt
;
#define ccid3hctx_x_calc ccid3hctx_tfrc.tfrctx_x_calc
u16
r_sqmean
;
#define ccid3hctx_rtt ccid3hctx_tfrc.tfrctx_rtt
u32
p
;
#define ccid3hctx_p ccid3hctx_tfrc.tfrctx_p
u32
t_rto
;
#define ccid3hctx_t_rto ccid3hctx_tfrc.tfrctx_rto
u32
t_ipi
;
#define ccid3hctx_t_ipi ccid3hctx_tfrc.tfrctx_ipi
u16
s
;
u16
ccid3hctx_s
;
bool
feedback
:
1
;
enum
ccid3_hc_tx_states
ccid3hctx_state
:
8
;
u8
last_win_count
;
u8
ccid3hctx_last_win_count
;
ktime_t
t_last_win_count
;
ktime_t
ccid3hctx_t_last_win_count
;
struct
timer_list
no_feedback_timer
;
struct
timer_list
ccid3hctx_no_feedback_timer
;
ktime_t
t_ld
;
ktime_t
ccid3hctx_t_ld
;
ktime_t
t_nom
;
ktime_t
ccid3hctx_t_nom
;
struct
tfrc_tx_hist_entry
*
hist
;
u32
ccid3hctx_delta
;
struct
tfrc_tx_hist_entry
*
ccid3hctx_hist
;
struct
ccid3_options_received
ccid3hctx_options_received
;
};
};
static
inline
struct
ccid3_hc_tx_sock
*
ccid3_hc_tx_sk
(
const
struct
sock
*
sk
)
static
inline
struct
ccid3_hc_tx_sock
*
ccid3_hc_tx_sk
(
const
struct
sock
*
sk
)
...
@@ -116,32 +124,41 @@ static inline struct ccid3_hc_tx_sock *ccid3_hc_tx_sk(const struct sock *sk)
...
@@ -116,32 +124,41 @@ static inline struct ccid3_hc_tx_sock *ccid3_hc_tx_sk(const struct sock *sk)
return
hctx
;
return
hctx
;
}
}
/* TFRC receiver states */
enum
ccid3_fback_type
{
enum
ccid3_hc_rx_states
{
CCID3_FBACK_NONE
=
0
,
TFRC_RSTATE_NO_DATA
=
1
,
CCID3_FBACK_INITIAL
,
TFRC_RSTATE_DATA
,
CCID3_FBACK_PERIODIC
,
TFRC_RSTATE_TERM
=
127
,
CCID3_FBACK_PARAM_CHANGE
};
};
/** struct ccid3_hc_rx_sock - CCID3 receiver half-connection socket
/** struct ccid3_hc_rx_sock - CCID3 receiver half-connection socket
*
*
* @last_counter - Tracks window counter (RFC 4342, 8.1)
* @ccid3hcrx_x_recv - Receiver estimate of send rate (RFC 3448 4.3)
* @feedback - The type of the feedback last sent
* @ccid3hcrx_rtt - Receiver estimate of rtt (non-standard)
* @x_recv - Receiver estimate of send rate (RFC 3448, sec. 4.3)
* @ccid3hcrx_p - Current loss event rate (RFC 3448 5.4)
* @tstamp_last_feedback - Time at which last feedback was sent
* @ccid3hcrx_last_counter - Tracks window counter (RFC 4342, 8.1)
* @hist - Packet history (loss detection + RTT sampling)
* @ccid3hcrx_state - Receiver state, one of %ccid3_hc_rx_states
* @li_hist - Loss Interval database
* @ccid3hcrx_bytes_recv - Total sum of DCCP payload bytes
* @p_inverse - Inverse of Loss Event Rate (RFC 4342, sec. 8.5)
* @ccid3hcrx_x_recv - Receiver estimate of send rate (RFC 3448, sec. 4.3)
* @ccid3hcrx_rtt - Receiver estimate of RTT
* @ccid3hcrx_tstamp_last_feedback - Time at which last feedback was sent
* @ccid3hcrx_tstamp_last_ack - Time at which last feedback was sent
* @ccid3hcrx_hist - Packet history (loss detection + RTT sampling)
* @ccid3hcrx_li_hist - Loss Interval database
* @ccid3hcrx_s - Received packet size in bytes
* @ccid3hcrx_pinv - Inverse of Loss Event Rate (RFC 4342, sec. 8.5)
*/
*/
struct
ccid3_hc_rx_sock
{
struct
ccid3_hc_rx_sock
{
u8
last_counter
:
4
;
u8
ccid3hcrx_last_counter
:
4
;
enum
ccid3_fback_type
feedback
:
4
;
enum
ccid3_hc_rx_states
ccid3hcrx_state
:
8
;
u32
x_recv
;
u32
ccid3hcrx_bytes_recv
;
ktime_t
tstamp_last_feedback
;
u32
ccid3hcrx_x_recv
;
struct
tfrc_rx_hist
hist
;
u32
ccid3hcrx_rtt
;
struct
tfrc_loss_hist
li_hist
;
ktime_t
ccid3hcrx_tstamp_last_feedback
;
#define p_inverse li_hist.i_mean
struct
tfrc_rx_hist
ccid3hcrx_hist
;
struct
tfrc_loss_hist
ccid3hcrx_li_hist
;
u16
ccid3hcrx_s
;
#define ccid3hcrx_pinv ccid3hcrx_li_hist.i_mean
};
};
static
inline
struct
ccid3_hc_rx_sock
*
ccid3_hc_rx_sk
(
const
struct
sock
*
sk
)
static
inline
struct
ccid3_hc_rx_sock
*
ccid3_hc_rx_sk
(
const
struct
sock
*
sk
)
...
...
net/dccp/ccids/lib/loss_interval.c
View file @
ded67c0e
...
@@ -86,26 +86,21 @@ static void tfrc_lh_calc_i_mean(struct tfrc_loss_hist *lh)
...
@@ -86,26 +86,21 @@ static void tfrc_lh_calc_i_mean(struct tfrc_loss_hist *lh)
/**
/**
* tfrc_lh_update_i_mean - Update the `open' loss interval I_0
* tfrc_lh_update_i_mean - Update the `open' loss interval I_0
* This updates I_mean as the sequence numbers increase. As a consequence, the
* For recomputing p: returns `true' if p > p_prev <=> 1/p < 1/p_prev
* open loss interval I_0 increases, hence p = W_tot/max(I_tot0, I_tot1)
* decreases, and thus there is no need to send renewed feedback.
*/
*/
void
tfrc_lh_update_i_mean
(
struct
tfrc_loss_hist
*
lh
,
struct
sk_buff
*
skb
)
u8
tfrc_lh_update_i_mean
(
struct
tfrc_loss_hist
*
lh
,
struct
sk_buff
*
skb
)
{
{
struct
tfrc_loss_interval
*
cur
=
tfrc_lh_peek
(
lh
);
struct
tfrc_loss_interval
*
cur
=
tfrc_lh_peek
(
lh
);
u32
old_i_mean
=
lh
->
i_mean
;
s64
len
;
s64
len
;
if
(
cur
==
NULL
)
/* not initialised */
if
(
cur
==
NULL
)
/* not initialised */
return
;
return
0
;
/* FIXME: should probably also count non-data packets (RFC 4342, 6.1) */
if
(
!
dccp_data_packet
(
skb
))
return
;
len
=
dccp_delta_seqno
(
cur
->
li_seqno
,
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
)
+
1
;
len
=
dccp_delta_seqno
(
cur
->
li_seqno
,
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
)
+
1
;
if
(
len
-
(
s64
)
cur
->
li_length
<=
0
)
/* duplicate or reordered */
if
(
len
-
(
s64
)
cur
->
li_length
<=
0
)
/* duplicate or reordered */
return
;
return
0
;
if
(
SUB16
(
dccp_hdr
(
skb
)
->
dccph_ccval
,
cur
->
li_ccval
)
>
4
)
if
(
SUB16
(
dccp_hdr
(
skb
)
->
dccph_ccval
,
cur
->
li_ccval
)
>
4
)
/*
/*
...
@@ -119,11 +114,14 @@ void tfrc_lh_update_i_mean(struct tfrc_loss_hist *lh, struct sk_buff *skb)
...
@@ -119,11 +114,14 @@ void tfrc_lh_update_i_mean(struct tfrc_loss_hist *lh, struct sk_buff *skb)
cur
->
li_is_closed
=
1
;
cur
->
li_is_closed
=
1
;
if
(
tfrc_lh_length
(
lh
)
==
1
)
/* due to RFC 3448, 6.3.1 */
if
(
tfrc_lh_length
(
lh
)
==
1
)
/* due to RFC 3448, 6.3.1 */
return
;
return
0
;
cur
->
li_length
=
len
;
cur
->
li_length
=
len
;
tfrc_lh_calc_i_mean
(
lh
);
tfrc_lh_calc_i_mean
(
lh
);
return
(
lh
->
i_mean
<
old_i_mean
);
}
}
EXPORT_SYMBOL_GPL
(
tfrc_lh_update_i_mean
);
/* Determine if `new_loss' does begin a new loss interval [RFC 4342, 10.2] */
/* Determine if `new_loss' does begin a new loss interval [RFC 4342, 10.2] */
static
inline
u8
tfrc_lh_is_new_loss
(
struct
tfrc_loss_interval
*
cur
,
static
inline
u8
tfrc_lh_is_new_loss
(
struct
tfrc_loss_interval
*
cur
,
...
@@ -140,18 +138,18 @@ static inline u8 tfrc_lh_is_new_loss(struct tfrc_loss_interval *cur,
...
@@ -140,18 +138,18 @@ static inline u8 tfrc_lh_is_new_loss(struct tfrc_loss_interval *cur,
* @sk: Used by @calc_first_li in caller-specific way (subtyping)
* @sk: Used by @calc_first_li in caller-specific way (subtyping)
* Updates I_mean and returns 1 if a new interval has in fact been added to @lh.
* Updates I_mean and returns 1 if a new interval has in fact been added to @lh.
*/
*/
bool
tfrc_lh_interval_add
(
struct
tfrc_loss_hist
*
lh
,
struct
tfrc_rx_hist
*
rh
,
int
tfrc_lh_interval_add
(
struct
tfrc_loss_hist
*
lh
,
struct
tfrc_rx_hist
*
rh
,
u32
(
*
calc_first_li
)(
struct
sock
*
),
struct
sock
*
sk
)
u32
(
*
calc_first_li
)(
struct
sock
*
),
struct
sock
*
sk
)
{
{
struct
tfrc_loss_interval
*
cur
=
tfrc_lh_peek
(
lh
),
*
new
;
struct
tfrc_loss_interval
*
cur
=
tfrc_lh_peek
(
lh
),
*
new
;
if
(
cur
!=
NULL
&&
!
tfrc_lh_is_new_loss
(
cur
,
tfrc_rx_hist_loss_prev
(
rh
)))
if
(
cur
!=
NULL
&&
!
tfrc_lh_is_new_loss
(
cur
,
tfrc_rx_hist_loss_prev
(
rh
)))
return
false
;
return
0
;
new
=
tfrc_lh_demand_next
(
lh
);
new
=
tfrc_lh_demand_next
(
lh
);
if
(
unlikely
(
new
==
NULL
))
{
if
(
unlikely
(
new
==
NULL
))
{
DCCP_CRIT
(
"Cannot allocate/add loss record."
);
DCCP_CRIT
(
"Cannot allocate/add loss record."
);
return
false
;
return
0
;
}
}
new
->
li_seqno
=
tfrc_rx_hist_loss_prev
(
rh
)
->
tfrchrx_seqno
;
new
->
li_seqno
=
tfrc_rx_hist_loss_prev
(
rh
)
->
tfrchrx_seqno
;
...
@@ -169,7 +167,7 @@ bool tfrc_lh_interval_add(struct tfrc_loss_hist *lh, struct tfrc_rx_hist *rh,
...
@@ -169,7 +167,7 @@ bool tfrc_lh_interval_add(struct tfrc_loss_hist *lh, struct tfrc_rx_hist *rh,
tfrc_lh_calc_i_mean
(
lh
);
tfrc_lh_calc_i_mean
(
lh
);
}
}
return
true
;
return
1
;
}
}
EXPORT_SYMBOL_GPL
(
tfrc_lh_interval_add
);
EXPORT_SYMBOL_GPL
(
tfrc_lh_interval_add
);
...
...
net/dccp/ccids/lib/loss_interval.h
View file @
ded67c0e
...
@@ -67,9 +67,9 @@ static inline u8 tfrc_lh_length(struct tfrc_loss_hist *lh)
...
@@ -67,9 +67,9 @@ static inline u8 tfrc_lh_length(struct tfrc_loss_hist *lh)
struct
tfrc_rx_hist
;
struct
tfrc_rx_hist
;
extern
bool
tfrc_lh_interval_add
(
struct
tfrc_loss_hist
*
,
struct
tfrc_rx_hist
*
,
extern
int
tfrc_lh_interval_add
(
struct
tfrc_loss_hist
*
,
struct
tfrc_rx_hist
*
,
u32
(
*
first_li
)(
struct
sock
*
),
struct
sock
*
);
u32
(
*
first_li
)(
struct
sock
*
),
struct
sock
*
);
extern
void
tfrc_lh_update_i_mean
(
struct
tfrc_loss_hist
*
lh
,
struct
sk_buff
*
);
extern
u8
tfrc_lh_update_i_mean
(
struct
tfrc_loss_hist
*
lh
,
struct
sk_buff
*
);
extern
void
tfrc_lh_cleanup
(
struct
tfrc_loss_hist
*
lh
);
extern
void
tfrc_lh_cleanup
(
struct
tfrc_loss_hist
*
lh
);
#endif
/* _DCCP_LI_HIST_ */
#endif
/* _DCCP_LI_HIST_ */
net/dccp/ccids/lib/packet_history.c
View file @
ded67c0e
...
@@ -40,6 +40,18 @@
...
@@ -40,6 +40,18 @@
#include "packet_history.h"
#include "packet_history.h"
#include "../../dccp.h"
#include "../../dccp.h"
/**
* tfrc_tx_hist_entry - Simple singly-linked TX history list
* @next: next oldest entry (LIFO order)
* @seqno: sequence number of this entry
* @stamp: send time of packet with sequence number @seqno
*/
struct
tfrc_tx_hist_entry
{
struct
tfrc_tx_hist_entry
*
next
;
u64
seqno
;
ktime_t
stamp
;
};
/*
/*
* Transmitter History Routines
* Transmitter History Routines
*/
*/
...
@@ -61,6 +73,15 @@ void tfrc_tx_packet_history_exit(void)
...
@@ -61,6 +73,15 @@ void tfrc_tx_packet_history_exit(void)
}
}
}
}
static
struct
tfrc_tx_hist_entry
*
tfrc_tx_hist_find_entry
(
struct
tfrc_tx_hist_entry
*
head
,
u64
seqno
)
{
while
(
head
!=
NULL
&&
head
->
seqno
!=
seqno
)
head
=
head
->
next
;
return
head
;
}
int
tfrc_tx_hist_add
(
struct
tfrc_tx_hist_entry
**
headp
,
u64
seqno
)
int
tfrc_tx_hist_add
(
struct
tfrc_tx_hist_entry
**
headp
,
u64
seqno
)
{
{
struct
tfrc_tx_hist_entry
*
entry
=
kmem_cache_alloc
(
tfrc_tx_hist_slab
,
gfp_any
());
struct
tfrc_tx_hist_entry
*
entry
=
kmem_cache_alloc
(
tfrc_tx_hist_slab
,
gfp_any
());
...
@@ -90,6 +111,25 @@ void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp)
...
@@ -90,6 +111,25 @@ void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp)
}
}
EXPORT_SYMBOL_GPL
(
tfrc_tx_hist_purge
);
EXPORT_SYMBOL_GPL
(
tfrc_tx_hist_purge
);
u32
tfrc_tx_hist_rtt
(
struct
tfrc_tx_hist_entry
*
head
,
const
u64
seqno
,
const
ktime_t
now
)
{
u32
rtt
=
0
;
struct
tfrc_tx_hist_entry
*
packet
=
tfrc_tx_hist_find_entry
(
head
,
seqno
);
if
(
packet
!=
NULL
)
{
rtt
=
ktime_us_delta
(
now
,
packet
->
stamp
);
/*
* Garbage-collect older (irrelevant) entries:
*/
tfrc_tx_hist_purge
(
&
packet
->
next
);
}
return
rtt
;
}
EXPORT_SYMBOL_GPL
(
tfrc_tx_hist_rtt
);
/*
/*
* Receiver History Routines
* Receiver History Routines
*/
*/
...
@@ -151,31 +191,14 @@ int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb)
...
@@ -151,31 +191,14 @@ int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb)
}
}
EXPORT_SYMBOL_GPL
(
tfrc_rx_hist_duplicate
);
EXPORT_SYMBOL_GPL
(
tfrc_rx_hist_duplicate
);
static
void
__tfrc_rx_hist_swap
(
struct
tfrc_rx_hist
*
h
,
const
u8
a
,
const
u8
b
)
{
struct
tfrc_rx_hist_entry
*
tmp
=
h
->
ring
[
a
];
h
->
ring
[
a
]
=
h
->
ring
[
b
];
h
->
ring
[
b
]
=
tmp
;
}
static
void
tfrc_rx_hist_swap
(
struct
tfrc_rx_hist
*
h
,
const
u8
a
,
const
u8
b
)
static
void
tfrc_rx_hist_swap
(
struct
tfrc_rx_hist
*
h
,
const
u8
a
,
const
u8
b
)
{
{
__tfrc_rx_hist_swap
(
h
,
tfrc_rx_hist_index
(
h
,
a
),
const
u8
idx_a
=
tfrc_rx_hist_index
(
h
,
a
),
tfrc_rx_hist_index
(
h
,
b
)
);
idx_b
=
tfrc_rx_hist_index
(
h
,
b
);
}
struct
tfrc_rx_hist_entry
*
tmp
=
h
->
ring
[
idx_a
];
/**
h
->
ring
[
idx_a
]
=
h
->
ring
[
idx_b
];
* tfrc_rx_hist_resume_rtt_sampling - Prepare RX history for RTT sampling
h
->
ring
[
idx_b
]
=
tmp
;
* This is called after loss detection has finished, when the history entry
* with the index of `loss_count' holds the highest-received sequence number.
* RTT sampling requires this information at ring[0] (tfrc_rx_hist_sample_rtt).
*/
static
inline
void
tfrc_rx_hist_resume_rtt_sampling
(
struct
tfrc_rx_hist
*
h
)
{
__tfrc_rx_hist_swap
(
h
,
0
,
tfrc_rx_hist_index
(
h
,
h
->
loss_count
));
h
->
loss_count
=
h
->
loss_start
=
0
;
}
}
/*
/*
...
@@ -192,8 +215,10 @@ static void __do_track_loss(struct tfrc_rx_hist *h, struct sk_buff *skb, u64 n1)
...
@@ -192,8 +215,10 @@ static void __do_track_loss(struct tfrc_rx_hist *h, struct sk_buff *skb, u64 n1)
u64
s0
=
tfrc_rx_hist_loss_prev
(
h
)
->
tfrchrx_seqno
,
u64
s0
=
tfrc_rx_hist_loss_prev
(
h
)
->
tfrchrx_seqno
,
s1
=
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
;
s1
=
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
;
if
(
!
dccp_loss_free
(
s0
,
s1
,
n1
))
/* gap between S0 and S1 */
if
(
!
dccp_loss_free
(
s0
,
s1
,
n1
))
{
/* gap between S0 and S1 */
h
->
loss_count
=
1
;
h
->
loss_count
=
1
;
tfrc_rx_hist_entry_from_skb
(
tfrc_rx_hist_entry
(
h
,
1
),
skb
,
n1
);
}
}
}
static
void
__one_after_loss
(
struct
tfrc_rx_hist
*
h
,
struct
sk_buff
*
skb
,
u32
n2
)
static
void
__one_after_loss
(
struct
tfrc_rx_hist
*
h
,
struct
sk_buff
*
skb
,
u32
n2
)
...
@@ -215,7 +240,8 @@ static void __one_after_loss(struct tfrc_rx_hist *h, struct sk_buff *skb, u32 n2
...
@@ -215,7 +240,8 @@ static void __one_after_loss(struct tfrc_rx_hist *h, struct sk_buff *skb, u32 n2
if
(
dccp_loss_free
(
s2
,
s1
,
n1
))
{
if
(
dccp_loss_free
(
s2
,
s1
,
n1
))
{
/* hole is filled: S0, S2, and S1 are consecutive */
/* hole is filled: S0, S2, and S1 are consecutive */
tfrc_rx_hist_resume_rtt_sampling
(
h
);
h
->
loss_count
=
0
;
h
->
loss_start
=
tfrc_rx_hist_index
(
h
,
1
);
}
else
}
else
/* gap between S2 and S1: just update loss_prev */
/* gap between S2 and S1: just update loss_prev */
tfrc_rx_hist_entry_from_skb
(
tfrc_rx_hist_loss_prev
(
h
),
skb
,
n2
);
tfrc_rx_hist_entry_from_skb
(
tfrc_rx_hist_loss_prev
(
h
),
skb
,
n2
);
...
@@ -268,7 +294,8 @@ static int __two_after_loss(struct tfrc_rx_hist *h, struct sk_buff *skb, u32 n3)
...
@@ -268,7 +294,8 @@ static int __two_after_loss(struct tfrc_rx_hist *h, struct sk_buff *skb, u32 n3)
if
(
dccp_loss_free
(
s1
,
s2
,
n2
))
{
if
(
dccp_loss_free
(
s1
,
s2
,
n2
))
{
/* entire hole filled by S0, S3, S1, S2 */
/* entire hole filled by S0, S3, S1, S2 */
tfrc_rx_hist_resume_rtt_sampling
(
h
);
h
->
loss_start
=
tfrc_rx_hist_index
(
h
,
2
);
h
->
loss_count
=
0
;
}
else
{
}
else
{
/* gap remains between S1 and S2 */
/* gap remains between S1 and S2 */
h
->
loss_start
=
tfrc_rx_hist_index
(
h
,
1
);
h
->
loss_start
=
tfrc_rx_hist_index
(
h
,
1
);
...
@@ -312,7 +339,8 @@ static void __three_after_loss(struct tfrc_rx_hist *h)
...
@@ -312,7 +339,8 @@ static void __three_after_loss(struct tfrc_rx_hist *h)
if
(
dccp_loss_free
(
s2
,
s3
,
n3
))
{
if
(
dccp_loss_free
(
s2
,
s3
,
n3
))
{
/* no gap between S2 and S3: entire hole is filled */
/* no gap between S2 and S3: entire hole is filled */
tfrc_rx_hist_resume_rtt_sampling
(
h
);
h
->
loss_start
=
tfrc_rx_hist_index
(
h
,
3
);
h
->
loss_count
=
0
;
}
else
{
}
else
{
/* gap between S2 and S3 */
/* gap between S2 and S3 */
h
->
loss_start
=
tfrc_rx_hist_index
(
h
,
2
);
h
->
loss_start
=
tfrc_rx_hist_index
(
h
,
2
);
...
@@ -326,13 +354,13 @@ static void __three_after_loss(struct tfrc_rx_hist *h)
...
@@ -326,13 +354,13 @@ static void __three_after_loss(struct tfrc_rx_hist *h)
}
}
/**
/**
* tfrc_rx_
congestion_event
- Loss detection and further processing
* tfrc_rx_
handle_loss
- Loss detection and further processing
* @h: The non-empty RX history object
* @h:
The non-empty RX history object
* @lh: Loss Intervals database to update
* @lh:
Loss Intervals database to update
* @skb: Currently received packet
* @skb:
Currently received packet
* @ndp: The NDP count belonging to @skb
* @ndp:
The NDP count belonging to @skb
* @
first_li:
Caller-dependent computation of first loss interval in @lh
* @
calc_first_li:
Caller-dependent computation of first loss interval in @lh
* @sk: Used by @calc_first_li (see tfrc_lh_interval_add)
* @sk:
Used by @calc_first_li (see tfrc_lh_interval_add)
* Chooses action according to pending loss, updates LI database when a new
* Chooses action according to pending loss, updates LI database when a new
* loss was detected, and does required post-processing. Returns 1 when caller
* loss was detected, and does required post-processing. Returns 1 when caller
* should send feedback, 0 otherwise.
* should send feedback, 0 otherwise.
...
@@ -340,20 +368,15 @@ static void __three_after_loss(struct tfrc_rx_hist *h)
...
@@ -340,20 +368,15 @@ static void __three_after_loss(struct tfrc_rx_hist *h)
* records accordingly, the caller should not perform any more RX history
* records accordingly, the caller should not perform any more RX history
* operations when loss_count is greater than 0 after calling this function.
* operations when loss_count is greater than 0 after calling this function.
*/
*/
bool
tfrc_rx_congestion_event
(
struct
tfrc_rx_hist
*
h
,
int
tfrc_rx_handle_loss
(
struct
tfrc_rx_hist
*
h
,
struct
tfrc_loss_hist
*
lh
,
struct
tfrc_loss_hist
*
lh
,
struct
sk_buff
*
skb
,
const
u64
ndp
,
struct
sk_buff
*
skb
,
const
u64
ndp
,
u32
(
*
first_li
)(
struct
sock
*
),
struct
sock
*
sk
)
u32
(
*
calc_
first_li
)(
struct
sock
*
),
struct
sock
*
sk
)
{
{
bool
new_event
=
false
;
int
is_new_loss
=
0
;
if
(
tfrc_rx_hist_duplicate
(
h
,
skb
))
return
0
;
if
(
h
->
loss_count
==
0
)
{
if
(
h
->
loss_count
==
0
)
{
__do_track_loss
(
h
,
skb
,
ndp
);
__do_track_loss
(
h
,
skb
,
ndp
);
tfrc_rx_hist_sample_rtt
(
h
,
skb
);
tfrc_rx_hist_add_packet
(
h
,
skb
,
ndp
);
}
else
if
(
h
->
loss_count
==
1
)
{
}
else
if
(
h
->
loss_count
==
1
)
{
__one_after_loss
(
h
,
skb
,
ndp
);
__one_after_loss
(
h
,
skb
,
ndp
);
}
else
if
(
h
->
loss_count
!=
2
)
{
}
else
if
(
h
->
loss_count
!=
2
)
{
...
@@ -362,57 +385,34 @@ bool tfrc_rx_congestion_event(struct tfrc_rx_hist *h,
...
@@ -362,57 +385,34 @@ bool tfrc_rx_congestion_event(struct tfrc_rx_hist *h,
/*
/*
* Update Loss Interval database and recycle RX records
* Update Loss Interval database and recycle RX records
*/
*/
new_event
=
tfrc_lh_interval_add
(
lh
,
h
,
first_li
,
sk
);
is_new_loss
=
tfrc_lh_interval_add
(
lh
,
h
,
calc_
first_li
,
sk
);
__three_after_loss
(
h
);
__three_after_loss
(
h
);
}
}
return
is_new_loss
;
/*
* Update moving-average of `s' and the sum of received payload bytes.
*/
if
(
dccp_data_packet
(
skb
))
{
const
u32
payload
=
skb
->
len
-
dccp_hdr
(
skb
)
->
dccph_doff
*
4
;
h
->
packet_size
=
tfrc_ewma
(
h
->
packet_size
,
payload
,
9
);
h
->
bytes_recvd
+=
payload
;
}
/* RFC 3448, 6.1: update I_0, whose growth implies p <= p_prev */
if
(
!
new_event
)
tfrc_lh_update_i_mean
(
lh
,
skb
);
return
new_event
;
}
}
EXPORT_SYMBOL_GPL
(
tfrc_rx_
congestion_event
);
EXPORT_SYMBOL_GPL
(
tfrc_rx_
handle_loss
);
/* Compute the sending rate X_recv measured between feedback intervals */
int
tfrc_rx_hist_alloc
(
struct
tfrc_rx_hist
*
h
)
u32
tfrc_rx_hist_x_recv
(
struct
tfrc_rx_hist
*
h
,
const
u32
last_x_recv
)
{
{
u64
bytes
=
h
->
bytes_recvd
,
last_rtt
=
h
->
rtt_estimate
;
int
i
;
s64
delta
=
ktime_to_us
(
net_timedelta
(
h
->
bytes_start
));
WARN_ON
(
delta
<=
0
);
/*
* Ensure that the sampling interval for X_recv is at least one RTT,
* by extending the sampling interval backwards in time, over the last
* R_(m-1) seconds, as per rfc3448bis-06, 6.2.
* To reduce noise (e.g. when the RTT changes often), this is only
* done when delta is smaller than RTT/2.
*/
if
(
last_x_recv
>
0
&&
delta
<
last_rtt
/
2
)
{
tfrc_pr_debug
(
"delta < RTT ==> %ld us < %u us
\n
"
,
(
long
)
delta
,
(
unsigned
)
last_rtt
);
delta
=
(
bytes
?
delta
:
0
)
+
last_rtt
;
for
(
i
=
0
;
i
<=
TFRC_NDUPACK
;
i
++
)
{
bytes
+=
div_u64
((
u64
)
last_x_recv
*
last_rtt
,
USEC_PER_SEC
);
h
->
ring
[
i
]
=
kmem_cache_alloc
(
tfrc_rx_hist_slab
,
GFP_ATOMIC
);
if
(
h
->
ring
[
i
]
==
NULL
)
goto
out_free
;
}
}
if
(
unlikely
(
bytes
==
0
))
{
h
->
loss_count
=
h
->
loss_start
=
0
;
DCCP_WARN
(
"X_recv == 0, using old value of %u
\n
"
,
last_x_recv
);
return
0
;
return
last_x_recv
;
out_free:
while
(
i
--
!=
0
)
{
kmem_cache_free
(
tfrc_rx_hist_slab
,
h
->
ring
[
i
]);
h
->
ring
[
i
]
=
NULL
;
}
}
return
scaled_div32
(
bytes
,
delta
)
;
return
-
ENOBUFS
;
}
}
EXPORT_SYMBOL_GPL
(
tfrc_rx_hist_
x_recv
);
EXPORT_SYMBOL_GPL
(
tfrc_rx_hist_
alloc
);
void
tfrc_rx_hist_purge
(
struct
tfrc_rx_hist
*
h
)
void
tfrc_rx_hist_purge
(
struct
tfrc_rx_hist
*
h
)
{
{
...
@@ -426,81 +426,73 @@ void tfrc_rx_hist_purge(struct tfrc_rx_hist *h)
...
@@ -426,81 +426,73 @@ void tfrc_rx_hist_purge(struct tfrc_rx_hist *h)
}
}
EXPORT_SYMBOL_GPL
(
tfrc_rx_hist_purge
);
EXPORT_SYMBOL_GPL
(
tfrc_rx_hist_purge
);
static
int
tfrc_rx_hist_alloc
(
struct
tfrc_rx_hist
*
h
)
/**
* tfrc_rx_hist_rtt_last_s - reference entry to compute RTT samples against
*/
static
inline
struct
tfrc_rx_hist_entry
*
tfrc_rx_hist_rtt_last_s
(
const
struct
tfrc_rx_hist
*
h
)
{
{
int
i
;
return
h
->
ring
[
0
];
memset
(
h
,
0
,
sizeof
(
*
h
));
for
(
i
=
0
;
i
<=
TFRC_NDUPACK
;
i
++
)
{
h
->
ring
[
i
]
=
kmem_cache_alloc
(
tfrc_rx_hist_slab
,
GFP_ATOMIC
);
if
(
h
->
ring
[
i
]
==
NULL
)
{
tfrc_rx_hist_purge
(
h
);
return
-
ENOBUFS
;
}
}
return
0
;
}
}
int
tfrc_rx_hist_init
(
struct
tfrc_rx_hist
*
h
,
struct
sock
*
sk
)
/**
* tfrc_rx_hist_rtt_prev_s: previously suitable (wrt rtt_last_s) RTT-sampling entry
*/
static
inline
struct
tfrc_rx_hist_entry
*
tfrc_rx_hist_rtt_prev_s
(
const
struct
tfrc_rx_hist
*
h
)
{
{
if
(
tfrc_rx_hist_alloc
(
h
))
return
h
->
ring
[
h
->
rtt_sample_prev
];
return
-
ENOBUFS
;
/*
* Initialise first entry with GSR to start loss detection as early as
* possible. Code using this must not use any other fields. The entry
* will be overwritten once the CCID updates its received packets.
*/
tfrc_rx_hist_loss_prev
(
h
)
->
tfrchrx_seqno
=
dccp_sk
(
sk
)
->
dccps_gsr
;
return
0
;
}
}
EXPORT_SYMBOL_GPL
(
tfrc_rx_hist_init
);
/**
/**
* tfrc_rx_hist_sample_rtt - Sample RTT from timestamp / CCVal
* tfrc_rx_hist_sample_rtt - Sample RTT from timestamp / CCVal
* Based on ideas presented in RFC 4342, 8.1. This function expects that no loss
* Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able
* is pending and uses the following history entries (via rtt_sample_prev):
* to compute a sample with given data - calling function should check this.
* - h->ring[0] contains the most recent history entry prior to @skb;
* - h->ring[1] is an unused `dummy' entry when the current difference is 0;
*/
*/
void
tfrc_rx_hist_sample_rtt
(
struct
tfrc_rx_hist
*
h
,
const
struct
sk_buff
*
skb
)
u32
tfrc_rx_hist_sample_rtt
(
struct
tfrc_rx_hist
*
h
,
const
struct
sk_buff
*
skb
)
{
{
struct
tfrc_rx_hist_entry
*
last
=
h
->
ring
[
0
];
u32
sample
=
0
,
u32
sample
,
delta_v
;
delta_v
=
SUB16
(
dccp_hdr
(
skb
)
->
dccph_ccval
,
tfrc_rx_hist_rtt_last_s
(
h
)
->
tfrchrx_ccval
);
/*
* When not to sample:
if
(
delta_v
<
1
||
delta_v
>
4
)
{
/* unsuitable CCVal delta */
* - on non-data packets
if
(
h
->
rtt_sample_prev
==
2
)
{
/* previous candidate stored */
* (RFC 4342, 8.1: CCVal only fully defined for data packets);
sample
=
SUB16
(
tfrc_rx_hist_rtt_prev_s
(
h
)
->
tfrchrx_ccval
,
* - when no data packets have been received yet
tfrc_rx_hist_rtt_last_s
(
h
)
->
tfrchrx_ccval
);
* (FIXME: using sampled packet size as indicator here);
if
(
sample
)
* - as long as there are gaps in the sequence space (pending loss).
sample
=
4
/
sample
*
*/
ktime_us_delta
(
tfrc_rx_hist_rtt_prev_s
(
h
)
->
tfrchrx_tstamp
,
if
(
!
dccp_data_packet
(
skb
)
||
h
->
packet_size
==
0
||
tfrc_rx_hist_rtt_last_s
(
h
)
->
tfrchrx_tstamp
);
tfrc_rx_hist_loss_pending
(
h
))
else
/*
return
;
* FIXME: This condition is in principle not
* possible but occurs when CCID is used for
* two-way data traffic. I have tried to trace
* it, but the cause does not seem to be here.
*/
DCCP_BUG
(
"please report to dccp@vger.kernel.org"
" => prev = %u, last = %u"
,
tfrc_rx_hist_rtt_prev_s
(
h
)
->
tfrchrx_ccval
,
tfrc_rx_hist_rtt_last_s
(
h
)
->
tfrchrx_ccval
);
}
else
if
(
delta_v
<
1
)
{
h
->
rtt_sample_prev
=
1
;
goto
keep_ref_for_next_time
;
}
h
->
rtt_sample_prev
=
0
;
/* reset previous candidate */
}
else
if
(
delta_v
==
4
)
/* optimal match */
sample
=
ktime_to_us
(
net_timedelta
(
tfrc_rx_hist_rtt_last_s
(
h
)
->
tfrchrx_tstamp
));
else
{
/* suboptimal match */
h
->
rtt_sample_prev
=
2
;
goto
keep_ref_for_next_time
;
}
delta_v
=
SUB16
(
dccp_hdr
(
skb
)
->
dccph_ccval
,
last
->
tfrchrx_ccval
);
if
(
unlikely
(
sample
>
DCCP_SANE_RTT_MAX
))
{
if
(
delta_v
==
0
)
{
/* less than RTT/4 difference */
DCCP_WARN
(
"RTT sample %u too large, using max
\n
"
,
sample
);
h
->
rtt_sample_prev
=
1
;
sample
=
DCCP_SANE_RTT_MAX
;
return
;
}
}
sample
=
dccp_sane_rtt
(
ktime_to_us
(
net_timedelta
(
last
->
tfrchrx_tstamp
)));
if
(
delta_v
<=
4
)
/* between RTT/4 and RTT */
h
->
rtt_sample_prev
=
0
;
/* use current entry as next reference */
sample
*=
4
/
delta_v
;
keep_ref_for_next_time:
else
if
(
!
(
sample
<
h
->
rtt_estimate
&&
sample
>
h
->
rtt_estimate
/
2
))
/*
* Optimisation: CCVal difference is greater than 1 RTT, yet the
* sample is less than the local RTT estimate; which means that
* the RTT estimate is too high.
* To avoid noise, it is not done if the sample is below RTT/2.
*/
return
;
/* Use a lower weight than usual to increase responsiveness */
return
sample
;
h
->
rtt_estimate
=
tfrc_ewma
(
h
->
rtt_estimate
,
sample
,
5
);
}
}
EXPORT_SYMBOL_GPL
(
tfrc_rx_hist_sample_rtt
);
EXPORT_SYMBOL_GPL
(
tfrc_rx_hist_sample_rtt
);
net/dccp/ccids/lib/packet_history.h
View file @
ded67c0e
...
@@ -40,28 +40,12 @@
...
@@ -40,28 +40,12 @@
#include <linux/slab.h>
#include <linux/slab.h>
#include "tfrc.h"
#include "tfrc.h"
/**
struct
tfrc_tx_hist_entry
;
* tfrc_tx_hist_entry - Simple singly-linked TX history list
* @next: next oldest entry (LIFO order)
* @seqno: sequence number of this entry
* @stamp: send time of packet with sequence number @seqno
*/
struct
tfrc_tx_hist_entry
{
struct
tfrc_tx_hist_entry
*
next
;
u64
seqno
;
ktime_t
stamp
;
};
static
inline
struct
tfrc_tx_hist_entry
*
tfrc_tx_hist_find_entry
(
struct
tfrc_tx_hist_entry
*
head
,
u64
seqno
)
{
while
(
head
!=
NULL
&&
head
->
seqno
!=
seqno
)
head
=
head
->
next
;
return
head
;
}
extern
int
tfrc_tx_hist_add
(
struct
tfrc_tx_hist_entry
**
headp
,
u64
seqno
);
extern
int
tfrc_tx_hist_add
(
struct
tfrc_tx_hist_entry
**
headp
,
u64
seqno
);
extern
void
tfrc_tx_hist_purge
(
struct
tfrc_tx_hist_entry
**
headp
);
extern
void
tfrc_tx_hist_purge
(
struct
tfrc_tx_hist_entry
**
headp
);
extern
u32
tfrc_tx_hist_rtt
(
struct
tfrc_tx_hist_entry
*
head
,
const
u64
seqno
,
const
ktime_t
now
);
/* Subtraction a-b modulo-16, respects circular wrap-around */
/* Subtraction a-b modulo-16, respects circular wrap-around */
#define SUB16(a, b) (((a) + 16 - (b)) & 0xF)
#define SUB16(a, b) (((a) + 16 - (b)) & 0xF)
...
@@ -91,22 +75,12 @@ struct tfrc_rx_hist_entry {
...
@@ -91,22 +75,12 @@ struct tfrc_rx_hist_entry {
* @loss_count: Number of entries in circular history
* @loss_count: Number of entries in circular history
* @loss_start: Movable index (for loss detection)
* @loss_start: Movable index (for loss detection)
* @rtt_sample_prev: Used during RTT sampling, points to candidate entry
* @rtt_sample_prev: Used during RTT sampling, points to candidate entry
* @rtt_estimate: Receiver RTT estimate
* @packet_size: Packet size in bytes (as per RFC 3448, 3.1)
* @bytes_recvd: Number of bytes received since @bytes_start
* @bytes_start: Start time for counting @bytes_recvd
*/
*/
struct
tfrc_rx_hist
{
struct
tfrc_rx_hist
{
struct
tfrc_rx_hist_entry
*
ring
[
TFRC_NDUPACK
+
1
];
struct
tfrc_rx_hist_entry
*
ring
[
TFRC_NDUPACK
+
1
];
u8
loss_count
:
2
,
u8
loss_count
:
2
,
loss_start:
2
;
loss_start:
2
;
/* Receiver RTT sampling */
#define rtt_sample_prev loss_start
#define rtt_sample_prev loss_start
u32
rtt_estimate
;
/* Receiver sampling of application payload lengths */
u32
packet_size
,
bytes_recvd
;
ktime_t
bytes_start
;
};
};
/**
/**
...
@@ -150,50 +124,20 @@ static inline bool tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h)
...
@@ -150,50 +124,20 @@ static inline bool tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h)
return
h
->
loss_count
>
0
;
return
h
->
loss_count
>
0
;
}
}
/*
* Accessor functions to retrieve parameters sampled by the RX history
*/
static
inline
u32
tfrc_rx_hist_packet_size
(
const
struct
tfrc_rx_hist
*
h
)
{
if
(
h
->
packet_size
==
0
)
{
DCCP_WARN
(
"No sample for s, using fallback
\n
"
);
return
TCP_MIN_RCVMSS
;
}
return
h
->
packet_size
;
}
static
inline
u32
tfrc_rx_hist_rtt
(
const
struct
tfrc_rx_hist
*
h
)
{
if
(
h
->
rtt_estimate
==
0
)
{
DCCP_WARN
(
"No RTT estimate available, using fallback RTT
\n
"
);
return
DCCP_FALLBACK_RTT
;
}
return
h
->
rtt_estimate
;
}
static
inline
void
tfrc_rx_hist_restart_byte_counter
(
struct
tfrc_rx_hist
*
h
)
{
h
->
bytes_recvd
=
0
;
h
->
bytes_start
=
ktime_get_real
();
}
extern
u32
tfrc_rx_hist_x_recv
(
struct
tfrc_rx_hist
*
h
,
const
u32
last_x_recv
);
extern
void
tfrc_rx_hist_add_packet
(
struct
tfrc_rx_hist
*
h
,
extern
void
tfrc_rx_hist_add_packet
(
struct
tfrc_rx_hist
*
h
,
const
struct
sk_buff
*
skb
,
const
u64
ndp
);
const
struct
sk_buff
*
skb
,
const
u64
ndp
);
extern
int
tfrc_rx_hist_duplicate
(
struct
tfrc_rx_hist
*
h
,
struct
sk_buff
*
skb
);
extern
int
tfrc_rx_hist_duplicate
(
struct
tfrc_rx_hist
*
h
,
struct
sk_buff
*
skb
);
struct
tfrc_loss_hist
;
struct
tfrc_loss_hist
;
extern
bool
tfrc_rx_congestion_event
(
struct
tfrc_rx_hist
*
h
,
extern
int
tfrc_rx_handle_loss
(
struct
tfrc_rx_hist
*
h
,
struct
tfrc_loss_hist
*
lh
,
struct
tfrc_loss_hist
*
lh
,
struct
sk_buff
*
skb
,
const
u64
ndp
,
struct
sk_buff
*
skb
,
const
u64
ndp
,
u32
(
*
first_li
)(
struct
sock
*
sk
),
u32
(
*
first_li
)(
struct
sock
*
sk
),
struct
sock
*
sk
);
struct
sock
*
sk
);
extern
void
tfrc_rx_hist_sample_rtt
(
struct
tfrc_rx_hist
*
h
,
extern
u32
tfrc_rx_hist_sample_rtt
(
struct
tfrc_rx_hist
*
h
,
const
struct
sk_buff
*
skb
);
const
struct
sk_buff
*
skb
);
extern
int
tfrc_rx_hist_init
(
struct
tfrc_rx_hist
*
h
,
struct
sock
*
sk
);
extern
int
tfrc_rx_hist_alloc
(
struct
tfrc_rx_hist
*
h
);
extern
void
tfrc_rx_hist_purge
(
struct
tfrc_rx_hist
*
h
);
extern
void
tfrc_rx_hist_purge
(
struct
tfrc_rx_hist
*
h
);
#endif
/* _DCCP_PKT_HIST_ */
#endif
/* _DCCP_PKT_HIST_ */
net/dccp/ccids/lib/tfrc.h
View file @
ded67c0e
...
@@ -47,21 +47,6 @@ static inline u32 scaled_div32(u64 a, u64 b)
...
@@ -47,21 +47,6 @@ static inline u32 scaled_div32(u64 a, u64 b)
return
result
;
return
result
;
}
}
/**
* tfrc_scaled_sqrt - Compute scaled integer sqrt(x) for 0 < x < 2^22-1
* Uses scaling to improve accuracy of the integer approximation of sqrt(). The
* scaling factor of 2^10 limits the maximum @sample to 4e6; this is okay for
* clamped RTT samples (dccp_sample_rtt).
* Should best be used for expressions of type sqrt(x)/sqrt(y), since then the
* scaling factor is neutralised. For this purpose, it avoids returning zero.
*/
static
inline
u16
tfrc_scaled_sqrt
(
const
u32
sample
)
{
const
unsigned
long
non_zero_sample
=
sample
?
:
1
;
return
int_sqrt
(
non_zero_sample
<<
10
);
}
/**
/**
* tfrc_ewma - Exponentially weighted moving average
* tfrc_ewma - Exponentially weighted moving average
* @weight: Weight to be used as damping factor, in units of 1/10
* @weight: Weight to be used as damping factor, in units of 1/10
...
@@ -73,7 +58,6 @@ static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight)
...
@@ -73,7 +58,6 @@ static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight)
extern
u32
tfrc_calc_x
(
u16
s
,
u32
R
,
u32
p
);
extern
u32
tfrc_calc_x
(
u16
s
,
u32
R
,
u32
p
);
extern
u32
tfrc_calc_x_reverse_lookup
(
u32
fvalue
);
extern
u32
tfrc_calc_x_reverse_lookup
(
u32
fvalue
);
extern
u32
tfrc_invert_loss_event_rate
(
u32
loss_event_rate
);
extern
int
tfrc_tx_packet_history_init
(
void
);
extern
int
tfrc_tx_packet_history_init
(
void
);
extern
void
tfrc_tx_packet_history_exit
(
void
);
extern
void
tfrc_tx_packet_history_exit
(
void
);
...
...
net/dccp/ccids/lib/tfrc_equation.c
View file @
ded67c0e
...
@@ -632,16 +632,8 @@ u32 tfrc_calc_x(u16 s, u32 R, u32 p)
...
@@ -632,16 +632,8 @@ u32 tfrc_calc_x(u16 s, u32 R, u32 p)
if
(
p
<=
TFRC_CALC_X_SPLIT
)
{
/* 0.0000 < p <= 0.05 */
if
(
p
<=
TFRC_CALC_X_SPLIT
)
{
/* 0.0000 < p <= 0.05 */
if
(
p
<
TFRC_SMALLEST_P
)
{
/* 0.0000 < p < 0.0001 */
if
(
p
<
TFRC_SMALLEST_P
)
{
/* 0.0000 < p < 0.0001 */
/*
DCCP_WARN
(
"Value of p (%d) below resolution. "
* In the congestion-avoidance phase p decays towards 0
"Substituting %d
\n
"
,
p
,
TFRC_SMALLEST_P
);
* when there are no further losses, so this case is
* natural. Truncating to p_min = 0.01% means that the
* maximum achievable throughput is limited to about
* X_calc_max = 122.4 * s/RTT (see RFC 3448, 3.1); e.g.
* with s=1500 bytes, RTT=0.01 s: X_calc_max = 147 Mbps.
*/
tfrc_pr_debug
(
"Value of p (%d) below resolution. "
"Substituting %d
\n
"
,
p
,
TFRC_SMALLEST_P
);
index
=
0
;
index
=
0
;
}
else
/* 0.0001 <= p <= 0.05 */
}
else
/* 0.0001 <= p <= 0.05 */
index
=
p
/
TFRC_SMALLEST_P
-
1
;
index
=
p
/
TFRC_SMALLEST_P
-
1
;
...
@@ -666,6 +658,7 @@ u32 tfrc_calc_x(u16 s, u32 R, u32 p)
...
@@ -666,6 +658,7 @@ u32 tfrc_calc_x(u16 s, u32 R, u32 p)
result
=
scaled_div
(
s
,
R
);
result
=
scaled_div
(
s
,
R
);
return
scaled_div32
(
result
,
f
);
return
scaled_div32
(
result
,
f
);
}
}
EXPORT_SYMBOL_GPL
(
tfrc_calc_x
);
EXPORT_SYMBOL_GPL
(
tfrc_calc_x
);
/**
/**
...
@@ -700,19 +693,5 @@ u32 tfrc_calc_x_reverse_lookup(u32 fvalue)
...
@@ -700,19 +693,5 @@ u32 tfrc_calc_x_reverse_lookup(u32 fvalue)
index
=
tfrc_binsearch
(
fvalue
,
0
);
index
=
tfrc_binsearch
(
fvalue
,
0
);
return
(
index
+
1
)
*
1000000
/
TFRC_CALC_X_ARRSIZE
;
return
(
index
+
1
)
*
1000000
/
TFRC_CALC_X_ARRSIZE
;
}
}
EXPORT_SYMBOL_GPL
(
tfrc_calc_x_reverse_lookup
);
/**
EXPORT_SYMBOL_GPL
(
tfrc_calc_x_reverse_lookup
);
* tfrc_invert_loss_event_rate - Compute p so that 10^6 corresponds to 100%
* When @loss_event_rate is large, there is a chance that p is truncated to 0.
* To avoid re-entering slow-start in that case, we set p = TFRC_SMALLEST_P > 0.
*/
u32
tfrc_invert_loss_event_rate
(
u32
loss_event_rate
)
{
if
(
loss_event_rate
==
UINT_MAX
)
/* see RFC 4342, 8.5 */
return
0
;
if
(
unlikely
(
loss_event_rate
==
0
))
/* map 1/0 into 100% */
return
1000000
;
return
max_t
(
u32
,
scaled_div
(
1
,
loss_event_rate
),
TFRC_SMALLEST_P
);
}
EXPORT_SYMBOL_GPL
(
tfrc_invert_loss_event_rate
);
net/dccp/dccp.h
View file @
ded67c0e
...
@@ -42,11 +42,9 @@
...
@@ -42,11 +42,9 @@
extern
int
dccp_debug
;
extern
int
dccp_debug
;
#define dccp_pr_debug(format, a...) DCCP_PR_DEBUG(dccp_debug, format, ##a)
#define dccp_pr_debug(format, a...) DCCP_PR_DEBUG(dccp_debug, format, ##a)
#define dccp_pr_debug_cat(format, a...) DCCP_PRINTK(dccp_debug, format, ##a)
#define dccp_pr_debug_cat(format, a...) DCCP_PRINTK(dccp_debug, format, ##a)
#define dccp_debug(fmt, a...) dccp_pr_debug_cat(KERN_DEBUG fmt, ##a)
#else
#else
#define dccp_pr_debug(format, a...)
#define dccp_pr_debug(format, a...)
#define dccp_pr_debug_cat(format, a...)
#define dccp_pr_debug_cat(format, a...)
#define dccp_debug(format, a...)
#endif
#endif
extern
struct
inet_hashinfo
dccp_hashinfo
;
extern
struct
inet_hashinfo
dccp_hashinfo
;
...
@@ -63,14 +61,11 @@ extern void dccp_time_wait(struct sock *sk, int state, int timeo);
...
@@ -63,14 +61,11 @@ extern void dccp_time_wait(struct sock *sk, int state, int timeo);
* - DCCP-Reset with ACK Subheader and 4 bytes of Reset Code fields
* - DCCP-Reset with ACK Subheader and 4 bytes of Reset Code fields
* Hence a safe upper bound for the maximum option length is 1020-28 = 992
* Hence a safe upper bound for the maximum option length is 1020-28 = 992
*/
*/
#define MAX_DCCP_SPECIFIC_HEADER (255 * sizeof(
uint32_
t))
#define MAX_DCCP_SPECIFIC_HEADER (255 * sizeof(
in
t))
#define DCCP_MAX_PACKET_HDR 28
#define DCCP_MAX_PACKET_HDR 28
#define DCCP_MAX_OPT_LEN (MAX_DCCP_SPECIFIC_HEADER - DCCP_MAX_PACKET_HDR)
#define DCCP_MAX_OPT_LEN (MAX_DCCP_SPECIFIC_HEADER - DCCP_MAX_PACKET_HDR)
#define MAX_DCCP_HEADER (MAX_DCCP_SPECIFIC_HEADER + MAX_HEADER)
#define MAX_DCCP_HEADER (MAX_DCCP_SPECIFIC_HEADER + MAX_HEADER)
/* Upper bound for initial feature-negotiation overhead (padded to 32 bits) */
#define DCCP_FEATNEG_OVERHEAD (32 * sizeof(uint32_t))
#define DCCP_TIMEWAIT_LEN (60 * HZ)
/* how long to wait to destroy TIME-WAIT
#define DCCP_TIMEWAIT_LEN (60 * HZ)
/* how long to wait to destroy TIME-WAIT
* state, about 60 seconds */
* state, about 60 seconds */
...
@@ -86,13 +81,10 @@ extern void dccp_time_wait(struct sock *sk, int state, int timeo);
...
@@ -86,13 +81,10 @@ extern void dccp_time_wait(struct sock *sk, int state, int timeo);
*/
*/
#define DCCP_RTO_MAX ((unsigned)(64 * HZ))
#define DCCP_RTO_MAX ((unsigned)(64 * HZ))
/* DCCP base time resolution - 10 microseconds (RFC 4340, 13.1 ... 13.3) */
#define DCCP_TIME_RESOLUTION 10
/*
/*
* RTT sampling: sanity bounds and fallback RTT value from RFC 4340, section 3.4
* RTT sampling: sanity bounds and fallback RTT value from RFC 4340, section 3.4
*/
*/
#define DCCP_SANE_RTT_MIN
(10 * DCCP_TIME_RESOLUTION)
#define DCCP_SANE_RTT_MIN
100
#define DCCP_FALLBACK_RTT (USEC_PER_SEC / 5)
#define DCCP_FALLBACK_RTT (USEC_PER_SEC / 5)
#define DCCP_SANE_RTT_MAX (3 * USEC_PER_SEC)
#define DCCP_SANE_RTT_MAX (3 * USEC_PER_SEC)
...
@@ -103,6 +95,12 @@ extern void dccp_time_wait(struct sock *sk, int state, int timeo);
...
@@ -103,6 +95,12 @@ extern void dccp_time_wait(struct sock *sk, int state, int timeo);
extern
int
sysctl_dccp_request_retries
;
extern
int
sysctl_dccp_request_retries
;
extern
int
sysctl_dccp_retries1
;
extern
int
sysctl_dccp_retries1
;
extern
int
sysctl_dccp_retries2
;
extern
int
sysctl_dccp_retries2
;
extern
int
sysctl_dccp_feat_sequence_window
;
extern
int
sysctl_dccp_feat_rx_ccid
;
extern
int
sysctl_dccp_feat_tx_ccid
;
extern
int
sysctl_dccp_feat_ack_ratio
;
extern
int
sysctl_dccp_feat_send_ack_vector
;
extern
int
sysctl_dccp_feat_send_ndp_count
;
extern
int
sysctl_dccp_tx_qlen
;
extern
int
sysctl_dccp_tx_qlen
;
extern
int
sysctl_dccp_sync_ratelimit
;
extern
int
sysctl_dccp_sync_ratelimit
;
...
@@ -237,22 +235,8 @@ extern void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
...
@@ -237,22 +235,8 @@ extern void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
extern
void
dccp_send_sync
(
struct
sock
*
sk
,
const
u64
seq
,
extern
void
dccp_send_sync
(
struct
sock
*
sk
,
const
u64
seq
,
const
enum
dccp_pkt_type
pkt_type
);
const
enum
dccp_pkt_type
pkt_type
);
/*
extern
void
dccp_write_xmit
(
struct
sock
*
sk
,
int
block
);
* TX Packet Dequeueing Interface
*/
extern
void
dccp_qpolicy_push
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
extern
bool
dccp_qpolicy_full
(
struct
sock
*
sk
);
extern
void
dccp_qpolicy_drop
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
extern
struct
sk_buff
*
dccp_qpolicy_top
(
struct
sock
*
sk
);
extern
struct
sk_buff
*
dccp_qpolicy_pop
(
struct
sock
*
sk
);
extern
bool
dccp_qpolicy_param_ok
(
struct
sock
*
sk
,
__be32
param
);
/*
* TX Packet Output and TX Timers
*/
extern
void
dccp_write_xmit
(
struct
sock
*
sk
);
extern
void
dccp_write_space
(
struct
sock
*
sk
);
extern
void
dccp_write_space
(
struct
sock
*
sk
);
extern
void
dccp_flush_write_queue
(
struct
sock
*
sk
,
long
*
time_budget
);
extern
void
dccp_init_xmit_timers
(
struct
sock
*
sk
);
extern
void
dccp_init_xmit_timers
(
struct
sock
*
sk
);
static
inline
void
dccp_clear_xmit_timers
(
struct
sock
*
sk
)
static
inline
void
dccp_clear_xmit_timers
(
struct
sock
*
sk
)
...
@@ -268,8 +252,7 @@ extern const char *dccp_state_name(const int state);
...
@@ -268,8 +252,7 @@ extern const char *dccp_state_name(const int state);
extern
void
dccp_set_state
(
struct
sock
*
sk
,
const
int
state
);
extern
void
dccp_set_state
(
struct
sock
*
sk
,
const
int
state
);
extern
void
dccp_done
(
struct
sock
*
sk
);
extern
void
dccp_done
(
struct
sock
*
sk
);
extern
int
dccp_reqsk_init
(
struct
request_sock
*
rq
,
struct
dccp_sock
const
*
dp
,
extern
void
dccp_reqsk_init
(
struct
request_sock
*
req
,
struct
sk_buff
*
skb
);
struct
sk_buff
const
*
skb
);
extern
int
dccp_v4_conn_request
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
extern
int
dccp_v4_conn_request
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
...
@@ -334,14 +317,7 @@ extern struct sk_buff *dccp_ctl_make_reset(struct sock *sk,
...
@@ -334,14 +317,7 @@ extern struct sk_buff *dccp_ctl_make_reset(struct sock *sk,
extern
int
dccp_send_reset
(
struct
sock
*
sk
,
enum
dccp_reset_codes
code
);
extern
int
dccp_send_reset
(
struct
sock
*
sk
,
enum
dccp_reset_codes
code
);
extern
void
dccp_send_close
(
struct
sock
*
sk
,
const
int
active
);
extern
void
dccp_send_close
(
struct
sock
*
sk
,
const
int
active
);
extern
int
dccp_invalid_packet
(
struct
sk_buff
*
skb
);
extern
int
dccp_invalid_packet
(
struct
sk_buff
*
skb
);
extern
u32
dccp_sample_rtt
(
struct
sock
*
sk
,
long
delta
);
static
inline
u32
dccp_sane_rtt
(
long
usec_sample
)
{
if
(
unlikely
(
usec_sample
<=
0
||
usec_sample
>
DCCP_SANE_RTT_MAX
))
DCCP_WARN
(
"RTT sample %ld out of bounds!
\n
"
,
usec_sample
);
return
clamp_val
(
usec_sample
,
DCCP_SANE_RTT_MIN
,
DCCP_SANE_RTT_MAX
);
}
extern
u32
dccp_sample_rtt
(
struct
sock
*
sk
,
long
delta
);
static
inline
int
dccp_bad_service_code
(
const
struct
sock
*
sk
,
static
inline
int
dccp_bad_service_code
(
const
struct
sock
*
sk
,
const
__be32
service
)
const
__be32
service
)
...
@@ -435,62 +411,36 @@ static inline void dccp_hdr_set_ack(struct dccp_hdr_ack_bits *dhack,
...
@@ -435,62 +411,36 @@ static inline void dccp_hdr_set_ack(struct dccp_hdr_ack_bits *dhack,
static
inline
void
dccp_update_gsr
(
struct
sock
*
sk
,
u64
seq
)
static
inline
void
dccp_update_gsr
(
struct
sock
*
sk
,
u64
seq
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
const
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
dp
->
dccps_gsr
=
seq
;
dp
->
dccps_gsr
=
seq
;
/* Sequence validity window depends on remote Sequence Window (7.5.1) */
dccp_set_seqno
(
&
dp
->
dccps_swl
,
dp
->
dccps_swl
=
SUB48
(
ADD48
(
dp
->
dccps_gsr
,
1
),
dp
->
dccps_r_seq_win
/
4
);
dp
->
dccps_gsr
+
1
-
(
dmsk
->
dccpms_sequence_window
/
4
));
/*
dccp_set_seqno
(
&
dp
->
dccps_swh
,
* Adjust SWL so that it is not below ISR. In contrast to RFC 4340,
dp
->
dccps_gsr
+
(
3
*
dmsk
->
dccpms_sequence_window
)
/
4
);
* 7.5.1 we perform this check beyond the initial handshake: W/W' are
* always > 32, so for the first W/W' packets in the lifetime of a
* connection we always have to adjust SWL.
* A second reason why we are doing this is that the window depends on
* the feature-remote value of Sequence Window: nothing stops the peer
* from updating this value while we are busy adjusting SWL for the
* first W packets (we would have to count from scratch again then).
* Therefore it is safer to always make sure that the Sequence Window
* is not artificially extended by a peer who grows SWL downwards by
* continually updating the feature-remote Sequence-Window.
* If sequence numbers wrap it is bad luck. But that will take a while
* (48 bit), and this measure prevents Sequence-number attacks.
*/
if
(
before48
(
dp
->
dccps_swl
,
dp
->
dccps_isr
))
dp
->
dccps_swl
=
dp
->
dccps_isr
;
dp
->
dccps_swh
=
ADD48
(
dp
->
dccps_gsr
,
(
3
*
dp
->
dccps_r_seq_win
)
/
4
);
}
}
static
inline
void
dccp_update_gss
(
struct
sock
*
sk
,
u64
seq
)
static
inline
void
dccp_update_gss
(
struct
sock
*
sk
,
u64
seq
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
dp
->
dccps_gss
=
seq
;
dp
->
dccps_awh
=
dp
->
dccps_gss
=
seq
;
/* Ack validity window depends on local Sequence Window value (7.5.1) */
dccp_set_seqno
(
&
dp
->
dccps_awl
,
dp
->
dccps_awl
=
SUB48
(
ADD48
(
dp
->
dccps_gss
,
1
),
dp
->
dccps_l_seq_win
);
(
dp
->
dccps_gss
-
/* Adjust AWL so that it is not below ISS - see comment above for SWL */
dccp_msk
(
sk
)
->
dccpms_sequence_window
+
1
));
if
(
before48
(
dp
->
dccps_awl
,
dp
->
dccps_iss
))
dp
->
dccps_awl
=
dp
->
dccps_iss
;
dp
->
dccps_awh
=
dp
->
dccps_gss
;
}
static
inline
int
dccp_ackvec_pending
(
const
struct
sock
*
sk
)
{
return
dccp_sk
(
sk
)
->
dccps_hc_rx_ackvec
!=
NULL
&&
!
dccp_ackvec_is_empty
(
dccp_sk
(
sk
)
->
dccps_hc_rx_ackvec
);
}
}
static
inline
int
dccp_ack_pending
(
const
struct
sock
*
sk
)
static
inline
int
dccp_ack_pending
(
const
struct
sock
*
sk
)
{
{
return
dccp_ackvec_pending
(
sk
)
||
inet_csk_ack_scheduled
(
sk
);
const
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
return
dp
->
dccps_timestamp_echo
!=
0
||
#ifdef CONFIG_IP_DCCP_ACKVEC
(
dccp_msk
(
sk
)
->
dccpms_send_ack_vector
&&
dccp_ackvec_pending
(
dp
->
dccps_hc_rx_ackvec
))
||
#endif
inet_csk_ack_scheduled
(
sk
);
}
}
extern
int
dccp_feat_signal_nn_change
(
struct
sock
*
sk
,
u8
feat
,
u64
nn_val
);
extern
int
dccp_feat_finalise_settings
(
struct
dccp_sock
*
dp
);
extern
int
dccp_feat_server_ccid_dependencies
(
struct
dccp_request_sock
*
dreq
);
extern
int
dccp_feat_insert_opts
(
struct
dccp_sock
*
,
struct
dccp_request_sock
*
,
struct
sk_buff
*
skb
);
extern
int
dccp_feat_activate_values
(
struct
sock
*
sk
,
struct
list_head
*
fn
);
extern
void
dccp_feat_list_purge
(
struct
list_head
*
fn_list
);
extern
int
dccp_insert_options
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
extern
int
dccp_insert_options
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
extern
int
dccp_insert_options_rsk
(
struct
dccp_request_sock
*
,
struct
sk_buff
*
);
extern
int
dccp_insert_options_rsk
(
struct
dccp_request_sock
*
,
struct
sk_buff
*
);
extern
int
dccp_insert_option_elapsed_time
(
struct
sock
*
sk
,
extern
int
dccp_insert_option_elapsed_time
(
struct
sock
*
sk
,
...
...
net/dccp/diag.c
View file @
ded67c0e
...
@@ -29,7 +29,7 @@ static void dccp_get_info(struct sock *sk, struct tcp_info *info)
...
@@ -29,7 +29,7 @@ static void dccp_get_info(struct sock *sk, struct tcp_info *info)
info
->
tcpi_backoff
=
icsk
->
icsk_backoff
;
info
->
tcpi_backoff
=
icsk
->
icsk_backoff
;
info
->
tcpi_pmtu
=
icsk
->
icsk_pmtu_cookie
;
info
->
tcpi_pmtu
=
icsk
->
icsk_pmtu_cookie
;
if
(
d
p
->
dccps_hc_rx_ackvec
!=
NULL
)
if
(
d
ccp_msk
(
sk
)
->
dccpms_send_ack_vector
)
info
->
tcpi_options
|=
TCPI_OPT_SACK
;
info
->
tcpi_options
|=
TCPI_OPT_SACK
;
ccid_hc_rx_get_info
(
dp
->
dccps_hc_rx_ccid
,
sk
,
info
);
ccid_hc_rx_get_info
(
dp
->
dccps_hc_rx_ccid
,
sk
,
info
);
...
...
net/dccp/feat.c
View file @
ded67c0e
/*
/*
* net/dccp/feat.c
* net/dccp/feat.c
*
*
* Feature negotiation for the DCCP protocol (RFC 4340, section 6)
* An implementation of the DCCP protocol
*
* Andrea Bittau <a.bittau@cs.ucl.ac.uk>
* Copyright (c) 2008 The University of Aberdeen, Scotland, UK
* Copyright (c) 2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>
* Rewrote from scratch, some bits from earlier code by
* Copyright (c) 2005 Andrea Bittau <a.bittau@cs.ucl.ac.uk>
*
*
*
* ASSUMPTIONS
* ASSUMPTIONS
* -----------
* -----------
* o Feature negotiation is coordinated with connection setup (as in TCP), wild
* changes of parameters of an established connection are not supported.
* o Changing NN values (Ack Ratio only) is supported in state OPEN/PARTOPEN.
* o All currently known SP features have 1-byte quantities. If in the future
* o All currently known SP features have 1-byte quantities. If in the future
* extensions of RFCs 4340..42 define features with item lengths larger than
* extensions of RFCs 4340..42 define features with item lengths larger than
* one byte, a feature-specific extension of the code will be required.
* one byte, a feature-specific extension of the code will be required.
...
@@ -23,1510 +15,635 @@
...
@@ -23,1510 +15,635 @@
* as published by the Free Software Foundation; either version
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
* 2 of the License, or (at your option) any later version.
*/
*/
#include <linux/module.h>
#include <linux/module.h>
#include "ccid.h"
#include "ccid.h"
#include "feat.h"
#include "feat.h"
/* feature-specific sysctls - initialised to the defaults from RFC 4340, 6.4 */
#define DCCP_FEAT_SP_NOAGREE (-123)
unsigned
long
sysctl_dccp_sequence_window
__read_mostly
=
100
;
int
sysctl_dccp_rx_ccid
__read_mostly
=
2
,
sysctl_dccp_tx_ccid
__read_mostly
=
2
;
/*
int
dccp_feat_change
(
struct
dccp_minisock
*
dmsk
,
u8
type
,
u8
feature
,
* Feature activation handlers.
u8
*
val
,
u8
len
,
gfp_t
gfp
)
*
* These all use an u64 argument, to provide enough room for NN/SP features. At
* this stage the negotiated values have been checked to be within their range.
*/
static
int
dccp_hdlr_ccid
(
struct
sock
*
sk
,
u64
ccid
,
bool
rx
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_opt_pend
*
opt
;
struct
ccid
*
new_ccid
=
ccid_new
(
ccid
,
sk
,
rx
,
gfp_any
());
if
(
new_ccid
==
NULL
)
dccp_feat_debug
(
type
,
feature
,
*
val
);
return
-
ENOMEM
;
if
(
rx
)
{
if
(
len
>
3
)
{
ccid_hc_rx_delete
(
dp
->
dccps_hc_rx_ccid
,
sk
);
DCCP_WARN
(
"invalid length %d
\n
"
,
len
);
dp
->
dccps_hc_rx_ccid
=
new_ccid
;
return
-
EINVAL
;
}
else
{
}
ccid_hc_tx_delete
(
dp
->
dccps_hc_tx_ccid
,
sk
);
/* XXX add further sanity checks */
dp
->
dccps_hc_tx_ccid
=
new_ccid
;
/* check if that feature is already being negotiated */
list_for_each_entry
(
opt
,
&
dmsk
->
dccpms_pending
,
dccpop_node
)
{
/* ok we found a negotiation for this option already */
if
(
opt
->
dccpop_feat
==
feature
&&
opt
->
dccpop_type
==
type
)
{
dccp_pr_debug
(
"Replacing old
\n
"
);
/* replace */
BUG_ON
(
opt
->
dccpop_val
==
NULL
);
kfree
(
opt
->
dccpop_val
);
opt
->
dccpop_val
=
val
;
opt
->
dccpop_len
=
len
;
opt
->
dccpop_conf
=
0
;
return
0
;
}
}
}
return
0
;
}
static
int
dccp_hdlr_seq_win
(
struct
sock
*
sk
,
u64
seq_win
,
bool
rx
)
/* negotiation for a new feature */
{
opt
=
kmalloc
(
sizeof
(
*
opt
),
gfp
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
if
(
opt
==
NULL
)
return
-
ENOMEM
;
if
(
rx
)
{
opt
->
dccpop_type
=
type
;
dp
->
dccps_r_seq_win
=
seq_win
;
opt
->
dccpop_feat
=
feature
;
/* propagate changes to update SWL/SWH */
opt
->
dccpop_len
=
len
;
dccp_update_gsr
(
sk
,
dp
->
dccps_gsr
);
opt
->
dccpop_val
=
val
;
}
else
{
opt
->
dccpop_conf
=
0
;
dp
->
dccps_l_seq_win
=
seq_win
;
opt
->
dccpop_sc
=
NULL
;
/* propagate changes to update AWL */
dccp_update_gss
(
sk
,
dp
->
dccps_gss
);
}
return
0
;
}
static
int
dccp_hdlr_ack_ratio
(
struct
sock
*
sk
,
u64
ratio
,
bool
rx
)
BUG_ON
(
opt
->
dccpop_val
==
NULL
);
{
#ifndef __CCID2_COPES_GRACEFULLY_WITH_DYNAMIC_ACK_RATIO_UPDATES__
list_add_tail
(
&
opt
->
dccpop_node
,
&
dmsk
->
dccpms_pending
);
/*
* FIXME: This is required until several problems in the CCID-2 code are
* resolved. The CCID-2 code currently does not cope well; using dynamic
* Ack Ratios greater than 1 caused instabilities. These were manifest
* in hangups and long RTO timeouts (1...3 seconds). Until this has been
* stabilised, it is safer not to activate dynamic Ack Ratio changes.
*/
dccp_pr_debug
(
"Not changing %s Ack Ratio from 1 to %u
\n
"
,
rx
?
"RX"
:
"TX"
,
(
u16
)
ratio
);
ratio
=
1
;
#endif
if
(
rx
)
dccp_sk
(
sk
)
->
dccps_r_ack_ratio
=
ratio
;
else
dccp_sk
(
sk
)
->
dccps_l_ack_ratio
=
ratio
;
return
0
;
return
0
;
}
}
static
int
dccp_hdlr_ackvec
(
struct
sock
*
sk
,
u64
enable
,
bool
rx
)
EXPORT_SYMBOL_GPL
(
dccp_feat_change
);
static
int
dccp_feat_update_ccid
(
struct
sock
*
sk
,
u8
type
,
u8
new_ccid_nr
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
/* figure out if we are changing our CCID or the peer's */
const
int
rx
=
type
==
DCCPO_CHANGE_R
;
const
u8
ccid_nr
=
rx
?
dmsk
->
dccpms_rx_ccid
:
dmsk
->
dccpms_tx_ccid
;
struct
ccid
*
new_ccid
;
/* Check if nothing is being changed. */
if
(
ccid_nr
==
new_ccid_nr
)
return
0
;
new_ccid
=
ccid_new
(
new_ccid_nr
,
sk
,
rx
,
GFP_ATOMIC
);
if
(
new_ccid
==
NULL
)
return
-
ENOMEM
;
if
(
rx
)
{
if
(
rx
)
{
if
(
enable
&&
dp
->
dccps_hc_rx_ackvec
==
NULL
)
{
ccid_hc_rx_delete
(
dp
->
dccps_hc_rx_ccid
,
sk
);
dp
->
dccps_hc_rx_ackvec
=
dccp_ackvec_alloc
(
gfp_any
());
dp
->
dccps_hc_rx_ccid
=
new_ccid
;
if
(
dp
->
dccps_hc_rx_ackvec
==
NULL
)
dmsk
->
dccpms_rx_ccid
=
new_ccid_nr
;
return
-
ENOMEM
;
}
else
{
}
else
if
(
!
enable
)
{
ccid_hc_tx_delete
(
dp
->
dccps_hc_tx_ccid
,
sk
);
dccp_ackvec_free
(
dp
->
dccps_hc_rx_ackvec
);
dp
->
dccps_hc_tx_ccid
=
new_ccid
;
dp
->
dccps_hc_rx_ackvec
=
NULL
;
dmsk
->
dccpms_tx_ccid
=
new_ccid_nr
;
}
}
}
return
0
;
}
static
int
dccp_hdlr_ndp
(
struct
sock
*
sk
,
u64
enable
,
bool
rx
)
{
if
(
!
rx
)
dccp_sk
(
sk
)
->
dccps_send_ndp_count
=
(
enable
>
0
);
return
0
;
return
0
;
}
}
/*
static
int
dccp_feat_update
(
struct
sock
*
sk
,
u8
type
,
u8
feat
,
u8
val
)
* Minimum Checksum Coverage is located at the RX side (9.2.1). This means that
* `rx' holds when the sending peer informs about his partial coverage via a
* ChangeR() option. In the other case, we are the sender and the receiver
* announces its coverage via ChangeL() options. The policy here is to honour
* such communication by enabling the corresponding partial coverage - but only
* if it has not been set manually before; the warning here means that all
* packets will be dropped.
*/
static
int
dccp_hdlr_min_cscov
(
struct
sock
*
sk
,
u64
cscov
,
bool
rx
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
dccp_feat_debug
(
type
,
feat
,
val
);
if
(
rx
)
switch
(
feat
)
{
dp
->
dccps_pcrlen
=
cscov
;
case
DCCPF_CCID
:
else
{
return
dccp_feat_update_ccid
(
sk
,
type
,
val
);
if
(
dp
->
dccps_pcslen
==
0
)
default:
dp
->
dccps_pcslen
=
cscov
;
dccp_pr_debug
(
"UNIMPLEMENTED: %s(%d, ...)
\n
"
,
else
if
(
cscov
>
dp
->
dccps_pcslen
)
dccp_feat_typename
(
type
),
feat
);
DCCP_WARN
(
"CsCov %u too small, peer requires >= %u
\n
"
,
break
;
dp
->
dccps_pcslen
,
(
u8
)
cscov
);
}
}
return
0
;
return
0
;
}
}
static
const
struct
{
static
int
dccp_feat_reconcile
(
struct
sock
*
sk
,
struct
dccp_opt_pend
*
opt
,
u8
feat_num
;
/* DCCPF_xxx */
u8
*
rpref
,
u8
rlen
)
enum
dccp_feat_type
rxtx
;
/* RX or TX */
enum
dccp_feat_type
reconciliation
;
/* SP or NN */
u8
default_value
;
/* as in 6.4 */
int
(
*
activation_hdlr
)(
struct
sock
*
sk
,
u64
val
,
bool
rx
);
/*
* Lookup table for location and type of features (from RFC 4340/4342)
* +--------------------------+----+-----+----+----+---------+-----------+
* | Feature | Location | Reconc. | Initial | Section |
* | | RX | TX | SP | NN | Value | Reference |
* +--------------------------+----+-----+----+----+---------+-----------+
* | DCCPF_CCID | | X | X | | 2 | 10 |
* | DCCPF_SHORT_SEQNOS | | X | X | | 0 | 7.6.1 |
* | DCCPF_SEQUENCE_WINDOW | | X | | X | 100 | 7.5.2 |
* | DCCPF_ECN_INCAPABLE | X | | X | | 0 | 12.1 |
* | DCCPF_ACK_RATIO | | X | | X | 2 | 11.3 |
* | DCCPF_SEND_ACK_VECTOR | X | | X | | 0 | 11.5 |
* | DCCPF_SEND_NDP_COUNT | | X | X | | 0 | 7.7.2 |
* | DCCPF_MIN_CSUM_COVER | X | | X | | 0 | 9.2.1 |
* | DCCPF_DATA_CHECKSUM | X | | X | | 0 | 9.3.1 |
* | DCCPF_SEND_LEV_RATE | X | | X | | 0 | 4342/8.4 |
* +--------------------------+----+-----+----+----+---------+-----------+
*/
}
dccp_feat_table
[]
=
{
{
DCCPF_CCID
,
FEAT_AT_TX
,
FEAT_SP
,
2
,
dccp_hdlr_ccid
},
{
DCCPF_SHORT_SEQNOS
,
FEAT_AT_TX
,
FEAT_SP
,
0
,
NULL
},
{
DCCPF_SEQUENCE_WINDOW
,
FEAT_AT_TX
,
FEAT_NN
,
100
,
dccp_hdlr_seq_win
},
{
DCCPF_ECN_INCAPABLE
,
FEAT_AT_RX
,
FEAT_SP
,
0
,
NULL
},
{
DCCPF_ACK_RATIO
,
FEAT_AT_TX
,
FEAT_NN
,
2
,
dccp_hdlr_ack_ratio
},
{
DCCPF_SEND_ACK_VECTOR
,
FEAT_AT_RX
,
FEAT_SP
,
0
,
dccp_hdlr_ackvec
},
{
DCCPF_SEND_NDP_COUNT
,
FEAT_AT_TX
,
FEAT_SP
,
0
,
dccp_hdlr_ndp
},
{
DCCPF_MIN_CSUM_COVER
,
FEAT_AT_RX
,
FEAT_SP
,
0
,
dccp_hdlr_min_cscov
},
{
DCCPF_DATA_CHECKSUM
,
FEAT_AT_RX
,
FEAT_SP
,
0
,
NULL
},
{
DCCPF_SEND_LEV_RATE
,
FEAT_AT_RX
,
FEAT_SP
,
0
,
NULL
},
};
#define DCCP_FEAT_SUPPORTED_MAX ARRAY_SIZE(dccp_feat_table)
/**
* dccp_feat_index - Hash function to map feature number into array position
* Returns consecutive array index or -1 if the feature is not understood.
*/
static
int
dccp_feat_index
(
u8
feat_num
)
{
{
/* The first 9 entries are occupied by the types from RFC 4340, 6.4 */
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
if
(
feat_num
>
DCCPF_RESERVED
&&
feat_num
<=
DCCPF_DATA_CHECKSUM
)
u8
*
spref
,
slen
,
*
res
=
NULL
;
return
feat_num
-
1
;
int
i
,
j
,
rc
,
agree
=
1
;
BUG_ON
(
rpref
==
NULL
);
/* check if we are the black sheep */
if
(
dp
->
dccps_role
==
DCCP_ROLE_CLIENT
)
{
spref
=
rpref
;
slen
=
rlen
;
rpref
=
opt
->
dccpop_val
;
rlen
=
opt
->
dccpop_len
;
}
else
{
spref
=
opt
->
dccpop_val
;
slen
=
opt
->
dccpop_len
;
}
/*
/*
*
Other features: add cases for new feature types here after adding
*
Now we have server preference list in spref and client preference in
*
them to the above table.
*
rpref
*/
*/
switch
(
feat_num
)
{
BUG_ON
(
spref
==
NULL
);
case
DCCPF_SEND_LEV_RATE
:
BUG_ON
(
rpref
==
NULL
);
return
DCCP_FEAT_SUPPORTED_MAX
-
1
;
}
return
-
1
;
}
static
u8
dccp_feat_type
(
u8
feat_num
)
{
int
idx
=
dccp_feat_index
(
feat_num
);
if
(
idx
<
0
)
return
FEAT_UNKNOWN
;
return
dccp_feat_table
[
idx
].
reconciliation
;
}
static
int
dccp_feat_default_value
(
u8
feat_num
)
/* FIXME sanity check vals */
{
int
idx
=
dccp_feat_index
(
feat_num
);
return
idx
<
0
?
:
dccp_feat_table
[
idx
].
default_value
;
/* Are values in any order? XXX Lame "algorithm" here */
}
for
(
i
=
0
;
i
<
slen
;
i
++
)
{
for
(
j
=
0
;
j
<
rlen
;
j
++
)
{
/*
if
(
spref
[
i
]
==
rpref
[
j
])
{
* Debugging and verbose-printing section
res
=
&
spref
[
i
];
*/
break
;
static
const
char
*
dccp_feat_fname
(
const
u8
feat
)
}
{
}
static
const
char
*
feature_names
[]
=
{
if
(
res
)
[
DCCPF_RESERVED
]
=
"Reserved"
,
break
;
[
DCCPF_CCID
]
=
"CCID"
,
[
DCCPF_SHORT_SEQNOS
]
=
"Allow Short Seqnos"
,
[
DCCPF_SEQUENCE_WINDOW
]
=
"Sequence Window"
,
[
DCCPF_ECN_INCAPABLE
]
=
"ECN Incapable"
,
[
DCCPF_ACK_RATIO
]
=
"Ack Ratio"
,
[
DCCPF_SEND_ACK_VECTOR
]
=
"Send ACK Vector"
,
[
DCCPF_SEND_NDP_COUNT
]
=
"Send NDP Count"
,
[
DCCPF_MIN_CSUM_COVER
]
=
"Min. Csum Coverage"
,
[
DCCPF_DATA_CHECKSUM
]
=
"Send Data Checksum"
,
};
if
(
feat
>
DCCPF_DATA_CHECKSUM
&&
feat
<
DCCPF_MIN_CCID_SPECIFIC
)
return
feature_names
[
DCCPF_RESERVED
];
if
(
feat
==
DCCPF_SEND_LEV_RATE
)
return
"Send Loss Event Rate"
;
if
(
feat
>=
DCCPF_MIN_CCID_SPECIFIC
)
return
"CCID-specific"
;
return
feature_names
[
feat
];
}
static
const
char
*
dccp_feat_sname
[]
=
{
"DEFAULT"
,
"INITIALISING"
,
"CHANGING"
,
"UNSTABLE"
,
"STABLE"
};
#ifdef CONFIG_IP_DCCP_DEBUG
static
const
char
*
dccp_feat_oname
(
const
u8
opt
)
{
switch
(
opt
)
{
case
DCCPO_CHANGE_L
:
return
"Change_L"
;
case
DCCPO_CONFIRM_L
:
return
"Confirm_L"
;
case
DCCPO_CHANGE_R
:
return
"Change_R"
;
case
DCCPO_CONFIRM_R
:
return
"Confirm_R"
;
}
}
return
NULL
;
}
static
void
dccp_feat_printval
(
u8
feat_num
,
dccp_feat_val
const
*
val
)
/* we didn't agree on anything */
{
if
(
res
==
NULL
)
{
u8
i
,
type
=
dccp_feat_type
(
feat_num
);
/* confirm previous value */
switch
(
opt
->
dccpop_feat
)
{
if
(
val
==
NULL
||
(
type
==
FEAT_SP
&&
val
->
sp
.
vec
==
NULL
))
case
DCCPF_CCID
:
dccp_pr_debug_cat
(
"(NULL)"
);
/* XXX did i get this right? =P */
else
if
(
type
==
FEAT_SP
)
if
(
opt
->
dccpop_type
==
DCCPO_CHANGE_L
)
for
(
i
=
0
;
i
<
val
->
sp
.
len
;
i
++
)
res
=
&
dccp_msk
(
sk
)
->
dccpms_tx_ccid
;
dccp_pr_debug_cat
(
"%s%u"
,
i
?
" "
:
""
,
val
->
sp
.
vec
[
i
]);
else
else
if
(
type
==
FEAT_NN
)
res
=
&
dccp_msk
(
sk
)
->
dccpms_rx_ccid
;
dccp_pr_debug_cat
(
"%llu"
,
(
unsigned
long
long
)
val
->
nn
);
break
;
else
dccp_pr_debug_cat
(
"unknown type %u"
,
type
);
}
static
void
dccp_feat_printvals
(
u8
feat_num
,
u8
*
list
,
u8
len
)
{
u8
type
=
dccp_feat_type
(
feat_num
);
dccp_feat_val
fval
=
{
.
sp
.
vec
=
list
,
.
sp
.
len
=
len
};
if
(
type
==
FEAT_NN
)
fval
.
nn
=
dccp_decode_value_var
(
list
,
len
);
dccp_feat_printval
(
feat_num
,
&
fval
);
}
static
void
dccp_feat_print_entry
(
struct
dccp_feat_entry
const
*
entry
)
default:
{
DCCP_BUG
(
"Fell through, feat=%d"
,
opt
->
dccpop_feat
);
dccp_debug
(
" * %s %s = "
,
entry
->
is_local
?
"local"
:
"remote"
,
/* XXX implement res */
dccp_feat_fname
(
entry
->
feat_num
));
return
-
EFAULT
;
dccp_feat_printval
(
entry
->
feat_num
,
&
entry
->
val
);
}
dccp_pr_debug_cat
(
", state=%s %s
\n
"
,
dccp_feat_sname
[
entry
->
state
],
entry
->
needs_confirm
?
"(Confirm pending)"
:
""
);
}
#define dccp_feat_print_opt(opt, feat, val, len, mandatory) do { \
dccp_pr_debug
(
"Don't agree... reconfirming %d
\n
"
,
*
res
);
dccp_pr_debug("%s(%s, ", dccp_feat_oname(opt), dccp_feat_fname(feat));\
agree
=
0
;
/* this is used for mandatory options... */
dccp_feat_printvals(feat, val, len); \
}
dccp_pr_debug_cat(") %s\n", mandatory ? "!" : ""); } while (0)
#define dccp_feat_print_fnlist(fn_list) { \
const struct dccp_feat_entry *___entry; \
\
dccp_pr_debug("List Dump:\n"); \
list_for_each_entry(___entry, fn_list, node) \
dccp_feat_print_entry(___entry); \
}
#else
/* ! CONFIG_IP_DCCP_DEBUG */
#define dccp_feat_print_opt(opt, feat, val, len, mandatory)
#define dccp_feat_print_fnlist(fn_list)
#endif
static
int
__dccp_feat_activate
(
struct
sock
*
sk
,
const
int
idx
,
/* need to put result and our preference list */
const
bool
is_local
,
dccp_feat_val
const
*
fval
)
rlen
=
1
+
opt
->
dccpop_len
;
{
rpref
=
kmalloc
(
rlen
,
GFP_ATOMIC
);
bool
rx
;
if
(
rpref
==
NULL
)
u64
val
;
return
-
ENOMEM
;
if
(
idx
<
0
||
idx
>=
DCCP_FEAT_SUPPORTED_MAX
)
*
rpref
=
*
res
;
return
-
1
;
memcpy
(
&
rpref
[
1
],
opt
->
dccpop_val
,
opt
->
dccpop_len
);
if
(
dccp_feat_table
[
idx
].
activation_hdlr
==
NULL
)
return
0
;
if
(
fval
==
NULL
)
{
/* put it in the "confirm queue" */
val
=
dccp_feat_table
[
idx
].
default_value
;
if
(
opt
->
dccpop_sc
==
NULL
)
{
}
else
if
(
dccp_feat_table
[
idx
].
reconciliation
==
FEAT_SP
)
{
opt
->
dccpop_sc
=
kmalloc
(
sizeof
(
*
opt
->
dccpop_sc
),
GFP_ATOMIC
);
if
(
fval
->
sp
.
vec
==
NULL
)
{
if
(
opt
->
dccpop_sc
==
NULL
)
{
/*
kfree
(
rpref
);
* This can happen when an empty Confirm is sent
return
-
ENOMEM
;
* for an SP (i.e. known) feature. In this case
* we would be using the default anyway.
*/
DCCP_CRIT
(
"Feature #%d undefined: using default"
,
idx
);
val
=
dccp_feat_table
[
idx
].
default_value
;
}
else
{
val
=
fval
->
sp
.
vec
[
0
];
}
}
}
else
{
}
else
{
val
=
fval
->
nn
;
/* recycle the confirm slot */
BUG_ON
(
opt
->
dccpop_sc
->
dccpoc_val
==
NULL
);
kfree
(
opt
->
dccpop_sc
->
dccpoc_val
);
dccp_pr_debug
(
"recycling confirm slot
\n
"
);
}
memset
(
opt
->
dccpop_sc
,
0
,
sizeof
(
*
opt
->
dccpop_sc
));
opt
->
dccpop_sc
->
dccpoc_val
=
rpref
;
opt
->
dccpop_sc
->
dccpoc_len
=
rlen
;
/* update the option on our side [we are about to send the confirm] */
rc
=
dccp_feat_update
(
sk
,
opt
->
dccpop_type
,
opt
->
dccpop_feat
,
*
res
);
if
(
rc
)
{
kfree
(
opt
->
dccpop_sc
->
dccpoc_val
);
kfree
(
opt
->
dccpop_sc
);
opt
->
dccpop_sc
=
NULL
;
return
rc
;
}
}
/* Location is RX if this is a local-RX or remote-TX feature */
dccp_pr_debug
(
"Will confirm %d
\n
"
,
*
rpref
);
rx
=
(
is_local
==
(
dccp_feat_table
[
idx
].
rxtx
==
FEAT_AT_RX
));
dccp_debug
(
" -> activating %s %s, %sval=%llu
\n
"
,
rx
?
"RX"
:
"TX"
,
dccp_feat_fname
(
dccp_feat_table
[
idx
].
feat_num
),
fval
?
""
:
"default "
,
(
unsigned
long
long
)
val
);
return
dccp_feat_table
[
idx
].
activation_hdlr
(
sk
,
val
,
rx
);
}
/**
* dccp_feat_activate - Activate feature value on socket
* @sk: fully connected DCCP socket (after handshake is complete)
* @feat_num: feature to activate, one of %dccp_feature_numbers
* @local: whether local (1) or remote (0) @feat_num is meant
* @fval: the value (SP or NN) to activate, or NULL to use the default value
* For general use this function is preferable over __dccp_feat_activate().
*/
static
int
dccp_feat_activate
(
struct
sock
*
sk
,
u8
feat_num
,
bool
local
,
dccp_feat_val
const
*
fval
)
{
return
__dccp_feat_activate
(
sk
,
dccp_feat_index
(
feat_num
),
local
,
fval
);
}
/* Test for "Req'd" feature (RFC 4340, 6.4) */
static
inline
int
dccp_feat_must_be_understood
(
u8
feat_num
)
{
return
feat_num
==
DCCPF_CCID
||
feat_num
==
DCCPF_SHORT_SEQNOS
||
feat_num
==
DCCPF_SEQUENCE_WINDOW
;
}
/* copy constructor, fval must not already contain allocated memory */
/* say we want to change to X but we just got a confirm X, suppress our
static
int
dccp_feat_clone_sp_val
(
dccp_feat_val
*
fval
,
u8
const
*
val
,
u8
len
)
* change
{
*/
fval
->
sp
.
len
=
len
;
if
(
!
opt
->
dccpop_conf
)
{
if
(
fval
->
sp
.
len
>
0
)
{
if
(
*
opt
->
dccpop_val
==
*
res
)
fval
->
sp
.
vec
=
kmemdup
(
val
,
len
,
gfp_any
());
opt
->
dccpop_conf
=
1
;
if
(
fval
->
sp
.
vec
==
NULL
)
{
dccp_pr_debug
(
"won't ask for change of same feature
\n
"
);
fval
->
sp
.
len
=
0
;
return
-
ENOBUFS
;
}
}
}
return
0
;
}
static
void
dccp_feat_val_destructor
(
u8
feat_num
,
dccp_feat_val
*
val
)
return
agree
?
0
:
DCCP_FEAT_SP_NOAGREE
;
/* used for mandatory opts */
{
if
(
unlikely
(
val
==
NULL
))
return
;
if
(
dccp_feat_type
(
feat_num
)
==
FEAT_SP
)
kfree
(
val
->
sp
.
vec
);
memset
(
val
,
0
,
sizeof
(
*
val
));
}
}
static
struct
dccp_feat_entry
*
static
int
dccp_feat_sp
(
struct
sock
*
sk
,
u8
type
,
u8
feature
,
u8
*
val
,
u8
len
)
dccp_feat_clone_entry
(
struct
dccp_feat_entry
const
*
original
)
{
{
struct
dccp_feat_entry
*
new
;
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
u8
type
=
dccp_feat_type
(
original
->
feat_num
);
struct
dccp_opt_pend
*
opt
;
int
rc
=
1
;
if
(
type
==
FEAT_UNKNOWN
)
u8
t
;
return
NULL
;
new
=
kmemdup
(
original
,
sizeof
(
struct
dccp_feat_entry
),
gfp_any
());
/*
if
(
new
==
NULL
)
* We received a CHANGE. We gotta match it against our own preference
return
NULL
;
* list. If we got a CHANGE_R it means it's a change for us, so we need
* to compare our CHANGE_L list.
*/
if
(
type
==
DCCPO_CHANGE_L
)
t
=
DCCPO_CHANGE_R
;
else
t
=
DCCPO_CHANGE_L
;
if
(
type
==
FEAT_SP
&&
dccp_feat_clone_sp_val
(
&
new
->
val
,
/* find our preference list for this feature */
original
->
val
.
sp
.
vec
,
list_for_each_entry
(
opt
,
&
dmsk
->
dccpms_pending
,
dccpop_node
)
{
original
->
val
.
sp
.
len
))
{
if
(
opt
->
dccpop_type
!=
t
||
opt
->
dccpop_feat
!=
feature
)
kfree
(
new
);
continue
;
return
NULL
;
}
return
new
;
}
static
void
dccp_feat_entry_destructor
(
struct
dccp_feat_entry
*
entry
)
/* find the winner from the two preference lists */
{
rc
=
dccp_feat_reconcile
(
sk
,
opt
,
val
,
len
);
if
(
entry
!=
NULL
)
{
break
;
dccp_feat_val_destructor
(
entry
->
feat_num
,
&
entry
->
val
);
kfree
(
entry
);
}
}
}
/*
/* We didn't deal with the change. This can happen if we have no
* List management functions
* preference list for the feature. In fact, it just shouldn't
*
* happen---if we understand a feature, we should have a preference list
* Feature negotiation lists rely on and maintain the following invariants:
* with at least the default value.
* - each feat_num in the list is known, i.e. we know its type and default value
*/
* - each feat_num/is_local combination is unique (old entries are overwritten)
BUG_ON
(
rc
==
1
);
* - SP values are always freshly allocated
* - list is sorted in increasing order of feature number (faster lookup)
*/
static
struct
dccp_feat_entry
*
dccp_feat_list_lookup
(
struct
list_head
*
fn_list
,
u8
feat_num
,
bool
is_local
)
{
struct
dccp_feat_entry
*
entry
;
list_for_each_entry
(
entry
,
fn_list
,
node
)
return
rc
;
if
(
entry
->
feat_num
==
feat_num
&&
entry
->
is_local
==
is_local
)
return
entry
;
else
if
(
entry
->
feat_num
>
feat_num
)
break
;
return
NULL
;
}
}
/**
static
int
dccp_feat_nn
(
struct
sock
*
sk
,
u8
type
,
u8
feature
,
u8
*
val
,
u8
len
)
* dccp_feat_entry_new - Central list update routine (called by all others)
* @head: list to add to
* @feat: feature number
* @local: whether the local (1) or remote feature with number @feat is meant
* This is the only constructor and serves to ensure the above invariants.
*/
static
struct
dccp_feat_entry
*
dccp_feat_entry_new
(
struct
list_head
*
head
,
u8
feat
,
bool
local
)
{
{
struct
dccp_feat_entry
*
entry
;
struct
dccp_opt_pend
*
opt
;
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
list_for_each_entry
(
entry
,
head
,
node
)
u8
*
copy
;
if
(
entry
->
feat_num
==
feat
&&
entry
->
is_local
==
local
)
{
int
rc
;
dccp_feat_val_destructor
(
entry
->
feat_num
,
&
entry
->
val
);
return
entry
;
}
else
if
(
entry
->
feat_num
>
feat
)
{
head
=
&
entry
->
node
;
break
;
}
entry
=
kmalloc
(
sizeof
(
*
entry
),
gfp_any
());
/* NN features must be Change L (sec. 6.3.2) */
if
(
entry
!=
NUL
L
)
{
if
(
type
!=
DCCPO_CHANGE_
L
)
{
entry
->
feat_num
=
feat
;
dccp_pr_debug
(
"received %s for NN feature %d
\n
"
,
entry
->
is_local
=
local
;
dccp_feat_typename
(
type
),
feature
)
;
list_add_tail
(
&
entry
->
node
,
head
)
;
return
-
EFAULT
;
}
}
return
entry
;
}
/**
/* XXX sanity check opt val */
* dccp_feat_push_change - Add/overwrite a Change option in the list
* @fn_list: feature-negotiation list to update
* @feat: one of %dccp_feature_numbers
* @local: whether local (1) or remote (0) @feat_num is meant
* @needs_mandatory: whether to use Mandatory feature negotiation options
* @fval: pointer to NN/SP value to be inserted (will be copied)
*/
static
int
dccp_feat_push_change
(
struct
list_head
*
fn_list
,
u8
feat
,
u8
local
,
u8
mandatory
,
dccp_feat_val
*
fval
)
{
struct
dccp_feat_entry
*
new
=
dccp_feat_entry_new
(
fn_list
,
feat
,
local
);
if
(
new
==
NULL
)
/* copy option so we can confirm it */
opt
=
kzalloc
(
sizeof
(
*
opt
),
GFP_ATOMIC
);
if
(
opt
==
NULL
)
return
-
ENOMEM
;
return
-
ENOMEM
;
new
->
feat_num
=
feat
;
copy
=
kmemdup
(
val
,
len
,
GFP_ATOMIC
);
new
->
is_local
=
local
;
if
(
copy
==
NULL
)
{
new
->
state
=
FEAT_INITIALISING
;
kfree
(
opt
);
new
->
needs_confirm
=
0
;
return
-
ENOMEM
;
new
->
empty_confirm
=
0
;
}
new
->
val
=
*
fval
;
new
->
needs_mandatory
=
mandatory
;
return
0
;
opt
->
dccpop_type
=
DCCPO_CONFIRM_R
;
/* NN can only confirm R */
}
opt
->
dccpop_feat
=
feature
;
opt
->
dccpop_val
=
copy
;
opt
->
dccpop_len
=
len
;
/**
/* change feature */
* dccp_feat_push_confirm - Add a Confirm entry to the FN list
rc
=
dccp_feat_update
(
sk
,
type
,
feature
,
*
val
);
* @fn_list: feature-negotiation list to add to
if
(
rc
)
{
* @feat: one of %dccp_feature_numbers
kfree
(
opt
->
dccpop_val
);
* @local: whether local (1) or remote (0) @feat_num is being confirmed
kfree
(
opt
);
* @fval: pointer to NN/SP value to be inserted or NULL
return
rc
;
* Returns 0 on success, a Reset code for further processing otherwise.
}
*/
static
int
dccp_feat_push_confirm
(
struct
list_head
*
fn_list
,
u8
feat
,
u8
local
,
dccp_feat_val
*
fval
)
{
struct
dccp_feat_entry
*
new
=
dccp_feat_entry_new
(
fn_list
,
feat
,
local
);
if
(
new
==
NULL
)
dccp_feat_debug
(
type
,
feature
,
*
copy
);
return
DCCP_RESET_CODE_TOO_BUSY
;
new
->
feat_num
=
feat
;
list_add_tail
(
&
opt
->
dccpop_node
,
&
dmsk
->
dccpms_conf
);
new
->
is_local
=
local
;
new
->
state
=
FEAT_STABLE
;
/* transition in 6.6.2 */
new
->
needs_confirm
=
1
;
new
->
empty_confirm
=
(
fval
==
NULL
);
new
->
val
.
nn
=
0
;
/* zeroes the whole structure */
if
(
!
new
->
empty_confirm
)
new
->
val
=
*
fval
;
new
->
needs_mandatory
=
0
;
return
0
;
return
0
;
}
}
static
int
dccp_push_empty_confirm
(
struct
list_head
*
fn_list
,
u8
feat
,
u8
local
)
static
void
dccp_feat_empty_confirm
(
struct
dccp_minisock
*
dmsk
,
u8
type
,
u8
feature
)
{
{
return
dccp_feat_push_confirm
(
fn_list
,
feat
,
local
,
NULL
);
/* XXX check if other confirms for that are queued and recycle slot */
}
struct
dccp_opt_pend
*
opt
=
kzalloc
(
sizeof
(
*
opt
),
GFP_ATOMIC
);
static
inline
void
dccp_feat_list_pop
(
struct
dccp_feat_entry
*
entry
)
if
(
opt
==
NULL
)
{
{
/* XXX what do we do? Ignoring should be fine. It's a change
list_del
(
&
entry
->
node
);
* after all =P
dccp_feat_entry_destructor
(
entry
);
*/
}
return
;
void
dccp_feat_list_purge
(
struct
list_head
*
fn_list
)
{
struct
dccp_feat_entry
*
entry
,
*
next
;
list_for_each_entry_safe
(
entry
,
next
,
fn_list
,
node
)
dccp_feat_entry_destructor
(
entry
);
INIT_LIST_HEAD
(
fn_list
);
}
EXPORT_SYMBOL_GPL
(
dccp_feat_list_purge
);
/* generate @to as full clone of @from - @to must not contain any nodes */
int
dccp_feat_clone_list
(
struct
list_head
const
*
from
,
struct
list_head
*
to
)
{
struct
dccp_feat_entry
*
entry
,
*
new
;
INIT_LIST_HEAD
(
to
);
list_for_each_entry
(
entry
,
from
,
node
)
{
new
=
dccp_feat_clone_entry
(
entry
);
if
(
new
==
NULL
)
goto
cloning_failed
;
list_add_tail
(
&
new
->
node
,
to
);
}
}
return
0
;
cloning_failed:
switch
(
type
)
{
dccp_feat_list_purge
(
to
);
case
DCCPO_CHANGE_L
:
return
-
ENOMEM
;
opt
->
dccpop_type
=
DCCPO_CONFIRM_R
;
}
break
;
case
DCCPO_CHANGE_R
:
opt
->
dccpop_type
=
DCCPO_CONFIRM_L
;
break
;
default:
DCCP_WARN
(
"invalid type %d
\n
"
,
type
);
kfree
(
opt
);
return
;
}
opt
->
dccpop_feat
=
feature
;
opt
->
dccpop_val
=
NULL
;
opt
->
dccpop_len
=
0
;
/**
/* change feature */
* dccp_feat_valid_nn_length - Enforce length constraints on NN options
dccp_pr_debug
(
"Empty %s(%d)
\n
"
,
dccp_feat_typename
(
type
),
feature
);
* Length is between 0 and %DCCP_OPTVAL_MAXLEN. Used for outgoing packets only,
* incoming options are accepted as long as their values are valid.
*/
static
u8
dccp_feat_valid_nn_length
(
u8
feat_num
)
{
if
(
feat_num
==
DCCPF_ACK_RATIO
)
/* RFC 4340, 11.3 and 6.6.8 */
return
2
;
if
(
feat_num
==
DCCPF_SEQUENCE_WINDOW
)
/* RFC 4340, 7.5.2 and 6.5 */
return
6
;
return
0
;
}
static
u8
dccp_feat_is_valid_nn_val
(
u8
feat_num
,
u64
val
)
list_add_tail
(
&
opt
->
dccpop_node
,
&
dmsk
->
dccpms_conf
);
{
switch
(
feat_num
)
{
case
DCCPF_ACK_RATIO
:
return
val
<=
DCCPF_ACK_RATIO_MAX
;
case
DCCPF_SEQUENCE_WINDOW
:
return
val
>=
DCCPF_SEQ_WMIN
&&
val
<=
DCCPF_SEQ_WMAX
;
}
return
0
;
/* feature unknown - so we can't tell */
}
}
/* check that SP values are within the ranges defined in RFC 4340 */
static
void
dccp_feat_flush_confirm
(
struct
sock
*
sk
)
static
u8
dccp_feat_is_valid_sp_val
(
u8
feat_num
,
u8
val
)
{
{
switch
(
feat_num
)
{
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
case
DCCPF_CCID
:
/* Check if there is anything to confirm in the first place */
return
val
==
DCCPC_CCID2
||
val
==
DCCPC_CCID3
;
int
yes
=
!
list_empty
(
&
dmsk
->
dccpms_conf
);
/* Type-check Boolean feature values: */
case
DCCPF_SHORT_SEQNOS
:
case
DCCPF_ECN_INCAPABLE
:
case
DCCPF_SEND_ACK_VECTOR
:
case
DCCPF_SEND_NDP_COUNT
:
case
DCCPF_DATA_CHECKSUM
:
case
DCCPF_SEND_LEV_RATE
:
return
val
<
2
;
case
DCCPF_MIN_CSUM_COVER
:
return
val
<
16
;
}
return
0
;
/* feature unknown */
}
static
u8
dccp_feat_sp_list_ok
(
u8
feat_num
,
u8
const
*
sp_list
,
u8
sp_len
)
if
(
!
yes
)
{
{
struct
dccp_opt_pend
*
opt
;
if
(
sp_list
==
NULL
||
sp_len
<
1
)
return
0
;
while
(
sp_len
--
)
if
(
!
dccp_feat_is_valid_sp_val
(
feat_num
,
*
sp_list
++
))
return
0
;
return
1
;
}
/**
list_for_each_entry
(
opt
,
&
dmsk
->
dccpms_pending
,
dccpop_node
)
{
* dccp_feat_insert_opts - Generate FN options from current list state
if
(
opt
->
dccpop_conf
)
{
* @skb: next sk_buff to be sent to the peer
yes
=
1
;
* @dp: for client during handshake and general negotiation
break
;
* @dreq: used by the server only (all Changes/Confirms in LISTEN/RESPOND)
*/
int
dccp_feat_insert_opts
(
struct
dccp_sock
*
dp
,
struct
dccp_request_sock
*
dreq
,
struct
sk_buff
*
skb
)
{
struct
list_head
*
fn
=
dreq
?
&
dreq
->
dreq_featneg
:
&
dp
->
dccps_featneg
;
struct
dccp_feat_entry
*
pos
,
*
next
;
u8
opt
,
type
,
len
,
*
ptr
,
nn_in_nbo
[
DCCP_OPTVAL_MAXLEN
];
bool
rpt
;
/* put entries into @skb in the order they appear in the list */
list_for_each_entry_safe_reverse
(
pos
,
next
,
fn
,
node
)
{
opt
=
dccp_feat_genopt
(
pos
);
type
=
dccp_feat_type
(
pos
->
feat_num
);
rpt
=
false
;
if
(
pos
->
empty_confirm
)
{
len
=
0
;
ptr
=
NULL
;
}
else
{
if
(
type
==
FEAT_SP
)
{
len
=
pos
->
val
.
sp
.
len
;
ptr
=
pos
->
val
.
sp
.
vec
;
rpt
=
pos
->
needs_confirm
;
}
else
if
(
type
==
FEAT_NN
)
{
len
=
dccp_feat_valid_nn_length
(
pos
->
feat_num
);
ptr
=
nn_in_nbo
;
dccp_encode_value_var
(
pos
->
val
.
nn
,
ptr
,
len
);
}
else
{
DCCP_BUG
(
"unknown feature %u"
,
pos
->
feat_num
);
return
-
1
;
}
}
}
}
dccp_feat_print_opt
(
opt
,
pos
->
feat_num
,
ptr
,
len
,
0
);
if
(
dccp_insert_fn_opt
(
skb
,
opt
,
pos
->
feat_num
,
ptr
,
len
,
rpt
))
return
-
1
;
if
(
pos
->
needs_mandatory
&&
dccp_insert_option_mandatory
(
skb
))
return
-
1
;
/*
* Enter CHANGING after transmitting the Change option (6.6.2).
*/
if
(
pos
->
state
==
FEAT_INITIALISING
)
pos
->
state
=
FEAT_CHANGING
;
}
}
return
0
;
}
/**
* __feat_register_nn - Register new NN value on socket
* @fn: feature-negotiation list to register with
* @feat: an NN feature from %dccp_feature_numbers
* @mandatory: use Mandatory option if 1
* @nn_val: value to register (restricted to 4 bytes)
* Note that NN features are local by definition (RFC 4340, 6.3.2).
*/
static
int
__feat_register_nn
(
struct
list_head
*
fn
,
u8
feat
,
u8
mandatory
,
u64
nn_val
)
{
dccp_feat_val
fval
=
{
.
nn
=
nn_val
};
if
(
dccp_feat_type
(
feat
)
!=
FEAT_NN
||
!
dccp_feat_is_valid_nn_val
(
feat
,
nn_val
))
return
-
EINVAL
;
/* Don't bother with default values, they will be activated anyway. */
if
(
nn_val
-
(
u64
)
dccp_feat_default_value
(
feat
)
==
0
)
return
0
;
return
dccp_feat_push_change
(
fn
,
feat
,
1
,
mandatory
,
&
fval
);
}
/**
* __feat_register_sp - Register new SP value/list on socket
* @fn: feature-negotiation list to register with
* @feat: an SP feature from %dccp_feature_numbers
* @is_local: whether the local (1) or the remote (0) @feat is meant
* @mandatory: use Mandatory option if 1
* @sp_val: SP value followed by optional preference list
* @sp_len: length of @sp_val in bytes
*/
static
int
__feat_register_sp
(
struct
list_head
*
fn
,
u8
feat
,
u8
is_local
,
u8
mandatory
,
u8
const
*
sp_val
,
u8
sp_len
)
{
dccp_feat_val
fval
;
if
(
dccp_feat_type
(
feat
)
!=
FEAT_SP
||
if
(
!
yes
)
!
dccp_feat_sp_list_ok
(
feat
,
sp_val
,
sp_len
))
return
;
return
-
EINVAL
;
/* Avoid negotiating alien CCIDs by only advertising supported ones */
if
(
feat
==
DCCPF_CCID
&&
!
ccid_support_check
(
sp_val
,
sp_len
))
return
-
EOPNOTSUPP
;
if
(
dccp_feat_clone_sp_val
(
&
fval
,
sp_val
,
sp_len
))
return
-
ENOMEM
;
return
dccp_feat_push_change
(
fn
,
feat
,
is_local
,
mandatory
,
&
fval
);
/* OK there is something to confirm... */
/* XXX check if packet is in flight? Send delayed ack?? */
if
(
sk
->
sk_state
==
DCCP_OPEN
)
dccp_send_ack
(
sk
);
}
}
/**
int
dccp_feat_change_recv
(
struct
sock
*
sk
,
u8
type
,
u8
feature
,
u8
*
val
,
u8
len
)
* dccp_feat_register_sp - Register requests to change SP feature values
* @sk: client or listening socket
* @feat: one of %dccp_feature_numbers
* @is_local: whether the local (1) or remote (0) @feat is meant
* @list: array of preferred values, in descending order of preference
* @len: length of @list in bytes
*/
int
dccp_feat_register_sp
(
struct
sock
*
sk
,
u8
feat
,
u8
is_local
,
u8
const
*
list
,
u8
len
)
{
/* any changes must be registered before establishing the connection */
if
(
sk
->
sk_state
!=
DCCP_CLOSED
)
return
-
EISCONN
;
if
(
dccp_feat_type
(
feat
)
!=
FEAT_SP
)
return
-
EINVAL
;
return
__feat_register_sp
(
&
dccp_sk
(
sk
)
->
dccps_featneg
,
feat
,
is_local
,
0
,
list
,
len
);
}
/* Analogous to dccp_feat_register_sp(), but for non-negotiable values */
int
dccp_feat_register_nn
(
struct
sock
*
sk
,
u8
feat
,
u64
val
)
{
{
/* any changes must be registered before establishing the connection */
int
rc
;
if
(
sk
->
sk_state
!=
DCCP_CLOSED
)
return
-
EISCONN
;
if
(
dccp_feat_type
(
feat
)
!=
FEAT_NN
)
return
-
EINVAL
;
return
__feat_register_nn
(
&
dccp_sk
(
sk
)
->
dccps_featneg
,
feat
,
0
,
val
);
}
/**
dccp_feat_debug
(
type
,
feature
,
*
val
);
* dccp_feat_signal_nn_change - Update NN values for an established connection
* @sk: DCCP socket of an established connection
* @feat: NN feature number from %dccp_feature_numbers
* @nn_val: the new value to use
* This function is used to communicate NN updates out-of-band. The difference
* to feature negotiation during connection setup is that values are activated
* immediately after validation, i.e. we don't wait for the Confirm: either the
* value is accepted by the peer (and then the waiting is futile), or it is not
* (Reset or empty Confirm). We don't accept empty Confirms - transmitted values
* are validated, and the peer "MUST accept any valid value" (RFC 4340, 6.3.2).
*/
int
dccp_feat_signal_nn_change
(
struct
sock
*
sk
,
u8
feat
,
u64
nn_val
)
{
struct
list_head
*
fn
=
&
dccp_sk
(
sk
)
->
dccps_featneg
;
dccp_feat_val
fval
=
{
.
nn
=
nn_val
};
struct
dccp_feat_entry
*
entry
;
if
(
sk
->
sk_state
!=
DCCP_OPEN
&&
sk
->
sk_state
!=
DCCP_PARTOPEN
)
/* figure out if it's SP or NN feature */
return
0
;
switch
(
feature
)
{
/* deal with SP features */
case
DCCPF_CCID
:
rc
=
dccp_feat_sp
(
sk
,
type
,
feature
,
val
,
len
);
break
;
if
(
dccp_feat_type
(
feat
)
!=
FEAT_NN
||
/* deal with NN features */
!
dccp_feat_is_valid_nn_val
(
feat
,
nn_val
))
case
DCCPF_ACK_RATIO
:
return
-
EINVAL
;
rc
=
dccp_feat_nn
(
sk
,
type
,
feature
,
val
,
len
);
break
;
entry
=
dccp_feat_list_lookup
(
fn
,
feat
,
1
);
/* XXX implement other features */
if
(
entry
!=
NULL
)
{
default:
dccp_pr_debug
(
"Ignoring %llu, entry %llu exists in state %s
\n
"
,
dccp_pr_debug
(
"UNIMPLEMENTED: not handling %s(%d, ...)
\n
"
,
(
unsigned
long
long
)
nn_val
,
dccp_feat_typename
(
type
),
feature
);
(
unsigned
long
long
)
entry
->
val
.
nn
,
rc
=
-
EFAULT
;
dccp_feat_sname
[
entry
->
state
]);
break
;
return
0
;
}
}
if
(
dccp_feat_activate
(
sk
,
feat
,
1
,
&
fval
))
/* check if there were problems changing features */
return
-
EADV
;
if
(
rc
)
{
/* If we don't agree on SP, we sent a confirm for old value.
inet_csk_schedule_ack
(
sk
);
* However we propagate rc to caller in case option was
return
dccp_feat_push_change
(
fn
,
feat
,
1
,
0
,
&
fval
);
* mandatory
}
EXPORT_SYMBOL_GPL
(
dccp_feat_signal_nn_change
);
/*
* Tracking features whose value depend on the choice of CCID
*
* This is designed with an extension in mind so that a list walk could be done
* before activating any features. However, the existing framework was found to
* work satisfactorily up until now, the automatic verification is left open.
* When adding new CCIDs, add a corresponding dependency table here.
*/
static
const
struct
ccid_dependency
*
dccp_feat_ccid_deps
(
u8
ccid
,
bool
is_local
)
{
static
const
struct
ccid_dependency
ccid2_dependencies
[
2
][
2
]
=
{
/*
* CCID2 mandates Ack Vectors (RFC 4341, 4.): as CCID is a TX
* feature and Send Ack Vector is an RX feature, `is_local'
* needs to be reversed.
*/
*/
{
/* Dependencies of the receiver-side (remote) CCID2 */
if
(
rc
!=
DCCP_FEAT_SP_NOAGREE
)
{
dccp_feat_empty_confirm
(
dccp_msk
(
sk
),
type
,
feature
);
.
dependent_feat
=
DCCPF_SEND_ACK_VECTOR
,
.
is_local
=
true
,
.
is_mandatory
=
true
,
.
val
=
1
},
{
0
,
0
,
0
,
0
}
},
{
/* Dependencies of the sender-side (local) CCID2 */
{
.
dependent_feat
=
DCCPF_SEND_ACK_VECTOR
,
.
is_local
=
false
,
.
is_mandatory
=
true
,
.
val
=
1
},
{
0
,
0
,
0
,
0
}
}
};
static
const
struct
ccid_dependency
ccid3_dependencies
[
2
][
5
]
=
{
{
/*
* Dependencies of the receiver-side CCID3
*/
{
/* locally disable Ack Vectors */
.
dependent_feat
=
DCCPF_SEND_ACK_VECTOR
,
.
is_local
=
true
,
.
is_mandatory
=
false
,
.
val
=
0
},
{
/* see below why Send Loss Event Rate is on */
.
dependent_feat
=
DCCPF_SEND_LEV_RATE
,
.
is_local
=
true
,
.
is_mandatory
=
true
,
.
val
=
1
},
{
/* NDP Count is needed as per RFC 4342, 6.1.1 */
.
dependent_feat
=
DCCPF_SEND_NDP_COUNT
,
.
is_local
=
false
,
.
is_mandatory
=
true
,
.
val
=
1
},
{
0
,
0
,
0
,
0
},
},
{
/*
* CCID3 at the TX side: we request that the HC-receiver
* will not send Ack Vectors (they will be ignored, so
* Mandatory is not set); we enable Send Loss Event Rate
* (Mandatory since the implementation does not support
* the Loss Intervals option of RFC 4342, 8.6).
* The last two options are for peer's information only.
*/
{
.
dependent_feat
=
DCCPF_SEND_ACK_VECTOR
,
.
is_local
=
false
,
.
is_mandatory
=
false
,
.
val
=
0
},
{
.
dependent_feat
=
DCCPF_SEND_LEV_RATE
,
.
is_local
=
false
,
.
is_mandatory
=
true
,
.
val
=
1
},
{
/* this CCID does not support Ack Ratio */
.
dependent_feat
=
DCCPF_ACK_RATIO
,
.
is_local
=
true
,
.
is_mandatory
=
false
,
.
val
=
0
},
{
/* tell receiver we are sending NDP counts */
.
dependent_feat
=
DCCPF_SEND_NDP_COUNT
,
.
is_local
=
true
,
.
is_mandatory
=
false
,
.
val
=
1
},
{
0
,
0
,
0
,
0
}
}
};
switch
(
ccid
)
{
case
DCCPC_CCID2
:
return
ccid2_dependencies
[
is_local
];
case
DCCPC_CCID3
:
return
ccid3_dependencies
[
is_local
];
default:
return
NULL
;
}
}
}
/**
/* generate the confirm [if required] */
* dccp_feat_propagate_ccid - Resolve dependencies of features on choice of CCID
dccp_feat_flush_confirm
(
sk
);
* @fn: feature-negotiation list to update
* @id: CCID number to track
* @is_local: whether TX CCID (1) or RX CCID (0) is meant
* This function needs to be called after registering all other features.
*/
static
int
dccp_feat_propagate_ccid
(
struct
list_head
*
fn
,
u8
id
,
bool
is_local
)
{
const
struct
ccid_dependency
*
table
=
dccp_feat_ccid_deps
(
id
,
is_local
);
int
i
,
rc
=
(
table
==
NULL
);
for
(
i
=
0
;
rc
==
0
&&
table
[
i
].
dependent_feat
!=
DCCPF_RESERVED
;
i
++
)
if
(
dccp_feat_type
(
table
[
i
].
dependent_feat
)
==
FEAT_SP
)
rc
=
__feat_register_sp
(
fn
,
table
[
i
].
dependent_feat
,
table
[
i
].
is_local
,
table
[
i
].
is_mandatory
,
&
table
[
i
].
val
,
1
);
else
rc
=
__feat_register_nn
(
fn
,
table
[
i
].
dependent_feat
,
table
[
i
].
is_mandatory
,
table
[
i
].
val
);
return
rc
;
}
/**
* dccp_feat_finalise_settings - Finalise settings before starting negotiation
* @dp: client or listening socket (settings will be inherited)
* This is called after all registrations (socket initialisation, sysctls, and
* sockopt calls), and before sending the first packet containing Change options
* (ie. client-Request or server-Response), to ensure internal consistency.
*/
int
dccp_feat_finalise_settings
(
struct
dccp_sock
*
dp
)
{
struct
list_head
*
fn
=
&
dp
->
dccps_featneg
;
struct
dccp_feat_entry
*
entry
;
int
i
=
2
,
ccids
[
2
]
=
{
-
1
,
-
1
};
/*
return
rc
;
* Propagating CCIDs:
* 1) not useful to propagate CCID settings if this host advertises more
* than one CCID: the choice of CCID may still change - if this is
* the client, or if this is the server and the client sends
* singleton CCID values.
* 2) since is that propagate_ccid changes the list, we defer changing
* the sorted list until after the traversal.
*/
list_for_each_entry
(
entry
,
fn
,
node
)
if
(
entry
->
feat_num
==
DCCPF_CCID
&&
entry
->
val
.
sp
.
len
==
1
)
ccids
[
entry
->
is_local
]
=
entry
->
val
.
sp
.
vec
[
0
];
while
(
i
--
)
if
(
ccids
[
i
]
>
0
&&
dccp_feat_propagate_ccid
(
fn
,
ccids
[
i
],
i
))
return
-
1
;
dccp_feat_print_fnlist
(
fn
);
return
0
;
}
}
/**
EXPORT_SYMBOL_GPL
(
dccp_feat_change_recv
);
* dccp_feat_server_ccid_dependencies - Resolve CCID-dependent features
* It is the server which resolves the dependencies once the CCID has been
* fully negotiated. If no CCID has been negotiated, it uses the default CCID.
*/
int
dccp_feat_server_ccid_dependencies
(
struct
dccp_request_sock
*
dreq
)
{
struct
list_head
*
fn
=
&
dreq
->
dreq_featneg
;
struct
dccp_feat_entry
*
entry
;
u8
is_local
,
ccid
;
for
(
is_local
=
0
;
is_local
<=
1
;
is_local
++
)
{
entry
=
dccp_feat_list_lookup
(
fn
,
DCCPF_CCID
,
is_local
);
if
(
entry
!=
NULL
&&
!
entry
->
empty_confirm
)
ccid
=
entry
->
val
.
sp
.
vec
[
0
];
else
ccid
=
dccp_feat_default_value
(
DCCPF_CCID
);
if
(
dccp_feat_propagate_ccid
(
fn
,
ccid
,
is_local
))
return
-
1
;
}
return
0
;
}
/* Select the first entry in @servlist that also occurs in @clilist (6.3.1) */
int
dccp_feat_confirm_recv
(
struct
sock
*
sk
,
u8
type
,
u8
feature
,
static
int
dccp_feat_preflist_match
(
u8
*
servlist
,
u8
slen
,
u8
*
clilist
,
u8
c
len
)
u8
*
val
,
u8
len
)
{
{
u8
c
,
s
;
u8
t
;
struct
dccp_opt_pend
*
opt
;
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
int
found
=
0
;
int
all_confirmed
=
1
;
for
(
s
=
0
;
s
<
slen
;
s
++
)
dccp_feat_debug
(
type
,
feature
,
*
val
);
for
(
c
=
0
;
c
<
clen
;
c
++
)
if
(
servlist
[
s
]
==
clilist
[
c
])
return
servlist
[
s
];
return
-
1
;
}
/**
/* locate our change request */
* dccp_feat_prefer - Move preferred entry to the start of array
switch
(
type
)
{
* Reorder the @array_len elements in @array so that @preferred_value comes
case
DCCPO_CONFIRM_L
:
t
=
DCCPO_CHANGE_R
;
break
;
* first. Returns >0 to indicate that @preferred_value does occur in @array.
case
DCCPO_CONFIRM_R
:
t
=
DCCPO_CHANGE_L
;
break
;
*/
default:
DCCP_WARN
(
"invalid type %d
\n
"
,
type
);
static
u8
dccp_feat_prefer
(
u8
preferred_value
,
u8
*
array
,
u8
array_len
)
return
1
;
{
u8
i
,
does_occur
=
0
;
if
(
array
!=
NULL
)
{
for
(
i
=
0
;
i
<
array_len
;
i
++
)
if
(
array
[
i
]
==
preferred_value
)
{
array
[
i
]
=
array
[
0
];
does_occur
++
;
}
if
(
does_occur
)
array
[
0
]
=
preferred_value
;
}
}
return
does_occur
;
/* XXX sanity check feature value */
}
/**
list_for_each_entry
(
opt
,
&
dmsk
->
dccpms_pending
,
dccpop_node
)
{
* dccp_feat_reconcile - Reconcile SP preference lists
if
(
!
opt
->
dccpop_conf
&&
opt
->
dccpop_type
==
t
&&
* @fval: SP list to reconcile into
opt
->
dccpop_feat
==
feature
)
{
* @arr: received SP preference list
found
=
1
;
* @len: length of @arr in bytes
dccp_pr_debug
(
"feature %d found
\n
"
,
opt
->
dccpop_feat
);
* @is_server: whether this side is the server (and @fv is the server's list)
* @reorder: whether to reorder the list in @fv after reconciling with @arr
* When successful, > 0 is returned and the reconciled list is in @fval.
* A value of 0 means that negotiation failed (no shared entry).
*/
static
int
dccp_feat_reconcile
(
dccp_feat_val
*
fv
,
u8
*
arr
,
u8
len
,
bool
is_server
,
bool
reorder
)
{
int
rc
;
if
(
!
fv
->
sp
.
vec
||
!
arr
)
{
/* XXX do sanity check */
DCCP_CRIT
(
"NULL feature value or array"
);
return
0
;
}
if
(
is_server
)
opt
->
dccpop_conf
=
1
;
rc
=
dccp_feat_preflist_match
(
fv
->
sp
.
vec
,
fv
->
sp
.
len
,
arr
,
len
);
else
rc
=
dccp_feat_preflist_match
(
arr
,
len
,
fv
->
sp
.
vec
,
fv
->
sp
.
len
);
if
(
!
reorder
)
return
rc
;
if
(
rc
<
0
)
return
0
;
/*
/* We got a confirmation---change the option */
* Reorder list: used for activating features and in dccp_insert_fn_opt.
dccp_feat_update
(
sk
,
opt
->
dccpop_type
,
*/
opt
->
dccpop_feat
,
*
val
);
return
dccp_feat_prefer
(
rc
,
fv
->
sp
.
vec
,
fv
->
sp
.
len
);
}
/**
/* XXX check the return value of dccp_feat_update */
* dccp_feat_change_recv - Process incoming ChangeL/R options
break
;
* @fn: feature-negotiation list to update
}
* @is_mandatory: whether the Change was preceded by a Mandatory option
* @opt: %DCCPO_CHANGE_L or %DCCPO_CHANGE_R
* @feat: one of %dccp_feature_numbers
* @val: NN value or SP value/preference list
* @len: length of @val in bytes
* @server: whether this node is the server (1) or the client (0)
*/
static
u8
dccp_feat_change_recv
(
struct
list_head
*
fn
,
u8
is_mandatory
,
u8
opt
,
u8
feat
,
u8
*
val
,
u8
len
,
const
bool
server
)
{
u8
defval
,
type
=
dccp_feat_type
(
feat
);
const
bool
local
=
(
opt
==
DCCPO_CHANGE_R
);
struct
dccp_feat_entry
*
entry
;
dccp_feat_val
fval
;
if
(
len
==
0
||
type
==
FEAT_UNKNOWN
)
/* 6.1 and 6.6.8 */
goto
unknown_feature_or_value
;
dccp_feat_print_opt
(
opt
,
feat
,
val
,
len
,
is_mandatory
);
/*
* Negotiation of NN features: Change R is invalid, so there is no
* simultaneous negotiation; hence we do not look up in the list.
*/
if
(
type
==
FEAT_NN
)
{
if
(
local
||
len
>
sizeof
(
fval
.
nn
))
goto
unknown_feature_or_value
;
/* 6.3.2: "The feature remote MUST accept any valid value..." */
fval
.
nn
=
dccp_decode_value_var
(
val
,
len
);
if
(
!
dccp_feat_is_valid_nn_val
(
feat
,
fval
.
nn
))
goto
unknown_feature_or_value
;
return
dccp_feat_push_confirm
(
fn
,
feat
,
local
,
&
fval
);
if
(
!
opt
->
dccpop_conf
)
all_confirmed
=
0
;
}
}
/*
/* fix re-transmit timer */
* Unidirectional/simultaneous negotiation of SP features (6.3.1)
/* XXX gotta make sure that no option negotiation occurs during
* connection shutdown. Consider that the CLOSEREQ is sent and timer is
* on. if all options are confirmed it might kill timer which should
* remain alive until close is received.
*/
*/
entry
=
dccp_feat_list_lookup
(
fn
,
feat
,
local
);
if
(
all_confirmed
)
{
if
(
entry
==
NULL
)
{
dccp_pr_debug
(
"clear feat negotiation timer %p
\n
"
,
sk
);
/*
inet_csk_clear_xmit_timer
(
sk
,
ICSK_TIME_RETRANS
);
* No particular preferences have been registered. We deal with
* this situation by assuming that all valid values are equally
* acceptable, and apply the following checks:
* - if the peer's list is a singleton, we accept a valid value;
* - if we are the server, we first try to see if the peer (the
* client) advertises the default value. If yes, we use it,
* otherwise we accept the preferred value;
* - else if we are the client, we use the first list element.
*/
if
(
dccp_feat_clone_sp_val
(
&
fval
,
val
,
1
))
return
DCCP_RESET_CODE_TOO_BUSY
;
if
(
len
>
1
&&
server
)
{
defval
=
dccp_feat_default_value
(
feat
);
if
(
dccp_feat_preflist_match
(
&
defval
,
1
,
val
,
len
)
>
-
1
)
fval
.
sp
.
vec
[
0
]
=
defval
;
}
else
if
(
!
dccp_feat_is_valid_sp_val
(
feat
,
fval
.
sp
.
vec
[
0
]))
{
kfree
(
fval
.
sp
.
vec
);
goto
unknown_feature_or_value
;
}
/* Treat unsupported CCIDs like invalid values */
if
(
feat
==
DCCPF_CCID
&&
!
ccid_support_check
(
fval
.
sp
.
vec
,
1
))
{
kfree
(
fval
.
sp
.
vec
);
goto
not_valid_or_not_known
;
}
return
dccp_feat_push_confirm
(
fn
,
feat
,
local
,
&
fval
);
}
else
if
(
entry
->
state
==
FEAT_UNSTABLE
)
{
/* 6.6.2 */
return
0
;
}
}
if
(
dccp_feat_reconcile
(
&
entry
->
val
,
val
,
len
,
server
,
true
))
{
if
(
!
found
)
entry
->
empty_confirm
=
0
;
dccp_pr_debug
(
"%s(%d, ...) never requested
\n
"
,
}
else
if
(
is_mandatory
)
{
dccp_feat_typename
(
type
),
feature
);
return
DCCP_RESET_CODE_MANDATORY_ERROR
;
}
else
if
(
entry
->
state
==
FEAT_INITIALISING
)
{
/*
* Failed simultaneous negotiation (server only): try to `save'
* the connection by checking whether entry contains the default
* value for @feat. If yes, send an empty Confirm to signal that
* the received Change was not understood - which implies using
* the default value.
* If this also fails, we use Reset as the last resort.
*/
WARN_ON
(
!
server
);
defval
=
dccp_feat_default_value
(
feat
);
if
(
!
dccp_feat_reconcile
(
&
entry
->
val
,
&
defval
,
1
,
server
,
true
))
return
DCCP_RESET_CODE_OPTION_ERROR
;
entry
->
empty_confirm
=
1
;
}
entry
->
needs_confirm
=
1
;
entry
->
needs_mandatory
=
0
;
entry
->
state
=
FEAT_STABLE
;
return
0
;
return
0
;
unknown_feature_or_value:
if
(
!
is_mandatory
)
return
dccp_push_empty_confirm
(
fn
,
feat
,
local
);
not_valid_or_not_known:
return
is_mandatory
?
DCCP_RESET_CODE_MANDATORY_ERROR
:
DCCP_RESET_CODE_OPTION_ERROR
;
}
}
/**
EXPORT_SYMBOL_GPL
(
dccp_feat_confirm_recv
);
* dccp_feat_confirm_recv - Process received Confirm options
* @fn: feature-negotiation list to update
* @is_mandatory: whether @opt was preceded by a Mandatory option
* @opt: %DCCPO_CONFIRM_L or %DCCPO_CONFIRM_R
* @feat: one of %dccp_feature_numbers
* @val: NN value or SP value/preference list
* @len: length of @val in bytes
* @server: whether this node is server (1) or client (0)
*/
static
u8
dccp_feat_confirm_recv
(
struct
list_head
*
fn
,
u8
is_mandatory
,
u8
opt
,
u8
feat
,
u8
*
val
,
u8
len
,
const
bool
server
)
{
u8
*
plist
,
plen
,
type
=
dccp_feat_type
(
feat
);
const
bool
local
=
(
opt
==
DCCPO_CONFIRM_R
);
struct
dccp_feat_entry
*
entry
=
dccp_feat_list_lookup
(
fn
,
feat
,
local
);
dccp_feat_print_opt
(
opt
,
feat
,
val
,
len
,
is_mandatory
);
if
(
entry
==
NULL
)
{
/* nothing queued: ignore or handle error */
if
(
is_mandatory
&&
type
==
FEAT_UNKNOWN
)
return
DCCP_RESET_CODE_MANDATORY_ERROR
;
if
(
!
local
&&
type
==
FEAT_NN
)
/* 6.3.2 */
goto
confirmation_failed
;
return
0
;
}
if
(
entry
->
state
!=
FEAT_CHANGING
)
/* 6.6.2 */
return
0
;
if
(
len
==
0
)
{
if
(
dccp_feat_must_be_understood
(
feat
))
/* 6.6.7 */
goto
confirmation_failed
;
/*
* Empty Confirm during connection setup: this means reverting
* to the `old' value, which in this case is the default. Since
* we handle default values automatically when no other values
* have been set, we revert to the old value by removing this
* entry from the list.
*/
dccp_feat_list_pop
(
entry
);
return
0
;
}
if
(
type
==
FEAT_NN
)
{
void
dccp_feat_clean
(
struct
dccp_minisock
*
dmsk
)
if
(
len
>
sizeof
(
entry
->
val
.
nn
))
{
goto
confirmation_failed
;
struct
dccp_opt_pend
*
opt
,
*
next
;
if
(
entry
->
val
.
nn
==
dccp_decode_value_var
(
val
,
len
))
list_for_each_entry_safe
(
opt
,
next
,
&
dmsk
->
dccpms_pending
,
goto
confirmation_succeeded
;
dccpop_node
)
{
BUG_ON
(
opt
->
dccpop_val
==
NULL
);
kfree
(
opt
->
dccpop_val
);
DCCP_WARN
(
"Bogus Confirm for non-existing value
\n
"
);
if
(
opt
->
dccpop_sc
!=
NULL
)
{
goto
confirmation_failed
;
BUG_ON
(
opt
->
dccpop_sc
->
dccpoc_val
==
NULL
);
}
kfree
(
opt
->
dccpop_sc
->
dccpoc_val
);
kfree
(
opt
->
dccpop_sc
);
}
/*
kfree
(
opt
);
* Parsing SP Confirms: the first element of @val is the preferred
* SP value which the peer confirms, the remainder depends on @len.
* Note that only the confirmed value need to be a valid SP value.
*/
if
(
!
dccp_feat_is_valid_sp_val
(
feat
,
*
val
))
goto
confirmation_failed
;
if
(
len
==
1
)
{
/* peer didn't supply a preference list */
plist
=
val
;
plen
=
len
;
}
else
{
/* preferred value + preference list */
plist
=
val
+
1
;
plen
=
len
-
1
;
}
}
INIT_LIST_HEAD
(
&
dmsk
->
dccpms_pending
);
/* Check whether the peer got the reconciliation right (6.6.8) */
list_for_each_entry_safe
(
opt
,
next
,
&
dmsk
->
dccpms_conf
,
dccpop_node
)
{
if
(
dccp_feat_reconcile
(
&
entry
->
val
,
plist
,
plen
,
server
,
0
)
!=
*
val
)
{
BUG_ON
(
opt
==
NULL
);
DCCP_WARN
(
"Confirm selected the wrong value %u
\n
"
,
*
val
);
if
(
opt
->
dccpop_val
!=
NULL
)
return
DCCP_RESET_CODE_OPTION_ERROR
;
kfree
(
opt
->
dccpop_val
);
kfree
(
opt
);
}
}
entry
->
val
.
sp
.
vec
[
0
]
=
*
val
;
INIT_LIST_HEAD
(
&
dmsk
->
dccpms_conf
);
confirmation_succeeded:
entry
->
state
=
FEAT_STABLE
;
return
0
;
confirmation_failed:
DCCP_WARN
(
"Confirmation failed
\n
"
);
return
is_mandatory
?
DCCP_RESET_CODE_MANDATORY_ERROR
:
DCCP_RESET_CODE_OPTION_ERROR
;
}
}
/**
EXPORT_SYMBOL_GPL
(
dccp_feat_clean
);
* dccp_feat_handle_nn_established - Fast-path reception of NN options
* @sk: socket of an established DCCP connection
/* this is to be called only when a listening sock creates its child. It is
* @mandatory: whether @opt was preceded by a Mandatory option
* assumed by the function---the confirm is not duplicated, but rather it is
* @opt: %DCCPO_CHANGE_L | %DCCPO_CONFIRM_R (NN only)
* "passed on".
* @feat: NN number, one of %dccp_feature_numbers
* @val: NN value
* @len: length of @val in bytes
* This function combines the functionality of change_recv/confirm_recv, with
* the following differences (reset codes are the same):
* - cleanup after receiving the Confirm;
* - values are directly activated after successful parsing;
* - deliberately restricted to NN features.
* The restriction to NN features is essential since SP features can have non-
* predictable outcomes (depending on the remote configuration), and are inter-
* dependent (CCIDs for instance cause further dependencies).
*/
*/
static
u8
dccp_feat_handle_nn_established
(
struct
sock
*
sk
,
u8
mandatory
,
u8
opt
,
int
dccp_feat_clone
(
struct
sock
*
oldsk
,
struct
sock
*
newsk
)
u8
feat
,
u8
*
val
,
u8
len
)
{
{
struct
list_head
*
fn
=
&
dccp_sk
(
sk
)
->
dccps_featneg
;
struct
dccp_minisock
*
olddmsk
=
dccp_msk
(
oldsk
);
const
bool
local
=
(
opt
==
DCCPO_CONFIRM_R
);
struct
dccp_minisock
*
newdmsk
=
dccp_msk
(
newsk
);
struct
dccp_feat_entry
*
entry
;
struct
dccp_opt_pend
*
opt
;
u8
type
=
dccp_feat_type
(
feat
);
int
rc
=
0
;
dccp_feat_val
fval
;
dccp_feat_print_opt
(
opt
,
feat
,
val
,
len
,
mandatory
);
INIT_LIST_HEAD
(
&
newdmsk
->
dccpms_pending
);
INIT_LIST_HEAD
(
&
newdmsk
->
dccpms_conf
);
/* Ignore non-mandatory unknown and non-NN features */
list_for_each_entry
(
opt
,
&
olddmsk
->
dccpms_pending
,
dccpop_node
)
{
if
(
type
==
FEAT_UNKNOWN
)
{
struct
dccp_opt_pend
*
newopt
;
if
(
local
&&
!
mandatory
)
/* copy the value of the option */
return
0
;
u8
*
val
=
kmemdup
(
opt
->
dccpop_val
,
opt
->
dccpop_len
,
GFP_ATOMIC
);
goto
fast_path_unknown
;
}
else
if
(
type
!=
FEAT_NN
)
{
return
0
;
}
/*
* We don't accept empty Confirms, since in fast-path feature
* negotiation the values are enabled immediately after sending
* the Change option.
* Empty Changes on the other hand are invalid (RFC 4340, 6.1).
*/
if
(
len
==
0
||
len
>
sizeof
(
fval
.
nn
))
goto
fast_path_unknown
;
if
(
opt
==
DCCPO_CHANGE_L
)
{
fval
.
nn
=
dccp_decode_value_var
(
val
,
len
);
if
(
!
dccp_feat_is_valid_nn_val
(
feat
,
fval
.
nn
))
goto
fast_path_unknown
;
if
(
dccp_feat_push_confirm
(
fn
,
feat
,
local
,
&
fval
)
||
if
(
val
==
NULL
)
dccp_feat_activate
(
sk
,
feat
,
local
,
&
fval
))
goto
out_clean
;
return
DCCP_RESET_CODE_TOO_BUSY
;
/* set the `Ack Pending' flag to piggyback a Confirm */
newopt
=
kmemdup
(
opt
,
sizeof
(
*
newopt
),
GFP_ATOMIC
);
inet_csk_schedule_ack
(
sk
);
if
(
newopt
==
NULL
)
{
kfree
(
val
);
}
else
if
(
opt
==
DCCPO_CONFIRM_R
)
{
goto
out_clean
;
entry
=
dccp_feat_list_lookup
(
fn
,
feat
,
local
);
if
(
entry
==
NULL
||
entry
->
state
!=
FEAT_CHANGING
)
return
0
;
fval
.
nn
=
dccp_decode_value_var
(
val
,
len
);
if
(
fval
.
nn
!=
entry
->
val
.
nn
)
{
DCCP_WARN
(
"Bogus Confirm for non-existing value
\n
"
);
goto
fast_path_failed
;
}
}
/* It has been confirmed - so remove the entry */
/* insert the option */
dccp_feat_list_pop
(
entry
);
newopt
->
dccpop_val
=
val
;
list_add_tail
(
&
newopt
->
dccpop_node
,
&
newdmsk
->
dccpms_pending
);
}
else
{
/* XXX what happens with backlogs and multiple connections at
DCCP_WARN
(
"Received illegal option %u
\n
"
,
opt
);
* once...
goto
fast_path_failed
;
*/
/* the master socket no longer needs to worry about confirms */
opt
->
dccpop_sc
=
NULL
;
/* it's not a memleak---new socket has it */
/* reset state for a new socket */
opt
->
dccpop_conf
=
0
;
}
}
return
0
;
fast_path_unknown:
/* XXX not doing anything about the conf queue */
if
(
!
mandatory
)
return
dccp_push_empty_confirm
(
fn
,
feat
,
local
);
out:
return
rc
;
fast_path_failed:
out_clean:
return
mandatory
?
DCCP_RESET_CODE_MANDATORY_ERROR
dccp_feat_clean
(
newdmsk
);
:
DCCP_RESET_CODE_OPTION_ERROR
;
rc
=
-
ENOMEM
;
goto
out
;
}
}
/**
EXPORT_SYMBOL_GPL
(
dccp_feat_clone
);
* dccp_feat_parse_options - Process Feature-Negotiation Options
* @sk: for general use and used by the client during connection setup
static
int
__dccp_feat_init
(
struct
dccp_minisock
*
dmsk
,
u8
type
,
u8
feat
,
* @dreq: used by the server during connection setup
u8
*
val
,
u8
len
)
* @mandatory: whether @opt was preceded by a Mandatory option
* @opt: %DCCPO_CHANGE_L | %DCCPO_CHANGE_R | %DCCPO_CONFIRM_L | %DCCPO_CONFIRM_R
* @feat: one of %dccp_feature_numbers
* @val: value contents of @opt
* @len: length of @val in bytes
* Returns 0 on success, a Reset code for ending the connection otherwise.
*/
int
dccp_feat_parse_options
(
struct
sock
*
sk
,
struct
dccp_request_sock
*
dreq
,
u8
mandatory
,
u8
opt
,
u8
feat
,
u8
*
val
,
u8
len
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
int
rc
=
-
ENOMEM
;
struct
list_head
*
fn
=
dreq
?
&
dreq
->
dreq_featneg
:
&
dp
->
dccps_featneg
;
u8
*
copy
=
kmemdup
(
val
,
len
,
GFP_KERNEL
);
bool
server
=
false
;
switch
(
sk
->
sk_state
)
{
if
(
copy
!=
NULL
)
{
/*
rc
=
dccp_feat_change
(
dmsk
,
type
,
feat
,
copy
,
len
,
GFP_KERNEL
);
* Negotiation during connection setup
if
(
rc
)
*/
kfree
(
copy
);
case
DCCP_LISTEN
:
server
=
true
;
/* fall through */
case
DCCP_REQUESTING
:
switch
(
opt
)
{
case
DCCPO_CHANGE_L
:
case
DCCPO_CHANGE_R
:
return
dccp_feat_change_recv
(
fn
,
mandatory
,
opt
,
feat
,
val
,
len
,
server
);
case
DCCPO_CONFIRM_R
:
case
DCCPO_CONFIRM_L
:
return
dccp_feat_confirm_recv
(
fn
,
mandatory
,
opt
,
feat
,
val
,
len
,
server
);
}
break
;
/*
* Support for exchanging NN options on an established connection
* This is currently restricted to Ack Ratio (RFC 4341, 6.1.2)
*/
case
DCCP_OPEN
:
case
DCCP_PARTOPEN
:
return
dccp_feat_handle_nn_established
(
sk
,
mandatory
,
opt
,
feat
,
val
,
len
);
}
}
return
0
;
/* ignore FN options in all other states */
return
rc
;
}
}
/**
int
dccp_feat_init
(
struct
dccp_minisock
*
dmsk
)
* dccp_feat_init - Seed feature negotiation with host-specific defaults
* This initialises global defaults, depending on the value of the sysctls.
* These can later be overridden by registering changes via setsockopt calls.
* The last link in the chain is finalise_settings, to make sure that between
* here and the start of actual feature negotiation no inconsistencies enter.
*
* All features not appearing below use either defaults or are otherwise
* later adjusted through dccp_feat_finalise_settings().
*/
int
dccp_feat_init
(
struct
sock
*
sk
)
{
{
struct
list_head
*
fn
=
&
dccp_sk
(
sk
)
->
dccps_featneg
;
u8
on
=
1
,
off
=
0
;
int
rc
;
int
rc
;
struct
{
u8
*
val
;
u8
len
;
}
tx
,
rx
;
/* Non-negotiable (NN) features */
rc
=
__feat_register_nn
(
fn
,
DCCPF_SEQUENCE_WINDOW
,
0
,
sysctl_dccp_sequence_window
);
if
(
rc
)
return
rc
;
/* Server-priority (SP) features */
INIT_LIST_HEAD
(
&
dmsk
->
dccpms_pending
);
INIT_LIST_HEAD
(
&
dmsk
->
dccpms_conf
);
/* Advertise that short seqnos are not supported (7.6.1) */
rc
=
__feat_register_sp
(
fn
,
DCCPF_SHORT_SEQNOS
,
true
,
true
,
&
off
,
1
);
if
(
rc
)
return
rc
;
/* RFC 4340 12.1: "If a DCCP is not ECN capable, ..." */
/* CCID L */
rc
=
__feat_register_sp
(
fn
,
DCCPF_ECN_INCAPABLE
,
true
,
true
,
&
on
,
1
);
rc
=
__dccp_feat_init
(
dmsk
,
DCCPO_CHANGE_L
,
DCCPF_CCID
,
&
dmsk
->
dccpms_tx_ccid
,
1
);
if
(
rc
)
if
(
rc
)
return
rc
;
goto
out
;
/*
* We advertise the available list of CCIDs and reorder according to
* preferences, to avoid failure resulting from negotiating different
* singleton values (which always leads to failure).
* These settings can still (later) be overridden via sockopts.
*/
if
(
ccid_get_builtin_ccids
(
&
tx
.
val
,
&
tx
.
len
)
||
ccid_get_builtin_ccids
(
&
rx
.
val
,
&
rx
.
len
))
return
-
ENOBUFS
;
/* Pre-load all CCID modules that are going to be advertised */
rc
=
-
EUNATCH
;
if
(
ccid_request_modules
(
tx
.
val
,
tx
.
len
))
goto
free_ccid_lists
;
if
(
!
dccp_feat_prefer
(
sysctl_dccp_tx_ccid
,
tx
.
val
,
tx
.
len
)
||
!
dccp_feat_prefer
(
sysctl_dccp_rx_ccid
,
rx
.
val
,
rx
.
len
))
goto
free_ccid_lists
;
rc
=
__feat_register_sp
(
fn
,
DCCPF_CCID
,
true
,
false
,
tx
.
val
,
tx
.
len
);
/* CCID R */
rc
=
__dccp_feat_init
(
dmsk
,
DCCPO_CHANGE_R
,
DCCPF_CCID
,
&
dmsk
->
dccpms_rx_ccid
,
1
);
if
(
rc
)
if
(
rc
)
goto
free_ccid_lists
;
goto
out
;
rc
=
__feat_register_sp
(
fn
,
DCCPF_CCID
,
false
,
false
,
rx
.
val
,
rx
.
len
);
/* Ack ratio */
rc
=
__dccp_feat_init
(
dmsk
,
DCCPO_CHANGE_L
,
DCCPF_ACK_RATIO
,
free_ccid_lists:
&
dmsk
->
dccpms_ack_ratio
,
1
);
kfree
(
tx
.
val
);
out:
kfree
(
rx
.
val
);
return
rc
;
return
rc
;
}
}
int
dccp_feat_activate_values
(
struct
sock
*
sk
,
struct
list_head
*
fn_list
)
EXPORT_SYMBOL_GPL
(
dccp_feat_init
);
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_feat_entry
*
cur
,
*
next
;
int
idx
;
dccp_feat_val
*
fvals
[
DCCP_FEAT_SUPPORTED_MAX
][
2
]
=
{
[
0
...
DCCP_FEAT_SUPPORTED_MAX
-
1
]
=
{
NULL
,
NULL
}
};
list_for_each_entry
(
cur
,
fn_list
,
node
)
{
/*
* An empty Confirm means that either an unknown feature type
* or an invalid value was present. In the first case there is
* nothing to activate, in the other the default value is used.
*/
if
(
cur
->
empty_confirm
)
continue
;
idx
=
dccp_feat_index
(
cur
->
feat_num
);
#ifdef CONFIG_IP_DCCP_DEBUG
if
(
idx
<
0
)
{
const
char
*
dccp_feat_typename
(
const
u8
type
)
DCCP_BUG
(
"Unknown feature %u"
,
cur
->
feat_num
);
{
goto
activation_failed
;
switch
(
type
)
{
}
case
DCCPO_CHANGE_L
:
return
(
"ChangeL"
);
if
(
cur
->
state
!=
FEAT_STABLE
)
{
case
DCCPO_CONFIRM_L
:
return
(
"ConfirmL"
);
DCCP_CRIT
(
"Negotiation of %s %s failed in state %s"
,
case
DCCPO_CHANGE_R
:
return
(
"ChangeR"
);
cur
->
is_local
?
"local"
:
"remote"
,
case
DCCPO_CONFIRM_R
:
return
(
"ConfirmR"
);
dccp_feat_fname
(
cur
->
feat_num
),
/* the following case must not appear in feature negotation */
dccp_feat_sname
[
cur
->
state
]);
default:
dccp_pr_debug
(
"unknown type %d [BUG!]
\n
"
,
type
);
goto
activation_failed
;
}
fvals
[
idx
][
cur
->
is_local
]
=
&
cur
->
val
;
}
}
return
NULL
;
}
/*
EXPORT_SYMBOL_GPL
(
dccp_feat_typename
);
* Activate in decreasing order of index, so that the CCIDs are always
* activated as the last feature. This avoids the case where a CCID
* relies on the initialisation of one or more features that it depends
* on (e.g. Send NDP Count, Send Ack Vector, and Ack Ratio features).
*/
for
(
idx
=
DCCP_FEAT_SUPPORTED_MAX
;
--
idx
>=
0
;)
if
(
__dccp_feat_activate
(
sk
,
idx
,
0
,
fvals
[
idx
][
0
])
||
__dccp_feat_activate
(
sk
,
idx
,
1
,
fvals
[
idx
][
1
]))
{
DCCP_CRIT
(
"Could not activate %d"
,
idx
);
goto
activation_failed
;
}
/* Clean up Change options which have been confirmed already */
const
char
*
dccp_feat_name
(
const
u8
feat
)
list_for_each_entry_safe
(
cur
,
next
,
fn_list
,
node
)
{
if
(
!
cur
->
needs_confirm
)
static
const
char
*
feature_names
[]
=
{
dccp_feat_list_pop
(
cur
);
[
DCCPF_RESERVED
]
=
"Reserved"
,
[
DCCPF_CCID
]
=
"CCID"
,
[
DCCPF_SHORT_SEQNOS
]
=
"Allow Short Seqnos"
,
[
DCCPF_SEQUENCE_WINDOW
]
=
"Sequence Window"
,
[
DCCPF_ECN_INCAPABLE
]
=
"ECN Incapable"
,
[
DCCPF_ACK_RATIO
]
=
"Ack Ratio"
,
[
DCCPF_SEND_ACK_VECTOR
]
=
"Send ACK Vector"
,
[
DCCPF_SEND_NDP_COUNT
]
=
"Send NDP Count"
,
[
DCCPF_MIN_CSUM_COVER
]
=
"Min. Csum Coverage"
,
[
DCCPF_DATA_CHECKSUM
]
=
"Send Data Checksum"
,
};
if
(
feat
>
DCCPF_DATA_CHECKSUM
&&
feat
<
DCCPF_MIN_CCID_SPECIFIC
)
return
feature_names
[
DCCPF_RESERVED
];
dccp_pr_debug
(
"Activation OK
\n
"
);
if
(
feat
>=
DCCPF_MIN_CCID_SPECIFIC
)
return
0
;
return
"CCID-specific"
;
activation_failed:
return
feature_names
[
feat
];
/*
* We clean up everything that may have been allocated, since
* it is difficult to track at which stage negotiation failed.
* This is ok, since all allocation functions below are robust
* against NULL arguments.
*/
ccid_hc_rx_delete
(
dp
->
dccps_hc_rx_ccid
,
sk
);
ccid_hc_tx_delete
(
dp
->
dccps_hc_tx_ccid
,
sk
);
dp
->
dccps_hc_rx_ccid
=
dp
->
dccps_hc_tx_ccid
=
NULL
;
dccp_ackvec_free
(
dp
->
dccps_hc_rx_ackvec
);
dp
->
dccps_hc_rx_ackvec
=
NULL
;
return
-
1
;
}
}
EXPORT_SYMBOL_GPL
(
dccp_feat_name
);
#endif
/* CONFIG_IP_DCCP_DEBUG */
net/dccp/feat.h
View file @
ded67c0e
...
@@ -3,134 +3,38 @@
...
@@ -3,134 +3,38 @@
/*
/*
* net/dccp/feat.h
* net/dccp/feat.h
*
*
* Feature negotiation for the DCCP protocol (RFC 4340, section 6)
* An implementation of the DCCP protocol
* Copyright (c) 2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>
* Copyright (c) 2005 Andrea Bittau <a.bittau@cs.ucl.ac.uk>
* Copyright (c) 2005 Andrea Bittau <a.bittau@cs.ucl.ac.uk>
*
*
*
This program is free software; you can redistribute it and/or modify it
*
This program is free software; you can redistribute it and/or modify it
*
under the terms of the GNU General Public License version 2 as
*
under the terms of the GNU General Public License version 2 as
*
published by the Free Software Foundation.
*
published by the Free Software Foundation.
*/
*/
#include <linux/types.h>
#include <linux/types.h>
#include "dccp.h"
#include "dccp.h"
/*
#ifdef CONFIG_IP_DCCP_DEBUG
* Known limit values
extern
const
char
*
dccp_feat_typename
(
const
u8
type
);
*/
extern
const
char
*
dccp_feat_name
(
const
u8
feat
);
/* Ack Ratio takes 2-byte integer values (11.3) */
#define DCCPF_ACK_RATIO_MAX 0xFFFF
/* Wmin=32 and Wmax=2^46-1 from 7.5.2 */
#define DCCPF_SEQ_WMIN 32
#define DCCPF_SEQ_WMAX 0x3FFFFFFFFFFFull
/* Maximum number of SP values that fit in a single (Confirm) option */
#define DCCP_FEAT_MAX_SP_VALS (DCCP_SINGLE_OPT_MAXLEN - 2)
enum
dccp_feat_type
{
FEAT_AT_RX
=
1
,
/* located at RX side of half-connection */
FEAT_AT_TX
=
2
,
/* located at TX side of half-connection */
FEAT_SP
=
4
,
/* server-priority reconciliation (6.3.1) */
FEAT_NN
=
8
,
/* non-negotiable reconciliation (6.3.2) */
FEAT_UNKNOWN
=
0xFF
/* not understood or invalid feature */
};
enum
dccp_feat_state
{
FEAT_DEFAULT
=
0
,
/* using default values from 6.4 */
FEAT_INITIALISING
,
/* feature is being initialised */
FEAT_CHANGING
,
/* Change sent but not confirmed yet */
FEAT_UNSTABLE
,
/* local modification in state CHANGING */
FEAT_STABLE
/* both ends (think they) agree */
};
/**
static
inline
void
dccp_feat_debug
(
const
u8
type
,
const
u8
feat
,
const
u8
val
)
* dccp_feat_val - Container for SP or NN feature values
* @nn: single NN value
* @sp.vec: single SP value plus optional preference list
* @sp.len: length of @sp.vec in bytes
*/
typedef
union
{
u64
nn
;
struct
{
u8
*
vec
;
u8
len
;
}
sp
;
}
dccp_feat_val
;
/**
* struct feat_entry - Data structure to perform feature negotiation
* @feat_num: one of %dccp_feature_numbers
* @val: feature's current value (SP features may have preference list)
* @state: feature's current state
* @needs_mandatory: whether Mandatory options should be sent
* @needs_confirm: whether to send a Confirm instead of a Change
* @empty_confirm: whether to send an empty Confirm (depends on @needs_confirm)
* @is_local: feature location (1) or feature-remote (0)
* @node: list pointers, entries arranged in FIFO order
*/
struct
dccp_feat_entry
{
u8
feat_num
;
dccp_feat_val
val
;
enum
dccp_feat_state
state
:
8
;
bool
needs_mandatory
:
1
,
needs_confirm:
1
,
empty_confirm:
1
,
is_local:
1
;
struct
list_head
node
;
};
static
inline
u8
dccp_feat_genopt
(
struct
dccp_feat_entry
*
entry
)
{
{
if
(
entry
->
needs_confirm
)
dccp_pr_debug
(
"%s(%s (%d), %d)
\n
"
,
dccp_feat_typename
(
type
),
return
entry
->
is_local
?
DCCPO_CONFIRM_L
:
DCCPO_CONFIRM_R
;
dccp_feat_name
(
feat
),
feat
,
val
);
return
entry
->
is_local
?
DCCPO_CHANGE_L
:
DCCPO_CHANGE_R
;
}
}
#else
#define dccp_feat_debug(type, feat, val)
#endif
/* CONFIG_IP_DCCP_DEBUG */
extern
int
dccp_feat_change
(
struct
dccp_minisock
*
dmsk
,
u8
type
,
u8
feature
,
u8
*
val
,
u8
len
,
gfp_t
gfp
);
extern
int
dccp_feat_change_recv
(
struct
sock
*
sk
,
u8
type
,
u8
feature
,
u8
*
val
,
u8
len
);
extern
int
dccp_feat_confirm_recv
(
struct
sock
*
sk
,
u8
type
,
u8
feature
,
u8
*
val
,
u8
len
);
extern
void
dccp_feat_clean
(
struct
dccp_minisock
*
dmsk
);
extern
int
dccp_feat_clone
(
struct
sock
*
oldsk
,
struct
sock
*
newsk
);
extern
int
dccp_feat_init
(
struct
dccp_minisock
*
dmsk
);
/**
* struct ccid_dependency - Track changes resulting from choosing a CCID
* @dependent_feat: one of %dccp_feature_numbers
* @is_local: local (1) or remote (0) @dependent_feat
* @is_mandatory: whether presence of @dependent_feat is mission-critical or not
* @val: corresponding default value for @dependent_feat (u8 is sufficient here)
*/
struct
ccid_dependency
{
u8
dependent_feat
;
bool
is_local
:
1
,
is_mandatory:
1
;
u8
val
;
};
/*
* Sysctls to seed defaults for feature negotiation
*/
extern
unsigned
long
sysctl_dccp_sequence_window
;
extern
int
sysctl_dccp_rx_ccid
;
extern
int
sysctl_dccp_tx_ccid
;
extern
int
dccp_feat_init
(
struct
sock
*
sk
);
extern
void
dccp_feat_initialise_sysctls
(
void
);
extern
int
dccp_feat_register_sp
(
struct
sock
*
sk
,
u8
feat
,
u8
is_local
,
u8
const
*
list
,
u8
len
);
extern
int
dccp_feat_register_nn
(
struct
sock
*
sk
,
u8
feat
,
u64
val
);
extern
int
dccp_feat_parse_options
(
struct
sock
*
,
struct
dccp_request_sock
*
,
u8
mand
,
u8
opt
,
u8
feat
,
u8
*
val
,
u8
len
);
extern
int
dccp_feat_clone_list
(
struct
list_head
const
*
,
struct
list_head
*
);
/*
* Encoding variable-length options and their maximum length.
*
* This affects NN options (SP options are all u8) and other variable-length
* options (see table 3 in RFC 4340). The limit is currently given the Sequence
* Window NN value (sec. 7.5.2) and the NDP count (sec. 7.7) option, all other
* options consume less than 6 bytes (timestamps are 4 bytes).
* When updating this constant (e.g. due to new internet drafts / RFCs), make
* sure that you also update all code which refers to it.
*/
#define DCCP_OPTVAL_MAXLEN 6
extern
void
dccp_encode_value_var
(
const
u64
value
,
u8
*
to
,
const
u8
len
);
extern
u64
dccp_decode_value_var
(
const
u8
*
bf
,
const
u8
len
);
extern
int
dccp_insert_option_mandatory
(
struct
sk_buff
*
skb
);
extern
int
dccp_insert_fn_opt
(
struct
sk_buff
*
skb
,
u8
type
,
u8
feat
,
u8
*
val
,
u8
len
,
bool
repeat_first
);
#endif
/* _DCCP_FEAT_H */
#endif
/* _DCCP_FEAT_H */
net/dccp/input.c
View file @
ded67c0e
...
@@ -159,15 +159,13 @@ static void dccp_rcv_reset(struct sock *sk, struct sk_buff *skb)
...
@@ -159,15 +159,13 @@ static void dccp_rcv_reset(struct sock *sk, struct sk_buff *skb)
dccp_time_wait
(
sk
,
DCCP_TIME_WAIT
,
0
);
dccp_time_wait
(
sk
,
DCCP_TIME_WAIT
,
0
);
}
}
static
void
dccp_
handle_ackvec_processing
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
void
dccp_
event_ack_recv
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
{
struct
dccp_
ackvec
*
av
=
dccp_sk
(
sk
)
->
dccps_hc_rx_ackvec
;
struct
dccp_
sock
*
dp
=
dccp_sk
(
sk
)
;
if
(
av
==
NULL
)
if
(
dccp_msk
(
sk
)
->
dccpms_send_ack_vector
)
return
;
dccp_ackvec_check_rcv_ackno
(
dp
->
dccps_hc_rx_ackvec
,
sk
,
if
(
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
!=
DCCP_PKT_WITHOUT_ACK_SEQ
)
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
);
dccp_ackvec_clear_state
(
av
,
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
);
dccp_ackvec_input
(
av
,
skb
);
}
}
static
void
dccp_deliver_input_to_ccids
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
void
dccp_deliver_input_to_ccids
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
...
@@ -366,13 +364,22 @@ static int __dccp_rcv_established(struct sock *sk, struct sk_buff *skb,
...
@@ -366,13 +364,22 @@ static int __dccp_rcv_established(struct sock *sk, struct sk_buff *skb,
int
dccp_rcv_established
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
,
int
dccp_rcv_established
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
,
const
struct
dccp_hdr
*
dh
,
const
unsigned
len
)
const
struct
dccp_hdr
*
dh
,
const
unsigned
len
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
if
(
dccp_check_seqno
(
sk
,
skb
))
if
(
dccp_check_seqno
(
sk
,
skb
))
goto
discard
;
goto
discard
;
if
(
dccp_parse_options
(
sk
,
NULL
,
skb
))
if
(
dccp_parse_options
(
sk
,
NULL
,
skb
))
return
1
;
return
1
;
dccp_handle_ackvec_processing
(
sk
,
skb
);
if
(
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
!=
DCCP_PKT_WITHOUT_ACK_SEQ
)
dccp_event_ack_recv
(
sk
,
skb
);
if
(
dccp_msk
(
sk
)
->
dccpms_send_ack_vector
&&
dccp_ackvec_add
(
dp
->
dccps_hc_rx_ackvec
,
sk
,
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
,
DCCP_ACKVEC_STATE_RECEIVED
))
goto
discard
;
dccp_deliver_input_to_ccids
(
sk
,
skb
);
dccp_deliver_input_to_ccids
(
sk
,
skb
);
return
__dccp_rcv_established
(
sk
,
skb
,
dh
,
len
);
return
__dccp_rcv_established
(
sk
,
skb
,
dh
,
len
);
...
@@ -414,33 +421,40 @@ static int dccp_rcv_request_sent_state_process(struct sock *sk,
...
@@ -414,33 +421,40 @@ static int dccp_rcv_request_sent_state_process(struct sock *sk,
goto
out_invalid_packet
;
goto
out_invalid_packet
;
}
}
/*
* If option processing (Step 8) failed, return 1 here so that
* dccp_v4_do_rcv() sends a Reset. The Reset code depends on
* the option type and is set in dccp_parse_options().
*/
if
(
dccp_parse_options
(
sk
,
NULL
,
skb
))
if
(
dccp_parse_options
(
sk
,
NULL
,
skb
))
return
1
;
goto
out_invalid_packet
;
/* Obtain usec RTT sample from SYN exchange (used by CCID 3) */
/* Obtain usec RTT sample from SYN exchange (used by CCID 3) */
if
(
likely
(
dp
->
dccps_options_received
.
dccpor_timestamp_echo
))
if
(
likely
(
dp
->
dccps_options_received
.
dccpor_timestamp_echo
))
dp
->
dccps_syn_rtt
=
dccp_sample_rtt
(
sk
,
10
*
(
tstamp
-
dp
->
dccps_syn_rtt
=
dccp_sample_rtt
(
sk
,
10
*
(
tstamp
-
dp
->
dccps_options_received
.
dccpor_timestamp_echo
));
dp
->
dccps_options_received
.
dccpor_timestamp_echo
));
if
(
dccp_msk
(
sk
)
->
dccpms_send_ack_vector
&&
dccp_ackvec_add
(
dp
->
dccps_hc_rx_ackvec
,
sk
,
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
,
DCCP_ACKVEC_STATE_RECEIVED
))
goto
out_invalid_packet
;
/* FIXME: change error code */
/* Stop the REQUEST timer */
/* Stop the REQUEST timer */
inet_csk_clear_xmit_timer
(
sk
,
ICSK_TIME_RETRANS
);
inet_csk_clear_xmit_timer
(
sk
,
ICSK_TIME_RETRANS
);
WARN_ON
(
sk
->
sk_send_head
==
NULL
);
WARN_ON
(
sk
->
sk_send_head
==
NULL
);
kfree_skb
(
sk
->
sk_send_head
);
kfree_skb
(
sk
->
sk_send_head
);
sk
->
sk_send_head
=
NULL
;
sk
->
sk_send_head
=
NULL
;
dp
->
dccps_isr
=
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
;
dccp_update_gsr
(
sk
,
dp
->
dccps_isr
);
/*
/*
* Set ISR, GSR from packet. ISS was set in dccp_v{4,6}_connect
* SWL and AWL are initially adjusted so that they are not less than
* and GSS in dccp_transmit_skb(). Setting AWL/AWH and SWL/SWH
* the initial Sequence Numbers received and sent, respectively:
* is done as part of activating the feature values below, since
* SWL := max(GSR + 1 - floor(W/4), ISR),
* these settings depend on the local/remote Sequence Window
* AWL := max(GSS - W' + 1, ISS).
* features, which were undefined or not confirmed until now.
* These adjustments MUST be applied only at the beginning of the
* connection.
*
* AWL was adjusted in dccp_v4_connect -acme
*/
*/
dp
->
dccps_gsr
=
dp
->
dccps_isr
=
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
;
dccp_set_seqno
(
&
dp
->
dccps_swl
,
max48
(
dp
->
dccps_swl
,
dp
->
dccps_isr
));
dccp_sync_mss
(
sk
,
icsk
->
icsk_pmtu_cookie
);
dccp_sync_mss
(
sk
,
icsk
->
icsk_pmtu_cookie
);
...
@@ -461,15 +475,6 @@ static int dccp_rcv_request_sent_state_process(struct sock *sk,
...
@@ -461,15 +475,6 @@ static int dccp_rcv_request_sent_state_process(struct sock *sk,
*/
*/
dccp_set_state
(
sk
,
DCCP_PARTOPEN
);
dccp_set_state
(
sk
,
DCCP_PARTOPEN
);
/*
* If feature negotiation was successful, activate features now;
* an activation failure means that this host could not activate
* one ore more features (e.g. insufficient memory), which would
* leave at least one feature in an undefined state.
*/
if
(
dccp_feat_activate_values
(
sk
,
&
dp
->
dccps_featneg
))
goto
unable_to_proceed
;
/* Make sure socket is routed, for correct metrics. */
/* Make sure socket is routed, for correct metrics. */
icsk
->
icsk_af_ops
->
rebuild_header
(
sk
);
icsk
->
icsk_af_ops
->
rebuild_header
(
sk
);
...
@@ -504,16 +509,6 @@ static int dccp_rcv_request_sent_state_process(struct sock *sk,
...
@@ -504,16 +509,6 @@ static int dccp_rcv_request_sent_state_process(struct sock *sk,
/* dccp_v4_do_rcv will send a reset */
/* dccp_v4_do_rcv will send a reset */
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_code
=
DCCP_RESET_CODE_PACKET_ERROR
;
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_code
=
DCCP_RESET_CODE_PACKET_ERROR
;
return
1
;
return
1
;
unable_to_proceed:
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_code
=
DCCP_RESET_CODE_ABORTED
;
/*
* We mark this socket as no longer usable, so that the loop in
* dccp_sendmsg() terminates and the application gets notified.
*/
dccp_set_state
(
sk
,
DCCP_CLOSED
);
sk
->
sk_err
=
ECOMM
;
return
1
;
}
}
static
int
dccp_rcv_respond_partopen_state_process
(
struct
sock
*
sk
,
static
int
dccp_rcv_respond_partopen_state_process
(
struct
sock
*
sk
,
...
@@ -595,6 +590,8 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
...
@@ -595,6 +590,8 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
if
(
inet_csk
(
sk
)
->
icsk_af_ops
->
conn_request
(
sk
,
if
(
inet_csk
(
sk
)
->
icsk_af_ops
->
conn_request
(
sk
,
skb
)
<
0
)
skb
)
<
0
)
return
1
;
return
1
;
/* FIXME: do congestion control initialization */
goto
discard
;
goto
discard
;
}
}
if
(
dh
->
dccph_type
==
DCCP_PKT_RESET
)
if
(
dh
->
dccph_type
==
DCCP_PKT_RESET
)
...
@@ -603,35 +600,29 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
...
@@ -603,35 +600,29 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
/* Caller (dccp_v4_do_rcv) will send Reset */
/* Caller (dccp_v4_do_rcv) will send Reset */
dcb
->
dccpd_reset_code
=
DCCP_RESET_CODE_NO_CONNECTION
;
dcb
->
dccpd_reset_code
=
DCCP_RESET_CODE_NO_CONNECTION
;
return
1
;
return
1
;
}
else
if
(
sk
->
sk_state
==
DCCP_CLOSED
)
{
dcb
->
dccpd_reset_code
=
DCCP_RESET_CODE_NO_CONNECTION
;
return
1
;
}
}
/* Step 6: Check sequence numbers (omitted in LISTEN/REQUEST state) */
if
(
sk
->
sk_state
!=
DCCP_REQUESTING
)
{
if
(
sk
->
sk_state
!=
DCCP_REQUESTING
&&
dccp_check_seqno
(
sk
,
skb
))
if
(
dccp_check_seqno
(
sk
,
skb
))
goto
discard
;
goto
discard
;
/*
/*
* Step 7: Check for unexpected packet types
* Step 8: Process options and mark acknowledgeable
* If (S.is_server and P.type == Response)
*/
* or (S.is_client and P.type == Request)
if
(
dccp_parse_options
(
sk
,
NULL
,
skb
))
* or (S.state == RESPOND and P.type == Data),
return
1
;
* Send Sync packet acknowledging P.seqno
* Drop packet and return
*/
if
((
dp
->
dccps_role
!=
DCCP_ROLE_CLIENT
&&
dh
->
dccph_type
==
DCCP_PKT_RESPONSE
)
||
(
dp
->
dccps_role
==
DCCP_ROLE_CLIENT
&&
dh
->
dccph_type
==
DCCP_PKT_REQUEST
)
||
(
sk
->
sk_state
==
DCCP_RESPOND
&&
dh
->
dccph_type
==
DCCP_PKT_DATA
))
{
dccp_send_sync
(
sk
,
dcb
->
dccpd_seq
,
DCCP_PKT_SYNC
);
goto
discard
;
}
/* Step 8: Process options */
if
(
dcb
->
dccpd_ack_seq
!=
DCCP_PKT_WITHOUT_ACK_SEQ
)
if
(
dccp_parse_options
(
sk
,
NULL
,
skb
))
dccp_event_ack_recv
(
sk
,
skb
);
return
1
;
if
(
dccp_msk
(
sk
)
->
dccpms_send_ack_vector
&&
dccp_ackvec_add
(
dp
->
dccps_hc_rx_ackvec
,
sk
,
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
,
DCCP_ACKVEC_STATE_RECEIVED
))
goto
discard
;
dccp_deliver_input_to_ccids
(
sk
,
skb
);
}
/*
/*
* Step 9: Process Reset
* Step 9: Process Reset
...
@@ -640,22 +631,44 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
...
@@ -640,22 +631,44 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
* S.state := TIMEWAIT
* S.state := TIMEWAIT
* Set TIMEWAIT timer
* Set TIMEWAIT timer
* Drop packet and return
* Drop packet and return
*/
*/
if
(
dh
->
dccph_type
==
DCCP_PKT_RESET
)
{
if
(
dh
->
dccph_type
==
DCCP_PKT_RESET
)
{
dccp_rcv_reset
(
sk
,
skb
);
dccp_rcv_reset
(
sk
,
skb
);
return
0
;
return
0
;
}
else
if
(
dh
->
dccph_type
==
DCCP_PKT_CLOSEREQ
)
{
/* Step 13 */
/*
* Step 7: Check for unexpected packet types
* If (S.is_server and P.type == Response)
* or (S.is_client and P.type == Request)
* or (S.state == RESPOND and P.type == Data),
* Send Sync packet acknowledging P.seqno
* Drop packet and return
*/
}
else
if
((
dp
->
dccps_role
!=
DCCP_ROLE_CLIENT
&&
dh
->
dccph_type
==
DCCP_PKT_RESPONSE
)
||
(
dp
->
dccps_role
==
DCCP_ROLE_CLIENT
&&
dh
->
dccph_type
==
DCCP_PKT_REQUEST
)
||
(
sk
->
sk_state
==
DCCP_RESPOND
&&
dh
->
dccph_type
==
DCCP_PKT_DATA
))
{
dccp_send_sync
(
sk
,
dcb
->
dccpd_seq
,
DCCP_PKT_SYNC
);
goto
discard
;
}
else
if
(
dh
->
dccph_type
==
DCCP_PKT_CLOSEREQ
)
{
if
(
dccp_rcv_closereq
(
sk
,
skb
))
if
(
dccp_rcv_closereq
(
sk
,
skb
))
return
0
;
return
0
;
goto
discard
;
goto
discard
;
}
else
if
(
dh
->
dccph_type
==
DCCP_PKT_CLOSE
)
{
/* Step 14 */
}
else
if
(
dh
->
dccph_type
==
DCCP_PKT_CLOSE
)
{
if
(
dccp_rcv_close
(
sk
,
skb
))
if
(
dccp_rcv_close
(
sk
,
skb
))
return
0
;
return
0
;
goto
discard
;
goto
discard
;
}
}
switch
(
sk
->
sk_state
)
{
switch
(
sk
->
sk_state
)
{
case
DCCP_CLOSED
:
dcb
->
dccpd_reset_code
=
DCCP_RESET_CODE_NO_CONNECTION
;
return
1
;
case
DCCP_REQUESTING
:
case
DCCP_REQUESTING
:
/* FIXME: do congestion control initialization */
queued
=
dccp_rcv_request_sent_state_process
(
sk
,
skb
,
dh
,
len
);
queued
=
dccp_rcv_request_sent_state_process
(
sk
,
skb
,
dh
,
len
);
if
(
queued
>=
0
)
if
(
queued
>=
0
)
return
queued
;
return
queued
;
...
@@ -663,12 +676,8 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
...
@@ -663,12 +676,8 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
__kfree_skb
(
skb
);
__kfree_skb
(
skb
);
return
0
;
return
0
;
case
DCCP_PARTOPEN
:
/* Step 8: if using Ack Vectors, mark packet acknowledgeable */
dccp_handle_ackvec_processing
(
sk
,
skb
);
dccp_deliver_input_to_ccids
(
sk
,
skb
);
/* fall through */
case
DCCP_RESPOND
:
case
DCCP_RESPOND
:
case
DCCP_PARTOPEN
:
queued
=
dccp_rcv_respond_partopen_state_process
(
sk
,
skb
,
queued
=
dccp_rcv_respond_partopen_state_process
(
sk
,
skb
,
dh
,
len
);
dh
,
len
);
break
;
break
;
...
@@ -707,7 +716,16 @@ u32 dccp_sample_rtt(struct sock *sk, long delta)
...
@@ -707,7 +716,16 @@ u32 dccp_sample_rtt(struct sock *sk, long delta)
/* dccpor_elapsed_time is either zeroed out or set and > 0 */
/* dccpor_elapsed_time is either zeroed out or set and > 0 */
delta
-=
dccp_sk
(
sk
)
->
dccps_options_received
.
dccpor_elapsed_time
*
10
;
delta
-=
dccp_sk
(
sk
)
->
dccps_options_received
.
dccpor_elapsed_time
*
10
;
return
dccp_sane_rtt
(
delta
);
if
(
unlikely
(
delta
<=
0
))
{
DCCP_WARN
(
"unusable RTT sample %ld, using min
\n
"
,
delta
);
return
DCCP_SANE_RTT_MIN
;
}
if
(
unlikely
(
delta
>
DCCP_SANE_RTT_MAX
))
{
DCCP_WARN
(
"RTT sample %ld too large, using max
\n
"
,
delta
);
return
DCCP_SANE_RTT_MAX
;
}
return
delta
;
}
}
EXPORT_SYMBOL_GPL
(
dccp_sample_rtt
);
EXPORT_SYMBOL_GPL
(
dccp_sample_rtt
);
net/dccp/ipv4.c
View file @
ded67c0e
...
@@ -545,7 +545,6 @@ static void dccp_v4_ctl_send_reset(struct sock *sk, struct sk_buff *rxskb)
...
@@ -545,7 +545,6 @@ static void dccp_v4_ctl_send_reset(struct sock *sk, struct sk_buff *rxskb)
static
void
dccp_v4_reqsk_destructor
(
struct
request_sock
*
req
)
static
void
dccp_v4_reqsk_destructor
(
struct
request_sock
*
req
)
{
{
dccp_feat_list_purge
(
&
dccp_rsk
(
req
)
->
dreq_featneg
);
kfree
(
inet_rsk
(
req
)
->
opt
);
kfree
(
inet_rsk
(
req
)
->
opt
);
}
}
...
@@ -596,8 +595,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
...
@@ -596,8 +595,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
if
(
req
==
NULL
)
if
(
req
==
NULL
)
goto
drop
;
goto
drop
;
if
(
dccp_reqsk_init
(
req
,
dccp_sk
(
sk
),
skb
))
dccp_reqsk_init
(
req
,
skb
);
goto
drop_and_free
;
dreq
=
dccp_rsk
(
req
);
dreq
=
dccp_rsk
(
req
);
if
(
dccp_parse_options
(
sk
,
dreq
,
skb
))
if
(
dccp_parse_options
(
sk
,
dreq
,
skb
))
...
...
net/dccp/ipv6.c
View file @
ded67c0e
...
@@ -302,7 +302,6 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req)
...
@@ -302,7 +302,6 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req)
static
void
dccp_v6_reqsk_destructor
(
struct
request_sock
*
req
)
static
void
dccp_v6_reqsk_destructor
(
struct
request_sock
*
req
)
{
{
dccp_feat_list_purge
(
&
dccp_rsk
(
req
)
->
dreq_featneg
);
if
(
inet6_rsk
(
req
)
->
pktopts
!=
NULL
)
if
(
inet6_rsk
(
req
)
->
pktopts
!=
NULL
)
kfree_skb
(
inet6_rsk
(
req
)
->
pktopts
);
kfree_skb
(
inet6_rsk
(
req
)
->
pktopts
);
}
}
...
@@ -425,8 +424,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
...
@@ -425,8 +424,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
if
(
req
==
NULL
)
if
(
req
==
NULL
)
goto
drop
;
goto
drop
;
if
(
dccp_reqsk_init
(
req
,
dccp_sk
(
sk
),
skb
))
dccp_reqsk_init
(
req
,
skb
);
goto
drop_and_free
;
dreq
=
dccp_rsk
(
req
);
dreq
=
dccp_rsk
(
req
);
if
(
dccp_parse_options
(
sk
,
dreq
,
skb
))
if
(
dccp_parse_options
(
sk
,
dreq
,
skb
))
...
...
net/dccp/minisocks.c
View file @
ded67c0e
...
@@ -42,6 +42,16 @@ struct inet_timewait_death_row dccp_death_row = {
...
@@ -42,6 +42,16 @@ struct inet_timewait_death_row dccp_death_row = {
EXPORT_SYMBOL_GPL
(
dccp_death_row
);
EXPORT_SYMBOL_GPL
(
dccp_death_row
);
void
dccp_minisock_init
(
struct
dccp_minisock
*
dmsk
)
{
dmsk
->
dccpms_sequence_window
=
sysctl_dccp_feat_sequence_window
;
dmsk
->
dccpms_rx_ccid
=
sysctl_dccp_feat_rx_ccid
;
dmsk
->
dccpms_tx_ccid
=
sysctl_dccp_feat_tx_ccid
;
dmsk
->
dccpms_ack_ratio
=
sysctl_dccp_feat_ack_ratio
;
dmsk
->
dccpms_send_ack_vector
=
sysctl_dccp_feat_send_ack_vector
;
dmsk
->
dccpms_send_ndp_count
=
sysctl_dccp_feat_send_ndp_count
;
}
void
dccp_time_wait
(
struct
sock
*
sk
,
int
state
,
int
timeo
)
void
dccp_time_wait
(
struct
sock
*
sk
,
int
state
,
int
timeo
)
{
{
struct
inet_timewait_sock
*
tw
=
NULL
;
struct
inet_timewait_sock
*
tw
=
NULL
;
...
@@ -102,9 +112,10 @@ struct sock *dccp_create_openreq_child(struct sock *sk,
...
@@ -102,9 +112,10 @@ struct sock *dccp_create_openreq_child(struct sock *sk,
struct
sock
*
newsk
=
inet_csk_clone
(
sk
,
req
,
GFP_ATOMIC
);
struct
sock
*
newsk
=
inet_csk_clone
(
sk
,
req
,
GFP_ATOMIC
);
if
(
newsk
!=
NULL
)
{
if
(
newsk
!=
NULL
)
{
struct
dccp_request_sock
*
dreq
=
dccp_rsk
(
req
);
const
struct
dccp_request_sock
*
dreq
=
dccp_rsk
(
req
);
struct
inet_connection_sock
*
newicsk
=
inet_csk
(
newsk
);
struct
inet_connection_sock
*
newicsk
=
inet_csk
(
newsk
);
struct
dccp_sock
*
newdp
=
dccp_sk
(
newsk
);
struct
dccp_sock
*
newdp
=
dccp_sk
(
newsk
);
struct
dccp_minisock
*
newdmsk
=
dccp_msk
(
newsk
);
newdp
->
dccps_role
=
DCCP_ROLE_SERVER
;
newdp
->
dccps_role
=
DCCP_ROLE_SERVER
;
newdp
->
dccps_hc_rx_ackvec
=
NULL
;
newdp
->
dccps_hc_rx_ackvec
=
NULL
;
...
@@ -114,32 +125,65 @@ struct sock *dccp_create_openreq_child(struct sock *sk,
...
@@ -114,32 +125,65 @@ struct sock *dccp_create_openreq_child(struct sock *sk,
newdp
->
dccps_timestamp_time
=
dreq
->
dreq_timestamp_time
;
newdp
->
dccps_timestamp_time
=
dreq
->
dreq_timestamp_time
;
newicsk
->
icsk_rto
=
DCCP_TIMEOUT_INIT
;
newicsk
->
icsk_rto
=
DCCP_TIMEOUT_INIT
;
INIT_LIST_HEAD
(
&
newdp
->
dccps_featneg
);
if
(
dccp_feat_clone
(
sk
,
newsk
))
goto
out_free
;
if
(
newdmsk
->
dccpms_send_ack_vector
)
{
newdp
->
dccps_hc_rx_ackvec
=
dccp_ackvec_alloc
(
GFP_ATOMIC
);
if
(
unlikely
(
newdp
->
dccps_hc_rx_ackvec
==
NULL
))
goto
out_free
;
}
newdp
->
dccps_hc_rx_ccid
=
ccid_hc_rx_new
(
newdmsk
->
dccpms_rx_ccid
,
newsk
,
GFP_ATOMIC
);
newdp
->
dccps_hc_tx_ccid
=
ccid_hc_tx_new
(
newdmsk
->
dccpms_tx_ccid
,
newsk
,
GFP_ATOMIC
);
if
(
unlikely
(
newdp
->
dccps_hc_rx_ccid
==
NULL
||
newdp
->
dccps_hc_tx_ccid
==
NULL
))
{
dccp_ackvec_free
(
newdp
->
dccps_hc_rx_ackvec
);
ccid_hc_rx_delete
(
newdp
->
dccps_hc_rx_ccid
,
newsk
);
ccid_hc_tx_delete
(
newdp
->
dccps_hc_tx_ccid
,
newsk
);
out_free:
/* It is still raw copy of parent, so invalidate
* destructor and make plain sk_free() */
newsk
->
sk_destruct
=
NULL
;
sk_free
(
newsk
);
return
NULL
;
}
/*
/*
* Step 3: Process LISTEN state
* Step 3: Process LISTEN state
*
*
* Choose S.ISS (initial seqno) or set from Init Cookies
* Choose S.ISS (initial seqno) or set from Init Cookies
* Initialize S.GAR := S.ISS
* Initialize S.GAR := S.ISS
* Set S.ISR, S.GSR from packet (or Init Cookies)
* Set S.ISR, S.GSR, S.SWL, S.SWH from packet or Init Cookies
*
* Setting AWL/AWH and SWL/SWH happens as part of the feature
* activation below, as these windows all depend on the local
* and remote Sequence Window feature values (7.5.2).
*/
*/
newdp
->
dccps_gss
=
newdp
->
dccps_iss
=
dreq
->
dreq_iss
;
newdp
->
dccps_gar
=
newdp
->
dccps_iss
;
/* See dccp_v4_conn_request */
newdp
->
dccps_gsr
=
newdp
->
dccps_isr
=
dreq
->
dreq_isr
;
newdmsk
->
dccpms_sequence_window
=
req
->
rcv_wnd
;
newdp
->
dccps_gar
=
newdp
->
dccps_iss
=
dreq
->
dreq_iss
;
dccp_update_gss
(
newsk
,
dreq
->
dreq_iss
);
newdp
->
dccps_isr
=
dreq
->
dreq_isr
;
dccp_update_gsr
(
newsk
,
dreq
->
dreq_isr
);
/*
/*
* Activate features: initialise CCIDs, sequence windows etc.
* SWL and AWL are initially adjusted so that they are not less than
* the initial Sequence Numbers received and sent, respectively:
* SWL := max(GSR + 1 - floor(W/4), ISR),
* AWL := max(GSS - W' + 1, ISS).
* These adjustments MUST be applied only at the beginning of the
* connection.
*/
*/
if
(
dccp_feat_activate_values
(
newsk
,
&
dreq
->
dreq_featneg
))
{
dccp_set_seqno
(
&
newdp
->
dccps_swl
,
/* It is still raw copy of parent, so invalidate
max48
(
newdp
->
dccps_swl
,
newdp
->
dccps_isr
));
* destructor and make plain sk_free() */
dccp_set_seqno
(
&
newdp
->
dccps_awl
,
newsk
->
sk_destruct
=
NULL
;
max48
(
newdp
->
dccps_awl
,
newdp
->
dccps_iss
));
sk_free
(
newsk
);
return
NULL
;
}
dccp_init_xmit_timers
(
newsk
);
dccp_init_xmit_timers
(
newsk
);
DCCP_INC_STATS_BH
(
DCCP_MIB_PASSIVEOPENS
);
DCCP_INC_STATS_BH
(
DCCP_MIB_PASSIVEOPENS
);
...
@@ -260,17 +304,14 @@ void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
...
@@ -260,17 +304,14 @@ void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
EXPORT_SYMBOL_GPL
(
dccp_reqsk_send_ack
);
EXPORT_SYMBOL_GPL
(
dccp_reqsk_send_ack
);
int
dccp_reqsk_init
(
struct
request_sock
*
req
,
void
dccp_reqsk_init
(
struct
request_sock
*
req
,
struct
sk_buff
*
skb
)
struct
dccp_sock
const
*
dp
,
struct
sk_buff
const
*
skb
)
{
{
struct
dccp_request_sock
*
dreq
=
dccp_rsk
(
req
);
struct
dccp_request_sock
*
dreq
=
dccp_rsk
(
req
);
inet_rsk
(
req
)
->
rmt_port
=
dccp_hdr
(
skb
)
->
dccph_sport
;
inet_rsk
(
req
)
->
rmt_port
=
dccp_hdr
(
skb
)
->
dccph_sport
;
inet_rsk
(
req
)
->
acked
=
0
;
inet_rsk
(
req
)
->
acked
=
0
;
req
->
rcv_wnd
=
sysctl_dccp_feat_sequence_window
;
dreq
->
dreq_timestamp_echo
=
0
;
dreq
->
dreq_timestamp_echo
=
0
;
/* inherit feature negotiation options from listening socket */
return
dccp_feat_clone_list
(
&
dp
->
dccps_featneg
,
&
dreq
->
dreq_featneg
);
}
}
EXPORT_SYMBOL_GPL
(
dccp_reqsk_init
);
EXPORT_SYMBOL_GPL
(
dccp_reqsk_init
);
net/dccp/options.c
View file @
ded67c0e
...
@@ -23,20 +23,23 @@
...
@@ -23,20 +23,23 @@
#include "dccp.h"
#include "dccp.h"
#include "feat.h"
#include "feat.h"
u64
dccp_decode_value_var
(
const
u8
*
bf
,
const
u8
len
)
int
sysctl_dccp_feat_sequence_window
=
DCCPF_INITIAL_SEQUENCE_WINDOW
;
int
sysctl_dccp_feat_rx_ccid
=
DCCPF_INITIAL_CCID
;
int
sysctl_dccp_feat_tx_ccid
=
DCCPF_INITIAL_CCID
;
int
sysctl_dccp_feat_ack_ratio
=
DCCPF_INITIAL_ACK_RATIO
;
int
sysctl_dccp_feat_send_ack_vector
=
DCCPF_INITIAL_SEND_ACK_VECTOR
;
int
sysctl_dccp_feat_send_ndp_count
=
DCCPF_INITIAL_SEND_NDP_COUNT
;
static
u32
dccp_decode_value_var
(
const
unsigned
char
*
bf
,
const
u8
len
)
{
{
u
64
value
=
0
;
u
32
value
=
0
;
if
(
len
>=
DCCP_OPTVAL_MAXLEN
)
value
+=
((
u64
)
*
bf
++
)
<<
40
;
if
(
len
>
4
)
value
+=
((
u64
)
*
bf
++
)
<<
32
;
if
(
len
>
3
)
if
(
len
>
3
)
value
+=
((
u64
)
*
bf
++
)
<<
24
;
value
+=
*
bf
++
<<
24
;
if
(
len
>
2
)
if
(
len
>
2
)
value
+=
((
u64
)
*
bf
++
)
<<
16
;
value
+=
*
bf
++
<<
16
;
if
(
len
>
1
)
if
(
len
>
1
)
value
+=
((
u64
)
*
bf
++
)
<<
8
;
value
+=
*
bf
++
<<
8
;
if
(
len
>
0
)
if
(
len
>
0
)
value
+=
*
bf
;
value
+=
*
bf
;
...
@@ -54,6 +57,7 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
...
@@ -54,6 +57,7 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
const
struct
dccp_hdr
*
dh
=
dccp_hdr
(
skb
);
const
struct
dccp_hdr
*
dh
=
dccp_hdr
(
skb
);
const
u8
pkt_type
=
DCCP_SKB_CB
(
skb
)
->
dccpd_type
;
const
u8
pkt_type
=
DCCP_SKB_CB
(
skb
)
->
dccpd_type
;
u64
ackno
=
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
;
unsigned
char
*
options
=
(
unsigned
char
*
)
dh
+
dccp_hdr_len
(
skb
);
unsigned
char
*
options
=
(
unsigned
char
*
)
dh
+
dccp_hdr_len
(
skb
);
unsigned
char
*
opt_ptr
=
options
;
unsigned
char
*
opt_ptr
=
options
;
const
unsigned
char
*
opt_end
=
(
unsigned
char
*
)
dh
+
const
unsigned
char
*
opt_end
=
(
unsigned
char
*
)
dh
+
...
@@ -95,11 +99,18 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
...
@@ -95,11 +99,18 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
}
}
/*
/*
* CCID-Specific Options (from RFC 4340, sec. 10.3):
*
* Option numbers 128 through 191 are for options sent from the
* HC-Sender to the HC-Receiver; option numbers 192 through 255
* are for options sent from the HC-Receiver to the HC-Sender.
*
* CCID-specific options are ignored during connection setup, as
* CCID-specific options are ignored during connection setup, as
* negotiation may still be in progress (see RFC 4340, 10.3).
* negotiation may still be in progress (see RFC 4340, 10.3).
* The same applies to Ack Vectors, as these depend on the CCID.
* The same applies to Ack Vectors, as these depend on the CCID.
*
*/
*/
if
(
dreq
!=
NULL
&&
(
opt
>=
DCCPO_MIN_RX_CCID_SPECIFIC
||
if
(
dreq
!=
NULL
&&
(
opt
>=
128
||
opt
==
DCCPO_ACK_VECTOR_0
||
opt
==
DCCPO_ACK_VECTOR_1
))
opt
==
DCCPO_ACK_VECTOR_0
||
opt
==
DCCPO_ACK_VECTOR_1
))
goto
ignore_option
;
goto
ignore_option
;
...
@@ -120,13 +131,43 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
...
@@ -120,13 +131,43 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
dccp_pr_debug
(
"%s opt: NDP count=%llu
\n
"
,
dccp_role
(
sk
),
dccp_pr_debug
(
"%s opt: NDP count=%llu
\n
"
,
dccp_role
(
sk
),
(
unsigned
long
long
)
opt_recv
->
dccpor_ndp
);
(
unsigned
long
long
)
opt_recv
->
dccpor_ndp
);
break
;
break
;
case
DCCPO_CHANGE_L
...
DCCPO_CONFIRM_R
:
case
DCCPO_CHANGE_L
:
if
(
pkt_type
==
DCCP_PKT_DATA
)
/* RFC 4340, 6 */
/* fall through */
case
DCCPO_CHANGE_R
:
if
(
pkt_type
==
DCCP_PKT_DATA
)
break
;
break
;
rc
=
dccp_feat_parse_options
(
sk
,
dreq
,
mandatory
,
opt
,
if
(
len
<
2
)
*
value
,
value
+
1
,
len
-
1
);
goto
out_invalid_option
;
if
(
rc
)
rc
=
dccp_feat_change_recv
(
sk
,
opt
,
*
value
,
value
+
1
,
goto
out_featneg_failed
;
len
-
1
);
/*
* When there is a change error, change_recv is
* responsible for dealing with it. i.e. reply with an
* empty confirm.
* If the change was mandatory, then we need to die.
*/
if
(
rc
&&
mandatory
)
goto
out_invalid_option
;
break
;
case
DCCPO_CONFIRM_L
:
/* fall through */
case
DCCPO_CONFIRM_R
:
if
(
pkt_type
==
DCCP_PKT_DATA
)
break
;
if
(
len
<
2
)
/* FIXME this disallows empty confirm */
goto
out_invalid_option
;
if
(
dccp_feat_confirm_recv
(
sk
,
opt
,
*
value
,
value
+
1
,
len
-
1
))
goto
out_invalid_option
;
break
;
case
DCCPO_ACK_VECTOR_0
:
case
DCCPO_ACK_VECTOR_1
:
if
(
dccp_packet_without_ack
(
skb
))
/* RFC 4340, 11.4 */
break
;
if
(
dccp_msk
(
sk
)
->
dccpms_send_ack_vector
&&
dccp_ackvec_parse
(
sk
,
skb
,
&
ackno
,
opt
,
value
,
len
))
goto
out_invalid_option
;
break
;
break
;
case
DCCPO_TIMESTAMP
:
case
DCCPO_TIMESTAMP
:
if
(
len
!=
4
)
if
(
len
!=
4
)
...
@@ -154,8 +195,6 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
...
@@ -154,8 +195,6 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
dccp_role
(
sk
),
ntohl
(
opt_val
),
dccp_role
(
sk
),
ntohl
(
opt_val
),
(
unsigned
long
long
)
(
unsigned
long
long
)
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
);
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
);
/* schedule an Ack in case this sender is quiescent */
inet_csk_schedule_ack
(
sk
);
break
;
break
;
case
DCCPO_TIMESTAMP_ECHO
:
case
DCCPO_TIMESTAMP_ECHO
:
if
(
len
!=
4
&&
len
!=
6
&&
len
!=
8
)
if
(
len
!=
4
&&
len
!=
6
&&
len
!=
8
)
...
@@ -212,25 +251,23 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
...
@@ -212,25 +251,23 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
dccp_pr_debug
(
"%s rx opt: ELAPSED_TIME=%d
\n
"
,
dccp_pr_debug
(
"%s rx opt: ELAPSED_TIME=%d
\n
"
,
dccp_role
(
sk
),
elapsed_time
);
dccp_role
(
sk
),
elapsed_time
);
break
;
break
;
case
DCCPO_MIN_RX_CCID_SPECIFIC
...
DCCPO_MAX_RX_CCID_SPECIFIC
:
case
128
...
191
:
{
const
u16
idx
=
value
-
options
;
if
(
ccid_hc_rx_parse_options
(
dp
->
dccps_hc_rx_ccid
,
sk
,
if
(
ccid_hc_rx_parse_options
(
dp
->
dccps_hc_rx_ccid
,
sk
,
pkt_type
,
opt
,
value
,
len
))
opt
,
len
,
idx
,
value
)
!=
0
)
goto
out_invalid_option
;
goto
out_invalid_option
;
}
break
;
break
;
case
DCCPO_ACK_VECTOR_0
:
case
192
...
255
:
{
case
DCCPO_ACK_VECTOR_1
:
const
u16
idx
=
value
-
options
;
if
(
dccp_packet_without_ack
(
skb
))
/* RFC 4340, 11.4 */
break
;
/*
* Ack vectors are processed by the TX CCID if it is
* interested. The RX CCID need not parse Ack Vectors,
* since it is only interested in clearing old state.
* Fall through.
*/
case
DCCPO_MIN_TX_CCID_SPECIFIC
...
DCCPO_MAX_TX_CCID_SPECIFIC
:
if
(
ccid_hc_tx_parse_options
(
dp
->
dccps_hc_tx_ccid
,
sk
,
if
(
ccid_hc_tx_parse_options
(
dp
->
dccps_hc_tx_ccid
,
sk
,
pkt_type
,
opt
,
value
,
len
))
opt
,
len
,
idx
,
value
)
!=
0
)
goto
out_invalid_option
;
goto
out_invalid_option
;
}
break
;
break
;
default:
default:
DCCP_CRIT
(
"DCCP(%p): option %d(len=%d) not "
DCCP_CRIT
(
"DCCP(%p): option %d(len=%d) not "
...
@@ -252,10 +289,8 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
...
@@ -252,10 +289,8 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
out_invalid_option:
out_invalid_option:
DCCP_INC_STATS_BH
(
DCCP_MIB_INVALIDOPT
);
DCCP_INC_STATS_BH
(
DCCP_MIB_INVALIDOPT
);
rc
=
DCCP_RESET_CODE_OPTION_ERROR
;
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_code
=
DCCP_RESET_CODE_OPTION_ERROR
;
out_featneg_failed:
DCCP_WARN
(
"DCCP(%p): invalid option %d, len=%d"
,
sk
,
opt
,
len
);
DCCP_WARN
(
"DCCP(%p): Option %d (len=%d) error=%u
\n
"
,
sk
,
opt
,
len
,
rc
);
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_code
=
rc
;
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_data
[
0
]
=
opt
;
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_data
[
0
]
=
opt
;
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_data
[
1
]
=
len
>
0
?
value
[
0
]
:
0
;
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_data
[
1
]
=
len
>
0
?
value
[
0
]
:
0
;
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_data
[
2
]
=
len
>
1
?
value
[
1
]
:
0
;
DCCP_SKB_CB
(
skb
)
->
dccpd_reset_data
[
2
]
=
len
>
1
?
value
[
1
]
:
0
;
...
@@ -264,12 +299,9 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
...
@@ -264,12 +299,9 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
EXPORT_SYMBOL_GPL
(
dccp_parse_options
);
EXPORT_SYMBOL_GPL
(
dccp_parse_options
);
void
dccp_encode_value_var
(
const
u64
value
,
u8
*
to
,
const
u8
len
)
static
void
dccp_encode_value_var
(
const
u32
value
,
unsigned
char
*
to
,
const
unsigned
int
len
)
{
{
if
(
len
>=
DCCP_OPTVAL_MAXLEN
)
*
to
++
=
(
value
&
0xFF0000000000ull
)
>>
40
;
if
(
len
>
4
)
*
to
++
=
(
value
&
0xFF00000000ull
)
>>
32
;
if
(
len
>
3
)
if
(
len
>
3
)
*
to
++
=
(
value
&
0xFF000000
)
>>
24
;
*
to
++
=
(
value
&
0xFF000000
)
>>
24
;
if
(
len
>
2
)
if
(
len
>
2
)
...
@@ -429,140 +461,92 @@ static int dccp_insert_option_timestamp_echo(struct dccp_sock *dp,
...
@@ -429,140 +461,92 @@ static int dccp_insert_option_timestamp_echo(struct dccp_sock *dp,
return
0
;
return
0
;
}
}
static
int
dccp_insert_option_ackvec
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
int
dccp_insert_feat_opt
(
struct
sk_buff
*
skb
,
u8
type
,
u8
feat
,
u8
*
val
,
u8
len
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
u8
*
to
;
struct
dccp_ackvec
*
av
=
dp
->
dccps_hc_rx_ackvec
;
struct
dccp_skb_cb
*
dcb
=
DCCP_SKB_CB
(
skb
);
const
u16
buflen
=
dccp_ackvec_buflen
(
av
);
/* Figure out how many options do we need to represent the ackvec */
const
u8
nr_opts
=
DIV_ROUND_UP
(
buflen
,
DCCP_SINGLE_OPT_MAXLEN
);
u16
len
=
buflen
+
2
*
nr_opts
;
u8
i
,
nonce
=
0
;
const
unsigned
char
*
tail
,
*
from
;
unsigned
char
*
to
;
if
(
dcb
->
dccpd_opt_len
+
len
>
DCCP_MAX_OPT_LEN
)
{
if
(
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
+
len
+
3
>
DCCP_MAX_OPT_LEN
)
{
DCCP_WARN
(
"Lacking space for %u bytes on %s packet
\n
"
,
len
,
DCCP_WARN
(
"packet too small for feature %d option!
\n
"
,
feat
);
dccp_packet_name
(
dcb
->
dccpd_type
));
return
-
1
;
return
-
1
;
}
}
/*
* Since Ack Vectors are variable-length, we can not always predict
* their size. To catch exception cases where the space is running out
* on the skb, a separate Sync is scheduled to carry the Ack Vector.
*/
if
(
len
>
DCCPAV_MIN_OPTLEN
&&
len
+
dcb
->
dccpd_opt_len
+
skb
->
len
>
dp
->
dccps_mss_cache
)
{
DCCP_WARN
(
"No space left for Ack Vector (%u) on skb (%u+%u), "
"MPS=%u ==> reduce payload size?
\n
"
,
len
,
skb
->
len
,
dcb
->
dccpd_opt_len
,
dp
->
dccps_mss_cache
);
dp
->
dccps_sync_scheduled
=
1
;
return
0
;
}
dcb
->
dccpd_opt_len
+=
len
;
to
=
skb_push
(
skb
,
len
);
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
+=
len
+
3
;
len
=
buflen
;
from
=
av
->
av_buf
+
av
->
av_buf_head
;
tail
=
av
->
av_buf
+
DCCPAV_MAX_ACKVEC_LEN
;
for
(
i
=
0
;
i
<
nr_opts
;
++
i
)
{
to
=
skb_push
(
skb
,
len
+
3
);
int
copylen
=
len
;
*
to
++
=
type
;
*
to
++
=
len
+
3
;
if
(
len
>
DCCP_SINGLE_OPT_MAXLEN
)
*
to
++
=
feat
;
copylen
=
DCCP_SINGLE_OPT_MAXLEN
;
/*
* RFC 4340, 12.2: Encode the Nonce Echo for this Ack Vector via
* its type; ack_nonce is the sum of all individual buf_nonce's.
*/
nonce
^=
av
->
av_buf_nonce
[
i
];
*
to
++
=
DCCPO_ACK_VECTOR_0
+
av
->
av_buf_nonce
[
i
];
*
to
++
=
copylen
+
2
;
/* Check if buf_head wraps */
if
(
from
+
copylen
>
tail
)
{
const
u16
tailsize
=
tail
-
from
;
memcpy
(
to
,
from
,
tailsize
);
to
+=
tailsize
;
len
-=
tailsize
;
copylen
-=
tailsize
;
from
=
av
->
av_buf
;
}
memcpy
(
to
,
from
,
copylen
);
from
+=
copylen
;
to
+=
copylen
;
len
-=
copylen
;
}
/*
* Each sent Ack Vector is recorded in the list, as per A.2 of RFC 4340.
*/
if
(
dccp_ackvec_update_records
(
av
,
dcb
->
dccpd_seq
,
nonce
))
return
-
ENOBUFS
;
return
0
;
}
/**
if
(
len
)
* dccp_insert_option_mandatory - Mandatory option (5.8.2)
memcpy
(
to
,
val
,
len
);
* Note that since we are using skb_push, this function needs to be called
* _after_ inserting the option it is supposed to influence (stack order).
*/
int
dccp_insert_option_mandatory
(
struct
sk_buff
*
skb
)
{
if
(
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
>=
DCCP_MAX_OPT_LEN
)
return
-
1
;
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
++
;
dccp_pr_debug
(
"%s(%s (%d), ...), length %d
\n
"
,
*
skb_push
(
skb
,
1
)
=
DCCPO_MANDATORY
;
dccp_feat_typename
(
type
),
dccp_feat_name
(
feat
),
feat
,
len
);
return
0
;
return
0
;
}
}
/**
static
int
dccp_insert_options_feat
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
* dccp_insert_fn_opt - Insert single Feature-Negotiation option into @skb
* @type: %DCCPO_CHANGE_L, %DCCPO_CHANGE_R, %DCCPO_CONFIRM_L, %DCCPO_CONFIRM_R
* @feat: one out of %dccp_feature_numbers
* @val: NN value or SP array (preferred element first) to copy
* @len: true length of @val in bytes (excluding first element repetition)
* @repeat_first: whether to copy the first element of @val twice
* The last argument is used to construct Confirm options, where the preferred
* value and the preference list appear separately (RFC 4340, 6.3.1). Preference
* lists are kept such that the preferred entry is always first, so we only need
* to copy twice, and avoid the overhead of cloning into a bigger array.
*/
int
dccp_insert_fn_opt
(
struct
sk_buff
*
skb
,
u8
type
,
u8
feat
,
u8
*
val
,
u8
len
,
bool
repeat_first
)
{
{
u8
tot_len
,
*
to
;
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
struct
dccp_opt_pend
*
opt
,
*
next
;
int
change
=
0
;
/* confirm any options [NN opts] */
list_for_each_entry_safe
(
opt
,
next
,
&
dmsk
->
dccpms_conf
,
dccpop_node
)
{
dccp_insert_feat_opt
(
skb
,
opt
->
dccpop_type
,
opt
->
dccpop_feat
,
opt
->
dccpop_val
,
opt
->
dccpop_len
);
/* fear empty confirms */
if
(
opt
->
dccpop_val
)
kfree
(
opt
->
dccpop_val
);
kfree
(
opt
);
}
INIT_LIST_HEAD
(
&
dmsk
->
dccpms_conf
);
/* see which features we need to send */
list_for_each_entry
(
opt
,
&
dmsk
->
dccpms_pending
,
dccpop_node
)
{
/* see if we need to send any confirm */
if
(
opt
->
dccpop_sc
)
{
dccp_insert_feat_opt
(
skb
,
opt
->
dccpop_type
+
1
,
opt
->
dccpop_feat
,
opt
->
dccpop_sc
->
dccpoc_val
,
opt
->
dccpop_sc
->
dccpoc_len
);
BUG_ON
(
!
opt
->
dccpop_sc
->
dccpoc_val
);
kfree
(
opt
->
dccpop_sc
->
dccpoc_val
);
kfree
(
opt
->
dccpop_sc
);
opt
->
dccpop_sc
=
NULL
;
}
/* take the `Feature' field and possible repetition into account */
/* any option not confirmed, re-send it */
if
(
len
>
(
DCCP_SINGLE_OPT_MAXLEN
-
2
))
{
if
(
!
opt
->
dccpop_conf
)
{
DCCP_WARN
(
"length %u for feature %u too large
\n
"
,
len
,
feat
);
dccp_insert_feat_opt
(
skb
,
opt
->
dccpop_type
,
return
-
1
;
opt
->
dccpop_feat
,
opt
->
dccpop_val
,
opt
->
dccpop_len
);
change
++
;
}
}
}
if
(
unlikely
(
val
==
NULL
||
len
==
0
))
/* Retransmit timer.
len
=
repeat_first
=
0
;
* If this is the master listening sock, we don't set a timer on it. It
tot_len
=
3
+
repeat_first
+
len
;
* should be fine because if the dude doesn't receive our RESPONSE
* [which will contain the CHANGE] he will send another REQUEST which
* will "retrnasmit" the change.
*/
if
(
change
&&
dp
->
dccps_role
!=
DCCP_ROLE_LISTEN
)
{
dccp_pr_debug
(
"reset feat negotiation timer %p
\n
"
,
sk
);
if
(
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
+
tot_len
>
DCCP_MAX_OPT_LEN
)
{
/* XXX don't reset the timer on re-transmissions. I.e. reset it
DCCP_WARN
(
"packet too small for feature %d option!
\n
"
,
feat
);
* only when sending new stuff i guess. Currently the timer
return
-
1
;
* never backs off because on re-transmission it just resets it!
*/
inet_csk_reset_xmit_timer
(
sk
,
ICSK_TIME_RETRANS
,
inet_csk
(
sk
)
->
icsk_rto
,
DCCP_RTO_MAX
);
}
}
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
+=
tot_len
;
to
=
skb_push
(
skb
,
tot_len
);
*
to
++
=
type
;
*
to
++
=
tot_len
;
*
to
++
=
feat
;
if
(
repeat_first
)
*
to
++
=
*
val
;
if
(
len
)
memcpy
(
to
,
val
,
len
);
return
0
;
return
0
;
}
}
...
@@ -581,30 +565,19 @@ static void dccp_insert_option_padding(struct sk_buff *skb)
...
@@ -581,30 +565,19 @@ static void dccp_insert_option_padding(struct sk_buff *skb)
int
dccp_insert_options
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
int
dccp_insert_options
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
=
0
;
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
=
0
;
if
(
dp
->
dccps_send_ndp_count
&&
dccp_insert_option_ndp
(
sk
,
skb
))
if
(
dmsk
->
dccpms_send_ndp_count
&&
dccp_insert_option_ndp
(
sk
,
skb
))
return
-
1
;
return
-
1
;
if
(
DCCP_SKB_CB
(
skb
)
->
dccpd_type
!=
DCCP_PKT_DATA
)
{
if
(
!
dccp_packet_without_ack
(
skb
)
)
{
if
(
dmsk
->
dccpms_send_ack_vector
&&
/* Feature Negotiation */
dccp_ackvec_pending
(
dp
->
dccps_hc_rx_ackvec
)
&&
if
(
dccp_feat_insert_opts
(
dp
,
NULL
,
skb
))
dccp_insert_option_ackvec
(
sk
,
skb
))
return
-
1
;
return
-
1
;
if
(
DCCP_SKB_CB
(
skb
)
->
dccpd_type
==
DCCP_PKT_REQUEST
)
{
/*
* Obtain RTT sample from Request/Response exchange.
* This is currently used in CCID 3 initialisation.
*/
if
(
dccp_insert_option_timestamp
(
sk
,
skb
))
return
-
1
;
}
else
if
(
dccp_ackvec_pending
(
sk
)
&&
dccp_insert_option_ackvec
(
sk
,
skb
))
{
return
-
1
;
}
}
}
if
(
dp
->
dccps_hc_rx_insert_options
)
{
if
(
dp
->
dccps_hc_rx_insert_options
)
{
...
@@ -613,6 +586,21 @@ int dccp_insert_options(struct sock *sk, struct sk_buff *skb)
...
@@ -613,6 +586,21 @@ int dccp_insert_options(struct sock *sk, struct sk_buff *skb)
dp
->
dccps_hc_rx_insert_options
=
0
;
dp
->
dccps_hc_rx_insert_options
=
0
;
}
}
/* Feature negotiation */
/* Data packets can't do feat negotiation */
if
(
DCCP_SKB_CB
(
skb
)
->
dccpd_type
!=
DCCP_PKT_DATA
&&
DCCP_SKB_CB
(
skb
)
->
dccpd_type
!=
DCCP_PKT_DATAACK
&&
dccp_insert_options_feat
(
sk
,
skb
))
return
-
1
;
/*
* Obtain RTT sample from Request/Response exchange.
* This is currently used in CCID 3 initialisation.
*/
if
(
DCCP_SKB_CB
(
skb
)
->
dccpd_type
==
DCCP_PKT_REQUEST
&&
dccp_insert_option_timestamp
(
sk
,
skb
))
return
-
1
;
if
(
dp
->
dccps_timestamp_echo
!=
0
&&
if
(
dp
->
dccps_timestamp_echo
!=
0
&&
dccp_insert_option_timestamp_echo
(
dp
,
NULL
,
skb
))
dccp_insert_option_timestamp_echo
(
dp
,
NULL
,
skb
))
return
-
1
;
return
-
1
;
...
@@ -625,9 +613,6 @@ int dccp_insert_options_rsk(struct dccp_request_sock *dreq, struct sk_buff *skb)
...
@@ -625,9 +613,6 @@ int dccp_insert_options_rsk(struct dccp_request_sock *dreq, struct sk_buff *skb)
{
{
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
=
0
;
DCCP_SKB_CB
(
skb
)
->
dccpd_opt_len
=
0
;
if
(
dccp_feat_insert_opts
(
NULL
,
dreq
,
skb
))
return
-
1
;
if
(
dreq
->
dreq_timestamp_echo
!=
0
&&
if
(
dreq
->
dreq_timestamp_echo
!=
0
&&
dccp_insert_option_timestamp_echo
(
NULL
,
dreq
,
skb
))
dccp_insert_option_timestamp_echo
(
NULL
,
dreq
,
skb
))
return
-
1
;
return
-
1
;
...
...
net/dccp/output.c
View file @
ded67c0e
...
@@ -26,13 +26,11 @@ static inline void dccp_event_ack_sent(struct sock *sk)
...
@@ -26,13 +26,11 @@ static inline void dccp_event_ack_sent(struct sock *sk)
inet_csk_clear_xmit_timer
(
sk
,
ICSK_TIME_DACK
);
inet_csk_clear_xmit_timer
(
sk
,
ICSK_TIME_DACK
);
}
}
/* enqueue @skb on sk_send_head for retransmission, return clone to send now */
static
void
dccp_skb_entail
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
static
struct
sk_buff
*
dccp_skb_entail
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
{
skb_set_owner_w
(
skb
,
sk
);
skb_set_owner_w
(
skb
,
sk
);
WARN_ON
(
sk
->
sk_send_head
);
WARN_ON
(
sk
->
sk_send_head
);
sk
->
sk_send_head
=
skb
;
sk
->
sk_send_head
=
skb
;
return
skb_clone
(
sk
->
sk_send_head
,
gfp_any
());
}
}
/*
/*
...
@@ -163,27 +161,21 @@ unsigned int dccp_sync_mss(struct sock *sk, u32 pmtu)
...
@@ -163,27 +161,21 @@ unsigned int dccp_sync_mss(struct sock *sk, u32 pmtu)
struct
inet_connection_sock
*
icsk
=
inet_csk
(
sk
);
struct
inet_connection_sock
*
icsk
=
inet_csk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
u32
ccmps
=
dccp_determine_ccmps
(
dp
);
u32
ccmps
=
dccp_determine_ccmps
(
dp
);
u32
cur_mps
=
ccmps
?
min
(
pmtu
,
ccmps
)
:
pmtu
;
int
cur_mps
=
ccmps
?
min
(
pmtu
,
ccmps
)
:
pmtu
;
/* Account for header lengths and IPv4/v6 option overhead */
/* Account for header lengths and IPv4/v6 option overhead */
cur_mps
-=
(
icsk
->
icsk_af_ops
->
net_header_len
+
icsk
->
icsk_ext_hdr_len
+
cur_mps
-=
(
icsk
->
icsk_af_ops
->
net_header_len
+
icsk
->
icsk_ext_hdr_len
+
sizeof
(
struct
dccp_hdr
)
+
sizeof
(
struct
dccp_hdr_ext
));
sizeof
(
struct
dccp_hdr
)
+
sizeof
(
struct
dccp_hdr_ext
));
/*
/*
* Leave enough headroom for common DCCP header options.
* FIXME: this should come from the CCID infrastructure, where, say,
* This only considers options which may appear on DCCP-Data packets, as
* TFRC will say it wants TIMESTAMPS, ELAPSED time, etc, for now lets
* per table 3 in RFC 4340, 5.8. When running out of space for other
* put a rough estimate for NDP + TIMESTAMP + TIMESTAMP_ECHO + ELAPSED
* options (eg. Ack Vector which can take up to 255 bytes), it is better
* TIME + TFRC_OPT_LOSS_EVENT_RATE + TFRC_OPT_RECEIVE_RATE + padding to
* to schedule a separate Ack. Thus we leave headroom for the following:
* make it a multiple of 4
* - 1 byte for Slow Receiver (11.6)
* - 6 bytes for Timestamp (13.1)
* - 10 bytes for Timestamp Echo (13.3)
* - 8 bytes for NDP count (7.7, when activated)
* - 6 bytes for Data Checksum (9.3)
* - %DCCPAV_MIN_OPTLEN bytes for Ack Vector size (11.4, when enabled)
*/
*/
cur_mps
-=
roundup
(
1
+
6
+
10
+
dp
->
dccps_send_ndp_count
*
8
+
6
+
(
dp
->
dccps_hc_rx_ackvec
?
DCCPAV_MIN_OPTLEN
:
0
),
4
)
;
cur_mps
-=
((
5
+
6
+
10
+
6
+
6
+
6
+
3
)
/
4
)
*
4
;
/* And store cached results */
/* And store cached results */
icsk
->
icsk_pmtu_cookie
=
pmtu
;
icsk
->
icsk_pmtu_cookie
=
pmtu
;
...
@@ -208,158 +200,95 @@ void dccp_write_space(struct sock *sk)
...
@@ -208,158 +200,95 @@ void dccp_write_space(struct sock *sk)
}
}
/**
/**
* dccp_wait_for_ccid
- Await CCID send permission
* dccp_wait_for_ccid
- Wait for ccid to tell us we can send a packet
* @sk: socket to wait for
* @sk: socket to wait for
* @delay: timeout in jiffies
* @skb: current skb to pass on for waiting
* This is used by CCIDs which need to delay the send time in process context.
* @delay: sleep timeout in milliseconds (> 0)
* This function is called by default when the socket is closed, and
* when a non-zero linger time is set on the socket. For consistency
*/
*/
static
int
dccp_wait_for_ccid
(
struct
sock
*
sk
,
unsigned
long
delay
)
static
int
dccp_wait_for_ccid
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
,
int
delay
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
DEFINE_WAIT
(
wait
);
DEFINE_WAIT
(
wait
);
long
remaining
;
unsigned
long
jiffdelay
;
int
rc
;
prepare_to_wait
(
sk
->
sk_sleep
,
&
wait
,
TASK_INTERRUPTIBLE
);
sk
->
sk_write_pending
++
;
release_sock
(
sk
);
remaining
=
schedule_timeout
(
delay
);
do
{
dccp_pr_debug
(
"delayed send by %d msec
\n
"
,
delay
);
lock_sock
(
sk
);
jiffdelay
=
msecs_to_jiffies
(
delay
);
sk
->
sk_write_pending
--
;
finish_wait
(
sk
->
sk_sleep
,
&
wait
);
if
(
signal_pending
(
current
)
||
sk
->
sk_err
)
prepare_to_wait
(
sk
->
sk_sleep
,
&
wait
,
TASK_INTERRUPTIBLE
);
return
-
1
;
return
remaining
;
}
/**
* dccp_xmit_packet - Send data packet under control of CCID
* Transmits next-queued payload and informs CCID to account for the packet.
*/
static
void
dccp_xmit_packet
(
struct
sock
*
sk
)
{
int
err
,
len
;
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
sk_buff
*
skb
=
dccp_qpolicy_pop
(
sk
);
if
(
unlikely
(
skb
==
NULL
))
sk
->
sk_write_pending
++
;
return
;
release_sock
(
sk
);
len
=
skb
->
len
;
schedule_timeout
(
jiffdelay
);
lock_sock
(
sk
);
sk
->
sk_write_pending
--
;
if
(
sk
->
sk_state
==
DCCP_PARTOPEN
)
{
if
(
sk
->
sk_err
)
const
u32
cur_mps
=
dp
->
dccps_mss_cache
-
DCCP_FEATNEG_OVERHEAD
;
goto
do_error
;
/*
if
(
signal_pending
(
current
))
* See 8.1.5 - Handshake Completion.
goto
do_interrupted
;
*
* For robustness we resend Confirm options until the client has
* entered OPEN. During the initial feature negotiation, the MPS
* is smaller than usual, reduced by the Change/Confirm options.
*/
if
(
!
list_empty
(
&
dp
->
dccps_featneg
)
&&
len
>
cur_mps
)
{
DCCP_WARN
(
"Payload too large (%d) for featneg.
\n
"
,
len
);
dccp_send_ack
(
sk
);
dccp_feat_list_purge
(
&
dp
->
dccps_featneg
);
}
inet_csk_schedule_ack
(
sk
);
rc
=
ccid_hc_tx_send_packet
(
dp
->
dccps_hc_tx_ccid
,
sk
,
skb
);
inet_csk_reset_xmit_timer
(
sk
,
ICSK_TIME_DACK
,
}
while
((
delay
=
rc
)
>
0
);
inet_csk
(
sk
)
->
icsk_rto
,
out:
DCCP_RTO_MAX
);
finish_wait
(
sk
->
sk_sleep
,
&
wait
);
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
DCCP_PKT_DATAACK
;
return
rc
;
}
else
if
(
dccp_ack_pending
(
sk
))
{
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
DCCP_PKT_DATAACK
;
do_error:
}
else
{
rc
=
-
EPIPE
;
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
DCCP_PKT_DATA
;
goto
out
;
}
do_interrupted:
rc
=
-
EINTR
;
err
=
dccp_transmit_skb
(
sk
,
skb
);
goto
out
;
if
(
err
)
dccp_pr_debug
(
"transmit_skb() returned err=%d
\n
"
,
err
);
/*
* Register this one as sent even if an error occurred. To the remote
* end a local packet drop is indistinguishable from network loss, i.e.
* any local drop will eventually be reported via receiver feedback.
*/
ccid_hc_tx_packet_sent
(
dp
->
dccps_hc_tx_ccid
,
sk
,
len
);
/*
* If the CCID needs to transfer additional header options out-of-band
* (e.g. Ack Vectors or feature-negotiation options), it activates this
* flag to schedule a Sync. The Sync will automatically incorporate all
* currently pending header options, thus clearing the backlog.
*/
if
(
dp
->
dccps_sync_scheduled
)
dccp_send_sync
(
sk
,
dp
->
dccps_gsr
,
DCCP_PKT_SYNC
);
}
}
/**
void
dccp_write_xmit
(
struct
sock
*
sk
,
int
block
)
* dccp_flush_write_queue - Drain queue at end of connection
* Since dccp_sendmsg queues packets without waiting for them to be sent, it may
* happen that the TX queue is not empty at the end of a connection. We give the
* HC-sender CCID a grace period of up to @time_budget jiffies. If this function
* returns with a non-empty write queue, it will be purged later.
*/
void
dccp_flush_write_queue
(
struct
sock
*
sk
,
long
*
time_budget
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
sk_buff
*
skb
;
struct
sk_buff
*
skb
;
long
delay
,
rc
;
while
(
*
time_budget
>
0
&&
(
skb
=
skb_peek
(
&
sk
->
sk_write_queue
)))
{
rc
=
ccid_hc_tx_send_packet
(
dp
->
dccps_hc_tx_ccid
,
sk
,
skb
);
switch
(
ccid_packet_dequeue_eval
(
rc
))
{
while
((
skb
=
skb_peek
(
&
sk
->
sk_write_queue
)))
{
case
CCID_PACKET_WILL_DEQUEUE_LATER
:
int
err
=
ccid_hc_tx_send_packet
(
dp
->
dccps_hc_tx_ccid
,
sk
,
skb
);
/*
* If the CCID determines when to send, the next sending
if
(
err
>
0
)
{
* time is unknown or the CCID may not even send again
if
(
!
block
)
{
* (e.g. remote host crashes or lost Ack packets).
sk_reset_timer
(
sk
,
&
dp
->
dccps_xmit_timer
,
*/
msecs_to_jiffies
(
err
)
+
jiffies
);
DCCP_WARN
(
"CCID did not manage to send all packets
\n
"
);
break
;
return
;
}
else
case
CCID_PACKET_DELAY
:
err
=
dccp_wait_for_ccid
(
sk
,
skb
,
err
);
delay
=
msecs_to_jiffies
(
rc
);
if
(
err
&&
err
!=
-
EINTR
)
if
(
delay
>
*
time_budget
)
DCCP_BUG
(
"err=%d after dccp_wait_for_ccid"
,
err
);
return
;
rc
=
dccp_wait_for_ccid
(
sk
,
delay
);
if
(
rc
<
0
)
return
;
*
time_budget
-=
(
delay
-
rc
);
/* check again if we can send now */
break
;
case
CCID_PACKET_SEND_AT_ONCE
:
dccp_xmit_packet
(
sk
);
break
;
case
CCID_PACKET_ERR
:
skb_dequeue
(
&
sk
->
sk_write_queue
);
kfree_skb
(
skb
);
dccp_pr_debug
(
"packet discarded due to err=%ld
\n
"
,
rc
);
}
}
}
}
void
dccp_write_xmit
(
struct
sock
*
sk
)
skb_dequeue
(
&
sk
->
sk_write_queue
);
{
if
(
err
==
0
)
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_skb_cb
*
dcb
=
DCCP_SKB_CB
(
skb
);
struct
sk_buff
*
skb
;
const
int
len
=
skb
->
len
;
while
((
skb
=
dccp_qpolicy_top
(
sk
)))
{
if
(
sk
->
sk_state
==
DCCP_PARTOPEN
)
{
int
rc
=
ccid_hc_tx_send_packet
(
dp
->
dccps_hc_tx_ccid
,
sk
,
skb
);
/* See 8.1.5. Handshake Completion */
inet_csk_schedule_ack
(
sk
);
switch
(
ccid_packet_dequeue_eval
(
rc
))
{
inet_csk_reset_xmit_timer
(
sk
,
ICSK_TIME_DACK
,
case
CCID_PACKET_WILL_DEQUEUE_LATER
:
inet_csk
(
sk
)
->
icsk_rto
,
return
;
DCCP_RTO_MAX
);
case
CCID_PACKET_DELAY
:
dcb
->
dccpd_type
=
DCCP_PKT_DATAACK
;
sk_reset_timer
(
sk
,
&
dp
->
dccps_xmit_timer
,
}
else
if
(
dccp_ack_pending
(
sk
))
jiffies
+
msecs_to_jiffies
(
rc
));
dcb
->
dccpd_type
=
DCCP_PKT_DATAACK
;
return
;
else
case
CCID_PACKET_SEND_AT_ONCE
:
dcb
->
dccpd_type
=
DCCP_PKT_DATA
;
dccp_xmit_packet
(
sk
);
break
;
err
=
dccp_transmit_skb
(
sk
,
skb
);
case
CCID_PACKET_ERR
:
ccid_hc_tx_packet_sent
(
dp
->
dccps_hc_tx_ccid
,
sk
,
0
,
len
);
dccp_qpolicy_drop
(
sk
,
skb
);
if
(
err
)
dccp_pr_debug
(
"packet discarded due to err=%d
\n
"
,
rc
);
DCCP_BUG
(
"err=%d after ccid_hc_tx_packet_sent"
,
err
);
}
else
{
dccp_pr_debug
(
"packet discarded due to err=%d
\n
"
,
err
);
kfree_skb
(
skb
);
}
}
}
}
}
}
...
@@ -410,12 +339,10 @@ struct sk_buff *dccp_make_response(struct sock *sk, struct dst_entry *dst,
...
@@ -410,12 +339,10 @@ struct sk_buff *dccp_make_response(struct sock *sk, struct dst_entry *dst,
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
DCCP_PKT_RESPONSE
;
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
DCCP_PKT_RESPONSE
;
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
=
dreq
->
dreq_iss
;
DCCP_SKB_CB
(
skb
)
->
dccpd_seq
=
dreq
->
dreq_iss
;
/* Resolve feature dependencies resulting from choice of CCID */
if
(
dccp_insert_options_rsk
(
dreq
,
skb
))
{
if
(
dccp_feat_server_ccid_dependencies
(
dreq
))
kfree_skb
(
skb
);
goto
response_failed
;
return
NULL
;
}
if
(
dccp_insert_options_rsk
(
dreq
,
skb
))
goto
response_failed
;
/* Build and checksum header */
/* Build and checksum header */
dh
=
dccp_zeroed_hdr
(
skb
,
dccp_header_size
);
dh
=
dccp_zeroed_hdr
(
skb
,
dccp_header_size
);
...
@@ -436,9 +363,6 @@ struct sk_buff *dccp_make_response(struct sock *sk, struct dst_entry *dst,
...
@@ -436,9 +363,6 @@ struct sk_buff *dccp_make_response(struct sock *sk, struct dst_entry *dst,
inet_rsk
(
req
)
->
acked
=
1
;
inet_rsk
(
req
)
->
acked
=
1
;
DCCP_INC_STATS
(
DCCP_MIB_OUTSEGS
);
DCCP_INC_STATS
(
DCCP_MIB_OUTSEGS
);
return
skb
;
return
skb
;
response_failed:
kfree_skb
(
skb
);
return
NULL
;
}
}
EXPORT_SYMBOL_GPL
(
dccp_make_response
);
EXPORT_SYMBOL_GPL
(
dccp_make_response
);
...
@@ -523,9 +447,8 @@ int dccp_send_reset(struct sock *sk, enum dccp_reset_codes code)
...
@@ -523,9 +447,8 @@ int dccp_send_reset(struct sock *sk, enum dccp_reset_codes code)
/*
/*
* Do all connect socket setups that can be done AF independent.
* Do all connect socket setups that can be done AF independent.
*/
*/
int
dccp_connec
t
(
struct
sock
*
sk
)
static
inline
void
dccp_connect_ini
t
(
struct
sock
*
sk
)
{
{
struct
sk_buff
*
skb
;
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dst_entry
*
dst
=
__sk_dst_get
(
sk
);
struct
dst_entry
*
dst
=
__sk_dst_get
(
sk
);
struct
inet_connection_sock
*
icsk
=
inet_csk
(
sk
);
struct
inet_connection_sock
*
icsk
=
inet_csk
(
sk
);
...
@@ -535,13 +458,19 @@ int dccp_connect(struct sock *sk)
...
@@ -535,13 +458,19 @@ int dccp_connect(struct sock *sk)
dccp_sync_mss
(
sk
,
dst_mtu
(
dst
));
dccp_sync_mss
(
sk
,
dst_mtu
(
dst
));
/* do not connect if feature negotiation setup fails */
if
(
dccp_feat_finalise_settings
(
dccp_sk
(
sk
)))
return
-
EPROTO
;
/* Initialise GAR as per 8.5; AWL/AWH are set in dccp_transmit_skb() */
/* Initialise GAR as per 8.5; AWL/AWH are set in dccp_transmit_skb() */
dp
->
dccps_gar
=
dp
->
dccps_iss
;
dp
->
dccps_gar
=
dp
->
dccps_iss
;
icsk
->
icsk_retransmits
=
0
;
}
int
dccp_connect
(
struct
sock
*
sk
)
{
struct
sk_buff
*
skb
;
struct
inet_connection_sock
*
icsk
=
inet_csk
(
sk
);
dccp_connect_init
(
sk
);
skb
=
alloc_skb
(
sk
->
sk_prot
->
max_header
,
sk
->
sk_allocation
);
skb
=
alloc_skb
(
sk
->
sk_prot
->
max_header
,
sk
->
sk_allocation
);
if
(
unlikely
(
skb
==
NULL
))
if
(
unlikely
(
skb
==
NULL
))
return
-
ENOBUFS
;
return
-
ENOBUFS
;
...
@@ -551,11 +480,11 @@ int dccp_connect(struct sock *sk)
...
@@ -551,11 +480,11 @@ int dccp_connect(struct sock *sk)
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
DCCP_PKT_REQUEST
;
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
DCCP_PKT_REQUEST
;
dccp_transmit_skb
(
sk
,
dccp_skb_entail
(
sk
,
skb
));
dccp_skb_entail
(
sk
,
skb
);
dccp_transmit_skb
(
sk
,
skb_clone
(
skb
,
GFP_KERNEL
));
DCCP_INC_STATS
(
DCCP_MIB_ACTIVEOPENS
);
DCCP_INC_STATS
(
DCCP_MIB_ACTIVEOPENS
);
/* Timer for repeating the REQUEST until an answer. */
/* Timer for repeating the REQUEST until an answer. */
icsk
->
icsk_retransmits
=
0
;
inet_csk_reset_xmit_timer
(
sk
,
ICSK_TIME_RETRANS
,
inet_csk_reset_xmit_timer
(
sk
,
ICSK_TIME_RETRANS
,
icsk
->
icsk_rto
,
DCCP_RTO_MAX
);
icsk
->
icsk_rto
,
DCCP_RTO_MAX
);
return
0
;
return
0
;
...
@@ -642,12 +571,6 @@ void dccp_send_sync(struct sock *sk, const u64 ackno,
...
@@ -642,12 +571,6 @@ void dccp_send_sync(struct sock *sk, const u64 ackno,
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
pkt_type
;
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
pkt_type
;
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
=
ackno
;
DCCP_SKB_CB
(
skb
)
->
dccpd_ack_seq
=
ackno
;
/*
* Clear the flag in case the Sync was scheduled for out-of-band data,
* such as carrying a long Ack Vector.
*/
dccp_sk
(
sk
)
->
dccps_sync_scheduled
=
0
;
dccp_transmit_skb
(
sk
,
skb
);
dccp_transmit_skb
(
sk
,
skb
);
}
}
...
@@ -676,7 +599,9 @@ void dccp_send_close(struct sock *sk, const int active)
...
@@ -676,7 +599,9 @@ void dccp_send_close(struct sock *sk, const int active)
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
DCCP_PKT_CLOSE
;
DCCP_SKB_CB
(
skb
)
->
dccpd_type
=
DCCP_PKT_CLOSE
;
if
(
active
)
{
if
(
active
)
{
skb
=
dccp_skb_entail
(
sk
,
skb
);
dccp_write_xmit
(
sk
,
1
);
dccp_skb_entail
(
sk
,
skb
);
dccp_transmit_skb
(
sk
,
skb_clone
(
skb
,
prio
));
/*
/*
* Retransmission timer for active-close: RFC 4340, 8.3 requires
* Retransmission timer for active-close: RFC 4340, 8.3 requires
* to retransmit the Close/CloseReq until the CLOSING/CLOSEREQ
* to retransmit the Close/CloseReq until the CLOSING/CLOSEREQ
...
@@ -689,6 +614,6 @@ void dccp_send_close(struct sock *sk, const int active)
...
@@ -689,6 +614,6 @@ void dccp_send_close(struct sock *sk, const int active)
*/
*/
inet_csk_reset_xmit_timer
(
sk
,
ICSK_TIME_RETRANS
,
inet_csk_reset_xmit_timer
(
sk
,
ICSK_TIME_RETRANS
,
DCCP_TIMEOUT_INIT
,
DCCP_RTO_MAX
);
DCCP_TIMEOUT_INIT
,
DCCP_RTO_MAX
);
}
}
else
dccp_transmit_skb
(
sk
,
skb
);
dccp_transmit_skb
(
sk
,
skb
);
}
}
net/dccp/probe.c
View file @
ded67c0e
...
@@ -46,54 +46,75 @@ static struct {
...
@@ -46,54 +46,75 @@ static struct {
struct
kfifo
*
fifo
;
struct
kfifo
*
fifo
;
spinlock_t
lock
;
spinlock_t
lock
;
wait_queue_head_t
wait
;
wait_queue_head_t
wait
;
ktime_t
start
;
struct
timespec
t
start
;
}
dccpw
;
}
dccpw
;
static
void
jdccp_write_xmit
(
struct
sock
*
sk
)
static
void
printl
(
const
char
*
fmt
,
...
)
{
{
const
struct
inet_sock
*
inet
=
inet_sk
(
sk
);
va_list
args
;
struct
ccid3_hc_tx_sock
*
hctx
=
NULL
;
int
len
;
struct
timespec
tv
;
struct
timespec
now
;
char
buf
[
256
];
char
tbuf
[
256
];
int
len
,
ccid
=
ccid_get_current_tx_ccid
(
dccp_sk
(
sk
));
if
(
ccid
==
DCCPC_CCID3
)
va_start
(
args
,
fmt
);
hctx
=
ccid3_hc_tx_sk
(
sk
);
getnstimeofday
(
&
now
);
if
(
!
port
||
ntohs
(
inet
->
dport
)
==
port
||
ntohs
(
inet
->
sport
)
==
port
)
{
now
=
timespec_sub
(
now
,
dccpw
.
tstart
);
tv
=
ktime_to_timespec
(
ktime_sub
(
ktime_get
(),
dccpw
.
start
));
len
=
sprintf
(
tbuf
,
"%lu.%06lu "
,
len
=
sprintf
(
buf
,
"%lu.%09lu %d.%d.%d.%d:%u %d.%d.%d.%d:%u %d"
,
(
unsigned
long
)
now
.
tv_sec
,
(
unsigned
long
)
tv
.
tv_sec
,
(
unsigned
long
)
now
.
tv_nsec
/
NSEC_PER_USEC
);
(
unsigned
long
)
tv
.
tv_nsec
,
len
+=
vscnprintf
(
tbuf
+
len
,
sizeof
(
tbuf
)
-
len
,
fmt
,
args
);
NIPQUAD
(
inet
->
saddr
),
ntohs
(
inet
->
sport
),
va_end
(
args
);
NIPQUAD
(
inet
->
daddr
),
ntohs
(
inet
->
dport
),
ccid
);
kfifo_put
(
dccpw
.
fifo
,
tbuf
,
len
);
wake_up
(
&
dccpw
.
wait
);
}
static
int
jdccp_sendmsg
(
struct
kiocb
*
iocb
,
struct
sock
*
sk
,
struct
msghdr
*
msg
,
size_t
size
)
{
const
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
const
struct
inet_sock
*
inet
=
inet_sk
(
sk
);
const
struct
ccid3_hc_tx_sock
*
hctx
;
if
(
dmsk
->
dccpms_tx_ccid
==
DCCPC_CCID3
)
hctx
=
ccid3_hc_tx_sk
(
sk
);
else
hctx
=
NULL
;
if
(
port
==
0
||
ntohs
(
inet
->
dport
)
==
port
||
ntohs
(
inet
->
sport
)
==
port
)
{
if
(
hctx
)
if
(
hctx
)
len
+=
sprintf
(
buf
+
len
,
" %d %d %d %u %u %u %d"
,
printl
(
"%d.%d.%d.%d:%u %d.%d.%d.%d:%u %d %d %d %d %u "
hctx
->
s
,
hctx
->
rtt
,
hctx
->
p
,
hctx
->
x_calc
,
"%llu %llu %d
\n
"
,
(
unsigned
)(
hctx
->
x_recv
>>
6
),
NIPQUAD
(
inet
->
saddr
),
ntohs
(
inet
->
sport
),
(
unsigned
)(
hctx
->
x
>>
6
),
hctx
->
t_ipi
);
NIPQUAD
(
inet
->
daddr
),
ntohs
(
inet
->
dport
),
size
,
hctx
->
ccid3hctx_s
,
hctx
->
ccid3hctx_rtt
,
len
+=
sprintf
(
buf
+
len
,
"
\n
"
);
hctx
->
ccid3hctx_p
,
hctx
->
ccid3hctx_x_calc
,
kfifo_put
(
dccpw
.
fifo
,
buf
,
len
);
hctx
->
ccid3hctx_x_recv
>>
6
,
wake_up
(
&
dccpw
.
wait
);
hctx
->
ccid3hctx_x
>>
6
,
hctx
->
ccid3hctx_t_ipi
);
else
printl
(
"%d.%d.%d.%d:%u %d.%d.%d.%d:%u %d
\n
"
,
NIPQUAD
(
inet
->
saddr
),
ntohs
(
inet
->
sport
),
NIPQUAD
(
inet
->
daddr
),
ntohs
(
inet
->
dport
),
size
);
}
}
jprobe_return
();
jprobe_return
();
return
0
;
}
}
static
struct
jprobe
dccp_send_probe
=
{
static
struct
jprobe
dccp_send_probe
=
{
.
kp
=
{
.
kp
=
{
.
symbol_name
=
"dccp_
write_xmit
"
,
.
symbol_name
=
"dccp_
sendmsg
"
,
},
},
.
entry
=
jdccp_
write_xmit
,
.
entry
=
jdccp_
sendmsg
,
};
};
static
int
dccpprobe_open
(
struct
inode
*
inode
,
struct
file
*
file
)
static
int
dccpprobe_open
(
struct
inode
*
inode
,
struct
file
*
file
)
{
{
kfifo_reset
(
dccpw
.
fifo
);
kfifo_reset
(
dccpw
.
fifo
);
dccpw
.
start
=
ktime_get
(
);
getnstimeofday
(
&
dccpw
.
tstart
);
return
0
;
return
0
;
}
}
...
...
net/dccp/proto.c
View file @
ded67c0e
...
@@ -67,9 +67,6 @@ void dccp_set_state(struct sock *sk, const int state)
...
@@ -67,9 +67,6 @@ void dccp_set_state(struct sock *sk, const int state)
case
DCCP_OPEN
:
case
DCCP_OPEN
:
if
(
oldstate
!=
DCCP_OPEN
)
if
(
oldstate
!=
DCCP_OPEN
)
DCCP_INC_STATS
(
DCCP_MIB_CURRESTAB
);
DCCP_INC_STATS
(
DCCP_MIB_CURRESTAB
);
/* Client retransmits all Confirm options until entering OPEN */
if
(
oldstate
==
DCCP_PARTOPEN
)
dccp_feat_list_purge
(
&
dccp_sk
(
sk
)
->
dccps_featneg
);
break
;
break
;
case
DCCP_CLOSED
:
case
DCCP_CLOSED
:
...
@@ -178,25 +175,63 @@ EXPORT_SYMBOL_GPL(dccp_state_name);
...
@@ -178,25 +175,63 @@ EXPORT_SYMBOL_GPL(dccp_state_name);
int
dccp_init_sock
(
struct
sock
*
sk
,
const
__u8
ctl_sock_initialized
)
int
dccp_init_sock
(
struct
sock
*
sk
,
const
__u8
ctl_sock_initialized
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
struct
inet_connection_sock
*
icsk
=
inet_csk
(
sk
);
struct
inet_connection_sock
*
icsk
=
inet_csk
(
sk
);
dccp_minisock_init
(
&
dp
->
dccps_minisock
);
icsk
->
icsk_rto
=
DCCP_TIMEOUT_INIT
;
icsk
->
icsk_rto
=
DCCP_TIMEOUT_INIT
;
icsk
->
icsk_syn_retries
=
sysctl_dccp_request_retries
;
icsk
->
icsk_syn_retries
=
sysctl_dccp_request_retries
;
sk
->
sk_state
=
DCCP_CLOSED
;
sk
->
sk_state
=
DCCP_CLOSED
;
sk
->
sk_write_space
=
dccp_write_space
;
sk
->
sk_write_space
=
dccp_write_space
;
icsk
->
icsk_sync_mss
=
dccp_sync_mss
;
icsk
->
icsk_sync_mss
=
dccp_sync_mss
;
dp
->
dccps_mss_cache
=
TCP_MIN_RCVMSS
;
dp
->
dccps_mss_cache
=
536
;
dp
->
dccps_rate_last
=
jiffies
;
dp
->
dccps_rate_last
=
jiffies
;
dp
->
dccps_role
=
DCCP_ROLE_UNDEFINED
;
dp
->
dccps_role
=
DCCP_ROLE_UNDEFINED
;
dp
->
dccps_service
=
DCCP_SERVICE_CODE_IS_ABSENT
;
dp
->
dccps_service
=
DCCP_SERVICE_CODE_IS_ABSENT
;
dp
->
dccps_
tx_qlen
=
sysctl_dccp_tx_qlen
;
dp
->
dccps_
l_ack_ratio
=
dp
->
dccps_r_ack_ratio
=
1
;
dccp_init_xmit_timers
(
sk
);
dccp_init_xmit_timers
(
sk
);
INIT_LIST_HEAD
(
&
dp
->
dccps_featneg
);
/*
/* control socket doesn't need feat nego */
* FIXME: We're hardcoding the CCID, and doing this at this point makes
if
(
likely
(
ctl_sock_initialized
))
* the listening (master) sock get CCID control blocks, which is not
return
dccp_feat_init
(
sk
);
* necessary, but for now, to not mess with the test userspace apps,
* lets leave it here, later the real solution is to do this in a
* setsockopt(CCIDs-I-want/accept). -acme
*/
if
(
likely
(
ctl_sock_initialized
))
{
int
rc
=
dccp_feat_init
(
dmsk
);
if
(
rc
)
return
rc
;
if
(
dmsk
->
dccpms_send_ack_vector
)
{
dp
->
dccps_hc_rx_ackvec
=
dccp_ackvec_alloc
(
GFP_KERNEL
);
if
(
dp
->
dccps_hc_rx_ackvec
==
NULL
)
return
-
ENOMEM
;
}
dp
->
dccps_hc_rx_ccid
=
ccid_hc_rx_new
(
dmsk
->
dccpms_rx_ccid
,
sk
,
GFP_KERNEL
);
dp
->
dccps_hc_tx_ccid
=
ccid_hc_tx_new
(
dmsk
->
dccpms_tx_ccid
,
sk
,
GFP_KERNEL
);
if
(
unlikely
(
dp
->
dccps_hc_rx_ccid
==
NULL
||
dp
->
dccps_hc_tx_ccid
==
NULL
))
{
ccid_hc_rx_delete
(
dp
->
dccps_hc_rx_ccid
,
sk
);
ccid_hc_tx_delete
(
dp
->
dccps_hc_tx_ccid
,
sk
);
if
(
dmsk
->
dccpms_send_ack_vector
)
{
dccp_ackvec_free
(
dp
->
dccps_hc_rx_ackvec
);
dp
->
dccps_hc_rx_ackvec
=
NULL
;
}
dp
->
dccps_hc_rx_ccid
=
dp
->
dccps_hc_tx_ccid
=
NULL
;
return
-
ENOMEM
;
}
}
else
{
/* control socket doesn't need feat nego */
INIT_LIST_HEAD
(
&
dmsk
->
dccpms_pending
);
INIT_LIST_HEAD
(
&
dmsk
->
dccpms_conf
);
}
return
0
;
return
0
;
}
}
...
@@ -205,6 +240,7 @@ EXPORT_SYMBOL_GPL(dccp_init_sock);
...
@@ -205,6 +240,7 @@ EXPORT_SYMBOL_GPL(dccp_init_sock);
void
dccp_destroy_sock
(
struct
sock
*
sk
)
void
dccp_destroy_sock
(
struct
sock
*
sk
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_minisock
*
dmsk
=
dccp_msk
(
sk
);
/*
/*
* DCCP doesn't use sk_write_queue, just sk_send_head
* DCCP doesn't use sk_write_queue, just sk_send_head
...
@@ -222,7 +258,7 @@ void dccp_destroy_sock(struct sock *sk)
...
@@ -222,7 +258,7 @@ void dccp_destroy_sock(struct sock *sk)
kfree
(
dp
->
dccps_service_list
);
kfree
(
dp
->
dccps_service_list
);
dp
->
dccps_service_list
=
NULL
;
dp
->
dccps_service_list
=
NULL
;
if
(
d
p
->
dccps_hc_rx_ackvec
!=
NULL
)
{
if
(
d
msk
->
dccpms_send_ack_vector
)
{
dccp_ackvec_free
(
dp
->
dccps_hc_rx_ackvec
);
dccp_ackvec_free
(
dp
->
dccps_hc_rx_ackvec
);
dp
->
dccps_hc_rx_ackvec
=
NULL
;
dp
->
dccps_hc_rx_ackvec
=
NULL
;
}
}
...
@@ -231,7 +267,7 @@ void dccp_destroy_sock(struct sock *sk)
...
@@ -231,7 +267,7 @@ void dccp_destroy_sock(struct sock *sk)
dp
->
dccps_hc_rx_ccid
=
dp
->
dccps_hc_tx_ccid
=
NULL
;
dp
->
dccps_hc_rx_ccid
=
dp
->
dccps_hc_tx_ccid
=
NULL
;
/* clean up feature negotiation state */
/* clean up feature negotiation state */
dccp_feat_
list_purge
(
&
dp
->
dccps_featneg
);
dccp_feat_
clean
(
dmsk
);
}
}
EXPORT_SYMBOL_GPL
(
dccp_destroy_sock
);
EXPORT_SYMBOL_GPL
(
dccp_destroy_sock
);
...
@@ -241,9 +277,6 @@ static inline int dccp_listen_start(struct sock *sk, int backlog)
...
@@ -241,9 +277,6 @@ static inline int dccp_listen_start(struct sock *sk, int backlog)
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
dp
->
dccps_role
=
DCCP_ROLE_LISTEN
;
dp
->
dccps_role
=
DCCP_ROLE_LISTEN
;
/* do not start to listen if feature negotiation setup fails */
if
(
dccp_feat_finalise_settings
(
dp
))
return
-
EPROTO
;
return
inet_csk_listen_start
(
sk
,
backlog
);
return
inet_csk_listen_start
(
sk
,
backlog
);
}
}
...
@@ -433,70 +466,42 @@ static int dccp_setsockopt_service(struct sock *sk, const __be32 service,
...
@@ -433,70 +466,42 @@ static int dccp_setsockopt_service(struct sock *sk, const __be32 service,
return
0
;
return
0
;
}
}
static
int
dccp_setsockopt_cscov
(
struct
sock
*
sk
,
int
cscov
,
bool
rx
)
/* byte 1 is feature. the rest is the preference list */
static
int
dccp_setsockopt_change
(
struct
sock
*
sk
,
int
type
,
struct
dccp_so_feat
__user
*
optval
)
{
{
u8
*
list
,
len
;
struct
dccp_so_feat
opt
;
int
i
,
rc
;
u8
*
val
;
int
rc
;
if
(
c
scov
<
0
||
cscov
>
15
)
if
(
c
opy_from_user
(
&
opt
,
optval
,
sizeof
(
opt
))
)
return
-
E
INVAL
;
return
-
E
FAULT
;
/*
/*
* Populate a list of permissible values, in the range cscov...15. This
* rfc4340: 6.1. Change Options
* is necessary since feature negotiation of single values only works if
* both sides incidentally choose the same value. Since the list starts
* lowest-value first, negotiation will pick the smallest shared value.
*/
*/
if
(
cscov
==
0
)
if
(
opt
.
dccpsf_len
<
1
)
return
0
;
len
=
16
-
cscov
;
list
=
kmalloc
(
len
,
GFP_KERNEL
);
if
(
list
==
NULL
)
return
-
ENOBUFS
;
for
(
i
=
0
;
i
<
len
;
i
++
)
list
[
i
]
=
cscov
++
;
rc
=
dccp_feat_register_sp
(
sk
,
DCCPF_MIN_CSUM_COVER
,
rx
,
list
,
len
);
if
(
rc
==
0
)
{
if
(
rx
)
dccp_sk
(
sk
)
->
dccps_pcrlen
=
cscov
;
else
dccp_sk
(
sk
)
->
dccps_pcslen
=
cscov
;
}
kfree
(
list
);
return
rc
;
}
static
int
dccp_setsockopt_ccid
(
struct
sock
*
sk
,
int
type
,
char
__user
*
optval
,
int
optlen
)
{
u8
*
val
;
int
rc
=
0
;
if
(
optlen
<
1
||
optlen
>
DCCP_FEAT_MAX_SP_VALS
)
return
-
EINVAL
;
return
-
EINVAL
;
val
=
kmalloc
(
optlen
,
GFP_KERNEL
);
val
=
kmalloc
(
opt
.
dccpsf_
len
,
GFP_KERNEL
);
if
(
val
==
NULL
)
if
(
!
val
)
return
-
ENOMEM
;
return
-
ENOMEM
;
if
(
copy_from_user
(
val
,
opt
val
,
opt
len
))
{
if
(
copy_from_user
(
val
,
opt
.
dccpsf_val
,
opt
.
dccpsf_
len
))
{
kfree
(
val
)
;
rc
=
-
EFAULT
;
return
-
EFAULT
;
goto
out_free_val
;
}
}
lock_sock
(
sk
);
rc
=
dccp_feat_change
(
dccp_msk
(
sk
),
type
,
opt
.
dccpsf_feat
,
if
(
type
==
DCCP_SOCKOPT_TX_CCID
||
type
==
DCCP_SOCKOPT_CCID
)
val
,
opt
.
dccpsf_len
,
GFP_KERNEL
);
rc
=
dccp_feat_register_sp
(
sk
,
DCCPF_CCID
,
1
,
val
,
optlen
);
if
(
rc
)
goto
out_free_val
;
if
(
!
rc
&&
(
type
==
DCCP_SOCKOPT_RX_CCID
||
type
==
DCCP_SOCKOPT_CCID
))
out:
rc
=
dccp_feat_register_sp
(
sk
,
DCCPF_CCID
,
0
,
val
,
optlen
);
return
rc
;
release_sock
(
sk
);
out_free_val:
kfree
(
val
);
kfree
(
val
);
return
rc
;
goto
out
;
}
}
static
int
do_dccp_setsockopt
(
struct
sock
*
sk
,
int
level
,
int
optname
,
static
int
do_dccp_setsockopt
(
struct
sock
*
sk
,
int
level
,
int
optname
,
...
@@ -505,21 +510,7 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname,
...
@@ -505,21 +510,7 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname,
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
int
val
,
err
=
0
;
int
val
,
err
=
0
;
switch
(
optname
)
{
if
(
optlen
<
sizeof
(
int
))
case
DCCP_SOCKOPT_PACKET_SIZE
:
DCCP_WARN
(
"sockopt(PACKET_SIZE) is deprecated: fix your app
\n
"
);
return
0
;
case
DCCP_SOCKOPT_CHANGE_L
:
case
DCCP_SOCKOPT_CHANGE_R
:
DCCP_WARN
(
"sockopt(CHANGE_L/R) is deprecated: fix your app
\n
"
);
return
0
;
case
DCCP_SOCKOPT_CCID
:
case
DCCP_SOCKOPT_RX_CCID
:
case
DCCP_SOCKOPT_TX_CCID
:
return
dccp_setsockopt_ccid
(
sk
,
optname
,
optval
,
optlen
);
}
if
(
optlen
<
(
int
)
sizeof
(
int
))
return
-
EINVAL
;
return
-
EINVAL
;
if
(
get_user
(
val
,
(
int
__user
*
)
optval
))
if
(
get_user
(
val
,
(
int
__user
*
)
optval
))
...
@@ -530,38 +521,53 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname,
...
@@ -530,38 +521,53 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname,
lock_sock
(
sk
);
lock_sock
(
sk
);
switch
(
optname
)
{
switch
(
optname
)
{
case
DCCP_SOCKOPT_PACKET_SIZE
:
DCCP_WARN
(
"sockopt(PACKET_SIZE) is deprecated: fix your app
\n
"
);
err
=
0
;
break
;
case
DCCP_SOCKOPT_CHANGE_L
:
if
(
optlen
!=
sizeof
(
struct
dccp_so_feat
))
err
=
-
EINVAL
;
else
err
=
dccp_setsockopt_change
(
sk
,
DCCPO_CHANGE_L
,
(
struct
dccp_so_feat
__user
*
)
optval
);
break
;
case
DCCP_SOCKOPT_CHANGE_R
:
if
(
optlen
!=
sizeof
(
struct
dccp_so_feat
))
err
=
-
EINVAL
;
else
err
=
dccp_setsockopt_change
(
sk
,
DCCPO_CHANGE_R
,
(
struct
dccp_so_feat
__user
*
)
optval
);
break
;
case
DCCP_SOCKOPT_SERVER_TIMEWAIT
:
case
DCCP_SOCKOPT_SERVER_TIMEWAIT
:
if
(
dp
->
dccps_role
!=
DCCP_ROLE_SERVER
)
if
(
dp
->
dccps_role
!=
DCCP_ROLE_SERVER
)
err
=
-
EOPNOTSUPP
;
err
=
-
EOPNOTSUPP
;
else
else
dp
->
dccps_server_timewait
=
(
val
!=
0
);
dp
->
dccps_server_timewait
=
(
val
!=
0
);
break
;
break
;
case
DCCP_SOCKOPT_SEND_CSCOV
:
case
DCCP_SOCKOPT_SEND_CSCOV
:
/* sender side, RFC 4340, sec. 9.2 */
err
=
dccp_setsockopt_cscov
(
sk
,
val
,
false
);
if
(
val
<
0
||
val
>
15
)
break
;
case
DCCP_SOCKOPT_RECV_CSCOV
:
err
=
dccp_setsockopt_cscov
(
sk
,
val
,
true
);
break
;
case
DCCP_SOCKOPT_QPOLICY_ID
:
if
(
sk
->
sk_state
!=
DCCP_CLOSED
)
err
=
-
EISCONN
;
else
if
(
val
<
0
||
val
>=
DCCPQ_POLICY_MAX
)
err
=
-
EINVAL
;
err
=
-
EINVAL
;
else
else
dp
->
dccps_
qpolicy
=
val
;
dp
->
dccps_
pcslen
=
val
;
break
;
break
;
case
DCCP_SOCKOPT_
QPOLICY_TXQLEN
:
case
DCCP_SOCKOPT_
RECV_CSCOV
:
/* receiver side, RFC 4340 sec. 9.2.1 */
if
(
val
<
0
)
if
(
val
<
0
||
val
>
15
)
err
=
-
EINVAL
;
err
=
-
EINVAL
;
else
else
{
dp
->
dccps_tx_qlen
=
val
;
dp
->
dccps_pcrlen
=
val
;
/* FIXME: add feature negotiation,
* ChangeL(MinimumChecksumCoverage, val) */
}
break
;
break
;
default:
default:
err
=
-
ENOPROTOOPT
;
err
=
-
ENOPROTOOPT
;
break
;
break
;
}
}
release_sock
(
sk
);
release_sock
(
sk
);
return
err
;
return
err
;
}
}
...
@@ -642,18 +648,6 @@ static int do_dccp_getsockopt(struct sock *sk, int level, int optname,
...
@@ -642,18 +648,6 @@ static int do_dccp_getsockopt(struct sock *sk, int level, int optname,
case
DCCP_SOCKOPT_GET_CUR_MPS
:
case
DCCP_SOCKOPT_GET_CUR_MPS
:
val
=
dp
->
dccps_mss_cache
;
val
=
dp
->
dccps_mss_cache
;
break
;
break
;
case
DCCP_SOCKOPT_AVAILABLE_CCIDS
:
return
ccid_getsockopt_builtin_ccids
(
sk
,
len
,
optval
,
optlen
);
case
DCCP_SOCKOPT_TX_CCID
:
val
=
ccid_get_current_tx_ccid
(
dp
);
if
(
val
<
0
)
return
-
ENOPROTOOPT
;
break
;
case
DCCP_SOCKOPT_RX_CCID
:
val
=
ccid_get_current_rx_ccid
(
dp
);
if
(
val
<
0
)
return
-
ENOPROTOOPT
;
break
;
case
DCCP_SOCKOPT_SERVER_TIMEWAIT
:
case
DCCP_SOCKOPT_SERVER_TIMEWAIT
:
val
=
dp
->
dccps_server_timewait
;
val
=
dp
->
dccps_server_timewait
;
break
;
break
;
...
@@ -663,12 +657,6 @@ static int do_dccp_getsockopt(struct sock *sk, int level, int optname,
...
@@ -663,12 +657,6 @@ static int do_dccp_getsockopt(struct sock *sk, int level, int optname,
case
DCCP_SOCKOPT_RECV_CSCOV
:
case
DCCP_SOCKOPT_RECV_CSCOV
:
val
=
dp
->
dccps_pcrlen
;
val
=
dp
->
dccps_pcrlen
;
break
;
break
;
case
DCCP_SOCKOPT_QPOLICY_ID
:
val
=
dp
->
dccps_qpolicy
;
break
;
case
DCCP_SOCKOPT_QPOLICY_TXQLEN
:
val
=
dp
->
dccps_tx_qlen
;
break
;
case
128
...
191
:
case
128
...
191
:
return
ccid_hc_rx_getsockopt
(
dp
->
dccps_hc_rx_ccid
,
sk
,
optname
,
return
ccid_hc_rx_getsockopt
(
dp
->
dccps_hc_rx_ccid
,
sk
,
optname
,
len
,
(
u32
__user
*
)
optval
,
optlen
);
len
,
(
u32
__user
*
)
optval
,
optlen
);
...
@@ -711,47 +699,6 @@ int compat_dccp_getsockopt(struct sock *sk, int level, int optname,
...
@@ -711,47 +699,6 @@ int compat_dccp_getsockopt(struct sock *sk, int level, int optname,
EXPORT_SYMBOL_GPL
(
compat_dccp_getsockopt
);
EXPORT_SYMBOL_GPL
(
compat_dccp_getsockopt
);
#endif
#endif
static
int
dccp_msghdr_parse
(
struct
msghdr
*
msg
,
struct
sk_buff
*
skb
)
{
struct
cmsghdr
*
cmsg
=
CMSG_FIRSTHDR
(
msg
);
/*
* Assign an (opaque) qpolicy priority value to skb->priority.
*
* We are overloading this skb field for use with the qpolicy subystem.
* The skb->priority is normally used for the SO_PRIORITY option, which
* is initialised from sk_priority. Since the assignment of sk_priority
* to skb->priority happens later (on layer 3), we overload this field
* for use with queueing priorities as long as the skb is on layer 4.
* The default priority value (if nothing is set) is 0.
*/
skb
->
priority
=
0
;
for
(;
cmsg
!=
NULL
;
cmsg
=
CMSG_NXTHDR
(
msg
,
cmsg
))
{
if
(
!
CMSG_OK
(
msg
,
cmsg
))
return
-
EINVAL
;
if
(
cmsg
->
cmsg_level
!=
SOL_DCCP
)
continue
;
if
(
cmsg
->
cmsg_type
<=
DCCP_SCM_QPOLICY_MAX
&&
!
dccp_qpolicy_param_ok
(
skb
->
sk
,
cmsg
->
cmsg_type
))
return
-
EINVAL
;
switch
(
cmsg
->
cmsg_type
)
{
case
DCCP_SCM_PRIORITY
:
if
(
cmsg
->
cmsg_len
!=
CMSG_LEN
(
sizeof
(
__u32
)))
return
-
EINVAL
;
skb
->
priority
=
*
(
__u32
*
)
CMSG_DATA
(
cmsg
);
break
;
default:
return
-
EINVAL
;
}
}
return
0
;
}
int
dccp_sendmsg
(
struct
kiocb
*
iocb
,
struct
sock
*
sk
,
struct
msghdr
*
msg
,
int
dccp_sendmsg
(
struct
kiocb
*
iocb
,
struct
sock
*
sk
,
struct
msghdr
*
msg
,
size_t
len
)
size_t
len
)
{
{
...
@@ -767,7 +714,8 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
...
@@ -767,7 +714,8 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
lock_sock
(
sk
);
lock_sock
(
sk
);
if
(
dccp_qpolicy_full
(
sk
))
{
if
(
sysctl_dccp_tx_qlen
&&
(
sk
->
sk_write_queue
.
qlen
>=
sysctl_dccp_tx_qlen
))
{
rc
=
-
EAGAIN
;
rc
=
-
EAGAIN
;
goto
out_release
;
goto
out_release
;
}
}
...
@@ -795,12 +743,8 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
...
@@ -795,12 +743,8 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
if
(
rc
!=
0
)
if
(
rc
!=
0
)
goto
out_discard
;
goto
out_discard
;
rc
=
dccp_msghdr_parse
(
msg
,
skb
);
skb_queue_tail
(
&
sk
->
sk_write_queue
,
skb
);
if
(
rc
!=
0
)
dccp_write_xmit
(
sk
,
0
);
goto
out_discard
;
dccp_qpolicy_push
(
sk
,
skb
);
dccp_write_xmit
(
sk
);
out_release:
out_release:
release_sock
(
sk
);
release_sock
(
sk
);
return
rc
?
:
len
;
return
rc
?
:
len
;
...
@@ -1023,22 +967,9 @@ void dccp_close(struct sock *sk, long timeout)
...
@@ -1023,22 +967,9 @@ void dccp_close(struct sock *sk, long timeout)
/* Check zero linger _after_ checking for unread data. */
/* Check zero linger _after_ checking for unread data. */
sk
->
sk_prot
->
disconnect
(
sk
,
0
);
sk
->
sk_prot
->
disconnect
(
sk
,
0
);
}
else
if
(
sk
->
sk_state
!=
DCCP_CLOSED
)
{
}
else
if
(
sk
->
sk_state
!=
DCCP_CLOSED
)
{
/*
* Normal connection termination. May need to wait if there are
* still packets in the TX queue that are delayed by the CCID.
*/
dccp_flush_write_queue
(
sk
,
&
timeout
);
dccp_terminate_connection
(
sk
);
dccp_terminate_connection
(
sk
);
}
}
/*
* Flush write queue. This may be necessary in several cases:
* - we have been closed by the peer but still have application data;
* - abortive termination (unread data or zero linger time),
* - normal termination but queue could not be flushed within time limit
*/
__skb_queue_purge
(
&
sk
->
sk_write_queue
);
sk_stream_wait_close
(
sk
,
timeout
);
sk_stream_wait_close
(
sk
,
timeout
);
adjudge_to_death:
adjudge_to_death:
...
...
net/dccp/qpolicy.c
deleted
100644 → 0
View file @
f8ef6e44
/*
* net/dccp/qpolicy.c
*
* Policy-based packet dequeueing interface for DCCP.
*
* Copyright (c) 2008 Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License v2
* as published by the Free Software Foundation.
*/
#include "dccp.h"
/*
* Simple Dequeueing Policy:
* If tx_qlen is different from 0, enqueue up to tx_qlen elements.
*/
static
void
qpolicy_simple_push
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
skb_queue_tail
(
&
sk
->
sk_write_queue
,
skb
);
}
static
bool
qpolicy_simple_full
(
struct
sock
*
sk
)
{
return
dccp_sk
(
sk
)
->
dccps_tx_qlen
&&
sk
->
sk_write_queue
.
qlen
>=
dccp_sk
(
sk
)
->
dccps_tx_qlen
;
}
static
struct
sk_buff
*
qpolicy_simple_top
(
struct
sock
*
sk
)
{
return
skb_peek
(
&
sk
->
sk_write_queue
);
}
/*
* Priority-based Dequeueing Policy:
* If tx_qlen is different from 0 and the queue has reached its upper bound
* of tx_qlen elements, replace older packets lowest-priority-first.
*/
static
struct
sk_buff
*
qpolicy_prio_best_skb
(
struct
sock
*
sk
)
{
struct
sk_buff
*
skb
,
*
best
=
NULL
;
skb_queue_walk
(
&
sk
->
sk_write_queue
,
skb
)
if
(
best
==
NULL
||
skb
->
priority
>
best
->
priority
)
best
=
skb
;
return
best
;
}
static
struct
sk_buff
*
qpolicy_prio_worst_skb
(
struct
sock
*
sk
)
{
struct
sk_buff
*
skb
,
*
worst
=
NULL
;
skb_queue_walk
(
&
sk
->
sk_write_queue
,
skb
)
if
(
worst
==
NULL
||
skb
->
priority
<
worst
->
priority
)
worst
=
skb
;
return
worst
;
}
static
bool
qpolicy_prio_full
(
struct
sock
*
sk
)
{
if
(
qpolicy_simple_full
(
sk
))
dccp_qpolicy_drop
(
sk
,
qpolicy_prio_worst_skb
(
sk
));
return
false
;
}
/**
* struct dccp_qpolicy_operations - TX Packet Dequeueing Interface
* @push: add a new @skb to the write queue
* @full: indicates that no more packets will be admitted
* @top: peeks at whatever the queueing policy defines as its `top'
*/
static
struct
dccp_qpolicy_operations
{
void
(
*
push
)
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
);
bool
(
*
full
)
(
struct
sock
*
sk
);
struct
sk_buff
*
(
*
top
)
(
struct
sock
*
sk
);
__be32
params
;
}
qpol_table
[
DCCPQ_POLICY_MAX
]
=
{
[
DCCPQ_POLICY_SIMPLE
]
=
{
.
push
=
qpolicy_simple_push
,
.
full
=
qpolicy_simple_full
,
.
top
=
qpolicy_simple_top
,
.
params
=
0
,
},
[
DCCPQ_POLICY_PRIO
]
=
{
.
push
=
qpolicy_simple_push
,
.
full
=
qpolicy_prio_full
,
.
top
=
qpolicy_prio_best_skb
,
.
params
=
DCCP_SCM_PRIORITY
,
},
};
/*
* Externally visible interface
*/
void
dccp_qpolicy_push
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
qpol_table
[
dccp_sk
(
sk
)
->
dccps_qpolicy
].
push
(
sk
,
skb
);
}
bool
dccp_qpolicy_full
(
struct
sock
*
sk
)
{
return
qpol_table
[
dccp_sk
(
sk
)
->
dccps_qpolicy
].
full
(
sk
);
}
void
dccp_qpolicy_drop
(
struct
sock
*
sk
,
struct
sk_buff
*
skb
)
{
if
(
skb
!=
NULL
)
{
skb_unlink
(
skb
,
&
sk
->
sk_write_queue
);
kfree_skb
(
skb
);
}
}
struct
sk_buff
*
dccp_qpolicy_top
(
struct
sock
*
sk
)
{
return
qpol_table
[
dccp_sk
(
sk
)
->
dccps_qpolicy
].
top
(
sk
);
}
struct
sk_buff
*
dccp_qpolicy_pop
(
struct
sock
*
sk
)
{
struct
sk_buff
*
skb
=
dccp_qpolicy_top
(
sk
);
/* Clear any skb fields that we used internally */
skb
->
priority
=
0
;
if
(
skb
)
skb_unlink
(
skb
,
&
sk
->
sk_write_queue
);
return
skb
;
}
bool
dccp_qpolicy_param_ok
(
struct
sock
*
sk
,
__be32
param
)
{
/* check if exactly one bit is set */
if
(
!
param
||
(
param
&
(
param
-
1
)))
return
false
;
return
(
qpol_table
[
dccp_sk
(
sk
)
->
dccps_qpolicy
].
params
&
param
)
==
param
;
}
net/dccp/sysctl.c
View file @
ded67c0e
...
@@ -18,72 +18,76 @@
...
@@ -18,72 +18,76 @@
#error This file should not be compiled without CONFIG_SYSCTL defined
#error This file should not be compiled without CONFIG_SYSCTL defined
#endif
#endif
/* Boundary values */
static
int
zero
=
0
,
u8_max
=
0xFF
;
static
unsigned
long
seqw_min
=
32
;
static
struct
ctl_table
dccp_default_table
[]
=
{
static
struct
ctl_table
dccp_default_table
[]
=
{
{
{
.
procname
=
"seq_window"
,
.
procname
=
"seq_window"
,
.
data
=
&
sysctl_dccp_sequence_window
,
.
data
=
&
sysctl_dccp_
feat_
sequence_window
,
.
maxlen
=
sizeof
(
sysctl_dccp_sequence_window
),
.
maxlen
=
sizeof
(
sysctl_dccp_
feat_
sequence_window
),
.
mode
=
0644
,
.
mode
=
0644
,
.
proc_handler
=
proc_doulongvec_minmax
,
.
proc_handler
=
proc_dointvec
,
.
extra1
=
&
seqw_min
,
/* RFC 4340, 7.5.2 */
},
},
{
{
.
procname
=
"rx_ccid"
,
.
procname
=
"rx_ccid"
,
.
data
=
&
sysctl_dccp_rx_ccid
,
.
data
=
&
sysctl_dccp_
feat_
rx_ccid
,
.
maxlen
=
sizeof
(
sysctl_dccp_rx_ccid
),
.
maxlen
=
sizeof
(
sysctl_dccp_
feat_
rx_ccid
),
.
mode
=
0644
,
.
mode
=
0644
,
.
proc_handler
=
proc_dointvec_minmax
,
.
proc_handler
=
proc_dointvec
,
.
extra1
=
&
zero
,
.
extra2
=
&
u8_max
,
/* RFC 4340, 10. */
},
},
{
{
.
procname
=
"tx_ccid"
,
.
procname
=
"tx_ccid"
,
.
data
=
&
sysctl_dccp_tx_ccid
,
.
data
=
&
sysctl_dccp_feat_tx_ccid
,
.
maxlen
=
sizeof
(
sysctl_dccp_tx_ccid
),
.
maxlen
=
sizeof
(
sysctl_dccp_feat_tx_ccid
),
.
mode
=
0644
,
.
proc_handler
=
proc_dointvec
,
},
{
.
procname
=
"ack_ratio"
,
.
data
=
&
sysctl_dccp_feat_ack_ratio
,
.
maxlen
=
sizeof
(
sysctl_dccp_feat_ack_ratio
),
.
mode
=
0644
,
.
proc_handler
=
proc_dointvec
,
},
{
.
procname
=
"send_ackvec"
,
.
data
=
&
sysctl_dccp_feat_send_ack_vector
,
.
maxlen
=
sizeof
(
sysctl_dccp_feat_send_ack_vector
),
.
mode
=
0644
,
.
proc_handler
=
proc_dointvec
,
},
{
.
procname
=
"send_ndp"
,
.
data
=
&
sysctl_dccp_feat_send_ndp_count
,
.
maxlen
=
sizeof
(
sysctl_dccp_feat_send_ndp_count
),
.
mode
=
0644
,
.
mode
=
0644
,
.
proc_handler
=
proc_dointvec_minmax
,
.
proc_handler
=
proc_dointvec
,
.
extra1
=
&
zero
,
.
extra2
=
&
u8_max
,
/* RFC 4340, 10. */
},
},
{
{
.
procname
=
"request_retries"
,
.
procname
=
"request_retries"
,
.
data
=
&
sysctl_dccp_request_retries
,
.
data
=
&
sysctl_dccp_request_retries
,
.
maxlen
=
sizeof
(
sysctl_dccp_request_retries
),
.
maxlen
=
sizeof
(
sysctl_dccp_request_retries
),
.
mode
=
0644
,
.
mode
=
0644
,
.
proc_handler
=
proc_dointvec_minmax
,
.
proc_handler
=
proc_dointvec
,
.
extra1
=
&
zero
,
.
extra2
=
&
u8_max
,
},
},
{
{
.
procname
=
"retries1"
,
.
procname
=
"retries1"
,
.
data
=
&
sysctl_dccp_retries1
,
.
data
=
&
sysctl_dccp_retries1
,
.
maxlen
=
sizeof
(
sysctl_dccp_retries1
),
.
maxlen
=
sizeof
(
sysctl_dccp_retries1
),
.
mode
=
0644
,
.
mode
=
0644
,
.
proc_handler
=
proc_dointvec_minmax
,
.
proc_handler
=
proc_dointvec
,
.
extra1
=
&
zero
,
.
extra2
=
&
u8_max
,
},
},
{
{
.
procname
=
"retries2"
,
.
procname
=
"retries2"
,
.
data
=
&
sysctl_dccp_retries2
,
.
data
=
&
sysctl_dccp_retries2
,
.
maxlen
=
sizeof
(
sysctl_dccp_retries2
),
.
maxlen
=
sizeof
(
sysctl_dccp_retries2
),
.
mode
=
0644
,
.
mode
=
0644
,
.
proc_handler
=
proc_dointvec_minmax
,
.
proc_handler
=
proc_dointvec
,
.
extra1
=
&
zero
,
.
extra2
=
&
u8_max
,
},
},
{
{
.
procname
=
"tx_qlen"
,
.
procname
=
"tx_qlen"
,
.
data
=
&
sysctl_dccp_tx_qlen
,
.
data
=
&
sysctl_dccp_tx_qlen
,
.
maxlen
=
sizeof
(
sysctl_dccp_tx_qlen
),
.
maxlen
=
sizeof
(
sysctl_dccp_tx_qlen
),
.
mode
=
0644
,
.
mode
=
0644
,
.
proc_handler
=
proc_dointvec_minmax
,
.
proc_handler
=
proc_dointvec
,
.
extra1
=
&
zero
,
},
},
{
{
.
procname
=
"sync_ratelimit"
,
.
procname
=
"sync_ratelimit"
,
...
...
net/dccp/timer.c
View file @
ded67c0e
...
@@ -87,6 +87,17 @@ static void dccp_retransmit_timer(struct sock *sk)
...
@@ -87,6 +87,17 @@ static void dccp_retransmit_timer(struct sock *sk)
{
{
struct
inet_connection_sock
*
icsk
=
inet_csk
(
sk
);
struct
inet_connection_sock
*
icsk
=
inet_csk
(
sk
);
/* retransmit timer is used for feature negotiation throughout
* connection. In this case, no packet is re-transmitted, but rather an
* ack is generated and pending changes are placed into its options.
*/
if
(
sk
->
sk_send_head
==
NULL
)
{
dccp_pr_debug
(
"feat negotiation retransmit timeout %p
\n
"
,
sk
);
if
(
sk
->
sk_state
==
DCCP_OPEN
)
dccp_send_ack
(
sk
);
goto
backoff
;
}
/*
/*
* More than than 4MSL (8 minutes) has passed, a RESET(aborted) was
* More than than 4MSL (8 minutes) has passed, a RESET(aborted) was
* sent, no need to retransmit, this sock is dead.
* sent, no need to retransmit, this sock is dead.
...
@@ -115,6 +126,7 @@ static void dccp_retransmit_timer(struct sock *sk)
...
@@ -115,6 +126,7 @@ static void dccp_retransmit_timer(struct sock *sk)
return
;
return
;
}
}
backoff:
icsk
->
icsk_backoff
++
;
icsk
->
icsk_backoff
++
;
icsk
->
icsk_rto
=
min
(
icsk
->
icsk_rto
<<
1
,
DCCP_RTO_MAX
);
icsk
->
icsk_rto
=
min
(
icsk
->
icsk_rto
<<
1
,
DCCP_RTO_MAX
);
...
@@ -237,35 +249,32 @@ static void dccp_delack_timer(unsigned long data)
...
@@ -237,35 +249,32 @@ static void dccp_delack_timer(unsigned long data)
sock_put
(
sk
);
sock_put
(
sk
);
}
}
/**
/* Transmit-delay timer: used by the CCIDs to delay actual send time */
* dccp_write_xmitlet - Workhorse for CCID packet dequeueing interface
static
void
dccp_write_xmit_timer
(
unsigned
long
data
)
* See the comments above %ccid_dequeueing_decision for supported modes.
*/
static
void
dccp_write_xmitlet
(
unsigned
long
data
)
{
{
struct
sock
*
sk
=
(
struct
sock
*
)
data
;
struct
sock
*
sk
=
(
struct
sock
*
)
data
;
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
bh_lock_sock
(
sk
);
bh_lock_sock
(
sk
);
if
(
sock_owned_by_user
(
sk
))
if
(
sock_owned_by_user
(
sk
))
sk_reset_timer
(
sk
,
&
d
ccp_sk
(
sk
)
->
dccps_xmit_timer
,
jiffies
+
1
);
sk_reset_timer
(
sk
,
&
d
p
->
dccps_xmit_timer
,
jiffies
+
1
);
else
else
dccp_write_xmit
(
sk
);
dccp_write_xmit
(
sk
,
0
);
bh_unlock_sock
(
sk
);
bh_unlock_sock
(
sk
);
sock_put
(
sk
);
}
}
static
void
dccp_
write_xmit_timer
(
unsigned
long
data
)
static
void
dccp_
init_write_xmit_timer
(
struct
sock
*
sk
)
{
{
dccp_write_xmitlet
(
data
);
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
sock_put
((
struct
sock
*
)
data
);
setup_timer
(
&
dp
->
dccps_xmit_timer
,
dccp_write_xmit_timer
,
(
unsigned
long
)
sk
);
}
}
void
dccp_init_xmit_timers
(
struct
sock
*
sk
)
void
dccp_init_xmit_timers
(
struct
sock
*
sk
)
{
{
struct
dccp_sock
*
dp
=
dccp_sk
(
sk
);
dccp_init_write_xmit_timer
(
sk
);
tasklet_init
(
&
dp
->
dccps_xmitlet
,
dccp_write_xmitlet
,
(
unsigned
long
)
sk
);
setup_timer
(
&
dp
->
dccps_xmit_timer
,
dccp_write_xmit_timer
,
(
unsigned
long
)
sk
);
inet_csk_init_xmit_timers
(
sk
,
&
dccp_write_timer
,
&
dccp_delack_timer
,
inet_csk_init_xmit_timers
(
sk
,
&
dccp_write_timer
,
&
dccp_delack_timer
,
&
dccp_keepalive_timer
);
&
dccp_keepalive_timer
);
}
}
...
@@ -281,7 +290,8 @@ u32 dccp_timestamp(void)
...
@@ -281,7 +290,8 @@ u32 dccp_timestamp(void)
{
{
s64
delta
=
ktime_us_delta
(
ktime_get_real
(),
dccp_timestamp_seed
);
s64
delta
=
ktime_us_delta
(
ktime_get_real
(),
dccp_timestamp_seed
);
return
div_u64
(
delta
,
DCCP_TIME_RESOLUTION
);
do_div
(
delta
,
10
);
return
delta
;
}
}
EXPORT_SYMBOL_GPL
(
dccp_timestamp
);
EXPORT_SYMBOL_GPL
(
dccp_timestamp
);
...
...
net/ipv4/tcp_input.c
View file @
ded67c0e
...
@@ -811,12 +811,25 @@ void tcp_update_metrics(struct sock *sk)
...
@@ -811,12 +811,25 @@ void tcp_update_metrics(struct sock *sk)
}
}
}
}
/* Numbers are taken from RFC3390.
*
* John Heffner states:
*
* The RFC specifies a window of no more than 4380 bytes
* unless 2*MSS > 4380. Reading the pseudocode in the RFC
* is a bit misleading because they use a clamp at 4380 bytes
* rather than use a multiplier in the relevant range.
*/
__u32
tcp_init_cwnd
(
struct
tcp_sock
*
tp
,
struct
dst_entry
*
dst
)
__u32
tcp_init_cwnd
(
struct
tcp_sock
*
tp
,
struct
dst_entry
*
dst
)
{
{
__u32
cwnd
=
(
dst
?
dst_metric
(
dst
,
RTAX_INITCWND
)
:
0
);
__u32
cwnd
=
(
dst
?
dst_metric
(
dst
,
RTAX_INITCWND
)
:
0
);
if
(
!
cwnd
)
if
(
!
cwnd
)
{
cwnd
=
rfc3390_bytes_to_packets
(
tp
->
mss_cache
);
if
(
tp
->
mss_cache
>
1460
)
cwnd
=
2
;
else
cwnd
=
(
tp
->
mss_cache
>
1095
)
?
3
:
4
;
}
return
min_t
(
__u32
,
cwnd
,
tp
->
snd_cwnd_clamp
);
return
min_t
(
__u32
,
cwnd
,
tp
->
snd_cwnd_clamp
);
}
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment