Internet Engineering Task Force A. Ford, Ed. Internet-Draft Roke Manor Research Intended status: Experimental C. Raiciu Expires: November 8, 2009 M. Handley University College London S. Barre Universite catholique de Louvain May 7, 2009 TCP Extensions for Multipath Operation with Multiple Addresses draft-ford-mptcp-multiaddressed-00 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on November 8, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Ford, et al. Expires November 8, 2009 [Page 1] Internet-Draft Multipath TCP May 2009 Abstract Often endpoints are connected by multiple paths, but the nature of TCP/IP restricts communications to a single path per socket. Resource usage within the network would be more efficient were these multiple paths able to be used concurrently. This should enhance user experience through higher throughput and improved resilience to network failure. This document presents extensions to TCP in order to transparently provide this multi-path functionality at the transport layer, if at least one endpoint is multi-addressed. Ford, et al. Expires November 8, 2009 [Page 2] Internet-Draft Multipath TCP May 2009 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Motivations . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Design Assumptions . . . . . . . . . . . . . . . . . . . . 4 1.3. Layered Representation . . . . . . . . . . . . . . . . . . 5 1.4. Operation Summary . . . . . . . . . . . . . . . . . . . . 6 1.5. Open Issues . . . . . . . . . . . . . . . . . . . . . . . 7 1.6. Requirements Language . . . . . . . . . . . . . . . . . . 8 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 3. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 9 4. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1. Session Initiation . . . . . . . . . . . . . . . . . . . . 10 4.2. Address Knowledge Exchange (Path Management) . . . . . . . 11 4.2.1. Explicit Path Management . . . . . . . . . . . . . . . 11 4.2.1.1. Adding Addresses . . . . . . . . . . . . . . . . . 11 4.2.1.2. Remove Address . . . . . . . . . . . . . . . . . . 12 4.2.2. Implicit Path Management . . . . . . . . . . . . . . . 13 4.2.2.1. Request-SYN . . . . . . . . . . . . . . . . . . . 14 4.2.2.2. Request-FIN (Remove Address) . . . . . . . . . . . 15 4.3. Starting a New Subflow . . . . . . . . . . . . . . . . . . 15 4.4. General MPTCP Operation . . . . . . . . . . . . . . . . . 16 4.4.1. Subflow Policy . . . . . . . . . . . . . . . . . . . . 17 4.4.2. Retransmissions . . . . . . . . . . . . . . . . . . . 19 4.4.3. Resync Packet . . . . . . . . . . . . . . . . . . . . 19 4.5. Closing a Connection . . . . . . . . . . . . . . . . . . . 20 4.6. Error Handling . . . . . . . . . . . . . . . . . . . . . . 22 5. Security Considerations . . . . . . . . . . . . . . . . . . . 22 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 23 7. Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 23 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 10.1. Normative References . . . . . . . . . . . . . . . . . . . 24 10.2. Informative References . . . . . . . . . . . . . . . . . . 24 Appendix A. Functional Separation . . . . . . . . . . . . . . . . 25 A.1. Motivations . . . . . . . . . . . . . . . . . . . . . . . 25 A.2. TCP Performance . . . . . . . . . . . . . . . . . . . . . 26 A.3. Architecture overview . . . . . . . . . . . . . . . . . . 26 A.4. PM/MPS interface . . . . . . . . . . . . . . . . . . . . . 28 Appendix B. Notes on use of TCP Options . . . . . . . . . . . . . 29 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 29 Ford, et al. Expires November 8, 2009 [Page 3] Internet-Draft Multipath TCP May 2009 1. Introduction This section describes the motivations behind the design of Multipath TCP (henceforth referred to as MPTCP), a set of extensions for regular TCP [RFC0793] to allow one TCP connection to be spread across multiple paths. The following sections go on to describe the extensions themselves, and its operation. 1.1. Motivations As the Internet evolves, demands on Internet resources are ever- increasing, but often these resources (in particular, bandwidth) cannot be fully utilised due to protocol constrains on both the end- systems and within the network. By the application of resource pooling [WISCHIK], these resources can be 'pooled' such that they appear as a single logical resource to the user. Multipath TCP achieves resource pooling by combining multiple TCP sessions running over multiple paths, and presenting them as a single TCP connection to the application. This form of resource pooling bring two key benefits: o To increase the efficiency of the resource usage, and thus increase the network capacity available to end hosts. o To increase the resilience of the connectivity by providing multiple paths, protecting end hosts from the failure of one. The protocol presented in this document still follows the same service model as TCP [RFC0793]: byte oriented, in order reliable delivery. This leads to a high level goal of the resulting protocol, where 'subflows' on different paths will function independently of one another, i.e. failure of one path should not result in reduced throughput on the other paths. 1.2. Design Assumptions In order to limit the potentially huge design space, the authors imposed two key constraints on the multipath TCP design presented in this document: o It must be backwards-compatible with current, regular TCP, to increase its chances of deployment o It can be assumed that one or both endpoints are multihomed and multiaddressed To simplify the design we assume that the presence of multiple Ford, et al. Expires November 8, 2009 [Page 4] Internet-Draft Multipath TCP May 2009 addresses at an endpoint is sufficient to indicate the existence of multiple paths. These paths need not be entirely disjoint: they may share one or many routers between them. Even in such a situation making use of multiple paths will improve resource utilisation. There are three aspects to the backwards-compatibility listed above: External Constraints: The protocol must function through the vast majority of existing middleboxes such as NATs, firewalls and proxies, and as such must resemble existing TCP as far as possible on the wire. In addition, therefore, we cannot rely on the TCP packets (both headers and payloads) remaining unchanged end-to- end. Application Constraints: The protocol must be usable with no change to existing applications that use the standard TCP API (although it is reasonable that not all features would be available to such legacy applications). Fall-back: The protocol should be able to fall back to standard TCP with no interference from the user, to be able to communicate with legacy hosts. Areas for further study: o In theory, since this is purely a TCP extension, it should be possible to use MPTCP with both IPv4 and IPv6 on dual-stack hosts, thus having the additional possible benefit of aiding transition. o Some features of the design presented here could be extended to work with non-multi-addressed hosts by using packet marking or partial multipath. o Some features of the design presented here could be combined with mechanisms such as shim6 [I-D.ietf-shim6-proto]. It is important to note that this document deliberately avoids any discussion of algorithms for coupling congestion windows in order to achieve optimum performance. Work in this area is ongoing and will be presented in separate documents; considerable discussion can be found in [I-D.van-beijnum-1e-mp-tcp-00] 1.3. Layered Representation MPTCP operates at the transport layer, and its existence aims to be transparent to both higher and lower layers. It is a set of additional features on top of standard TCP, and as such MPTCP is designed to be usable by legacy applications with no changes. A Ford, et al. Expires November 8, 2009 [Page 5] Internet-Draft Multipath TCP May 2009 possible implementation would be for such a feature to be a system- wide setting: "Use multipath TCP by default? Y/N". Multipath-aware applications would be able to use an extended sockets API to have further influence on the behaviour of MPTCP. Figure 1 illustrates this architecture. +-------------------------------+ | Application | +---------------+ +-------------------------------+ | Application | | MPTCP | +---------------+ + - - - - - - - + - - - - - - - + | TCP | | TCP | TCP | +---------------+ +-------------------------------+ | IP | | IP | IP | +---------------+ +-------------------------------+ Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks Detailed discussion of an architecture for developing a multipath TCP implementation, especially regarding the functional separation by which different components should be developed, is given in Appendix A. 1.4. Operation Summary This section will, very briefly, provide a high-level summary of the normal operation case of MPTCP, and is illustrated by the scenario shown in Figure 2. A detailed description of operation is given in Section 4. o To a non-MPTCP-aware application, MPTCP will be indistinguishable from normal TCP. All MPTCP operation is handled by the MPTCP implementation, although extended APIs could provide additional control. An application begins by opening a TCP socket in the normal way. o An MPTCP connection begins as a single TCP session. This illustrated in Figure 2 as being between Addresses A1 and B1 on Hosts A and B respectively. o If extra paths are available, additional TCP sessions are created on these paths, and are combined with the existing session, which continues to appear as a single connection to the applications at both ends. The creation of the additional TCP session is illustrated between Address A2 on Host A and Address B1 on Host B. o MPTCP identifies multiple paths by the presence of multiple addresses at endpoints. Combinations of these multiple addresses Ford, et al. Expires November 8, 2009 [Page 6] Internet-Draft Multipath TCP May 2009 equate to the additional paths. In the example, other potential paths that could be set up are A1<->B2 and A2<->B2. Although this additional session is shown as being initiated from A2, it could equally have been initiated from B1. o The discovery and setup of additional TCP sessions (termed 'subflows') can be achieved through alternative mechanisms, two of which are described in this document for comment. o The exact properties of these TCP sessions that are logically bonded are dependent upon the congestion and flow control characteristics of the endpoints' MPTCP implementation. o MPTCP adds connection-level sequence numbers in order to reassemble the data stream in-order from multiple subflows. Connections are terminated by connection-level FIN packets as well as those relating to the individual subflows. Host A Host B ------------------------ ------------------------ Address A1 Address A2 Address B1 Address B2 ---------- ---------- ---------- ---------- | | | | | (initial connection setup) | | |----------------------------------->| | |<-----------------------------------| | | | | | | (additional subflow setup) | | |--------------------->| | | |<---------------------| | | | | | | | | | Figure 2: Example MPTCP Usage Scenario 1.5. Open Issues This specification is a work-in-progress, and as such there are many issues that are still to be resolved. This section lists many of the key open issues within this specification; these are discussed in more detail in the appropriate sections throughout this document. o Congestion control, and especially mechanisms by which congestion windows should be coupled to best respond to congestion on a path. This is also related to retransmission algorithms, in particular how to decide when to retransmit packets on the same or different paths. Ford, et al. Expires November 8, 2009 [Page 7] Internet-Draft Multipath TCP May 2009 o Correct path/address management scheme. There are two schemes (implicit and explicit) presented in this document. The authors generally tend towards the implicit scheme for simplicity, however both are presented to solicit feedback. Other alternatives are also welcome! o Best handshake mechanisms. This document contains a proposed scheme by which connections and subflows can be set up. It is felt that, although this is "no worse than regular TCP", there could be opportunities for significant improvements in security that could be included (potentially optionally) within this protocol. o Issues around simulataneous opens, where both ends attempt to create a new subflow simultaneously, need to be investigated and behaviour specified. o Appropriate mechanisms for controlling policy of subflow usage. The ECN signal is currently proposed but other alternatives, including path property options, could be employed instead. 1.6. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Terminology Path: A sequence of links between a sender and a receiver, defined in this context by a source and destination address pair. Subflow: A stream of TCP packets sent over a path. A subflow is a component part of a connection between two endpoints. Connection: A collection of one or more subflows, over which an application can communicate between two endpoints. There is a one-to-one mapping between a connection and a socket. Token: A unique identifier given to a multipath connection by an endpoint. May also be referred to as a "Connection ID". Endpoint: A host operating an MPTCP implementation, and either initiating or terminating a MPTCP connection. Ford, et al. Expires November 8, 2009 [Page 8] Internet-Draft Multipath TCP May 2009 3. Semantic Issues In order to support multipath operation, the semantics of some TCP components have changed. To aid clarity, this section collects these semantic changes as a reference. Sequence Number: The TCP sequence number is subflow-specific, with a data sequence number used for reassembly for higher layers. FIN: The FIN only applies to a subflow, not to a connection. For a connection-level FIN, use the DATA FIN option. ACK: The ACK acknowledges the subflow sequence number only, and the mapping to the data sequence number is handled out-of-band. RST: The RST only applies to a subflow. There is no connection- level RST, since it would be impossible to distinguish the two, as the link between a subflow and a connection is established at the SYN handshake. A connection is considered reset if every subflow sends a RST in response. Address List: The address management is handled per-connection to permit the application of per-connection local policy. IP Address: The IP address presented to the application layer in a non-multipath-aware application is that of the first address connected to, even if that address has since been removed from the connection. 4. MPTCP Protocol This section describes the operation of the MPTCP protocol, and is subdivided into sections for each key part of the protocol operation. All MPTCP operations are signalled using optional TCP header fields. These TCP Options will have option numbers allocated by IANA, as discussed in Section 9, and are defined throughout the following subsections. This document currently presents two alternatives for management of addresses to set up additional subflows: Explicit Path Management: Each endpoint shares a list of addresses on which it can be reached. Either endpoint can then initiate new subflows between any pair of these addresses. Ford, et al. Expires November 8, 2009 [Page 9] Internet-Draft Multipath TCP May 2009 Implicit Path Management: A multihomed endpoint starts additional subflows, by connecting from an address not currently in use in the connection to a destination that is in use in an existing subflow. We present these alternatives in order to solicit feedback on the most appropriate mechanism to use for maximum compatibility and thus liklihood of take-up. Briefly, the key differences are that explicit path management provides additional flexibility in the ability of endpoints to use any combination of addresses (not just those already active), whereas implicit path management is relatively simpler (requiring fewer TCP options), and also has functionality to work around NATs. 4.1. Session Initiation Session Initiation begins with a SYN, SYN/ACK exchange on a single path. Each of these packets will additionally feature the Multipath Capable TCP option (Figure 3, which declares the sender's locally unique 32-bit token for this connection, and a version field. The "Multipath Capable" option declares an endpoint to be capable of operating Multipath TCP (or rather, more accurately, a desire to operate Multipath TCP on this particular connection). As well as this declaration, this field presents a token, which is used when adding additional subflows to this connection. This token is generated by the sender and has local meaning only, but it must be unique for the sender. The token should be difficult for an attacker to guess, and thus it is recommended to be generated randomly. (However, see further discussions about security in Section 5.) This option is only present in packets with the SYN flag set. It is only used in the first TCP session of a connection, in order to identify the connection; all following connections will use path management techniques to join the existing connection. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------------------------------+ | Kind=OPT_MPC | Length = 7 |(resvd)|Version| Sender Token : +---------------+---------------+-------------------------------+ : Sender Token (continued - 4 octets total) | +-----------------------------------------------+ Figure 3: Multipath Capable option The version field represents the version of MPTCP in use. The Ford, et al. Expires November 8, 2009 [Page 10] Internet-Draft Multipath TCP May 2009 version provided in this specification is 0. The reserved bits may be used for connection-specific flags in later versions. If a SYN contains a "multipath capable" option but the SYN/ACK does not, it is assumed that the recipient is not multipath capable and thus the MPTCP session will operate as regular, single-path TCP. If a SYN does not contain a "multipath capable" option, the SYN/ACK MUST NOT contain one in response. If these packets are unacknowledged, it is up to local policy to decide how to respond. It is expected that a sender will eventually fall back to single-path TCP (i.e. without the Multipath Capable Option), in order to work around middleboxes that may drop packets with unknown options, however the number of multipath-capable attempts that are made first will be up to local policy. In the case of out-of-order packets, i.e. if a multipath-capable SYN/ACK is received in response to a multipath-capable SYN, after a standard SYN has been sent, then once again it is up to the sender to choose how to behave. For example, the sender could respond to new connections using the previously declared token, or it could simply drop any new multipath options within the flow. If an endpoint is known to be multiaddressed (e.g. through multiple addresses returned in a DNS lookup), alternative destination addresses should be tried first, before falling back to regular TCP. 4.2. Address Knowledge Exchange (Path Management) This section presents two alternative path management techniques, as introduced at the start of Section 4. 4.2.1. Explicit Path Management With explicit path management, the addresses over which a host is accessible are announced to the other party through in-band signalling, and then hosts can set up new TCP subflows on any subset of combinations of (source, destination) address pairs. Either endpoint can initiate the creation of a new subflow. 4.2.1.1. Adding Addresses Announcing additional addresses that an endpoint can be reached on will be undertaken by the Add Address TCP Option (Figure 4), where an (index, address) pair can be announced to the other endpoint. Several addresses can be added if there is sufficient TCP option space, otherwise multiple TCP messages containing this option must be sent. This option can be used at any time during a connection; not just at the initial SYN/ACK exchange. Ford, et al. Expires November 8, 2009 [Page 11] Internet-Draft Multipath TCP May 2009 The Add Address option announces a list of alternative IP addresses, beyond the current one in use, that the sender can be contacted on. This option can be used multiple times until all available addresses have been announced, in order to get around TCP option space limits. It should be noted that every address has an index which can be used for address removal, and therefore endpoints must cache the mapping between index and address. The index must be unique to the sender, and although it is expected to be sequential this is not mandated. This option is shown for IPv4. For IPv6, the IPVer field will read 6, and the length of the address will be 16 octets not 4, and thus the length of the option will be 2 + (18 * number_of_entries). Multiple addresses can be included, with an index following on immediately from the previous address, and their existance can be inferred through the option length and version fields. NB: by having a IPVer field, we get four free reserved bits. These could be used in later versions of this protocol, e.g. one bit for "use now" or similar, to differentiate between subflows for backup purposes and those for throughput. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+---------------+-------+-------+ | Kind=OPT_ADDR | Length | Index | IPVer |(resvd)| +---------------+---------------+---------------+-------+-------+ | Address (IPv4 - 4 octets) | +---------------------------------------------------------------+ ( ... further Index/Version/Address fields as required ... ) Figure 4: Add Address option (for IPv4) If an index is already in use, it should be treated as a request to remove the existing address (see Section 4.2.1.2) followed by a new addition at that new index. 4.2.1.2. Remove Address If, during the lifetime of a MPTCP connection, a previously-announced address becomes invalid (e.g. if the interface disappears), the affected endpoint should announce this so that the other endpoint can remove subflows related to this address. This is achieved through the Remove Address option (Figure 5), which will remove a previously-added address (or list of addresses) from a connection and terminate any subflows currently using it. The sending and receipt of this message should trigger the sending of Ford, et al. Expires November 8, 2009 [Page 12] Internet-Draft Multipath TCP May 2009 FINs by both endpoints on the affected subflow(s) (if possible), as a courtesy to cleaning up middlebox state, but endpoints may clean up their internal state without a long timeout. If there is no address at the requested indices, the receiver will silently ignore the request. Address removal is undertaken by index, so as to permit the use of (MPTCP-aware) NATs and other middleboxes, in the cases where new connections have been initiated but now want to be removed. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+---------------+ |Kind=OPT_REMADR| Length = 2+n | Index | ... +---------------+---------------+---------------+ Figure 5: Remove Address option 4.2.2. Implicit Path Management As opposed to the explicit path management presented above, this method does not exchange a list of addresses which are then (independently) used to set up subflows. Instead, knowledge of an endpoint's alternative addresses is obtained only when an additional subflow is being set up (see Section 4.3). Subflows are started, joining a pre-existing connection, with no pre-negotiation. A "Request-SYN" option can also used to request a SYN in the reverse direction, in order to get around middleboxes, notably NATs. The implicit mechanism makes use of SYNs and connection identifiers in order to add new subflows to an existing connection. The following is an example of how this should work: o An endpoint that is multihomed starts an additional TCP session to an address/port pair that is already in use on the other endpoint, using a token to identify the flow (Section 4.3). (A multihomed destination may open a new subflow from its new address to the source address and port, or a multihomed source may open a new subflow from its new address another connection to the existing destination and port). o To expand upon this, say a connection is intiated from host "A" on (address, port) combination A1 to desintation (address, port) B1 on host "B". If host A is multihomed, it starts an additional connection from new (address, port) A2 to B1, using B's previously declared token. Alternatively, if B is multhomed, it will try to set up a new TCP connection from B2 to A1, using A's previously Ford, et al. Expires November 8, 2009 [Page 13] Internet-Draft Multipath TCP May 2009 declared token. o Simultaneously, a "Request-SYN" option is sent on an existing TCP connection, asking the recipient to try to open a connection to the sender's additional address. This is intended to permit new sessions to be opened if one endpoint is behind a NAT. o Using the previous notation, this would be a Request-SYN packet sent from A1 to B1 requesting a SYN to be sent from B1 to A2. As can be seen, the implicit path management is designed for ease of deployment and operation through middleboxes such as NATs. The main drawback is that new subflows can only be started with one of the two addresses being part of an existing subflow, since there is no separate exchange of addresses. This improves security and simplicity but limits the flexibility and speed of being able to set up entirely disjoint subflows immediately on an address list exchange. However, once multiple addresses exist at one endpoint, the other endpoint can target new connections at any or all of these. 4.2.2.1. Request-SYN This packet requests the recipient to send a SYN (with a join option, discussed in Section 4.3) to the presented IP address to initiate a new subflow. The motivation for this is to get around NATs and firewalls that may block SYN packets in the forward direction. This packet could be seen as fulfilling the same function as "Add Address" for explicit path management. This option is shown for IPv4. For IPv6, the IPVer field will read 6, and the length of the address will be 16 octets not 4, and thus the length of the option will be 19. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-------+---------------+ |Kind=OPT_REQSYN| Length | IPVer |(resvd)| Address... : +---------------+---------------+-------+-------+---------------+ : ... Address (4 octets - IPv4 version only) | +-----------------------------------------------+ Figure 6: Request-SYN option (IPv4 version) OPEN ISSUES: Must the recipient reply from the same address? Can a nonce be used for security here by echoing it in the "join" option in the SYN? Do we need anything to prevent DoS here? We also will need to define the logic of responding to this versus having already sent a SYN (related to the simultaneous open issue). Ford, et al. Expires November 8, 2009 [Page 14] Internet-Draft Multipath TCP May 2009 4.2.2.2. Request-FIN (Remove Address) OPEN ISSUE: Do we want to be able to do Request-FIN? It would be used to do cleanups of other subflows, e.g. when an interface becomes unavailable (i.e. like "Remove Address" for explicit path management). Assuming we need this option, somehow we need to be able to identify the existing subflows. This is particularly difficult when there is no subflow identifier. The primary reason for this message is to allow a sender to tell its receiver that a particular inferface has been unexpectedly lost, and thus it should close any connections associated with it. Although this is purely an efficiency and not essential to the operation of the protocol, it would nevertheless be useful to deploy such a mechanism. As currently proposed, this option will not work through non-MPTCP-aware NATs, and so it should not be expected to be received. This option works by a sender identifying the source address that is no longer valid. A Request-FIN requests the recipient to send a FIN on the affected subflow(s), and then it can close the subflows with a short timeout. The sender should also send FINs, however the Request-FIN is used to help clean up state on middleboxes on subflows that have unexpectedly broken. This option is shown for IPv4. For IPv6, the IPVer field will read 6, and the length of the address will be 16 octets not 4, and thus the length of the option will be 19. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-------+---------------+ |Kind=OPT_REQFIN| Length | IPVer |(resvd)| Address... : +---------------+---------------+-------+-------+---------------+ : ... Address (4 octets - IPv4 version only) | +-----------------------------------------------+ Figure 7: Request-FIN option (IPv4 version) 4.3. Starting a New Subflow Endpoints have knowledge of their own multiple addresses, and can become aware of the other endpoint's addresses through a path management technique as described in Section 4.2. Once this knowledge has been gathered, an endpoint will want to initiate a new subflow over a currently unused pair of addresses. A new subflow is started as a normal TCP SYN/ACK exchange, to (or Ford, et al. Expires November 8, 2009 [Page 15] Internet-Draft Multipath TCP May 2009 from) a different address to one already in use. The following TCP option is used to identify which connection the new subflow should become part of. The token used is the locally unique token of the destination for the connection, as defined by the Multipath Capable option received in the first SYN/ACK exchange. It should be noted that, in theory, additional subflows can exist between any pair of ports, and as such it is this token that is used for demuxing at the receiver. A receiver must store some mapping state, of (source_addr, dest_addr, source_port, dest_port) to its token, using information from the initial SYN exchange, in order to enable this. In practice, however, it is envisaged that most new subflows will connect to a port that is already in use as the source or destination port of an existing subflow, in order to have a greater chance of getting through firewalls and other middleboxes, and to support traffic engineering of the flows. This option can only be present when the SYN flag is set. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------------------------------+ | Kind=OPT_JOIN | Length = 6 |Receiver Token (4 octets total): +---------------+---------------+-------------------------------+ : Receiver Token (continued) | +-------------------------------+ Figure 8: Join Connection option 4.4. General MPTCP Operation This section discusses operation of MPTCP for data transfer, independent of the path management mechanism used. At a high level, the an MPTCP implementation will take one input data stream from an application, and split it into one or more subflows. The data stream as a whole can be reassembled through the use of the Data Sequence Number (Figure 9) option, which defines the sequence in the data stream of the first octet of the packet's payload, and this is used by the receiver to ensure in-order delivery to th applicationlayers. Meanwhile, the subflow-level sequence numbers (i.e. the regular TCP header sequence numbers) have subflow-only relevance. The only acknowledgements are those at the subflow-level, so the sender must be able to map these acknowledgements to the data sequence numbers that were contained in the relevant packets. The sender thus knows, if subflow data goes unackowledged, which part of the original data stream this equates to, and thus what data must be retransmitted. It is expected (but not mandated) that SACK [RFC2018] Ford, et al. Expires November 8, 2009 [Page 16] Internet-Draft Multipath TCP May 2009 is used as an efficiency at the subflow level. Each subflow will maintain its own congestion widow. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+------------------------------+ | Kind=OPT_DSN | Length | Data Sequence Number... : +---------------+---------------+------------------------------+ : ... ( (length-2) octets ) | +-------------------------------+ Figure 9: Data Sequence Number option As a TCP option contains a length field, the length of the Data Sequence Number can be declared implicitly. Although it is expected that initial implementations will use 32-bit sequence numbers (i.e. 4 octets, so a length field of 6), setting the length field to 10 and including a 64-bit sequence number (of four octets) MUST be considered valid and processed appropriately. This may have also have useful security implications, discussed in Section 5. As wth the standard TCP sequence number, the data sequence number should not start at zero, but at a random value to make session hi- jacking harder. The Data Sequence Number is included in every MPTCP packet that contains data (or a DATA FIN, see Section 4.5), even if only one path is in use, so long as the MPTCP handshake has been completed and the endpoints have therefore agreed to use MPTCP. The MPTCP data and subflow level sequence numbering could be said to be analogous to that used in SACK, however there are subtle differences. The key similarity is that it is possible to have temporary "holes" in the received data sequence space - later data may have arrived earlier (most likely on a different subflow), but does not need to be retransmitted. The "holes" are later filled in. The key difference, however, is that while SACK can rely on the regular TCP cumulative acknowledgements to indicate how much data has been successfully received (with no holes), there is no similar method in MPTCP. Instead, the sender must keep track of the acknowledgements to derive what data has been successfully received. This leads to some oddities especially with session termination (see Section 4.5). 4.4.1. Subflow Policy Within a local MPTCP implementation, a host may use any policy it wishes to decide how to share the traffic to be sent over the Ford, et al. Expires November 8, 2009 [Page 17] Internet-Draft Multipath TCP May 2009 available paths. In the typical use case, where the goal is to maximise throughput, it is necessary to couple the congestion windows in use on each subflow, in order to react in the most appropriate way to congestion on subflows. This is the subject of significant theoretical and practical research outside the scope of this document. In other use cases, a user may split traffic across available subflows according to local policy. Typically such cases would be an 'all-or-nothing' approach, i.e. have a second path ready for use in the event of failure of the first path, but alternatives could include entirely saturating one path before using an additional path (the 'overflow' case). Such choices would be most likely based on the monetary cost of links, but may also be based on properties such as delay or bandwidth, in cases where the additional paths are significantly worse and not worth including in the base operation. Other metrics such as this could be wrapped into an overall "cost" metric for a link. The ability to make effective choices at the sender requires full knowledge of the path characteristics, which is unlikely to be the case. There is no mechanism in MPTCP for a receiver to signal their own particular preferences for paths, but this is a necessary feature since receivers will often be the multihomed party, such as in the case of laptop computers with wired and wireless connectivity. Instead of incorporating complex signalling, it is proposed to use existing TCP features to signal priority implicitly. If a receiver wishes to keep a path active as a backup but wishes to prevent data being sent on that path, this could be achieved by the receiver not sending ACKs for any data it receives on that path. The sender would interpret this as severe congestion or a broken path and stop using it. We do not advocate this method, however, since this is brutal, naive, and will result in unnecessary retransmissions. Therefore, it is proposed to use ECN [RFC3168] to to provide fake congestion signals on paths that a receiver wishes to stop being used for data. This has the benefit of causing the sender to back off without the need to retransmit data unnecessarily, as in the case of a lost ACK. This should be sufficient to allow a receiver to express their policy, although does not permit a rapid increase in throughput when switching to such a path. A potential solution to this would be that, if there is significant congestion, or the set of available paths has changed, MPTCP should wipe all subflow state and restart the multiplicative increase on all paths that appear uncongested. ECN will stop any paths that are still not required immediately, while the receiver's desired backup path will be in use and throughput will increase quickly. This proposal should be no worse Ford, et al. Expires November 8, 2009 [Page 18] Internet-Draft Multipath TCP May 2009 than current TCP. 4.4.2. Retransmissions This protocol specification does not mandate any mechanisms for handling retransmissions in the event of path failures, and much will be dependent upon local policy (as discussed in Section 4.4.1). The data sequence number, as given in a TCP option, is used to reassemble the incoming streams before presentation to the application layers, so a sender is free to re-send data with the same data sequence number on a different subflow. When doing this, it may be necessary to use the re-sync packet (Section 4.4.3) in order to skip over the subflow sequence numbers that were not retransmitted on the original subflow. Of course, such a retransmission will only occur if this is what local policy suggests. Indeed, it may be equally valid to retransmit on the same subflow if alternative paths have considerably worse quality of service, or are only kept for backup purposes. Similarly, local implementation/policy will also determine how to modify the treatment of paths after packet loss - for example, how long to wait until returning to treating it as the preferred path. Additionally, it may be possible for some implementations to signal from lower layers if there are problems with the paths, and so more appropriate responses can occur. 4.4.3. Resync Packet The resync packet is used in certain circumstances when a sender needs to instruct the receiver to skip over certain subflow sequence numbers (i.e. to treat the specified sequence space as having been received and acknowledged). The typical use of this option will be when packets are retransmitted on different subflows, after failing to be acknowledged on the original subflow. In such a case, it becomes necessary to move forward the original subflow's sequence numbering so as not to later transmit different data with a previously used sequence number (i.e. when more data comes to be transmitted on the original subflow, it would be different data, and so must not be sent with previously-used (but unacknowledged) sequence numbering). The rationale for needing to do this is two-fold: firstly, when ACKs are received they are for the subflow only, and the sender infers from this the data that was sent - if the same sequence space could be occupied by different data, the sender won't know whether the intended data was received. Secondly, certain classes of middleboxes may cache data and not send the new data on a previously-seen Ford, et al. Expires November 8, 2009 [Page 19] Internet-Draft Multipath TCP May 2009 sequence number. Therefore, it is necessary to 're-sync' the expected sequence numbering at the receiving end of a subflow, using the following TCP option. This packet declares a sequence number space (inclusive) which the receiving node should skip over, i.e. if the receiver's next expected sequence number was previously within the range start_seq_num to end_seq_num, move it forward to end_seq_num + 1. This option will be used on the first new packet on the subflow that needs its sequence numbering re-synchronised. It will be continue to be included on every packet sent on this subflow until a packet containing this option has been acknowledged (i.e. if subflow acknowledgements exist for packets beyond the end sequence number). If the end sequence number is earlier than the current expected sequence number (i.e. if a resync packet has already been received), this option should be ignored. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+------------------------------+ |Kind=OPT_RESYNC| Length = 10 | Start Sequence Number : +---------------+---------------+------------------------------+ : (4 octets) | End Sequence Number : +---------------+---------------+------------------------------+ : (4 octets) | +-------------------------------+ Figure 10: Resync option 4.5. Closing a Connection Under single path TCP, a FIN signifies that the sender has no more data to send. In order to allow subflows to operate independently, however, and with as little change from regular TCP as possible, a FIN in MPTCP will only affect the subflow on which it is sent. This allows nodes to exercise considerable freedom over which paths are in use at any one time. The semantics of a FIN remain as for regular TCP, i.e. it is not until both sides have ACKed each other's FINs that the subflow is fully closed. When an application calls close() on a socket, this indicates that it has no more data to send, and for regular TCP this would result in a FIN on the connection. For MPTCP, an equivalent mechanism is needed, and this is the DATA FIN. This option, shown in Figure 11, is attached to a regular FIN option on a subflow. A DATA FIN is an indication that the sender has no more data to send, Ford, et al. Expires November 8, 2009 [Page 20] Internet-Draft Multipath TCP May 2009 and as such can be used as a rapid indication of the end of data from a sender. Therefore, it is an optimisation to clean up state associated with a MPTCP connection, especially when some subflows may have failed. Specifically, when a DATA FIN has been received, IF all data has been successfully received, timeouts on all subflows MAY be reduced. Similarly, when sending a DATA FIN, once all data (including the DATA FIN has been acknowledged, FINs must be sent on every subflow. This applies to both endpoints, and is required in order to clean up state in middleboxes. There are complex interactions, however, between a DATA FIN and subflow properties: o A DATA FIN can only be sent on a packet which also has the FIN flag set. o A DATA FIN occupies one octet (the final octet) of Data Sequence Number space. Therefore, even if there is no user data, a Data Sequence Number option must be added to a packet containing the DATA FIN option. This allows the receiver to easily determine the last data sequence number that should have been received. o There is a one-to-one mapping between the DATA FIN and the subflow's FIN flag (and its associated sequence space and thus its acknowlegement). In other words, when a subflow's FIN flag has been acknowledged, the associated DATA FIN is also acknowledged. o As such, the acknowledgement of a FIN and DATA FIN DOES NOT indicate that all data has been successfully received; this must wait for all subflows to acknowledge. It should be noted that an endpoint may also send a FIN on an individual subflow to shut it down, but this impact is limited to the subflow in question. If all subflows have been closed with a FIN, that is equivalent to having closed the connection with a DATA FIN. 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +---------------+---------------+ | Kind=OPT_DFIN | Length = 2 | +---------------+---------------+ Figure 11: DATA FIN option Ford, et al. Expires November 8, 2009 [Page 21] Internet-Draft Multipath TCP May 2009 4.6. Error Handling TBD Unknown token in MPTCP SYN should equate to an unknown port, e.g. a TCP reset? We should make this as silent and tolerant as possible. Where possible, we should keep this close to the semantics of TCP. The amount of error handling required may also have an impact on the choice of path management schemes. Issues may include odd cases where a data sequence number is missing from a subflow. Will definitely need errors in those cases. 5. Security Considerations TBD (Token generation, handshake mechanisms, new subflow authentication, etc...) The development of a TCP extension such as this will bring with it many additional security concerns. We have set out here to produce a solution that is "no worse" than current TCP, with the possibility that more secure extensions could be proposed later. The primary area of concern will be around the handshake to start new subflows which join existing connections. The proposal set out in Section 4.1 and Section 4.3 is for the initiator of the new subflow to include the token of the other endpoint in the handshake. The purpose of this is to indicate that the sender of this token was the same entity that received this token at the initial handshake. One area of concern is that the token could be simply brute-forced. The token must behard to guess, and as such could be randomly generated. This may still not be strong enough, however, and so the use of 64 bits for the token would alleviate this somewhat. Use of these tokens only provide an indication that the token is the same as at the initial handshake, and does not say anything about the current sender of the token. Therefore, another approach would be to bring a new measure of freshness in to the handshake, so instead of using the initial token a sender could request a new token from the receiver to use in the next handshake. Yet another alternative would be for the SYN packet to include a data sequence number. This could either be used as a passive identifier to indicate an awareness of the current data sequence number (although a reasonable window would have to be allowed for delays). Ford, et al. Expires November 8, 2009 [Page 22] Internet-Draft Multipath TCP May 2009 Or, the SYN could form part of the data sequence space - but this would cause issues in the event of lost SYNs (if a new subflow is never established), thus causing unnecessary delays for retransmissions. The "Request-FIN" option (if included) is possibly vulnerable to TCP- Reset style attacks, however the presense of the subflow and data- level sequence numbers should provide some level of freshness verification. 6. Interactions with Middleboxes TBD How we get around NATs, firewalls. Problems with TCP proxies. How to make an MPTCP-aware middlebox, ... 7. Interfaces TBD Interface with applications, interface with TCP, interface with lower layers... 8. Acknowledgements The authors are supported by Trilogy (http://www.trilogy-project.org), a research project (ICT-216372) partially funded by the European Community under its Seventh Framework Program. The views expressed here are those of the author(s) only. The European Commission is not liable for any use that may be made of the information in this document. The authors gratefully acknowledge significant input into this document from many members of the Trilogy project, notably Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo Braun, Robert Hancock, Pasi Sarolahti, Olivier Bonaventure, Toby Moncaster, Philip Eardley and Andrew McDonald. 9. IANA Considerations This document will make a request to IANA to allocate new values for TCP Option identifiers, as follows: Ford, et al. Expires November 8, 2009 [Page 23] Internet-Draft Multipath TCP May 2009 +------------+-----------------+----------+-----------------+-------+ | Symbol | Name | PM | Ref | Value | +------------+-----------------+----------+-----------------+-------+ | OPT_MPC | Multipath | - | Section 4.1 | (tbc) | | | Capable | | | | | OPT_ADDR | Add Address | Explicit | Section 4.2.1.1 | (tbc) | | OPT_REMADR | Remove Address | Explicit | Section 4.2.1.2 | (tbc) | | OPT_REQSYN | Request-SYN | Implicit | Section 4.2.2.1 | (tbc) | | OPT_REQFIN | Request-FIN | Implicit | Section 4.2.2.2 | (tbc) | | OPT_JOIN | Join Connection | - | Section 4.3 | (tbc) | | OPT_DSN | Data Sequence | - | Section 4.4 | (tbc) | | | Number | | | | | OPT_RESYNC | Re-sync | - | Section 4.4.3 | (tbc) | | OPT_DFIN | DATA FIN | - | Section 4.5 | (tbc) | +------------+-----------------+----------+-----------------+-------+ Table 1: TCP Options for MPTCP 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 10.2. Informative References [I-D.eddy-tcp-loo] Eddy, W. and A. Langley, "Extending the Space Available for TCP Options", draft-eddy-tcp-loo-04 (work in progress), July 2008. [I-D.ietf-shim6-proto] Nordmark, E. and M. Bagnulo, "Shim6: Level 3 Multihoming Shim Protocol for IPv6", draft-ietf-shim6-proto-12 (work in progress), February 2009. [I-D.van-beijnum-1e-mp-tcp-00] van Beijnum, I., "One-ended Multipath TCP", draft-van-beijnum-1e-mp-tcp-00 (work in progress), May 2009. [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996. Ford, et al. Expires November 8, 2009 [Page 24] Internet-Draft Multipath TCP May 2009 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 4960, September 2007. [RFC5061] Stewart, R., Xie, Q., Tuexen, M., Maruyama, S., and M. Kozuka, "Stream Control Transmission Protocol (SCTP) Dynamic Address Reconfiguration", RFC 5061, September 2007. [RFC5062] Stewart, R., Tuexen, M., and G. Camarillo, "Security Attacks Found Against the Stream Control Transmission Protocol (SCTP) and Current Countermeasures", RFC 5062, September 2007. [WISCHIK] Wischik, D., Handley, M., and M. Bagnulo Braun, "The Resource Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52, October 2008, . Appendix A. Functional Separation [Potential to move to separate architectural document] This section describes the functional separation that drives the design of the MPTCP protocol. Its main goal is to separate MPTCP in two parts that communicate through a well defined interface. We first provide the motivations for this functional separation, then we describe in more details the two main components of the MPTCP architecture. A.1. Motivations The major goal behind MPTCP is to send data over different paths in the same time. This assumes that an MPTCP implementation must be able to discover and use the multiple paths that connect two given hosts, when they exist. However, different mechanisms can be envisioned for multipath discovery and use. Examples are as follows: Use multiple addresses: This is the method currently proposed in this document - if hosts are multi-addressed, different address pairs may take different routes. Ford, et al. Expires November 8, 2009 [Page 25] Internet-Draft Multipath TCP May 2009 Use a path selector value: An end-host might be able to tag packets with a path selector value, or "colour". If some network nodes are able to read the colour and use it as a path selector, the host can influence the outgoing path of the packet. Next-hop selection: In a network configuration where multiple next- hops can offer to forward packets, a host may decide to send some of its packets through one next-hop, and some through another. The above list is not exhaustive, and could grow as new network technologies are deployed. A.2. TCP Performance In addition to purely sending data over multiple paths, MTCP must do it in a way that will not affect TCP performance. This raises the need for an efficient multipath congestion control algorithm. While this specification does not mandate the use of any particular algorithm for congestion control, it ensures that the protocol is designed in such a way that any CC algorithm can be designed, independently of the particular path management mechanism available to the host. Consequently our architecture for MTCP decouples the policy from the mechanism. The policy is the decision of what path to use for each packet to send. It is mainly driven by the implementation-dependent congestion control algorithm. The mechanism is the technology used to ensure that a packet will be sent on the desired path. This separation is intended to be relatively future- proof by allowing these components to evolve at different speeds. A.3. Architecture overview Ford, et al. Expires November 8, 2009 [Page 26] Internet-Draft Multipath TCP May 2009 Control plane <-- | --> Data plane +---------------------------------------------------------------+ | Multipath Scheduler (MPS) | +---------------------------------------------------------------+ ^ | | | | | |Announcing new | +-------------+ |paths. (referred | | Data packet |<--Path idx:3 |to as path indices) | +-------------+ attached | | | by MPS | | V +--------------------------------------------\------------------+ | Path Manager (PM) \__________zzzzz | +--------------------------------------------------------\------+ / \ | \ /---------------------\ | /"\ /"\ /"\ | Path key Action | | | | | | | | | 1 xxxxx | | | | | | | | | 2 yyyyy | | \./ \./ \./ | 3 zzzzz | | path1 path2 path3 +---------------------+ Figure 12: Overview of MTCP architecture A general overview of the architecture is provided in Figure 12. The Multipath Scheduler (MPS) learns about the number of available paths through notifications received from the Path Manager (PM). From the point of view of the Multipath Scheduler, a path is just a number, called a Path Index. Notifications from the PM to the MPS MAY contain supporting information about the paths, if relevant, so that the MPS can make more intelligent decisions about where to route traffic. When the Multipath Scheduler initiates a communication to a new host, it can only send the packets to the default path. But since the Path manager is layered below the MPS, it can detect that a new communication is happening, and tell the MPS about the other paths it knows about. From then on, it is possible for the MPS to attach a Path Index to the control structure of its packets (internal to the MTCP implementation), so that the Path Manager can map this Path Index to the corresponding action. (see table in the lower left part of Figure 12). The particular action depends on the network mechanism used to select a path. Examples are address rewriting, tunnelling or setting a path selector valude inside the packet. The applicability of the architecture is not limited to the MTCP protocol. While we define in this document an MTCP MPS (MTCP Multipath Scheduler), other Multipath Schedulers can be defined. For Ford, et al. Expires November 8, 2009 [Page 27] Internet-Draft Multipath TCP May 2009 example, if an appropriate socket interface is designed, applications could behave as a Multipath Scheduler and decide where to send any particular data. In this document we concentrate on the MTCP case, however. In this specification, we define the core protocol for Multipath TCP. The core protocol is not dependent on the Path Management technique that is chosen, and MUST be implemented in any MTCP MPS. We also provide a default Path Manager that is based on declaring IP addresses, and carries control information in TCP options. An implementation of Multipath TCP can use any Path Manager, but it MUST be able to fallback to the default PM in case the other end does not support the custom PM. Alternative Path Managers may be specified in separate documents in the future. A.4. PM/MPS interface The minimal set of requirement for a Path Manager is as follows: o Outgoing untagged packets: Any outgoing packet flowing through the Path Manager is either tagged or untagged (by the MPS) with a path index. If it is untagged, the packet is sent normally to the Internet, as if no multi-path support were present. Untagged packets can be used to trigger a path discovery procedure, that is, a Path Manager can listen to untagged packets and decide at some time to find if any other path than the default one is useable for the corresponding host pair. Note that any other criteria could be used to decide when to start discovering available paths. Note also that MPS scheduling will not be possible until the Path Manager has notified the available paths. The PM is thus the first entity coming into action. o Outgoing tagged packets: The Path Manager maintains a table mapping path indices to actions. The action is the operation that allows using a particular path. Examples of possible actions are route selection, interface selection or packet transformation. When the PM sees a packet tagged with a path index, it looks up its table to find the appropriate action for that packet. The tag is purely local. It is removed before the packet is transmitted. o Incoming packets: A Path Manager MUST ensure that incoming path is mapped unambiguously to exactly one outgoing path. Note that this requirement implies that the same number of incoming/outgoing paths must be established. Moreover, a PM MUST tag any incoming path with the same Path Index as the one used for the corresponding outgoing path. This is necessary for MTCP to know what outgoing path in acknowledged by an incoming packet. Ford, et al. Expires November 8, 2009 [Page 28] Internet-Draft Multipath TCP May 2009 o Module interface: A PM MUST be able to notify the MPS about the number of available paths. Such notifications MUST contain the path indices that are legal for use by the MPS. In case the PM decides to stop providing service for one path, it MUST notify the MPS about path deletion. Additionnaly, a PM MAY provide complementary path information when available, such as link quality or preference level. Appendix B. Notes on use of TCP Options The TCP option space is limited due to the length of the Data Offset field in the TCP header (4 bits), which defines the TCP header length in 32-bit words. With the standard TCP header being 20 bytes, this leaves a maximum of 40 bytes for options, and many of these may already be used by options such as timestamp and SACK. As such, when doing address list manipulation, not all data may fit. This can be mitigated in one of two ways: o Using an option to extend the option space, such as that proposed in [I-D.eddy-tcp-loo], which proposes an option providing a 16-bit header length field. Such an option could only be used between nodes that support it, however, and so long options could not be used until a handshake is complete. o Alternatively, since at least one IP address option field should be able to fit per packet, address list manipulation can be undertaken with one address per packet. One method could be to wait for data to send, and then append one new address per packet. This would seem reasonable if the TCP session begins rapidly, but if it is required that the multipath session is ready before the first data is to be sent, address list manipulation would be required on empty data (signalling only) packets. Issues may arise regarding acknowledged delivery of signalling versus data - this is discussed in Section 3 below. Ford, et al. Expires November 8, 2009 [Page 29] Internet-Draft Multipath TCP May 2009 Authors' Addresses Alan Ford (editor) Roke Manor Research Old Salisbury Lane Romsey, Hampshire SO51 0ZN UK Phone: +44 1794 833 465 Email: alan.ford@roke.co.uk Costin Raiciu University College London Email: c.raiciu@cs.ucl.ac.uk Mark Handley University College London Sebastien Barre Universite catholique de Louvain Pl. Ste Barbe, 2 Louvain-la-Neuve 1348 Belgium Phone: +32 10 47 91 03 Email: sebastien.barre@uclouvain.be Ford, et al. Expires November 8, 2009 [Page 30]