-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrfc9293.txt
5576 lines (4370 loc) · 258 KB
/
rfc9293.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Internet Engineering Task Force (IETF) W. Eddy, Ed.
STD: 7 MTI Systems
Request for Comments: 9293 August 2022
Obsoletes: 793, 879, 2873, 6093, 6429, 6528,
6691
Updates: 1011, 1122, 5961
Category: Standards Track
ISSN: 2070-1721
Transmission Control Protocol (TCP)
Abstract
This document specifies the Transmission Control Protocol (TCP). TCP
is an important transport-layer protocol in the Internet protocol
stack, and it has continuously evolved over decades of use and growth
of the Internet. Over this time, a number of changes have been made
to TCP as it was specified in RFC 793, though these have only been
documented in a piecemeal fashion. This document collects and brings
those changes together with the protocol specification from RFC 793.
This document obsoletes RFC 793, as well as RFCs 879, 2873, 6093,
6429, 6528, and 6691 that updated parts of RFC 793. It updates RFCs
1011 and 1122, and it should be considered as a replacement for the
portions of those documents dealing with TCP requirements. It also
updates RFC 5961 by adding a small clarification in reset handling
while in the SYN-RECEIVED state. The TCP header control bits from
RFC 793 have also been updated based on RFC 3168.
Status of This Memo
This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9293.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Revised BSD License text as described in Section 4.e of the
Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Table of Contents
1. Purpose and Scope
2. Introduction
2.1. Requirements Language
2.2. Key TCP Concepts
3. Functional Specification
3.1. Header Format
3.2. Specific Option Definitions
3.2.1. Other Common Options
3.2.2. Experimental TCP Options
3.3. TCP Terminology Overview
3.3.1. Key Connection State Variables
3.3.2. State Machine Overview
3.4. Sequence Numbers
3.4.1. Initial Sequence Number Selection
3.4.2. Knowing When to Keep Quiet
3.4.3. The TCP Quiet Time Concept
3.5. Establishing a Connection
3.5.1. Half-Open Connections and Other Anomalies
3.5.2. Reset Generation
3.5.3. Reset Processing
3.6. Closing a Connection
3.6.1. Half-Closed Connections
3.7. Segmentation
3.7.1. Maximum Segment Size Option
3.7.2. Path MTU Discovery
3.7.3. Interfaces with Variable MTU Values
3.7.4. Nagle Algorithm
3.7.5. IPv6 Jumbograms
3.8. Data Communication
3.8.1. Retransmission Timeout
3.8.2. TCP Congestion Control
3.8.3. TCP Connection Failures
3.8.4. TCP Keep-Alives
3.8.5. The Communication of Urgent Information
3.8.6. Managing the Window
3.9. Interfaces
3.9.1. User/TCP Interface
3.9.2. TCP/Lower-Level Interface
3.10. Event Processing
3.10.1. OPEN Call
3.10.2. SEND Call
3.10.3. RECEIVE Call
3.10.4. CLOSE Call
3.10.5. ABORT Call
3.10.6. STATUS Call
3.10.7. SEGMENT ARRIVES
3.10.8. Timeouts
4. Glossary
5. Changes from RFC 793
6. IANA Considerations
7. Security and Privacy Considerations
8. References
8.1. Normative References
8.2. Informative References
Appendix A. Other Implementation Notes
A.1. IP Security Compartment and Precedence
A.1.1. Precedence
A.1.2. MLS Systems
A.2. Sequence Number Validation
A.3. Nagle Modification
A.4. Low Watermark Settings
Appendix B. TCP Requirement Summary
Acknowledgments
Author's Address
1. Purpose and Scope
In 1981, RFC 793 [16] was released, documenting the Transmission
Control Protocol (TCP) and replacing earlier published specifications
for TCP.
Since then, TCP has been widely implemented, and it has been used as
a transport protocol for numerous applications on the Internet.
For several decades, RFC 793 plus a number of other documents have
combined to serve as the core specification for TCP [49]. Over time,
a number of errata have been filed against RFC 793. There have also
been deficiencies found and resolved in security, performance, and
many other aspects. The number of enhancements has grown over time
across many separate documents. These were never accumulated
together into a comprehensive update to the base specification.
The purpose of this document is to bring together all of the IETF
Standards Track changes and other clarifications that have been made
to the base TCP functional specification (RFC 793) and to unify them
into an updated version of the specification.
Some companion documents are referenced for important algorithms that
are used by TCP (e.g., for congestion control) but have not been
completely included in this document. This is a conscious choice, as
this base specification can be used with multiple additional
algorithms that are developed and incorporated separately. This
document focuses on the common basis that all TCP implementations
must support in order to interoperate. Since some additional TCP
features have become quite complicated themselves (e.g., advanced
loss recovery and congestion control), future companion documents may
attempt to similarly bring these together.
In addition to the protocol specification that describes the TCP
segment format, generation, and processing rules that are to be
implemented in code, RFC 793 and other updates also contain
informative and descriptive text for readers to understand aspects of
the protocol design and operation. This document does not attempt to
alter or update this informative text and is focused only on updating
the normative protocol specification. This document preserves
references to the documentation containing the important explanations
and rationale, where appropriate.
This document is intended to be useful both in checking existing TCP
implementations for conformance purposes, as well as in writing new
implementations.
2. Introduction
RFC 793 contains a discussion of the TCP design goals and provides
examples of its operation, including examples of connection
establishment, connection termination, and packet retransmission to
repair losses.
This document describes the basic functionality expected in modern
TCP implementations and replaces the protocol specification in RFC
793. It does not replicate or attempt to update the introduction and
philosophy content in Sections 1 and 2 of RFC 793. Other documents
are referenced to provide explanations of the theory of operation,
rationale, and detailed discussion of design decisions. This
document only focuses on the normative behavior of the protocol.
The "TCP Roadmap" [49] provides a more extensive guide to the RFCs
that define TCP and describe various important algorithms. The TCP
Roadmap contains sections on strongly encouraged enhancements that
improve performance and other aspects of TCP beyond the basic
operation specified in this document. As one example, implementing
congestion control (e.g., [8]) is a TCP requirement, but it is a
complex topic on its own and not described in detail in this
document, as there are many options and possibilities that do not
impact basic interoperability. Similarly, most TCP implementations
today include the high-performance extensions in [47], but these are
not strictly required or discussed in this document. Multipath
considerations for TCP are also specified separately in [59].
A list of changes from RFC 793 is contained in Section 5.
2.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [3] [12] when, and only when, they appear in all capitals, as
shown here.
Each use of RFC 2119 keywords in the document is individually labeled
and referenced in Appendix B, which summarizes implementation
requirements.
Sentences using "MUST" are labeled as "MUST-X" with X being a numeric
identifier enabling the requirement to be located easily when
referenced from Appendix B.
Similarly, sentences using "SHOULD" are labeled with "SHLD-X", "MAY"
with "MAY-X", and "RECOMMENDED" with "REC-X".
For the purposes of this labeling, "SHOULD NOT" and "MUST NOT" are
labeled the same as "SHOULD" and "MUST" instances.
2.2. Key TCP Concepts
TCP provides a reliable, in-order, byte-stream service to
applications.
The application byte-stream is conveyed over the network via TCP
segments, with each TCP segment sent as an Internet Protocol (IP)
datagram.
TCP reliability consists of detecting packet losses (via sequence
numbers) and errors (via per-segment checksums), as well as
correction via retransmission.
TCP supports unicast delivery of data. There are anycast
applications that can successfully use TCP without modifications,
though there is some risk of instability due to changes of lower-
layer forwarding behavior [46].
TCP is connection oriented, though it does not inherently include a
liveness detection capability.
Data flow is supported bidirectionally over TCP connections, though
applications are free to send data only unidirectionally, if they so
choose.
TCP uses port numbers to identify application services and to
multiplex distinct flows between hosts.
A more detailed description of TCP features compared to other
transport protocols can be found in Section 3.1 of [52]. Further
description of the motivations for developing TCP and its role in the
Internet protocol stack can be found in Section 2 of [16] and earlier
versions of the TCP specification.
3. Functional Specification
3.1. Header Format
TCP segments are sent as internet datagrams. The Internet Protocol
(IP) header carries several information fields, including the source
and destination host addresses [1] [13]. A TCP header follows the IP
headers, supplying information specific to TCP. This division allows
for the existence of host-level protocols other than TCP. In the
early development of the Internet suite of protocols, the IP header
fields had been a part of TCP.
This document describes TCP, which uses TCP headers.
A TCP header, followed by any user data in the segment, is formatted
as follows, using the style from [66]:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |C|E|U|A|P|R|S|F| |
| Offset| Rsrvd |W|C|R|C|S|S|Y|I| Window |
| | |R|E|G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| [Options] |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :
: Data :
: |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Note that one tick mark represents one bit position.
Figure 1: TCP Header Format
where:
Source Port: 16 bits
The source port number.
Destination Port: 16 bits
The destination port number.
Sequence Number: 32 bits
The sequence number of the first data octet in this segment (except
when the SYN flag is set). If SYN is set, the sequence number is
the initial sequence number (ISN) and the first data octet is
ISN+1.
Acknowledgment Number: 32 bits
If the ACK control bit is set, this field contains the value of the
next sequence number the sender of the segment is expecting to
receive. Once a connection is established, this is always sent.
Data Offset (DOffset): 4 bits
The number of 32-bit words in the TCP header. This indicates where
the data begins. The TCP header (even one including options) is an
integer multiple of 32 bits long.
Reserved (Rsrvd): 4 bits
A set of control bits reserved for future use. Must be zero in
generated segments and must be ignored in received segments if the
corresponding future features are not implemented by the sending or
receiving host.
Control bits: The control bits are also known as "flags".
Assignment is managed by IANA from the "TCP Header Flags" registry
[62]. The currently assigned control bits are CWR, ECE, URG, ACK,
PSH, RST, SYN, and FIN.
CWR: 1 bit
Congestion Window Reduced (see [6]).
ECE: 1 bit
ECN-Echo (see [6]).
URG: 1 bit
Urgent pointer field is significant.
ACK: 1 bit
Acknowledgment field is significant.
PSH: 1 bit
Push function (see the Send Call description in Section 3.9.1).
RST: 1 bit
Reset the connection.
SYN: 1 bit
Synchronize sequence numbers.
FIN: 1 bit
No more data from sender.
Window: 16 bits
The number of data octets beginning with the one indicated in the
acknowledgment field that the sender of this segment is willing to
accept. The value is shifted when the window scaling extension is
used [47].
The window size MUST be treated as an unsigned number, or else
large window sizes will appear like negative windows and TCP will
not work (MUST-1). It is RECOMMENDED that implementations will
reserve 32-bit fields for the send and receive window sizes in the
connection record and do all window computations with 32 bits (REC-
1).
Checksum: 16 bits
The checksum field is the 16-bit ones' complement of the ones'
complement sum of all 16-bit words in the header and text. The
checksum computation needs to ensure the 16-bit alignment of the
data being summed. If a segment contains an odd number of header
and text octets, alignment can be achieved by padding the last
octet with zeros on its right to form a 16-bit word for checksum
purposes. The pad is not transmitted as part of the segment.
While computing the checksum, the checksum field itself is replaced
with zeros.
The checksum also covers a pseudo-header (Figure 2) conceptually
prefixed to the TCP header. The pseudo-header is 96 bits for IPv4
and 320 bits for IPv6. Including the pseudo-header in the checksum
gives the TCP connection protection against misrouted segments.
This information is carried in IP headers and is transferred across
the TCP/network interface in the arguments or results of calls by
the TCP implementation on the IP layer.
+--------+--------+--------+--------+
| Source Address |
+--------+--------+--------+--------+
| Destination Address |
+--------+--------+--------+--------+
| zero | PTCL | TCP Length |
+--------+--------+--------+--------+
Figure 2: IPv4 Pseudo-header
Pseudo-header components for IPv4:
Source Address: the IPv4 source address in network byte order
Destination Address: the IPv4 destination address in network
byte order
zero: bits set to zero
PTCL: the protocol number from the IP header
TCP Length: the TCP header length plus the data length in octets
(this is not an explicitly transmitted quantity but is
computed), and it does not count the 12 octets of the pseudo-
header.
For IPv6, the pseudo-header is defined in Section 8.1 of RFC 8200
[13] and contains the IPv6 Source Address and Destination Address,
an Upper-Layer Packet Length (a 32-bit value otherwise equivalent
to TCP Length in the IPv4 pseudo-header), three bytes of zero
padding, and a Next Header value, which differs from the IPv6
header value if there are extension headers present between IPv6
and TCP.
The TCP checksum is never optional. The sender MUST generate it
(MUST-2) and the receiver MUST check it (MUST-3).
Urgent Pointer: 16 bits
This field communicates the current value of the urgent pointer as
a positive offset from the sequence number in this segment. The
urgent pointer points to the sequence number of the octet following
the urgent data. This field is only to be interpreted in segments
with the URG control bit set.
Options: [TCP Option]; size(Options) == (DOffset-5)*32; present only
when DOffset > 5. Note that this size expression also includes any
padding trailing the actual options present.
Options may occupy space at the end of the TCP header and are a
multiple of 8 bits in length. All options are included in the
checksum. An option may begin on any octet boundary. There are
two cases for the format of an option:
Case 1: A single octet of option-kind.
Case 2: An octet of option-kind (Kind), an octet of option-length,
and the actual option-data octets.
The option-length counts the two octets of option-kind and option-
length as well as the option-data octets.
Note that the list of options may be shorter than the Data Offset
field might imply. The content of the header beyond the End of
Option List Option MUST be header padding of zeros (MUST-69).
The list of all currently defined options is managed by IANA [62],
and each option is defined in other RFCs, as indicated there. That
set includes experimental options that can be extended to support
multiple concurrent usages [45].
A given TCP implementation can support any currently defined
options, but the following options MUST be supported (MUST-4 --
note Maximum Segment Size Option support is also part of MUST-14 in
Section 3.7.1):
+======+========+============================+
| Kind | Length | Meaning |
+======+========+============================+
| 0 | - | End of Option List Option. |
+------+--------+----------------------------+
| 1 | - | No-Operation. |
+------+--------+----------------------------+
| 2 | 4 | Maximum Segment Size. |
+------+--------+----------------------------+
Table 1: Mandatory Option Set
These options are specified in detail in Section 3.2.
A TCP implementation MUST be able to receive a TCP Option in any
segment (MUST-5).
A TCP implementation MUST (MUST-6) ignore without error any TCP
Option it does not implement, assuming that the option has a length
field. All TCP Options except End of Option List Option (EOL) and
No-Operation (NOP) MUST have length fields, including all future
options (MUST-68). TCP implementations MUST be prepared to handle
an illegal option length (e.g., zero); a suggested procedure is to
reset the connection and log the error cause (MUST-7).
Note: There is ongoing work to extend the space available for TCP
Options, such as [65].
Data: variable length
User data carried by the TCP segment.
3.2. Specific Option Definitions
A TCP Option, in the mandatory option set, is one of an End of Option
List Option, a No-Operation Option, or a Maximum Segment Size Option.
An End of Option List Option is formatted as follows:
0
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
| 0 |
+-+-+-+-+-+-+-+-+
where:
Kind: 1 byte; Kind == 0.
This option code indicates the end of the option list. This might
not coincide with the end of the TCP header according to the Data
Offset field. This is used at the end of all options, not the end
of each option, and need only be used if the end of the options
would not otherwise coincide with the end of the TCP header.
A No-Operation Option is formatted as follows:
0
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
| 1 |
+-+-+-+-+-+-+-+-+
where:
Kind: 1 byte; Kind == 1.
This option code can be used between options, for example, to align
the beginning of a subsequent option on a word boundary. There is
no guarantee that senders will use this option, so receivers MUST
be prepared to process options even if they do not begin on a word
boundary (MUST-64).
A Maximum Segment Size Option is formatted as follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 2 | Length | Maximum Segment Size (MSS) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
where:
Kind: 1 byte; Kind == 2.
If this option is present, then it communicates the maximum receive
segment size at the TCP endpoint that sends this segment. This
value is limited by the IP reassembly limit. This field may be
sent in the initial connection request (i.e., in segments with the
SYN control bit set) and MUST NOT be sent in other segments (MUST-
65). If this option is not used, any segment size is allowed. A
more complete description of this option is provided in
Section 3.7.1.
Length: 1 byte; Length == 4.
Length of the option in bytes.
Maximum Segment Size (MSS): 2 bytes.
The maximum receive segment size at the TCP endpoint that sends
this segment.
3.2.1. Other Common Options
Additional RFCs define some other commonly used options that are
recommended to implement for high performance but are not necessary
for basic TCP interoperability. These are the TCP Selective
Acknowledgment (SACK) Option [22] [26], TCP Timestamp (TS) Option
[47], and TCP Window Scale (WS) Option [47].
3.2.2. Experimental TCP Options
Experimental TCP Option values are defined in [30], and [45]
describes the current recommended usage for these experimental
values.
3.3. TCP Terminology Overview
This section includes an overview of key terms needed to understand
the detailed protocol operation in the rest of the document. There
is a glossary of terms in Section 4.
3.3.1. Key Connection State Variables
Before we can discuss the operation of the TCP implementation in
detail, we need to introduce some detailed terminology. The
maintenance of a TCP connection requires maintaining state for
several variables. We conceive of these variables being stored in a
connection record called a Transmission Control Block or TCB. Among
the variables stored in the TCB are the local and remote IP addresses
and port numbers, the IP security level, and compartment of the
connection (see Appendix A.1), pointers to the user's send and
receive buffers, pointers to the retransmit queue and to the current
segment. In addition, several variables relating to the send and
receive sequence numbers are stored in the TCB.
+==========+=====================================================+
| Variable | Description |
+==========+=====================================================+
| SND.UNA | send unacknowledged |
+----------+-----------------------------------------------------+
| SND.NXT | send next |
+----------+-----------------------------------------------------+
| SND.WND | send window |
+----------+-----------------------------------------------------+
| SND.UP | send urgent pointer |
+----------+-----------------------------------------------------+
| SND.WL1 | segment sequence number used for last window update |
+----------+-----------------------------------------------------+
| SND.WL2 | segment acknowledgment number used for last window |
| | update |
+----------+-----------------------------------------------------+
| ISS | initial send sequence number |
+----------+-----------------------------------------------------+
Table 2: Send Sequence Variables
+==========+=================================+
| Variable | Description |
+==========+=================================+
| RCV.NXT | receive next |
+----------+---------------------------------+
| RCV.WND | receive window |
+----------+---------------------------------+
| RCV.UP | receive urgent pointer |
+----------+---------------------------------+
| IRS | initial receive sequence number |
+----------+---------------------------------+
Table 3: Receive Sequence Variables
The following diagrams may help to relate some of these variables to
the sequence space.
1 2 3 4
----------|----------|----------|----------
SND.UNA SND.NXT SND.UNA
+SND.WND
1 - old sequence numbers that have been acknowledged
2 - sequence numbers of unacknowledged data
3 - sequence numbers allowed for new data transmission
4 - future sequence numbers that are not yet allowed
Figure 3: Send Sequence Space
The send window is the portion of the sequence space labeled 3 in
Figure 3.
1 2 3
----------|----------|----------
RCV.NXT RCV.NXT
+RCV.WND
1 - old sequence numbers that have been acknowledged
2 - sequence numbers allowed for new reception
3 - future sequence numbers that are not yet allowed
Figure 4: Receive Sequence Space
The receive window is the portion of the sequence space labeled 2 in
Figure 4.
There are also some variables used frequently in the discussion that
take their values from the fields of the current segment.
+==========+===============================+
| Variable | Description |
+==========+===============================+
| SEG.SEQ | segment sequence number |
+----------+-------------------------------+
| SEG.ACK | segment acknowledgment number |
+----------+-------------------------------+
| SEG.LEN | segment length |
+----------+-------------------------------+
| SEG.WND | segment window |
+----------+-------------------------------+
| SEG.UP | segment urgent pointer |
+----------+-------------------------------+
Table 4: Current Segment Variables
3.3.2. State Machine Overview
A connection progresses through a series of states during its
lifetime. The states are: LISTEN, SYN-SENT, SYN-RECEIVED,
ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK,
TIME-WAIT, and the fictional state CLOSED. CLOSED is fictional
because it represents the state when there is no TCB, and therefore,
no connection. Briefly the meanings of the states are:
LISTEN - represents waiting for a connection request from any remote
TCP peer and port.
SYN-SENT - represents waiting for a matching connection request
after having sent a connection request.
SYN-RECEIVED - represents waiting for a confirming connection
request acknowledgment after having both received and sent a
connection request.
ESTABLISHED - represents an open connection, data received can be
delivered to the user. The normal state for the data transfer
phase of the connection.
FIN-WAIT-1 - represents waiting for a connection termination request
from the remote TCP peer, or an acknowledgment of the connection
termination request previously sent.
FIN-WAIT-2 - represents waiting for a connection termination request
from the remote TCP peer.
CLOSE-WAIT - represents waiting for a connection termination request
from the local user.
CLOSING - represents waiting for a connection termination request
acknowledgment from the remote TCP peer.
LAST-ACK - represents waiting for an acknowledgment of the
connection termination request previously sent to the remote TCP
peer (this termination request sent to the remote TCP peer already
included an acknowledgment of the termination request sent from
the remote TCP peer).
TIME-WAIT - represents waiting for enough time to pass to be sure
the remote TCP peer received the acknowledgment of its connection
termination request and to avoid new connections being impacted by
delayed segments from previous connections.
CLOSED - represents no connection state at all.
A TCP connection progresses from one state to another in response to
events. The events are the user calls, OPEN, SEND, RECEIVE, CLOSE,
ABORT, and STATUS; the incoming segments, particularly those
containing the SYN, ACK, RST, and FIN flags; and timeouts.
The OPEN call specifies whether connection establishment is to be
actively pursued, or to be passively waited for.
A passive OPEN request means that the process wants to accept
incoming connection requests, in contrast to an active OPEN
attempting to initiate a connection.
The state diagram in Figure 5 illustrates only state changes,
together with the causing events and resulting actions, but addresses
neither error conditions nor actions that are not connected with
state changes. In a later section, more detail is offered with
respect to the reaction of the TCP implementation to events. Some
state names are abbreviated or hyphenated differently in the diagram
from how they appear elsewhere in the document.
NOTA BENE: This diagram is only a summary and must not be taken as
the total specification. Many details are not included.
+---------+ ---------\ active OPEN
| CLOSED | \ -----------
+---------+<---------\ \ create TCB
| ^ \ \ snd SYN
passive OPEN | | CLOSE \ \
------------ | | ---------- \ \
create TCB | | delete TCB \ \
V | \ \
rcv RST (note 1) +---------+ CLOSE | \
-------------------->| LISTEN | ---------- | |
/ +---------+ delete TCB | |
/ rcv SYN | | SEND | |
/ ----------- | | ------- | V
+--------+ snd SYN,ACK / \ snd SYN +--------+
| |<----------------- ------------------>| |
| SYN | rcv SYN | SYN |
| RCVD |<-----------------------------------------------| SENT |
| | snd SYN,ACK | |
| |------------------ -------------------| |
+--------+ rcv ACK of SYN \ / rcv SYN,ACK +--------+
| -------------- | | -----------
| x | | snd ACK
| V V
| CLOSE +---------+
| ------- | ESTAB |
| snd FIN +---------+
| CLOSE | | rcv FIN
V ------- | | -------
+---------+ snd FIN / \ snd ACK +---------+
| FIN |<---------------- ------------------>| CLOSE |
| WAIT-1 |------------------ | WAIT |
+---------+ rcv FIN \ +---------+
| rcv ACK of FIN ------- | CLOSE |
| -------------- snd ACK | ------- |
V x V snd FIN V
+---------+ +---------+ +---------+
|FINWAIT-2| | CLOSING | | LAST-ACK|
+---------+ +---------+ +---------+
| rcv ACK of FIN | rcv ACK of FIN |
| rcv FIN -------------- | Timeout=2MSL -------------- |
| ------- x V ------------ x V
\ snd ACK +---------+delete TCB +---------+
-------------------->|TIME-WAIT|------------------->| CLOSED |
+---------+ +---------+
Figure 5: TCP Connection State Diagram
The following notes apply to Figure 5:
Note 1: The transition from SYN-RECEIVED to LISTEN on receiving a
RST is conditional on having reached SYN-RECEIVED after a passive
OPEN.
Note 2: The figure omits a transition from FIN-WAIT-1 to TIME-WAIT
if a FIN is received and the local FIN is also acknowledged.
Note 3: A RST can be sent from any state with a corresponding
transition to TIME-WAIT (see [70] for rationale). These
transitions are not explicitly shown; otherwise, the diagram would
become very difficult to read. Similarly, receipt of a RST from
any state results in a transition to LISTEN or CLOSED, though this
is also omitted from the diagram for legibility.
3.4. Sequence Numbers
A fundamental notion in the design is that every octet of data sent
over a TCP connection has a sequence number. Since every octet is
sequenced, each of them can be acknowledged. The acknowledgment
mechanism employed is cumulative so that an acknowledgment of
sequence number X indicates that all octets up to but not including X
have been received. This mechanism allows for straightforward
duplicate detection in the presence of retransmission. The numbering
scheme of octets within a segment is as follows: the first data octet
immediately following the header is the lowest numbered, and the
following octets are numbered consecutively.
It is essential to remember that the actual sequence number space is
finite, though large. This space ranges from 0 to 2^32 - 1. Since
the space is finite, all arithmetic dealing with sequence numbers
must be performed modulo 2^32. This unsigned arithmetic preserves
the relationship of sequence numbers as they cycle from 2^32 - 1 to 0
again. There are some subtleties to computer modulo arithmetic, so
great care should be taken in programming the comparison of such
values. The symbol "=<" means "less than or equal" (modulo 2^32).
The typical kinds of sequence number comparisons that the TCP
implementation must perform include:
(a) Determining that an acknowledgment refers to some sequence
number sent but not yet acknowledged.
(b) Determining that all sequence numbers occupied by a segment have
been acknowledged (e.g., to remove the segment from a
retransmission queue).
(c) Determining that an incoming segment contains sequence numbers
that are expected (i.e., that the segment "overlaps" the receive
window).
In response to sending data, the TCP endpoint will receive
acknowledgments. The following comparisons are needed to process the
acknowledgments:
SND.UNA = oldest unacknowledged sequence number
SND.NXT = next sequence number to be sent
SEG.ACK = acknowledgment from the receiving TCP peer (next
sequence number expected by the receiving TCP peer)
SEG.SEQ = first sequence number of a segment
SEG.LEN = the number of octets occupied by the data in the segment
(counting SYN and FIN)
SEG.SEQ+SEG.LEN-1 = last sequence number of a segment
A new acknowledgment (called an "acceptable ack") is one for which
the inequality below holds:
SND.UNA < SEG.ACK =< SND.NXT
A segment on the retransmission queue is fully acknowledged if the
sum of its sequence number and length is less than or equal to the
acknowledgment value in the incoming segment.
When data is received, the following comparisons are needed:
RCV.NXT = next sequence number expected on an incoming segment,
and is the left or lower edge of the receive window
RCV.NXT+RCV.WND-1 = last sequence number expected on an incoming
segment, and is the right or upper edge of the receive window
SEG.SEQ = first sequence number occupied by the incoming segment
SEG.SEQ+SEG.LEN-1 = last sequence number occupied by the incoming
segment
A segment is judged to occupy a portion of valid receive sequence
space if
RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND
or
RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND
The first part of this test checks to see if the beginning of the
segment falls in the window, the second part of the test checks to
see if the end of the segment falls in the window; if the segment
passes either part of the test, it contains data in the window.
Actually, it is a little more complicated than this. Due to zero
windows and zero-length segments, we have four cases for the
acceptability of an incoming segment:
+=========+=========+======================================+
| Segment | Receive | Test |
| Length | Window | |
+=========+=========+======================================+
| 0 | 0 | SEG.SEQ = RCV.NXT |
+---------+---------+--------------------------------------+
| 0 | >0 | RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND |
+---------+---------+--------------------------------------+
| >0 | 0 | not acceptable |
+---------+---------+--------------------------------------+
| >0 | >0 | RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND |
| | | |
| | | or |
| | | |
| | | RCV.NXT =< SEG.SEQ+SEG.LEN-1 < |
| | | RCV.NXT+RCV.WND |
+---------+---------+--------------------------------------+
Table 5: Segment Acceptability Tests
Note that when the receive window is zero no segments should be
acceptable except ACK segments. Thus, it is possible for a TCP
implementation to maintain a zero receive window while transmitting
data and receiving ACKs. A TCP receiver MUST process the RST and URG
fields of all incoming segments, even when the receive window is zero
(MUST-66).
We have taken advantage of the numbering scheme to protect certain
control information as well. This is achieved by implicitly
including some control flags in the sequence space so they can be
retransmitted and acknowledged without confusion (i.e., one and only
one copy of the control will be acted upon). Control information is
not physically carried in the segment data space. Consequently, we
must adopt rules for implicitly assigning sequence numbers to
control. The SYN and FIN are the only controls requiring this
protection, and these controls are used only at connection opening
and closing. For sequence number purposes, the SYN is considered to
occur before the first actual data octet of the segment in which it
occurs, while the FIN is considered to occur after the last actual
data octet in a segment in which it occurs. The segment length
(SEG.LEN) includes both data and sequence space-occupying controls.
When a SYN is present, then SEG.SEQ is the sequence number of the
SYN.
3.4.1. Initial Sequence Number Selection
A connection is defined by a pair of sockets. Connections can be
reused. New instances of a connection will be referred to as
incarnations of the connection. The problem that arises from this is