Document: FSC-0065 Version: 001 Date: 02-Aug-1992 Type 3 ASCII: A proposal ========================= Mark Kimes FidoNet 1:380/16 Status of this document: This FSC suggests a proposed protocol for the FidoNet(r) community, and requests discussion and suggestions for improvements. Distribution of this document is unlimited. Fido and FidoNet are registered marks of Tom Jennings and Fido Software. Introduction: ============ This document describes a type of mail packet called type 3 ASCII. Type 3 ASCII was designed with how Fidonet Technology Networks (FTNs) handle mail (netmail, echomail, groupmail) in mind. It was also designed to allow new distribution methods to be introduced. For instance, it is possible to combine the best of echomail and groupmail methods using type 3 ASCII packets. Finally, type 3 ASCII provides reliability, space and speed advantages over the current mail packet type 2 (see "Type 3 ASCII vs. Type 2" section below). Packet structure: ================ (See "Definitions" section below for the meaning of any arcane symbols) Type 3 ASCII packets and archived bundles will ride existing transport services (mailers) as attached files. Type 2 mail and type 3 ASCII mail can both be sent to a node without conflicts. Naturally, the receiving node should be able to process type 3 ASCII mail before it is sent. Type 3 ASCII packets are named <.><3KT> when sent to a remote site. Archives containing type 3 packets are named <.><3?A> when sent to remote sites. How these files are stored or named locally is not within the scope of this document. A type 3 ASCII packet consists of a packet header, followed by a carriage return, followed by zero or more messages, followed by a NUL. A type 3 ASCII message consists of a message header, followed by a carriage return, followed by zero or more characters of message text, followed by a NUL. Diagramatically speaking, (Text in brackets [] indicates optional data) Type 3 ASCII packet: header [messagehdr1 [text] NUL messagehdr2 [text] NUL ... messagehdrn [text] NUL ] NUL Breakdown: ========= (See "Description of Fields" section below for information on individual fields.) Packet header: ============= <3ASCII> From [To] Creator [Password] [Area] [Tag1data1] [Tag2[data2]] ... [Tagn[datan]] Message header: ============== From [To] [Subject] Date [Area] ID [Ref] [Tag1data1] [Tag2[data2]] ... [Tagn[datan]] Message body: ============ Free-flowing, NUL-terminated text. May be composed of any combination of ASCII characters > 31 (from the space character, ASCII character 32, onward) and may include as a "paragraph terminator." Systems which display message text should wrap long lines to suit their application. To be in compliance with this document, implementations must be able to forward messages with at least 131,072 (128K) characters of text (including the terminating NUL). Network politics may outlaw messages of lesser size, but that is beyond the scope of this document. If a compliant implementation encounters a message longer than the 128K limit, it may truncate the message text before forwarding. However, since it is easy to support messages of a length limited only by available disk space, it is encouraged that you do so and not impose artificial restrictions. The purpose of this limit is to guarantee a minimum size that will be passed, _not_ to restrict implementations to the "minimum." Line feeds (ASCII character 10) are reserved and should not normally appear in message text. Future plans call for their use as "escape codes." So called "soft carriage returns" (ASCII character 141) should not be contained in transmitted message text unless the actual character itself is desired. Tabs (ASCII character 9) should not be used in message text as their use often leads to unreadable messages. How many spaces should be used at a remote site to represent them? Description of Fields: ===================== Note: the maximum length of any field line (excluding, of course, message text) is 255 characters including the terminating . In practice, a bit of restraint should be practiced to keep fields as small as possible. The maximum length of any header is 32767 bytes, including terminating . In practice, this limit should never be approached. Date: ==== YYYYMMDDhhmmss where YYYY = year with century, as in 1991 or 2001 MM = month, as in 01 to 12 DD = day of month, as in 01 or 28 hh = hour of day, as in 00 to 23 mm = minute of hour, as in 00 to 59 ss = second of minute, as in 00 to 59 = offset from GMT in 15 min. increments (i.e. "+4" (sans quotes) for GMT + one hour) All numbers are represented in decimal. Samples: 19990419143200 (April 19, 1999 at 2:32:00 pm) 19921223020303+8 (December 23, 1922 at 02:03:03 GMT + 2 hours) The Date field is required. From and To: =========== The From field contains the writer's name followed by a valid FTN network address. For the purposes of this document and current implementations of type 3 ASCII packets, the format of a valid FTN network address is: Domain<#>Zone<:>NetNode[<.>Point] where Domain is a text string from 1 to 8 characters in length containing only alphabetical [A-Za-z] and/or numerical [0-9] characters. Zone is a decimal number from 1 to 65533. Net is a decimal number from 1 to 65533. Node is a decimal number from 0 to 65533. Point is a decimal number from 0 to 65535 (may be omitted if 0). The FTSC or whatever body guards tech specs may change this definition in the future as it sees fit. The full format of a type 3 ASCII From or To field is: [User Name<@>]Domain<#>Zone<:>NetNode[<.>Point] If User Name<@> is missing, assume user name is Sysop. User Name may be composed of any combination of ASCII characters > 31 (from the space character, ASCII character 32, onward) excluding <@>. If <.>point is missing, assume point 0. The To field contains the recepient's name and address as above. The To field is optional. If it is missing, message/packet is broadcast mail (no definite, single recipient). In this case there must be an area (if the To field is omitted in the packet header, there must be an area in the packet header and all messages must be broadcast mail for that area. If omitted in the message header, the message or the packet must have an area and message may be displayed as being addressed to "All@Anywhere"). A must still be present as a "space holder." In broadcast mail, it is permissible to give only the name of the user (without following address) in a message header; however, the name must end with <@> (to distinguish it from an address with no User Name). Note this means a single broadcast mail packet can be sent to many nodes. The From field is required. In the case of From and To fields in the packet header, [user name<@>] is probably unimportant. In the interests of saving space, domains such as "Fidonet.org" should be replaced with just "Fidonet," as the ".org" modifier has no meaning to an FTN site. Domains should be treated case insensitively. Sample: John Doe@Fidonet#1:380/16 (User "John Doe" in domain "Fidonet" zone 1 net 380 node 16, implied point 0) Creator: ======= Name of the product that produced the packet. This field is required. Password: ======== A password to use for security. This field is optional. If omitted, a must still be present as a "space holder." How this field is used is implementation-defined. Subject: ======= The subject field should contain text hinting at the subject of the message text. It may be composed of any combination of ASCII characters > 31 (from the space character, ASCII character 32, onward). The subject field is optional. If omitted, a must still be present as a "space holder." Area: ==== Area fields consist of a string of alphanumeric characters plus space, "-" and "_" (ASCII characters 32, 45 and 95 respectively). Area fields are optional with the following consequences: If the area field in a packet header is missing, the messages in the packet will have area fields present for broadcast mail, omitted for personal mail. If the area field in a packet header is present, all the messages in the packet will be broadcast mail for the area specified in the packet header. The message area fields will not be present. When an area field is omitted, a must still be present as a "space holder." ID: == An ID consists of the originating address of the message plus a serial number, in the form: origaddrserialno The originating address should be specified in a form that constitutes a valid return address for the originating network. If the originating address is enclosed in double-quotes, the entire string between the beginning and ending double-quotes is considered to be the orginating address. A double-quote character within a quoted address is represented by by two consecutive double-quote characters. The serial number may be any eight character hexadecimal number, as long as it is unique - no two messages from a given system may have the same serial number within one year. The manner in which this serial number is generated is left to the implementor. Notes: The "old" format of Zone<:>NetNode[<.>Point][<@>Domain] for FTN addresses is allowed in this field. The address portion of the ID may be omitted if it is exactly the same as the From address (less User Name<@>). In this case, the ID field should begin with a space followed immediately by the serial number. In the case of foreign network addresses, this address gives you the "true" origin, and the From address gives you the gateway at which the message entered FTN territory. This allows you to gate replies to "foreign" sites. Samples: some.other.net.addr ABCDEF12 12345ABC (Assume From field of second sample contained "Joe Blow@Fidonet#1:380/16",so complete constructed ID would be Fidonet#1:380/16 12345ABC Note address would be copied exactly from the From field.) The ID field is required. Ref: === A Ref consists of the ID of the original message to which this message refers (usually as a reply). Sample: Fidonet#1:380/16 12345ABC (would reference the second ID sample above) The reference field is optional. If omitted, a must still be present as a "place holder." TagData: =========== The tag+data lines are type 3 ASCII's method of automatically expanding its headers. A tag consists of a sequence of uppercase alphabetic (A-Z inclusive) and/or numeric sequence of characters and possibly a hyphen (ASCII character 45) and/or underline (ASCII character 95), up to 12 characters in length (a name). A tag name can be followed optionally by a space (ASCII character 32) and data. Data may be composed of any combination of ASCII characters > 31 (from the space character, ASCII character 32, onward). To aid in developer experimentation with tags in type 3 ASCII, it is guaranteed that the FTSC or whatever body guards tech specs will never "canonize" a tag beginning with the two characters "X-" (ASCII character 88 followed immediately by ASCII character 45). Thus, tags may use this combination before tag names to guarantee uniqueness. Experimental tags may be stripped by conforming implementations during message passthrough. This helps prevent experimental tags from escaping from test sites. Samples (tag names are invented): FOLLOW AFILEN.AME X-TAG SOMEDATA LONETAG Tagdata fields are optional and may be completely omitted when creating a packet. Exception: all tagdata fields except, possibly, experimental fields, should be passed through with a message being forwarded. Predefined tags: =============== Tag Where Data Meaning --- ----- ---- ------- PRIV Msg Hdr None Message is private FOROK Pkt Hdr None Packet may be forwarded without unpacking -- all messages are to the To: address in the packet header Type 3 ASCII vs. Type 2: ======================= Type 3 ASCII saves between 6% to 11% in raw packet size over type 2 (using Tiny Seenbys with the type 2 packets to make the test as fair as possible), depending on how area tags for echos are used in the type 3 ASCII packet (in packet header vs. message headers). 7% smaller would be the norm for the way we do echomail business now. The tests conducted were most unscientific but should be close to everyday echomail-oriented reality. Compressed packets are a slightly different story. Type 3 ASCII compresses the same as type 2 when using area tags for echos in the message headers. Type 2 compresses approximately 2.5% better when area tags are used in the type 3 ASCII packet headers instead. Either way, compressed type 3 ASCII packets are smaller than comparable type 2 packets due to the smaller raw packet size. Even compression ratios would be the norm for the way we do echomail business now. Type 3 ASCII imports between 2% to 5% faster (depending on algorithms used). There is no discernable difference on export. Keep in mind that this particular test has too many variables (software, hardware, relative efficiency of code, etc.) to be considered a real benchmark. Most of the speed savings is in not having to process SEEN-BY and PATH lines. The lack of end-of-text control information is a real boon. Type 2 has no method for reliably obtaining the full 5-D origin address of a message. Type 3 ASCII provides a reliable method of obtaining full origin address information for both the true origin (in whatever network) and the gateway which brought the message into FTN territory (if from a foreign network). This means that even if a message originated in a network with which your software has no idea how to communicate, you can still send a reply to an FTN node for gating. Type 2 has no reliable method for stopping dupes. Type 3 ASCII has a mandatory ID field, very similar to type 2's optional MSGID, which can be used for reliable dupe checking. Type 2 echomail has control information scattered throughout the message body, including SEEN-BY and PATH information at the end of the message. This causes problems for developers, who often opt for fixed-length buffers and arbitrary message length limits. All control information for Type 3 ASCII is in the extensible message header. Moreover, type 3 ASCII has generous set limits to which programmers can work, and which users can therefore rely on. Definitions: =========== Except where noted otherwise, numbers are in decimal. Although the ASCII character set is normally defined as being limited to characters from 0 to 127, this document acknowledges the existence of an eighth bit in most bytes and uses the term (loosely) to mean characters from 0-255. Network politics may or may not "outlaw" the use of some of those bytes; that is outside the scope of this document. Note: text in brackets [] indicates an optional field. See "Definitions" section below for meaning of text in <>. See "Description of Fields" section below for information on individual fields. Alphabetic: ========== A-Z and a-z, ASCII characters 65 to 90 and 97 to 122 inclusive. Numeric: ======= 0-9, ASCII characters 48 to 57 inclusive. Alphanumeric: ============ All characters alphabetic and numeric. Hexadecimal: =========== 0-9 and A-F (or a-f), ASCII characters 48 to 57 and 65 to 70 (or 97 to 102) inclusive. NUL: === ASCII character 0. : ==== Carriage return, ASCII character 13. : ==== Line feed, ASCII character 10. : ==== Space, ASCII character 32. <@>: === @, ASCII character 64. <#>: === #, ASCII character 35. <:>: === :, ASCII character 58. : === /, ASCII character 47. <.>: === ., ASCII character 46. <3ASCII>: ======== The literal string "3ASCII" (not including quotation marks). This text, followed by a , identifies a type 3 ASCII packet. Implementations should *not* processes a file unless this identifier is found on the first line, but should probably log the occurrence. : ========== Eight alphanumeric characters that serve as the "root" of a filename. <3KT>: ===== The literal string "3KT" (not including quotation marks). <3?A>: ===== The literal string "3?A" (not including quotation marks) with the question mark (?) being replaced by a decimal integer from 0 to 9 (ASCII 48 to 57 inclusive). Miscellaneous notes: =================== jim nutt invented MSGIDs and REPLYids (ref. FTS-0009), which were lifted very nearly whole to become IDs and Refs in this document. Tom Jennings invented Fido and Fidonet from whole cloth and RAM chips. NET_DEV's continual foolishness inspired me to do instead of whine. Let's see if this cuts down on the whining... -end- Mark Kimes 1:380/16.0@Fidonet (318)222-3455 data 542 Merrick Shreveport, LA, USA 71104