| Document: FSC-0084 | Version: 001 | Date: 03 September 1995 | | Denis Bider, FidoNet#2:380/129.0 /* Document: Electronic Data Exchange standard level 1 File: EDX1.TXT Purpose: a straight-forward data exchange standard with space to expand Author: denis bider, ofs->FidoNet#2:380/129.0 Copyright (C) 1994-1995 by denis bider. See DISCLAIM.TXT. Send *any* comments to one of my addresses as listed above. ======================================================================== Introduction ======================================================================== After a year of development and all sorts of improvements, EDX finally achieved the state where it has nearly everything currently wanted from a mail format. And finally, it is being released into the general public. My opinion is that it was well worth the waiting; anyway, this is up to you to decide. EDX is meant as a standard for electronic cumputer networks that exchange messages, files and similar data. What it does is to redesign all the existing chaos from the beginning and try not to do the same mistakes other similar standards did. It does its own work, others do their. It is not necessary that EDX is better than other such standards. It might also be the worst of all. This document will try to convince you about neither. It will simply describe the standard from the beginning to the end. Due to my relatively poor English, I may not succeed in the "easy to understand" part, but well, you'll just have to get along with it. Please mail me all comments you might have. ====================================================================== Notes, definitions ====================================================================== Null: ASCII 0 CR: Carriage Return (Enter) - ASCII 13 a long: a 32-bit (4-byte) signed value. an int: a 16-bit (2-byte) signed value. a char(acter): an 8-bit (1-byte) value. a ulong: an unsigned long. a uint: an unsigned int. A subfield: a various-length data field most commonly used in other data fields. Consists of a subfield ID (an uint), a subfield data length ("datlen") identifier (an ulong) and bytes of data. Ie: ulong datlen ulong ID char data[datlen] 0x value in hexadecimal (base 16). Lowercase: When a string or character is said to be "lowercase", that means that any characters between and including ASCII 'A'..'Z' are represented as their 'a'..'z' counterpart. Conversion applies to *no other characters in any national alphabets*. * All mentioned CRCs are, as in Zmodem, 0xffffffff based * All multi-byte items (words, longs) mentioned are expressed in Intel format, which means least significant bytes (LSB) being presented first. (Eg, 0xff11 should be presented as 0x11 0xff) ====================================================================== Views ====================================================================== The network ============= My opinion is that the most basic set of layers to which all computer network technologies can be divided to contains the following: 1: Physical point-to-point connection layer 2: Physical network layer 3: Logical point-to-point connection layer 4: Logical network layer Let's explain that on the example of Fidonet, a typical over-the-phone network technology. In this case, the physical point-to-point connections are telephone wires; the physical network is all those point-to-point connections combined; the logical point-to-point connections are modem dial-up connections; and the logical network are, roughly, all those point-to-point connections combined. The similar applies for, say, Internet telnet feature: the physical point-to-point connections are the low-level connections between Internet-connected computers, the physical network are all these combined, and the logical connection is the telnet feature itself. There is, of course, no logical network layer. And similarly for a connection to a local BBS. EDX is a standard that defines the fourth, logical network layer. A "Recommendations" chapter is provided in which a sample interaction between the fourth and the third network layer is defined; however, that chapter should not be treated as a part of EDX itself. The site ========== In everyday practice, I encounter many inconsistencies in how systems are generally treated. Often, one says "BBS" meaning "mail system", or meaning the entire site at all. So let's define these terms. 1. The site is all the hardware, software and peopleware, and is often referred to as "system". 2. The mail system is the part of the site that deals with networks, with "external relations". If you're in an OFT network and run SomeScan in combination with OtherMail, these two programs are your mail system from the viewpoint of the network you're in. 3. The BBS is the part of the site that deals with human callers, and has nothing to do with the part of the site called the "mail system", except that the parts can and usually do exchange data (messages, files). My opinion is that the mail system and the BBS part of a site should be kept separated, but often that is not the case. Take QWK networks for example, where not only the two concepts are totally mixed up, but networks also not so rarely mess with things that are none of their bussiness; a network as an organization should care about the systems, not about the BBSes or even the entire sites, but that is the mistake often done. The points ============ In networks like FidoNet, a user often installs mailing software and becomes what is called "a point". A point system is, in EDX, treated as any other system. Indeed, actually *every* system is a point system, it's only that those systems that are talked about as "nodes" have a point number of zero. See below for a disclaimer in which you will read that in EDX, if OFT addresses are used, all fields must be present, zero or not. Therefore, when an application receives or sends mail from/to a point, the "point" system must be treated as any other system. In EDX terms, points are full-fledged systems and that is exactly how they must be treated; they are included in SENTTO and TRACE subfields, as well. The limitations of a point being able to be linked to a single system (ie, what was in former organization called "a boss") is gone and buried; as said, EDX does not distinguish point systems from any other type of systems. Any differences in point-system-treatment in the other parts of a network do not affect how EDX treates them. ======================================================================== Addresses in EDX ======================================================================== EDX uses E-Addressing for maximum compatibility with various addressing systems and to allow independability from the addressing scheme as used by the underlying network. However, only and exclusively site E-Addresses are used in EDX; usage of a user E-Address in any field of an EDX message is considered a violation of the specifications. The general format of a site E-Addresses is: "->" specifies the format of the field. An E-Address is assumed not to contain any whitespace. E-Addresses can or cannot be case sensitive, depending on the contents of the field; for that matter, when passing E-Addresses, the its case should be left untouched. For now, all known types of E-Addresses are case INsensitive. The following formats are recognized: Format identifier: "ofs" (Traditional FTN style) format: "#" ":" "/" "." Example addresses: ofs->FidoNet#2:380/129.0 ALL ADDRESS COMPONENTS ARE REQUIRED. NO EXCEPTIONS. Format identifier: "itn" (Internet e-mail style) format: {"." } Example addresses: itn->f129.n380.z2.fidonet.org itn->ixtas.fer.uni-lj.si All format identifiers are and will be three characters in length. ======================================================================== The logical network layer ======================================================================== This chapter describes the logical network layer that is independent of the lower layers. One of the ways how to actually pass what is defined in this chapter from one system to another is described in the Recommendations chapter. The reason for such separation is that EDX is a layer 4 protocol definition exclusively, and does not want to mix with other network layers; ie., a network must by itself choose or define the layer 3, 2 and 1 protocols it is going to use with EDX. However, in order to standardize EDX-related matters, a chapter with some recommendations is provided towards the end of the document. The idea of the mentioned independent part of the logical network layer is similar to the way in which messages are stored in the JAM message base format; each message consists of a binary header for fixed-length data and an arbitrary number of subfields that contain other, variable- length data. An EDX subfield consists of, as lined out in the Notes section, a datlen identifier, an ID and data. Subfields with an unknown ID should be left untouched when exported to other systems. ======================================================================== The message ======================================================================== EDX messages differ a little from other network types' messages: in EDX, messages need not consist of text only, or of text at all; a message can have more than one receiver. True crossposting and other goodies ======================================================================== For quite a while at first, true crossposting (a single physical message belonging to more than one echo) was a part of the EDX specifications. However, it is my opinion that, in the current state of things, it would cause much more problems than it would solve, so this "feature" has been removed. Formerly present, but removed for the same reason have been Utypia-style ROUTE directions. Message header ======================================================================== The binary message header layout follows: char signature[8] // Must match <_> uint hdrlen // The size of the header int utcoffset // UTC offset, *signed*; see timestamp ulong timestamp // Local time of message's creation ulong subflen // Length of the subfields that follow ulong attribute1 // Message attributes ulong seqno // Message's sequential number hdrlen specifies the size of the header, from and including the first byte of the signature field to and including the last byte of the last present field. Used mainly to ensure downward compatibility for hypothetical EDX levels higher than 1. Should an application encounter hdrlen higher than it supports, it should only process fields up to what it supports and skip the others. Should it encounter hdrlen lower than it supports, it should only process fields up to bytes. Note that the hdrlen field cannot be just arbitrarily picked! When creating a header, always include the whole contents of the highest header revision you support; otherwise, it is perfectly allright for a processing application to dismiss the message in its entirety. timestamp contains the local date and time when the message has been written, or if that information isn't available, when it joined network flow. It is expressed as the number of seconds elapsed since 00:00:00, January 1st 1970; the time should be (= must be) represented in UTC. The UTC offset of the site that generated timestamp as described above is stored in the utcoffset field. Eg: if the UTC offset is -0230, the utcoffset field should read, simply, -230; +0200 => 200; and so forth. The seqno field is the message's sequential number. For each area an EDX system is linked to, it maintains the number of messages it exported from that area. When the next message is exported, that number is incremented by 1 and is also assigned to the message as its serial number. The main use of this serial number is that one can quickly see if they received all the messages from a particular system in a particular area, and if they didn't, messages are getting lost somewhere. This serial number might also be used as means of dupe-link detection, but however, if the serial numbers of two messages don't match, one of them can still be a dupe of the other; the system might have exported the message twice. Therefore, you should stick to the msgid header field for duplicate message checking; the serial numbers of duplicate messages can be used to determine the cause of duplication. Message attributes ======================================================================== The following bits for attribute1 are defined: HasFiles 0x01L The message has files attached IsReply 0x02L The message is a reply ReceiptRq 0x04L (netmail messages only) A return receipt should be generated for the message when it is received by the destination system. ConfirmRq 0x08L (netmail only) A return receipt should be generated for the message when it is read by each of its addressees. IsReceipt 0x10L (netmail only) The message is a return receipt. Echoed 0x20L If set, the message contains an ECHO subfield. If not set, the message contains a DEST subfield. Other bits should be set to 0. IsReceipt cannot be set in combination with ReceiptRq and/or ConfirmRq. Subfields ======================================================================== A short list of subfields and their IDs: DEST (0), ORIGIN (1), AUTHOR (2), ECHO (3), WHOTO (4), TRACE (5), CHARSET (6), SUBJECT (7), CREATOR (8), EXPORTER (9), SENTTO (10), MSGID (11), REPLYID (12), TEXT (1000), FILE (1001) Each subfield is an independent unit on itself. However, for the sake of easier producing of simpler and more readable EDX handling code, two major types of subfields are recognized, "simple" and "complex". The "simple" subfields are simply subfields that have a maximum lenght of 100 characters. They usually contain a stream of textual characters. Please note that if a simple subfield contains text, it is *not* null-terminated. Its length is to be determined by the "datlen" identifier in the subfield header. As said, the maximum length for simple subfields is 100 characters; all data beyond the 100th character can be ignored. Simple subfields have IDs ranging within 0..999. The "complex" subfields are all other subfields. Their maximum size and other attributes are specific for each of them. Their IDs range from 1000 on. Note: read what subfield descriptions say. If, for example, the Presence field says "exactly one", that means that *exactly one* subfield of this type should be inserted in the message, no more, no less. The same applies for other fields and as well to everything else in the document. SUBFIELD: DEST (simple) ID: 0 Presence: Either one DEST subfield or one ECHO subfield The DEST subfield stores the address of the system to route the message to. It is up to the systems that are passing the message to decide if and how to actually route the message there. For historical reasons, messages with a DEST subfield are called "netmail". Messages with an ECHO subfield are called "echomail". A netmail message is considered private between its authors and its addressees. SUBFIELD: ORIGIN (simple) ID: 1 Presence: Exactly one Contains: * the E-Address of the system that generated the message * a NULL character * the name of the person that wrote the message Gating: see Origin supplementary line. Also, as opposed to, for example, FidoNet, the gating system does not insert its own address in the ORIGINADDRESS subfield when a message is gated to EDX, but instead converts the original origination address to E-Address format and puts it here. The address of the gating system itself is stored as a part of a gated TRACE subfield. (See TRACE subfield) SUBFIELD: AUTHOR (simple) ID: 2 Presence: Zero or more Format of contents: * the E-Address of the system where the person can be reached * a NULL character * the name of the person Each AUTHOR subfield lists one of the message's authors if there are more than one or if the message's author is not the message's physical sender. All message's authors should be listed, any of them "residing" in the ORIGIN subfield or not. Gating to network formats that only support sender name (like QWK or OFT): use Author supplementary lines. SUBFIELD: ECHO (simple) ID: 3 Presence: Either an ECHO subfield or a DEST subfield The subfield specifies the name of the echo area to which the message has been posted. The contents of the ECHO subfield should be treated case insensitive. For the echo area name, all characters between from ASCII 33 to 126 are allowed, with the exception that '-', '+' and '%' must not be the first characters of the area name and that '*' and '?' must not be present at all. If there is no DEST or ECHO subfield in a message, the message should be shown to the sysop and its distribution among systems stopped. An echoed message is considered public. SUBFIELD: WHOTO (simple) ID: 4 Presence: Zero or more Each WHOTO subfield specifies a name of a person whose attention should be drawn to the message. The WHOTO subfield is, by its function, very much the same as To: lines in FidoNet and similar networks, except that EDX allows more than one message's addressee. (.. by allowing multiple WHOTO subfields to be present) If an WHOTO subfield is not present in a message with an ECHO subfield, the message should be assumed of equal importance to everybody. (Ie, the same as "To: All" in the analogy above) If no WHOTO subfield is present in a message with a DEST subfield, the message is assumed to be addressed to the operator of the system it is destined to. Gating to networks that don't support as many message addressees as the gated message has: use Whoto supplementary lines. SUBFIELD: TRACE (simple) ID: 5 Presence: Exactly one There are three formats for a TRACE subfield, "prevnet", "gated" and "native". The gated and prevnet formats are used only when converting a message from a parallel format to EDX and should not be used otherwise. The prevnet format reads: "<= " It is used to store TRACE information of the previous network. The gated format reads: "++