unmdplyr's blog

A case against text protocols

Human-computer interaction is very different from computer-computer interaction. For a human to communicate with a computer, English-like commands on text shell or graphical user interfaces that uses point-and-click model works great. But once that ends and when two or more computers have to interact, text protocols may not be all that helpful. Yet, text based computer-to-computer protocols are extremely common and often seemingly ubiquitous even in places where it's impossible for an average human to interact.

Here are some shaky arguments you'll come across on the internet that glorifies text based protocols.

  • Simpler to use text
  • You can type out messages
  • It's easy to debug
  • Parsing
  • Extensibility
  • Error recovery

Argument: Simpler to use text

...or any variations of it.

To be clear, historically computers had different bit widths ranging anywhere from 1 to 48 bits per byte between vendors and models. Even text itself was encoded differently on different computers from BCD, IBM, ISO, and 100s more. If you have any latest Linux or BSD computer, type iconv -l to list some popular text encoding formats. ASCII and UTF-8/16/32 eventually fixed the encoding problems only recently in late 80s and early 90s. Text was never as portable as it's believed to be.

It was only somewhere in early 90s did ISO/IEC 2382-1:1993 came up suggesting power of 2 for measuring data widths -- fixing byte as 8 bits. And now nearly almost all CPUs in use today represent numbers in two's complement making binary messages a simpler alternative over text formats.

Argument: You can type out messages

Surely, it's not like you will open a terminal on your cell-phone and then type in SIP like this to call your friend.

INVITE sip:1001@10.0.0.1:2780;transport=udp SIP/2.0
Via: SIP/2.0/UDP 10.0.1.12:5060;rport
From: <sip:2001@10.0.1.12>;tag=a1b2c3d4e5
To: "1001"<sip:1001@10.0.1.12>;tag=z1y2x3w4v5
Contact: <sip:2001@10.0.1.12>
Call-ID: OWYwZTg2NDRkOTZjYjc4NjUzYTE1ZGY5ZGY3ZGVkMmQ.
CSeq: 102 INVITE
User-Agent: Asterisk PBX
Max-Forwards: 70
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO
Supported: replaces
Content-Type: application/sdp
Content-Length: 210
.
v=0
o=root 6668 6669 IN IP4 10.0.0.2
s=session
c=IN IP4 10.0.0.2
t=0 0
m=audio 5004 RTP/AVP 8 101
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=sendrecv

People don't even dial an actual number any more these days. It's usually a name searched in contacts followed by the "Dial" button. Any argument around this usually derives from the idea that it's as easy as opening a socket and typing out a command. It maybe true with FTP or telnet, but not so much with HTTP or any modern day text based protocols.

Argument: Parsing/Easy to debug

...or any variations of it.

Without a visualiser (even a rudimentary one), the above message and the subsequent control-plane communication is not easy to debug. Sure you can view this on a terminal or edit this in your favourite text editor, but what next? How are you going to feed this into the call controller machinery without extensive debug openings into the state machine library?

As for parsing, take NTP for example,

struct ntp_ts {
    uint32_t secs;
    uint32_t frac;
};

struct ntp_packet {
    unsigned int li: 2;       /* Leap Indicator */
    unsigned int vn: 3;       /* Version number of the protocol */
    unsigned int mode: 3;     /* Client or server */

    uint8_t stratum;          /* Stratum level of the local clock */
    uint8_t poll;             /* Maximum interval between successive messages */
    uint8_t precision;        /* Precision of the local clock */

    uint32_t root_delay;      /* Total round trip delay time */
    uint32_t root_dispersion; /* Maximum error allowed from primary clock source */
    uint32_t ref_id;          /* Reference clock identifier */

    struct ntp_ts ref_ts;     /* Reference time-stamp */
    struct ntp_ts org_ts;     /* Originate time-stamp */
    struct ntp_ts rx_ts;      /* Received time-stamp seconds */
    struct ntp_ts tx_ts;      /* Important field for client: Transmit time-stamp */
} __attribute__((packed));

It is as simple as dumping struct ntp_packet on wire and reading it off it -- no parsing involved except for calling ntohX()/htonX() on all fields except li, vn and mode. But with SIP/HTTP, a simple line like Content-Length: 300 will require a good bit of string parsing, manipulation, etc.

Do you think JSON/XML would look nicer? If a parser was to be written that parsed NTP over JSON using DOM or Stream parser, it won't be apparent what is being parsed by just looking at the source code. Now of course you can add a few extra layers on top where there is some kind of style sheet or meta XML that is parsed during build that gives you a class NTPPacket with several get/set functions and also a cute looking serializer function or perhaps a class too. Parsing now is no more a O(1) problem but a complex O(wtf) problem with equally complex memory usage. But because that equation can somehow be expressed in terms of n, people will have cartoonish light bulb moment, concluding that the said parsing algorithm is linearly scalable. Since hardware is also cheap, it's easier to add more hardware to handle more messages, so it's a matter of client side economics and that there's nothing to worry about code-readability and long term maintainability. Any attempt to use struct again will be branded backward thinking or a futile attempt at premature optimisation.

Argument: Extensibility

When people talk about extensibility, they often expect adding or removing fields at will. Probably, they've never heard of Tag-Length-Value technique or the modern day CBOR. I mean sure, you can add a new Content-Type: application/batmobile but if the other end doesn't recognise the value or even the entire field, it isn't so much useful anyway.

Error recovery and resilience

How many HTTP/XML/JSON parsers out there recover from failed tokens and read the remaining contents? And how useful will that data be overall? Most popular parsers usually fail on the first error and report it. Protocols built on this usually send an error message back. So there's not much of error recovery either. Arguments around this usually suggests that somehow text protocols are resilient to 1 off errors and that the message as a whole is not corrupted. This argument is actually true for syslog vs journalctl but not so much in communication protocols.

Conclusion

This is not to say text protocols are wrong; they have a place. They are very useful in programming languages that cannot offset into a memory chunk directly without using unholy, bloated APIs. But using them in telecommunication systems which is predominantly machine-to-machine is just an overkill. Hardware is getting cheaper by the day and adding more memory and more CPUs to do a task will be economical. Nevertheless, binary protocols are both space and time efficient. They have less overhead in parsing/building and are slightly easier to maintain.

#protocols

- 45 toasts