Thursday, 10 March 2011
In the previous posts I explained semantics, syntax, and the fastest, cheapest and easiest way to get from diverse IT applications to one uniform business language. This post will take a deep dive into message formats such as Flat file, EDIFACT, XML and JSON
Ever wondered about the pros and cons of XML? JSON? What it is really? What other possible message syntaxes are?
When semantics and a logical structure has been defined, and functional groups and fields have been identified, just about any message format can be chosen.
Choosing message formats (or rather, predefining them) is a rather pragmatic measure that narrows down the margin for discussion when it comes to choosing a form for your common language
Message formats can be divided into two characteristics: length and dimension.
Length can be fixed-width or delimited, and dimension can be are horizontal or vertical. Basically, that's all there's to it
Fixed-length messages have records and fields, containing the information, that have a fixed width. The same type of information will always be found at the same location (position) within a record or field. Fields not fully used (the information in it is physically shorter than the maximum length of the field allows for) will be filled with spaces or zeroes, depending on the data type of those fields.
E.g. An empty record consisting of 5 fields, with a total length of 500 characters, will always be 500 characters long.
Delimited or variable-length messages have records and fields, containing the information, that have a delimited width. All records and fields are delimited by so-called delimiters, i.e. commonly defined characters that indicate the end of a record or field.
The same type of information will always be found after the same delimiter. Fields not fully used (the information in it is physically shorter than the maximum length of the field allows for) will be just totally empty
E.g. an empty record consisting of 5 fields, with a total length of 500 characters, will always be 5 characters (delimiters) long.
Horizontal messages look much like traditional files: fields within a record are found at the same line, one after another. This message format is the most common and resembles files or database-tables
Vertical messages look much like a single-row table: fields within a record are found each at a new line, one line after another. This format is not widely used but solves the fixed-width problem in an easy way: each field starts on a new line, preceded by a so-called “tag” that identifies the function of the field. The field ends when the last character on that line is encountered
Nowadays EDIFACT is the most common language used, mainly for its global approach, broad business document standards (over 200 different functional messages) and the fact that its delimited type allows for minimum message size. It is in existence since 1986, and evolved on ANSI X12 (1979). ANSI X12 is in use in the United States, whereas EDIFACT is widely used in the rest of the world
XML was gaining grounds, but due to lack of standards, and the fact that it is an unlucky combination of a delimited interface with fixed-width field names it will not play a large role any time soon, unless it is pushed and supported by organisations such as the UNECE. Evangelising Web Services as having to be encapsulated in XML however have sped up the evolution of it.
Additional push is delivered by most new applications offering XML-support nowadays when it comes to disclosing information.
A general pull is delivered by Google, Twitter and Facebook moving away, or having moved away already, from the format. Facebook is on the verge of doing so, Google never has used XML, and Twitter deprecated its use last year. Both Twitter and Facebook have embraced JSON, a far more simple and concise message structure that serves the same purpose
IDOC is a standard invented by SAP that is just a simple fixed-width interface, however supported by large organisations. Needless to say, however, that as SAP-implementations differ all over the world, so do the iDOCs
Information exchange via physical files goes back decennia. A decade ago links started being made via COM, CORBA, DCOM etc, during which information wasn’t exchanged between systems via physical messages but via memory objects.
This in fact meant that applications were tightly coupled via a virtual point-to-point interface. Development and maintenance of such solutions has proven to be (much) too expensive
There’s a lot of talk about loosely-coupled things these days: the ancient hard-coupling (point-to-point interfacing, COM, CORBA, etc) appeared to be very time-consuming and costly with regards to development and maintenance
Nowadays it’s architects that use the term loosely-coupled: some of them think that applications should be loosely-coupled towards the architecture, instead of vice versa. These never are business-architects, but IT-architects. Business architects know that applications change at high speed, and that -fast changing- business needs should be supported by IT, being able to move in or out applications or systems when need be - the European Parliament approach as sketched in the previous post
Loosely-coupled means that applications and systems should plug in and out of the enterprise almost overnight. Plugging in an application should happen as fast and cheap as possible. Regard it as just another politician speaking in Brussels: just another (relatively cheap) translator has to be found, after which business can continue as usual
So, applications must always speak their native language. This is the most simple and cost-efficient idea, leaving the application to do what it’s best at: provide functionality. The translator will provide its own specialty, which is translating messages from any language into any other language - the common language that serves your company as a uniform, tool- and platform independent business language that is relatively timeless and understandable to all
For an Oracle Siebel CRM application this usually means speaking Siebel XML, for SAP this usually means speaking iDOC; a mainframe will feel most comfortable with fixed or delimited flat files, etc.
If you visit China, you'll talk Chinese, in Greece it's Greek, and when wanting to speak English you better distinguish between UK English, US English, Australian English or, what is generally in use in social media networks: global English
This was the deep dive into messaging formats, concluding the information exchange form: messaging. Now let's resurface and address the information exchange method: transportation