Thursday, 8 September 2011
The other day my attention got drawn by a very large national company that claimed to have a performance problem: sometimes it would take ages for messages to reach their destination, and entire applications would come to a screeching halt.
After a few questions and answers, it was clear that they didn't have a performance problem: they had an architectural problem or, in a nutshell, a very unfortunate design
Queueing has become a much-appreciated message-transport system in the last decades.
Why? It was a perfect way to benefit from the change from Pull to Push (not in a John Hagel sense, but in an Integration sense). Pull is what you do when you expect a message to be there, and go look for it: you basically verify your assumption.
Two decades ago, B2B connections would be relatively very expensive as they all involved ISDN telephone lines and each Pull would mean you dialed in, logged in to some transport protocol (usually FTP), did your stuff and then logged out and dialed out again
So, on average, there would be minutes and more likely hours between Pull attempts, for an average B2B chain - but then again real-time wasn't much in demand back then
Ever since cost went down and demand for real-time or at least near real-time went up (which ever came first), Pull has been replaced by Push: in stead of system A checking for a trigger message from system B at regular intervals, system B now immediately sends that message to A when it's ready (the message, that is; not system A)
The very best way to Push is via queues: they stand eagerly awaiting any message and will receive and forward that within nanoseconds. What's more, they will assign unique ID's to every message and wait for the response to a request if need be.
And that's where it can go wrong
If you have an ESB and only 1 queue, you'll get the benefits and disadvantages of that: all your messages will be handled in the exact same order as they were placed on the ESB, because any queue is always 1 message wide. So, if the application on the other side takes a little while to handle that message, your entire queue will notice that.
It's the usual trade-off, and bad design for an ESB. You want a highway for you ESB, not a single-lane interstate
Having said that, you actually don't really want one single highway. Whether you have a double-lane or even a 10-lane highway, whenever there's a large enough accident, everything comes to a full stop or least one very, very large traffic jam
The other extreme is having one queue per application-to-application, and vice versa. For those of you in the know, that indeed does mean point-to-point connections on the application level, yet know solidified into the infrastructure level. Usually situations like these are even made worse because there are guidelines to put error messages on the applicable error queue, and failure messages on the applicable failure queue. If you ask me, that is the greatest sin one can commit in IT, because it takes both business failure as well as technical failures and demotes them to the infrastructural level and into the hands of Operations: sorry for riding my horse again but just like most religious and political systems, that's just willingly handing over full responsibility without getting anything in return: out of sight, out of heart
So what's the optimal solution? As usual the answer lies somewhere in the middle. Again, looking at the everyday world, the solution is there and has been proven. It's cost-efficient and effective, and offers the best solution for all. There are paths for pedestrians, roads for cars and motorcycles, interstates for those that need to drive a long distance and highways that allow for all kinds of vehicles and passengers to get somewhere fast. Ideally one should allow for a separate road for trucks (the heavy batch loads) and double all those to get some redundancy: 2 pedestrian paths, 2 low-volume roads, 2 interstates, 2 truck roads and 2 highways
But then you have to take care of message-sequence yourself, right? Yes, also called Straight-Through Processing the task of making sure that transactions are executed in the right order is a responsibility that you shouldn't hand over - period. A Finite State Machine will help you here. Looking tough? Well maybe, but it isn't: consider it simply workflow on the event level, as opposed to process (step) level
Performance never is a performance issue if you can't solve it with more iron. It's rather easy really to recognise bad design from afar