Scaling XML to High-Volume -- Dos and Don'ts
Developers looking to scale their pilot XML projects to embrace more volume or to link to more systems should carefully evaluate whether today's popular approaches will truly serve their needs.
A recent whitepaper from Zapthink, web services consultancy in Waltham, Mass., found certain drawbacks with all of the top XML performance tuning options, including
(a) XSL (Extensible Stylesheet Language),
(c) using smaller "element names,"
(d) using non-standard XML parsers; and even
(e) rewriting XML rules and/or business logic
"Developers need to start their planning from the basic assumption that XML is inefficient. If developers don't spend time thinking clearly about what that means to their systems environment they could run into challenges," Zapthink analyst Ronald Schmelzer told Integration Developer News.
Overcoming Hazards of XML Shortcuts
The problem with scaling XML, Schmelzer said, is that it's difficult to determine just where the inefficiencies will crop up. Because many XML projects are low- and medium-volume projects designed to be small pilot tests, he said, "at first, often these XML inefficiencies don't really show up."
Schmelzer describes XML inherent challenges this way in a recent Zapthink column.
"XML is a text-based, human-readable, and metadata-encoded markup language that operates on the principle that the metadata that describes a message's meaning and context accompanies the content of the message. As a result, XML document sizes can easily be ten to twenty times larger than an equivalent binary representation of the same information.
Schmelzer says these "inefficiencies" can crop up and bite a developer in 3 main areas:
1. Bandwidth -- The simple transferring XML documents -- even without XML schema transformations -- can eat up a lot of bandwidth. A network might need up to 10-20 MB of bandwidth for high-volume transfers, Schmelzer said.
2. Processor overload -- While a growing number of developers are using http://www.w3.org/Style/XSL/ XSL (Extensible Markup Language) to help with XML throughput, Schmelter says it's not a true answer. "XSL taxes a processor quite a lot, especially if you're doing 100 XSL transactions per second." And these XSL latencies can multiply, especially as developers begin to construct XML storage solutions.
3. Storage -- The more XML documents (or parts of documents) developers need to store, the more they may look to use XSL in many places. "Once you make a decision to use XSL, you may find that code will grow like wildfire," Schmelzer said.
Once the bottleneck (or potential bottleneck) is diagnosed, developers should apply care in applying a solution, Schmelzer said. In the push for better performance, developers are doing things to XML that aren't exactly standard," he said. Schmelzer admits there are "no established Best Practices" for speeding up XML and so "People are doing their own thing. Sometimes it works and sometimes it doesn't," he said.
Performance and Interoperability Trade-offs
Even though these techniques seem to solve today's (performance) problem, Schmelzer warned that the wide-spread use of non-standard solutions will probably constrain the interoperability of their XML systems -- both inside and outside the firewall.
"These solutions may speed up some performance to a point, but they don't work all the time for every bottleneck," Schmelzer said, "Developers need to realize that when doing XML data sharing or document transfers the other side of the communication has to understand these compressed formats or rewritten element names," he said. "Naturally, one consequence is that while your first project may work just fine, the more systems you add in, the more likely you'll lose interoperability."
In Search of Solutions
Schmelzer admits there aren't many standard answers to these problems -- at least not today. He vocally suggests that the W3C, OASIS or another web service standards body take up the issue of making high-volume XML more efficient. "As companies begin to rollout bigger and bigger XML projects, it will become much more evident that this problem merits standards attention," he told IDN.
But in the meantime, he suggests the boost in performance should come from tuning the hardware or the application server -- not the XML itself. "To avoid problems down the line, the developer should implement XML as a standard as much as he can, and optimize his hardware, software application or network for performance," he said.
Schmelzer posed the following question: "At what point does a compressed, stripped-down, non-validating "XML-like" format leave the standards behind and represent a proprietary data format?"
To guard against tweaking XML out of standard compliance, Schmelzer also suggests that XML be converted before it's dropped onto the network wire, and in turn have that post-XML data format compressed. XSLT processors are one flavor of such an option, but even this technique needs to be evaluated for the type of XML traffic being pushed through the network because XSLT processors can slow application servers by giving them more pre- or post-processing, he added.
"There should be somebody else worrying about how to squeeze more performance out of the XML traffic on the pipe so that developers don't need to write specialized parsers," he said. "That's the way EDI did it, with everyone writing their own parsers, and it definitely made things less interoperable."
The full copy of Schmelzer's Zapthink column on "Breaking XML To Optimize Performance" is available.