Tuning Your XML for Clean Web Services
by Adam Kolawa and Marvin Gapultos
Crap in, crap out -- it's an axiom that applies to many aspects of enterprise development, but none more so than building reliable and robust web applications and integration projects with Extensible Markup Language (XML). Since its inception, XML has been seen as the cure-all for every problem related to web applications and integration projects. However, "crappy," poorly written XML can either slow down an integration project, or worse, cause the integration project to collapse.
If it is written and used properly, XML does offer many advantages that can support reliable and robust web applications. The key to successfully using XML in an integration project is to first understand the inefficiencies that may cause poorly written XML, and then apply a rule-based system that establishes policies that can be adhered to.
Writing XML -- A 7 Keys Use Case
The Extensible Markup Language (XML) is a family of technologies that describe structured data. Using XML, companies can create common information formats and share this information on the World Wide Web. For example, a company can create an XML document to exchange information about its products over the Internet. The following is a simple example of an XML document:<
apples are great for making caramel apples. The sour taste
apple and the sweet taste of the caramel blend well together.
hearts are the tastiest part of the artichoke. When you
the heart, it's love at first bite!
is easy to cook.
are the main ingredient in guacamole.
monkeys eat bananas. Are you smart?
Avoid XML Inefficiencies
Although the example XML document appears to be written correctly, how can developers be completely sure that the code is valid and well-formed, comprehensible to other developers, and adheres to specific standards? In other words, how can developers be sure that the XML they write isn't crap? The answer to this question lies in a rule-based system that can establish team policies and practices to prevent poorly written XML.
The following sections will outline some of the inefficiencies that can lead to crappy XML, and will address how a rule-based system can "cut the crap" and prevent the use of poorly written XML in integration projects. After all, system performance is only as good as the data received and the instructions given. If errors are contained in the XML, it is more likely than not that the system will crash.
One of XML's main benefits is its provision of mechanisms for verifying document validity. There are two basic mechanisms for verifying document validity: Document Type Definition (DTD) and XML Schema. For example, when creating an XML document, developers can reference either of these mechanisms from within the document itself. The DTD or schema that is referenced specifies exactly how the XML document is to be processed, which elements and attributes are contained in the document, and the order in which these elements and attributes should be listed.
The following is an example of a simple DTD that can be referenced by an XML document:
file CDATA #REQUIRED
id CDATA #REQUIRED
mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:
isFruit (true|false) 'true'>
To reference this DTD from an XML document, the following header can be added to the beginning of the XML document:
A DTD is a specification based on the rules of the Standard Generalized Markup Language (SGML) and provides basic verification of XML documents. DTDs provide mechanisms for expressing which elements are allowed and what the composition of each element can be. Legal attributes can be defined per element type, and legal attribute values can be defined per attribute.
The following is an example of a simple schema that can be referenced by an XML document:
To reference this schema from an XML document, the attribute in the element can be specified with the following header:
An XML schema, like a DTD, defines a set of legal elements, attributes and attribute values. However, XML schemas provide a more robust verification for XML documents. XML schemas are namespace aware and also cover data types, data bounds, schema class inheritance and context-sensitive data values -- all of which are not covered by DTDs.
While referencing DTDs or schemas can guarantee the validity of XML documents, there is no requirement that developers use headers to reference DTDs or schemas at all. In fact, developers need only to follow simple syntax rules in order for an XML document to be "well-formed." However, a well-formed document is not necessarily a valid document. Without referencing either a DTD or a schema, there is no way to verify whether the XML document is valid or not. Therefore, measures must be taken to ensure that XML documents do, in fact, reference a DTD or schema.
To guarantee that an XML document references a DTD or schema, development teams can adopt a rule-based system that can detect and prevent errors within the XML code. Developers can create rules that impose constraints on XML documents to verify validity.
For example, a rule can be created that enforces an XML document to contain the sample schema header:
If the document is missing the specified a header, an error will occur alerting the developer of the violation.
As seen in the sample XML document, XML is human-readable. In other words, XML is created in plain text and utilizes actual words or phrases that have specific meanings to developers. However, even though XML can be read and written by humans, it does not necessarily mean that humans can understand XML -- developers can still create crappy, unreadable XML code. An element that has a specific meaning to one developer may be of no use, or make no sense, to another developer.
For instance, developers can create XML that is completely unintelligible to one another -- consider XML tags that are written in Polish or Japanese. Code need not be written in another country to be crappy, either; ambiguous code written in the same tongue that is to be shared between companies can be quite cryptic, as well. For example, the element
Naming conventions can be established that verify whether code follows rules that verify anything from W3C guidelines for a specific language, to team naming standards, to project-specific design requirements, to the proper usage of custom XML tags.
Although the W3C has made an effort to establish a common language, vocabulary and protocol for XML, these standards are still in development and are constantly changing. Companies that adhere to proposed standards that are not yet fully mature must be prepared to keep up with any changes of the standard in the future. For example, a standard that is in existence today may not exist six months from now. Without any stability in XML standards, developers are forced to either keep up with the rapid changes or fall behind. In spite of the chaos of standards that may exist for XML development, the recent release of Basic Profile 1.0 provides some guidance to developers seeking a widely used XML standard. The Web Services Interoperability Organization (WS-I) Basic Profile 1.0 consists of specifications that establish a baseline for interoperable web services.
These specifications include guidelines that cover XML 1.0. So, developers can now depend on Basic Profile 1.0 as a common framework for implementing XML and building integration projects. There are more than 25 WS-I member companies that support Basic Profile 1.0. Therefore, developers can be confident that the XML standards they use will not be subject to constant flux and change.
SOAPtest Boosts XML Efficiencies, Enforces Rules
As seen in the points outlined above, the inefficiencies of XML provide ample opportunities for developers to create poorly written XML documents. However, developers can be sure to create valid XML by addressing its inefficiencies and adopting a rule-based system. Products such as Parasoft SOAPtest 2.1 can be utilized to create and enforce such a rule-based system.
SOAPtest provides a number of tools to developers that can help to prevent crappy XML. The RuleWizard tool includes a standard XML dictionary of the standard elements (attributes, elements, document type, etc.) to construct rules and queries. In addition, developers can create their own dictionaries for any type of XML document and can express patterns in terms of nodes in these dictionaries.
The CodeWizard tool can be configured to check any number or combination of custom rules and built-in rules. Custom rules can verify application-specific design and content requirements, enforce custom coding standards, enforce naming conventions, identify text (such as an exception message) that signals a problem, query XML data, or even perform custom file transformations.
Parasoft's SOAPtest 2.1 also operates in conformation with Basic Profile 1.0. Therefore, by establishing rules and validating XML with SOAPtest 2.1, developers can be certain that the XML they create follows the standard set of guidelines put forth by WS-I.
SOAPtest provides the higher-level checking that goes beyond simply validating the XML and enforces application-specific rules and team-specific coding standards. By automating the verification process at each level, developers can increase their confidence that the Web application's usage of XML is consistently correct and clean -- cutting the crap out of the process.
Toward Cleaner Code
Whether you use SOAPtest, another XML tool, or check your own work by hand against your rules, the message here is that as XML becomes more pervasive in your enterprise, you and your dev team will need to assure yourselves it is the cleanest, most efficient and least ambiguous XML you can write. That means that simply using XML extensions with your dev tool or IDE and then deploying that XML directly onto your enterprise may soon begin to inflict some hazards in the form of slower processing, hard-to-update code or even error-prone operations. The key to avoiding these troubles in the future is to make sure that the XML you deploy today isn't crappy code that you just slap-dabbed into place. The first step toward sparkling code is simple education, and the awareness to take a second look at your XML practices and see if you or your team may be making any of the above mistakes.
More XML/XSD Resources: