Tuning Your XML for Clean Web Services

If you've been working on more than project with XML, especially for integration or web services, some experts say you need to make sure your XML is keeping up with the needs of your system for throughput, performance and such. This week, IDN brings devs an XML Checklist for cleaning up code that may have been auto-generated by a variety of tools. If your enterprise is becoming an XML house, it's worth a look. It's easy, quick -- and may even be fun.

Tags: XML, Developers, XML Document, Schema, Standards, DTD, Reference,


by Adam Kolawa and Marvin Gapultos

Crap in, crap out -- it's an axiom that applies to many aspects of enterprise development, but none more so than building reliable and robust web applications and integration projects with Extensible Markup Language (XML). Since its inception, XML has been seen as the cure-all for every problem related to web applications and integration projects. However, "crappy," poorly written XML can either slow down an integration project, or worse, cause the integration project to collapse.



If it is written and used properly, XML does offer many advantages that can support reliable and robust web applications. The key to successfully using XML in an integration project is to first understand the inefficiencies that may cause poorly written XML, and then apply a rule-based system that establishes policies that can be adhered to.


Writing XML -- A 7 Keys Use Case

The Extensible Markup Language (XML) is a family of technologies that describe structured data. Using XML, companies can create common information formats and share this information on the World Wide Web. For example, a company can create an XML document to exchange information about its products over the Internet. The following is a simple example of an XML document:<



encoding="US-ASCII"?>





color="green" file="apple_fruit_green.gif" id="0"
isFruit="true">


Green
apples are great for making caramel apples. The sour taste


of the
apple and the sweet taste of the caramel blend well together.





color="green" file="artichoke_veg_green.gif"
id="1" isFruit="false">


Artichoke
hearts are the tastiest part of the artichoke. When you


eat
the heart, it's love at first bite!





color="green" file="asparagus_veg_green.gif"
id="2" isFruit="false">


Asparagus
is easy to cook.





color="green" file="avocado_fruit_green.gif"
id="3" isFruit="true">


Avocados
are the main ingredient in guacamole.





color="yellow" file="banana_fruit_yellow.gif"
id="4" isFruit="true">


Smart
monkeys eat bananas. Are you smart?






Avoid XML Inefficiencies

Although the example XML document appears to be written correctly, how can developers be completely sure that the code is valid and well-formed, comprehensible to other developers, and adheres to specific standards? In other words, how can developers be sure that the XML they write isn't crap? The answer to this question lies in a rule-based system that can establish team policies and practices to prevent poorly written XML.



The following sections will outline some of the inefficiencies that can lead to crappy XML, and will address how a rule-based system can "cut the crap" and prevent the use of poorly written XML in integration projects. After all, system performance is only as good as the data received and the instructions given. If errors are contained in the XML, it is more likely than not that the system will crash.


  • 1. Validate XML

    One of XML's main benefits is its provision of mechanisms for verifying document validity. There are two basic mechanisms for verifying document validity: Document Type Definition (DTD) and XML Schema. For example, when creating an XML document, developers can reference either of these mechanisms from within the document itself. The DTD or schema that is referenced specifies exactly how the XML document is to be processed, which elements and attributes are contained in the document, and the order in which these elements and attributes should be listed.


  • 2. Define DTDs

    The following is an example of a simple DTD that can be referenced by an XML document:














    #REQUIRED



    file CDATA #REQUIRED



    id CDATA #REQUIRED


    mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:
    EN-US;mso-bidi-language:AR-SA">
    isFruit (true|false) 'true'>


    To reference this DTD from an XML document, the following header can be added to the beginning of the XML document:






    version="1.0" encoding="US-ASCII"?>


    "-//OnlineGrocer//ProductList//EN""ProductList.dtd">






    A DTD is a specification based on the rules of the Standard Generalized Markup Language (SGML) and provides basic verification of XML documents. DTDs provide mechanisms for expressing which elements are allowed and what the composition of each element can be. Legal attributes can be defined per element type, and legal attribute values can be defined per attribute.



  • 3. Define Schemas

    The following is an example of a simple schema that can be referenced by an XML document:






    name="ProductList">








    ref="Product" minOccurs="0" maxOccurs="unbounded"/>











    name="Product">








    base="xsd:string">


    name="color" use="required">



































    name="file" type="xsd:string" use="required"/>


    name="id" type="xsd:nonNegativeInteger"
    use="required"/>


    name="isFruit" type="xsd:boolean"
    default="true"/>

















    To reference this schema from an XML document, the attribute in the element can be specified with the following header:








    xsi:noNamespaceSchemaLocation="ProductList.xsd">





    An XML schema, like a DTD, defines a set of legal elements, attributes and attribute values. However, XML schemas provide a more robust verification for XML documents. XML schemas are namespace aware and also cover data types, data bounds, schema class inheritance and context-sensitive data values -- all of which are not covered by DTDs.


  • 4. Avoid Lack of DTD/Schema Enforcement

    While referencing DTDs or schemas can guarantee the validity of XML documents, there is no requirement that developers use headers to reference DTDs or schemas at all. In fact, developers need only to follow simple syntax rules in order for an XML document to be "well-formed." However, a well-formed document is not necessarily a valid document. Without referencing either a DTD or a schema, there is no way to verify whether the XML document is valid or not. Therefore, measures must be taken to ensure that XML documents do, in fact, reference a DTD or schema.


  • 5. Enforce Document Validity

    To guarantee that an XML document references a DTD or schema, development teams can adopt a rule-based system that can detect and prevent errors within the XML code. Developers can create rules that impose constraints on XML documents to verify validity.



    For example, a rule can be created that enforces an XML document to contain the sample schema header:








    xsi:noNamespaceSchemaLocation="ProductList.xsd">


    If the document is missing the specified a header, an error will occur alerting the developer of the violation.


  • 6. Ensure "Human-Readable" Makes Sense

    As seen in the sample XML document, XML is human-readable. In other words, XML is created in plain text and utilizes actual words or phrases that have specific meanings to developers. However, even though XML can be read and written by humans, it does not necessarily mean that humans can understand XML -- developers can still create crappy, unreadable XML code. An element that has a specific meaning to one developer may be of no use, or make no sense, to another developer.



    For instance, developers can create XML that is completely unintelligible to one another -- consider XML tags that are written in Polish or Japanese. Code need not be written in another country to be crappy, either; ambiguous code written in the same tongue that is to be shared between companies can be quite cryptic, as well. For example, the element can mean anything from transform, transaction, or Trans-Am, depending on the developer and the application. To prevent ambiguous XML code, development teams must mutually agree upon a standard XML vocabulary. With a standard language in place, developers within a team will be more apt to understand each other's code.



    Naming conventions can be established that verify whether code follows rules that verify anything from W3C guidelines for a specific language, to team naming standards, to project-specific design requirements, to the proper usage of custom XML tags.


  • 7. Sidestep Standards Chaos

    Although the W3C has made an effort to establish a common language, vocabulary and protocol for XML, these standards are still in development and are constantly changing. Companies that adhere to proposed standards that are not yet fully mature must be prepared to keep up with any changes of the standard in the future. For example, a standard that is in existence today may not exist six months from now. Without any stability in XML standards, developers are forced to either keep up with the rapid changes or fall behind. In spite of the chaos of standards that may exist for XML development, the recent release of Basic Profile 1.0 provides some guidance to developers seeking a widely used XML standard. The Web Services Interoperability Organization (WS-I) Basic Profile 1.0 consists of specifications that establish a baseline for interoperable web services.



    These specifications include guidelines that cover XML 1.0. So, developers can now depend on Basic Profile 1.0 as a common framework for implementing XML and building integration projects. There are more than 25 WS-I member companies that support Basic Profile 1.0. Therefore, developers can be confident that the XML standards they use will not be subject to constant flux and change.


    SOAPtest Boosts XML Efficiencies, Enforces Rules

    As seen in the points outlined above, the inefficiencies of XML provide ample opportunities for developers to create poorly written XML documents. However, developers can be sure to create valid XML by addressing its inefficiencies and adopting a rule-based system. Products such as Parasoft SOAPtest 2.1 can be utilized to create and enforce such a rule-based system.



    SOAPtest provides a number of tools to developers that can help to prevent crappy XML. The RuleWizard tool includes a standard XML dictionary of the standard elements (attributes, elements, document type, etc.) to construct rules and queries. In addition, developers can create their own dictionaries for any type of XML document and can express patterns in terms of nodes in these dictionaries.



    The CodeWizard tool can be configured to check any number or combination of custom rules and built-in rules. Custom rules can verify application-specific design and content requirements, enforce custom coding standards, enforce naming conventions, identify text (such as an exception message) that signals a problem, query XML data, or even perform custom file transformations.



    Parasoft's SOAPtest 2.1 also operates in conformation with Basic Profile 1.0. Therefore, by establishing rules and validating XML with SOAPtest 2.1, developers can be certain that the XML they create follows the standard set of guidelines put forth by WS-I.



    SOAPtest provides the higher-level checking that goes beyond simply validating the XML and enforces application-specific rules and team-specific coding standards. By automating the verification process at each level, developers can increase their confidence that the Web application's usage of XML is consistently correct and clean -- cutting the crap out of the process.



    Toward Cleaner Code

    Whether you use SOAPtest, another XML tool, or check your own work by hand against your rules, the message here is that as XML becomes more pervasive in your enterprise, you and your dev team will need to assure yourselves it is the cleanest, most efficient and least ambiguous XML you can write. That means that simply using XML extensions with your dev tool or IDE and then deploying that XML directly onto your enterprise may soon begin to inflict some hazards in the form of slower processing, hard-to-update code or even error-prone operations. The key to avoiding these troubles in the future is to make sure that the XML you deploy today isn't crappy code that you just slap-dabbed into place. The first step toward sparkling code is simple education, and the awareness to take a second look at your XML practices and see if you or your team may be making any of the above mistakes.




    More XML/XSD Resources:

  • Microsoft's MSDN provides a very thorough Developers' Guide to XSD, setting out steps and sample code for how developers can define the structure and data types for XML documents. This set of XSD resources at MSDN reviews highlights and tips for W3C's XML Schema Part 1: Structures Recommendation for the XML Schema Definition Language; and Part 2: Datatypes Recommendation (for defining data types used in XML schemas).


  • O'Reilly's XML.org presents a very thorough article that shows developers how to use XSD correctly to navigate "controlled vocabularies" -- or schema changes to "enumerated lists of element-attribute values." In his article, "Managing Enumerations in W3C XML Schemas," author Anthony Coates contends that when outside interests control schema definitions -- even well-intentioned standards groups (like ISO, W3C, etc.) developers must be on the defense against the need to rewrite, retest and reassign the schemas in your code. Coates provides real examples with real code samples that developers can use.


  • XML-Deviant Kendall Grant Clark says the topics of conversation on XML-DEV revolve around two camps of people: one which thinks aspect N of XML is a wart, the other which thinks N is an elegance. See which is which, and why even in a world of auto-generated XML, it still matters to core XML-based web services development at http://www.xml.com/pub/a/2003/03/19/deviant.html.


  • Developers are shown how to transform XML in steps using a preliminary XSLT to transform XSD into XML that is easier to transform into generated code. In an article from Visual Studio Magazine, Denver-based MVP Kathleen Dollard shows how using XSLT's DataSet and DataTable lets developers "hide some of the ugliness of restructuring data." Dollard also shows how XSD is used to create XML files containing metadata, and then to perform the code-generating transformation.


  • In this Top XML Tutorial by Kurt Cagle, learn how to navigate the main XSD Structures, including Conceptual Framework (to abstract the process of defining schemas; Schema Components (to create formal schemas from a strictly formalistic standpoint); XML Implementation (to implement XML representation of the formal grammar); and Constraints (which deal with ways that a given element or attribute can be constrained to work within a subset of its original domain).


  • XSD Validation using SAX (or the DOM API) is illustrated in this piece by PerfectXML.com's Managing Editor Darshan Singh. This through step-by-step (with code samples) approach takes developers through the whole process, from deriving stub implementation classes to declaring the methods that we'll use for XSD validation and for counting XML elements.


  • Download the latest upgrade to XMLSPY (v 5) from Altova.. XMLSPY is an XDE (XML Development Environment) for Java and .Net developers looking for help in designing, editing and debugging enterprise-class applications involving XML, XML Schema, XSL/XSLT, SOAP, WSDL and Web Service technologies.




  • back