The eXtensible Markup Language (XML) is the core technology in the current generation of service oriented architecture mechanisms for building distributed applications, and provide the basis for effectively managing information in large enterprise-wide environments today. XML enables electronic data to be tagged and processed in a standard way, and omits the more complex and less-used syntax of its parent language - the Structured General Markup Language (SGML) providing benefits such as;
Information passed between programs and computer systems is becoming richer and more complex as new web- and network-based applications are being developed. Such information must be self-describing, so that client software can successfully interpret and utilize external data. XML is an ideal data format for storing structured and semi-structured content.
An XML document contains special instructions, called tags, which usually enclose identifiable parts of a document, as well as identify specific characteristics (attributes) of the data. These tags are used to describe the nature of an object, as opposed to how it should be displayed or printed. Thus, since the information is self-describing, it can be easily located, extracted, manipulated, and exchanged as desired. In addition, since XML is a fully comprehensive subset of SGML, it also supports the parent-child relationships utilized in SGML (i.e., a “child” is a sub-element of a “parent” element).
XML can be utilized for describing both, the structure of a document, as well as the specific data contained within a document. However, XML is not restricted to just documents, it can be utilized to describe the structure and content of objects found in databases as well. This idea, coupled with advancements in browser technology, parsing, connectivity, application interfaces, databases, programming languages and IDEs, which all provide support for XML, has created a solid basis for establishing practical strategies utilizing XML to support both, small and large information frameworks.
XML is considered a data exchange language. However, it goes far beyond the bounds of the data exchange formats (i.e., comma separated values – CSV formats, fixed width, etc.) utilized in the past. XML has the inherent capability to enforce standard mappings of data, so that one business can utilize the data obtained from another business. This type of data exchange can be supported for businesses in the same industry, or a business in unrelated industries – as long as an agreed to standard (XML Schema) has been setup to support this data exchange. This concept of exchanging business data with other businesses is sometimes referred to as B2B (business-to-business).
It is often thought that XML is only for the Internet - a so-called extension of HTML. Holding fast to this notion can blind-side the IT division(s) within an organization. It is common for IT departments to be actively involved with the exchange of data from one information system to another. Often times, mapping and exchanging data from spreadsheets or flat files to databases and/or the converse. This type of data exchange is often custom mapped for the specific application at hand, and generally not reusable. However, establishing standards within an organization for the purpose of data exchange and management can greatly reduce the data inconsistencies, and associated IT effort generally related with this type of activity. In addition, the data exchange standards can provide a means to exchange and share select information with other businesses or industries, and gain a greater visibility alongside your market competitors.
In this information age, the ability to exchange, analyze, and utilize data from a large variety of different organizations can greatly increase our understanding of the industries that propel us through the 21st century and beyond.
XML is one of a class of markup languages - notations for describing some form of content. The most widely-used (and known) markup language is HTML, used to represent most web content. HTML is quite a restricted language in practice, and it describes content in terms of appearance. For example, most of the tags in HTML describe (or at least strongly hint) what a particular piece of content should look like when rendered on a web page. It should be said that many of the most display-related tags are now deprecated in favor of cascading style sheets. However, HTML is still essentially about appearance. XML on the other hand uses tags (in fundamentally the same way as HTML) which do not describe appearance: they simply describe content (and also structure) of documents.
Unlike HTML which has a pre-defined set of tags, XML does not. That is, you can 'make up your own'. For example, here is a fragment of XML representing Computer Science modules:
<modules>
<level num='1'>
<module name='Principles and Practice of Programming' code='CS_141'/>
<module name='Languages to Hardware' code='CS_113'/>
</level>
<level num='2'>
<module name='System Specification' code='CS_213'/>
</level>
<level num='3'>
<module name='Internet Computing' code='CS_338'/>
<module name='High Performance Microprocessors' code='CS_313'/>
</level>
<level num='M'>
<module name='Writing Web and Web Service Applications' code='CS_M68'/>
</level>
</modules>
In the example above, we have introduced three tags: modules; level; and module. Two of the modules have attributes: level has one (num) and module has two (name and code). In this example, we have used attributes to encode some information: however, we could have just used more tags. For example, we could rewrite one of the modules like this:
<module>
<name>Program Design</name>
<code>CS_141<code/>...
</module>
(The choice between using attributes or nested tags to represent content and structure is often fairly arbitrary. However, many people would argue that attributes should only be used to represent meta-data: that is, data about data. These people would argue against the first version above, because clearly `name' and `code' are not meta-data.)
Some basic rules regarding legal XML documents:
Some of the advantages of using the XML format include:
XML is a simple, very flexible text format that has been designed to meet the challenges of large-scale electronic publishing, and today is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.