What is XML?
- A W3C approved global standard for the Web: Extensible Markup Language.
- A meta language for describing data, regardless of what the data is.
- The evolutionary refinement of the ideas first fully standardized in SGML .
- A simple, open, fully extensible standard for data.
- XML is a markup language used to impose identification and structure on information.
- XML is extensible, which means it is not really a markup language at all!
- Instead, XML is a set of rules you can use to create your own markup language.
- asing markup languages on a common set of syntax rules allows:
- use of generic processing tools,
- evolving markup vocabularies, as needs change, and
- "multi-lingual" systems, which use multiple vocabularies.
- A way to separate data from its display properties.
- A better, faster, cheaper and much more flexible way of dealing with data.
- A way to free applications from data storage architecture characteristics.
- A way to leverage Industry standard data definitions and naming conventions.
- A standard way to describe ALL data, serialized and optimized for use over the Web.
- A facility for universal data exchange using a vendor-independent data format understood by browsers.
- The "global currency" for the Information Economy.
XML Main Points:
- Character-based data format.
- Serializes information into hierarchy.
- Extensible: unbounded descriptive power.
- Separates content from processing—one document, many outputs.
- ISO and W3C standard.
Basic XML Philosophy:
- Information is primary, not programs.
- Applications exist to serve our information, not vice-versa.
- Therefore business rules should be coupled to information instead of to applications.
Independence of information from applications.
- Changing the binding between applications and information should not be excessively difficult or prohibitively expensive.
Long-term view of data.
- XML information is accessible in the long term as well as the short term.
- Facilitates authorized "publishing" directly to the Web with faster turnaround and lower IT costs.
- Reduces application development cycle time by removing data file definition and access coding.
- Increases "reusability" of code by modularizing commonly used functions.
- Establishes installation nomenclature standards.
- Leverages industry standardized naming conventions.
- Enables application-to-application interoperability between databases, systems and enterprises.
- Reduces change time by removing data definition characteristics/management.
- Minimizes applications affected by data format changes.
Storage/Retrieval of Data
- Enables a single view of all data regardless of database organization.
- Facilitates Web-access of legacy data systems.
- Supports rich, hierarchical, structured and semi-structured forms of data.
- Voids the inefficiencies and costliness of EDI and VAN.
- Reduces search time substantially.
- Results in more accurate searches.
- Enables more effective search.
- Reduces redundant data flow.
- Eliminates need to retrieve entire records.
How XML Works:
XML is a way to use character strings to serialize structured data in a platform and implementation-independent way. Given a definition of the data content you want to capture, you define a set of structures, called elements , that contain data and label it with a descriptive “tag”. These tags label the meaning or purpose of the information. For example, to represent the information in a product inventory, you might define the element types “Inventory”, “Item”, “name”, “price”, “description”, and so on. These elements reflect the semantic structures in the data, that is, what the data means as opposed to how it looks on a screen or page.
Elements can contain other elements, creating hierarchical structures. The hierarchical structures of XML were originally designed to support the structuring of traditional documents: sections, subsections, paragraphs, and so on. But the mechanism is general enough that it can be applied to data of any type.
Elements are grouped into documents , each document being a separate file or storage object that contains a single tree of elements. For example, an inventory document using the element types mentioned above might look like this:
<?xml version="1.0"?> <Inventory> <Item> <name>Ink Printer</name> <price>100.00</price> <Description>Multi-clor Ink Jet Printer!</Description> </Item> </Inventory>
This is a single XML document (indicated as such by the string '<?xml version=”1.0”?>'). It is stored as a string of characters.
In this example, there is no explicit binding between the document instance shown and the formal definitions of the element types it uses (the document's vocabulary or document type ). However, most XML documents are governed by a formal document type definition , which may be defined as a machine-readable document type declaration (DTD), XML Schema , or other formal syntax, as well as by human-readable documentation, such as a data dictionary or document type reference manual. Given machine-readable document type rules, XML documents can be validated for conformance to these rules. This allows processors to ensure that the documents it is processing meet at least the syntactic requirements of the document type (there may be other requirements that cannot be specified in a DTD or schema).
You can think of a document type as a way to map business objects to the serialization of those business objects as XML. The document type definition should also map the business rules governing the business objects to the XML-based processing of those business objects. Examples of typical XML-based processing include generating HTML for Web display, loading XML data into a non-XML database or application, and transforming XML in one document type to XML in another document type for further interchange or processing.
As a way to capture structured data as simple strings with rigorously-defined parsing and processing rules, XML is immediately useful even in the context of a single application or task. However, the real power of XML derives from the ability for people and enterprises to exchange documents with some hope of being able to process them reliably and accurately. Communities of interest can define shared document types for the information they want to interchange. The textbook example of this is the use of XML for business-to-business communication. A number of organizations, including the United Nations, have defined industry-standard document types for business documents of all types (e.g., purchase orders, fulfillment requests, contracts, etc.) and are using them to enable wide-spread interchange of business data.
The act of defining a shared document type helps to produce clear consensus on what the information content and rules are. The use of formal, machine-based rules for document types enables validation of documents and guides the implementation of supporting processors. Because XML is completely platform and implementation independent members of a community of interest can use a variety of platforms, tools, and techniques to process the XML documents the community interchanges.
One side effect of XML's popularity as a data representation and interchange standard is that a very wide and deep set of supporting tools have been developed and deployed, both as commercial tools and as open-source. There are robust XML support packages for every major programming language and operating platform. This means that even when XML may not, in the abstract, be the optimal technology for a particular task, it is often more cost effective to use an XML solution simply because the supporting infrastructure is so widely available. It also means that there is a large and ever growing pool of skilled XML practitioners available to develop and maintain XML-based solutions.
However, this does not mean that XML is always the best solution to a particular business problem. There is a lot of hype around XML and it is important to weigh the use of XML as carefully as you would weigh the use of any other technology. It is also the case that many of the standards and practices evolving from XML are still quite new—not all of them are necessarily ideas that will stand the test of time.