|
Introduction to XML, Gus Bjorklund The Extensible Markup Language (XML) is a data format for structured document interchange on the Web. It is hardware architecture neutral, application-independent, flexible, yet simple and powerful. XML was developed by an XML Working Group (originally known as the SGML Editorial Review Board) formed under the auspices of the World Wide Web Consortium (W3C) in 1996. The World-Wide Web Consortiumıs official recommendation for XML and a variety of related materials can be found at the following URL: http://www.w3.org/XML/. XML is a subset of another markup language called SGML, which was adopted as an international standard in 1986 [ISO 8879]. SGML is based on a markup language called GML, which was developed by researchers at IBM in 1969. SGML is quite complex and the XML subset was created to eliminate the complexity while keeping the value. XML describes a class of data objects called "XML documents" and partially describes the behavior of computer programs that process them. XML is an "application profile" or restricted form of SGML. By construction, XML documents are conforming SGML documents. XML documents are made up of storage units called "entities", which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form "markup". Markup encodes a description of a document's content and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure. A software module called an "XML processor" is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of an application. The XML specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application. XML Documents
XML documents are made up of two parts: a prologue and a body. The
optional prologue may contain the XML version the document conforms
to, information about the character encoding used to encode the
contents of the document, and a "document type definition" (DTD)
which describes the grammar and vocabulary of the document. The body
may contain elements, entity references, and other markup information. Elements represent the logical components of documents. They can
contain data or other elements. For example, a customer element can
contain a number of column (field) elements and each column element a
data value. Here is an example of an element:
Note that the element begins with the construct <name> and ends with
</name>. These are delimeters called "tags" that specify the
beginning and end of the elment and what its name is. The characters
between the delimiters form the elementıs contents or data. XML does
not have any predefined tags. We are free to use whatever tags we
wish, as long as the names abide by a few simple restrictions imposed
by the XML recommendation. Elements can have additional information called "attributes" attached
to them. Attributes are used to describe properties of elements.
Here is an example of an element with an attribute: <name emp-num="1">Mary</name> Here is an example of elements that contain other elements nested
within them:
Document Type Definitions
There are an infinite variety of possible kinds of documents, such as
the repair manual for a vehicle, a dictionary, a telephone directory,
an order for equipment, an invoice, and so forth. Each kind of
document can have unique structure and organization that can be used
over and over.
The descriptions of classes of documents are called "Document Type
Definitions" or DTDıs. DTDıs are sets of rules that define the
required and optional elements that can be used in a document, and
what the relationships among the various elements are. A DTD can be
included as part of the content of an XML document, or it can be
separate from it and referred to by the document. Here is an example of a small document that includes a DTD in its prologue.
|
|
|
The DOM
The Document Object Model (DOM) is an application programming
interface (API) for HTML and XML documents. It defines the logical
structure of documents and the way a document is accessed and
manipulated. The DOM is an object model that represents XML
documents in a platform-neutral and application-independent form as
a "tree" of objects of various types. The W3Cıs official recommendation for the DOM and a variety of related
materials may be found at the following URL:
http://www.w3.org/DOM/. In the DOM specification, the term "document" is used in the broad
sense - increasingly, XML is being used as a way of representing many
different kinds of information that may be stored in diverse
systems. Much of this would traditionally be seen as data rather
than as documents. Nevertheless, XML presents this data as
documents, and the DOM may be used to manipulate this data. With the Document Object Model, programmers can build documents,
navigate their structure, and add, modify, or delete elements and
content. Anything found in an HTML or XML document can be accessed,
changed, deleted, or added using the Document Object Model, with a
few exceptions - in particular, the DOM interfaces for the XML
internal and external subsets have not yet been specified. The SAX API The DOM API allows manipulation of an XML document after it has been
completely parsed. Once the entire document has been converted into
its "DOM tree" representation, an application can use it. There is
another API called the "Simple API for XML" (SAX) under development. With the SAX API, the XML parser generates events via a callback
mechanism. As the document is being parsed, events are generated for
the start and end of the document and for the start and end of the
elements contained within it. SAX provides an alternate programming
model for working with XML documents. It is simpler than the DOM
API, but does not include all of the functionality provided it. HTML and XML
HTML is useful for describing the visualization of text documents and
related images on the World-Wide-Web. It has a number of deficiencies:
XML addresses these deficiencies, and others. Since XML is
extensible, one of the natural extensions is to describe HTML in
terms of XML. This is relatively straightforward because both are
subsets of SGML. It is expected that the HTML 4.0 specification will
be superseded by XHTML 1.0 in the near future. Once this occurs,
there will be gradual conversion of many HTML documents to XHTML.
However, HTML in its current form will probably be supported for many
years to come. Summary XML is:
XML is useful for many things but it is not the solution to every
problem. It will make solving certain problems easier than it might
be without it. Data interchange among applications is an area where
XML can be tremendously useful. This so because when XML is used to
encode messages, exchanging data becomes simpler than when messages
are encoded in some binary form. The next several years should be interesting times. regards, |
||
|
|
|||||