XML

  Definition:

  • XML stands for Extensible Markup Language. This is self defined language in which data and structure of data are placed within a XML document.
  • An XML document is Unicode text file that contains data with markup that defines the structure of data.
  • An XML is a Meta language in which we write data and data about data.
  • XML document can be interpreted easily, because it contains the information, How to interpret it.
  XML Document
XML document normally consists of prolog and document body.
  1. Prolog: provides necessary information for interpretation of the contents of the document body.  It contains two components of the prolog, in the sequence in which they must appear.
    1. XML Declaration(*required) :It consists of
                                                               i.      version (*required):that applies to the document
                                                             ii.      encoding (*optional): specifies the particular Unicode Character encoding used in document
                                                            iii.      standalone (*optional): yes or no
    1. document type declaration (*optional): specifies the markup declarations for the elements use in the document body.
  1. Document body: contains the data, it comprises one or more elements where each element is defined by a begin tag and end tag. There is always a single root element that contains all the other elements.

   Parts of XML

    Elements and Attribute

Now see the following example.

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
     <book>
          <title>XML </title>
          <author>M. K. Johan</author>
     </book>
</catalog>

Here catalog is root element. Each XML must have root element and anther elements are book, title, author.
Attributes: Attributes associate name-value pairs with elements and are declared on start tags.


<book quantity=”2”>
 


      Rules for declaring element name

Names can be contained letters, numbers, and other characters
Names must not start with a number or underscore
Names must not start with letters xml, XML or Xml
Name must not contains space

Empty tags:

Empty tag does not contain data.

<book></book>

Processing instruction (PI)

Processing instructions are the information for the XML processor. A XML document may contain the processing instructions at the end of prolog and at the end of the document body. Processing instruction has following syntax.
<? target instruction>
for example,
<?xml-stylesheet href="mystyle.css" type="text/css"?>
Every XML document starts with a processing instruction
<?xml version="1.0"?>

    Comments

Comments are delimited by <!-- and -->, for example,

<!-- This is a comment. -->

Comments should not contain the string --. Comments should only be information for human readers. They should never contain hidden commands. Use processing instructions for commands.

    Well formed document

When an XML document is said to be well-formed, it just means that it conforms to the rules for writing XML as defined by the XML specification. Essentially, an XML document is well-formed if its prolog and body are consistent with the rules for creating these. In a well-formed document there must be only one root element, and all elements must be properly nested. I will summarize more specifically what is required to make a document well-formed a little later in this chapter, after you have looked into the rules for writing XML.

  XML Processor (Parser)

An XML processor is a software module that is used by an application to read an XML document and
gain access to the data and its structure. An XML processor also determines whether an XML document is well-formed or not. Processing instructions are passed through to an application without any checking or analysis by the XML processor. The XML specification describes how an XML processor should behave when reading XML documents, including what information should be made available to an application for various types of document content.

     Entity

Entity is the data object that represents some contents in XML document. Entities are virtual storage of contents. An entity is the essential building block of physical structure in XML. the data referred  by entities is physically located somewhere, such as in a file on a disk drive or in a field of a database. Each entity consists of name and value. Here name works like identifier and value is the data for that identifier. A value, which is sometimes called the content of the entity. The value is either the data of the entity itself or it is a pointer to the data. Each entity’s name is mapped to its corresponding value or content.

Entities may be classified based on the data (contents) as:
§         Parsed entities:  In these types of entities the data is converted into textual form from binary.
§         Unparsed entities: In these type of entities the data is represented as it is.( in binary format)

From the reference point of view, entities are further classified into two categories.
§         Internal entity: these types of entities refer contents (data in text form or in binary form) within same document.
§         External entity: These types of entities refer contents (data in text form or in binary form) from extenal document.
Both parsed and unparsed entities may be either internal or external entities.

Entity declaration:
The entity is declared by ENTITY key word as follows

<!ENTITY course “Master of Computer Application”>

Parsed entities used within the document content are called general entities. You reference general entities by the name of the entity beginning with an ampersand (&) and ending with a semicolon (;).

e.g.
 &course;
This is called entity reference. When you use the entity reference in document, the entity reference is replaced by the entity value (content or data). An entity may contain the entity reference also.
e.g.

<!ENTITY qualification “sixth semester, &course;”>



     Document type declaration (DTD)

A Document Type Definition (DTD) defines how valid elements are constructed for a particular type of document. DTD defines the structure of the content of document. DTD defines the elements, attributes, entities, and notations that can be used in document. DTD is used to validate the XML document. It consists of set of rules that are used to check the document against this DTD rules. Parsers use this DTD to ensure that contents in XML document are well formed.

      Declaring DTD

You use a document type declaration (a DOCTYPE declaration) in the prolog of an XML document to
specify the DTD for the document. If we want to add the DTD to catalog XML document then we have to use following syntax,

<!DOCTYPE catalog SYSTEM “http://docs/dtds/CatalogDoc.dtd”>

Here:
§         DOCTYPE is keyword that specifying the declaration of DTD being used for this document.
§         catalog is the name of root element of the XML document for which this DTD will work. The name following the DOCTYPE keyword must always match the root element name in the document.
§         SYSTEM is a keyword that specifying (system ID) that the DTD is internal. we can use external DTD by PUBLIC keyword(public ID).
§         And between “ ” we specifying the location address(URL) of the DTD file.

Defining DTD

Suppose we want to define DTD for the following XML file.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE contact SYSTEM "contact.dtd">
<contact>
     <name>Abc</name>
     <email>Abc@yahoo.com</email>
     <phone>47834358</phone>
     <mobile>434534345</mobile>
</contact>
First we will declare root element that is contact.
            <!ELEMENT contact (name,email,phone+,mobile)>
Here, we are defining contact element. Element contact has nested elements (i.e. name, email, phone and mobile).
Next we define name element like
            <!ELEMENT name (#PCDATA)>
Here #PCDATA indicates that data to be stored name element is parsed data i.e. textual format of data.
Similarly we define all elements and resultant DTD look like
           
<!ELEMENT contact (name,email,phone+,mobile)>
     <!ELEMENT name (#PCDATA)>
     <!ELEMENT email (#PCDATA)>
     <!ELEMENT phone (#PCDATA)>
     <!ELEMENT mobile (#PCDATA)>

We can add above DTD definition within XML file that is called internal DTD as well as in external file that is called external DTD.
Cardinality
Cardinality of element shows the occurrence of element in XML documents. Fore example we have used + symbol with phone element. XML specification defines following cardinalities;
           
|    one occurrence from set of elements like(a|b) i.e. a or b
     +    1 or more
     *    0 or more
     ?    may be omitted or appears only once.

    XML advantages over HTML

Don't make the mistake of thinking that XML is merely “HTML on steroids." Although we approach it from HTML to make it easy to explain, XML does much more than HTML does. XML offers the following advantages:
·         It is an archival representation of data. Because its format is in plain text and carried around with the data, it can never be lost. That contrasts with binary representations of a file which all too easily become outdated. If this was all it did, it would be enough to justify its existence.
·         It provides a way to web-publish files that can be directly processed by computer, rather than merely human-readable text and pictures.
·         It is plain text, so it can be read by people without special tools.
·         It can easily be transformed into HTML, or PDF, or data structures internal to a program, or any other format yet to be dreamed up, so it is "future-proof."
·         It's portable, open, and a standard, which makes it a great fit with Java.

      Differences between SGML and XML

SGML
XML
It is designed to structured contents of electronic document.
It designed to represent only data.
DTD is mandatory.
DTD is optional.
Need to modify to support Internationalization.
By default support Internationalization
It is not supported by all browsers.
It is supported by all most popular browsers.

      Differences between HTML and XML

HTML
XML
Not case sensitive
Case sensitive
No need to balance each tag
Each tag must be balanced
It is used for data presentation
It is used to store data
It has limited tags
It has unlimited tags. Programmer creates own tags.




Example : Simple XML file
contact.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE contact SYSTEM "contact.dtd">
<contact>
            <name>Abc</name>
            <email>Abc@yahoo.com</email>
            <phone>47834358</phone>
            <mobile>434534345</mobile>
</contact>


Example : xml file with internal DTD

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE contact
[
            <!ELEMENT contact (name,email,phone+,mobile)>
            <!ELEMENT name (#PCDATA)>
            <!ELEMENT email (#PCDATA)>
            <!ELEMENT phone (#PCDATA)>
            <!ELEMENT mobile (#PCDATA)>
]>
<contact>
            <name>Abc</name>
            <email>Abc@yahoo.com</email>
            <phone>7847834358</phone>
            <mobile>434534345</mobile>
</contact>


Example: xml file with external DTD
contact.dtd

<!ELEMENT contact (name,email,phone+,mobile)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT mobile (#PCDATA)>

Contacts.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE contact SYSTEM "contact.dtd">
<contact>
            <name>Abc</name>
            <email>Abc@yahoo.com</email>
            <phone>47834358</phone>
            <mobile>434534345</mobile>
</contact>

No comments:

Popular Posts