Skip to main content



  • XML stands for Extensible Markup Language. This is self defined language in which data and structure of data are placed within a XML document.
  • An XML document is Unicode text file that contains data with markup that defines the structure of data.
  • An XML is a Meta language in which we write data and data about data.
  • XML document can be interpreted easily, because it contains the information, How to interpret it.
  XML Document
XML document normally consists of prolog and document body.
  1. Prolog: provides necessary information for interpretation of the contents of the document body.  It contains two components of the prolog, in the sequence in which they must appear.
    1. XML Declaration(*required) :It consists of
                                                               i.      version (*required):that applies to the document
                                                             ii.      encoding (*optional): specifies the particular Unicode Character encoding used in document
                                                            iii.      standalone (*optional): yes or no
    1. document type declaration (*optional): specifies the markup declarations for the elements use in the document body.
  1. Document body: contains the data, it comprises one or more elements where each element is defined by a begin tag and end tag. There is always a single root element that contains all the other elements.

   Parts of XML

    Elements and Attribute

Now see the following example.

<?xml version="1.0" encoding="UTF-8"?>
          <title>XML </title>
          <author>M. K. Johan</author>

Here catalog is root element. Each XML must have root element and anther elements are book, title, author.
Attributes: Attributes associate name-value pairs with elements and are declared on start tags.

<book quantity=”2”>

      Rules for declaring element name

Names can be contained letters, numbers, and other characters
Names must not start with a number or underscore
Names must not start with letters xml, XML or Xml
Name must not contains space

Empty tags:

Empty tag does not contain data.


Processing instruction (PI)

Processing instructions are the information for the XML processor. A XML document may contain the processing instructions at the end of prolog and at the end of the document body. Processing instruction has following syntax.
<? target instruction>
for example,
<?xml-stylesheet href="mystyle.css" type="text/css"?>
Every XML document starts with a processing instruction
<?xml version="1.0"?>


Comments are delimited by <!-- and -->, for example,

<!-- This is a comment. -->

Comments should not contain the string --. Comments should only be information for human readers. They should never contain hidden commands. Use processing instructions for commands.

    Well formed document

When an XML document is said to be well-formed, it just means that it conforms to the rules for writing XML as defined by the XML specification. Essentially, an XML document is well-formed if its prolog and body are consistent with the rules for creating these. In a well-formed document there must be only one root element, and all elements must be properly nested. I will summarize more specifically what is required to make a document well-formed a little later in this chapter, after you have looked into the rules for writing XML.

  XML Processor (Parser)

An XML processor is a software module that is used by an application to read an XML document and
gain access to the data and its structure. An XML processor also determines whether an XML document is well-formed or not. Processing instructions are passed through to an application without any checking or analysis by the XML processor. The XML specification describes how an XML processor should behave when reading XML documents, including what information should be made available to an application for various types of document content.


Entity is the data object that represents some contents in XML document. Entities are virtual storage of contents. An entity is the essential building block of physical structure in XML. the data referred  by entities is physically located somewhere, such as in a file on a disk drive or in a field of a database. Each entity consists of name and value. Here name works like identifier and value is the data for that identifier. A value, which is sometimes called the content of the entity. The value is either the data of the entity itself or it is a pointer to the data. Each entity’s name is mapped to its corresponding value or content.

Entities may be classified based on the data (contents) as:
§         Parsed entities:  In these types of entities the data is converted into textual form from binary.
§         Unparsed entities: In these type of entities the data is represented as it is.( in binary format)

From the reference point of view, entities are further classified into two categories.
§         Internal entity: these types of entities refer contents (data in text form or in binary form) within same document.
§         External entity: These types of entities refer contents (data in text form or in binary form) from extenal document.
Both parsed and unparsed entities may be either internal or external entities.

Entity declaration:
The entity is declared by ENTITY key word as follows

<!ENTITY course “Master of Computer Application”>

Parsed entities used within the document content are called general entities. You reference general entities by the name of the entity beginning with an ampersand (&) and ending with a semicolon (;).

This is called entity reference. When you use the entity reference in document, the entity reference is replaced by the entity value (content or data). An entity may contain the entity reference also.

<!ENTITY qualification “sixth semester, &course;”>

     Document type declaration (DTD)

A Document Type Definition (DTD) defines how valid elements are constructed for a particular type of document. DTD defines the structure of the content of document. DTD defines the elements, attributes, entities, and notations that can be used in document. DTD is used to validate the XML document. It consists of set of rules that are used to check the document against this DTD rules. Parsers use this DTD to ensure that contents in XML document are well formed.

      Declaring DTD

You use a document type declaration (a DOCTYPE declaration) in the prolog of an XML document to
specify the DTD for the document. If we want to add the DTD to catalog XML document then we have to use following syntax,

<!DOCTYPE catalog SYSTEM “http://docs/dtds/CatalogDoc.dtd”>

§         DOCTYPE is keyword that specifying the declaration of DTD being used for this document.
§         catalog is the name of root element of the XML document for which this DTD will work. The name following the DOCTYPE keyword must always match the root element name in the document.
§         SYSTEM is a keyword that specifying (system ID) that the DTD is internal. we can use external DTD by PUBLIC keyword(public ID).
§         And between “ ” we specifying the location address(URL) of the DTD file.

Defining DTD

Suppose we want to define DTD for the following XML file.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE contact SYSTEM "contact.dtd">
First we will declare root element that is contact.
            <!ELEMENT contact (name,email,phone+,mobile)>
Here, we are defining contact element. Element contact has nested elements (i.e. name, email, phone and mobile).
Next we define name element like
            <!ELEMENT name (#PCDATA)>
Here #PCDATA indicates that data to be stored name element is parsed data i.e. textual format of data.
Similarly we define all elements and resultant DTD look like
<!ELEMENT contact (name,email,phone+,mobile)>
     <!ELEMENT name (#PCDATA)>
     <!ELEMENT email (#PCDATA)>
     <!ELEMENT phone (#PCDATA)>
     <!ELEMENT mobile (#PCDATA)>

We can add above DTD definition within XML file that is called internal DTD as well as in external file that is called external DTD.
Cardinality of element shows the occurrence of element in XML documents. Fore example we have used + symbol with phone element. XML specification defines following cardinalities;
|    one occurrence from set of elements like(a|b) i.e. a or b
     +    1 or more
     *    0 or more
     ?    may be omitted or appears only once.

    XML advantages over HTML

Don't make the mistake of thinking that XML is merely “HTML on steroids." Although we approach it from HTML to make it easy to explain, XML does much more than HTML does. XML offers the following advantages:
·         It is an archival representation of data. Because its format is in plain text and carried around with the data, it can never be lost. That contrasts with binary representations of a file which all too easily become outdated. If this was all it did, it would be enough to justify its existence.
·         It provides a way to web-publish files that can be directly processed by computer, rather than merely human-readable text and pictures.
·         It is plain text, so it can be read by people without special tools.
·         It can easily be transformed into HTML, or PDF, or data structures internal to a program, or any other format yet to be dreamed up, so it is "future-proof."
·         It's portable, open, and a standard, which makes it a great fit with Java.

      Differences between SGML and XML

It is designed to structured contents of electronic document.
It designed to represent only data.
DTD is mandatory.
DTD is optional.
Need to modify to support Internationalization.
By default support Internationalization
It is not supported by all browsers.
It is supported by all most popular browsers.

      Differences between HTML and XML

Not case sensitive
Case sensitive
No need to balance each tag
Each tag must be balanced
It is used for data presentation
It is used to store data
It has limited tags
It has unlimited tags. Programmer creates own tags.

Example : Simple XML file

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE contact SYSTEM "contact.dtd">

Example : xml file with internal DTD

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE contact
            <!ELEMENT contact (name,email,phone+,mobile)>
            <!ELEMENT name (#PCDATA)>
            <!ELEMENT email (#PCDATA)>
            <!ELEMENT phone (#PCDATA)>
            <!ELEMENT mobile (#PCDATA)>

Example: xml file with external DTD

<!ELEMENT contact (name,email,phone+,mobile)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT mobile (#PCDATA)>


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE contact SYSTEM "contact.dtd">


Popular posts from this blog

Using HyperSQL (HSQLDB)

HSQLDB is a portable RDBMS implemented in pure java. It can be embedded with your application as well as can be used separately. It is very a small database that supports almost all features of the standard database system. It comes with small jar file that can be found in lib folder. The HSQLDB jar package is located in the /lib directory of the ZIP package and contains several components and programs. Core components of jar file are : HyperSQL RDBMS Engine (HSQLDB), HyperSQL JDBC Driver, Database Manager, and Sql Tool. Installing and Using Download: download latest release of HyperSQL database from website and extract it. You will see following contents. Here "bin" directory contains some batch files those can be used to run a swing based GUI tool. You can use runManagerSwing.bat to connect to database, but database must be on before running it. Directory lib contains File hsqldb.jar . It is the database to be used by you. Running database First

How to handle values from dynamically generated elements in web page using struts2

Some time you will see the form containing the button " Add More " . This facility is provided for the user to get the values for unknown number of repeating for some information. for example when you are asking to get the projects details from user, you need to put the option to add the more project for the user since you don't known how many projects user have. In the HTML form, you repeat the particular section to get the multiple values for those elements. In Html page , you can put the option to add new row of elements or text fields by writing the java script or using JQuery API. Now, the question is that how to capture the values of dynamically generated text fields on the server. Using the servlet programming you can get the values by using getParameters() method that resultants the array of the parameter having the same name. But this limit you to naming the text fields in the HTML form. To ally this approach, you have to take the same name for t

In Process Mode of HSQLDB in web application.

If you want to use the database into your web application, you can use the HSQLDB in In_Process mode. In this mode, you can embed the HSQLDB into your web application and it runs as a part of your web application programm in the same JVM. In this mode, the database does not open any port to connect to the application on the hosing machine and you don't need to configure anything to access it. Database is not expposed to other application and can not be accessed from any dabase tools like dbVisualizer etc. In this mode ,database will be unknown from any other person except you. But in the 1.8.0 version, you can use Server intance for external as well as in process access.  To close the databse, you can issue SHUTDOWN command as an SQL query.   In the in-process mode, database starts from JDBC with the associated databse file provided through  connection URL. for example   DriverManager.getConnection("jdbc:hsqldb:mydatabase","SA","");   Here myd