07-02-2013, 11:09 AM
Extensible Markup Language Parsing Techniques
Extensible Markup.doc (Size: 59.5 KB / Downloads: 19)
Abstract
XML is the language used to develop web applications. XML is a set of rules for designing structured data in a text format as opposed to binary format, which is useful for man, and machine both. A parser is used for syntactical and lexical analysis. XML parser extract the information from the XML document which is very much needed in all Web applications Simple object access protocol is a protocol that lets the program to send XML over HTTP to invoke methods on remote objects. An XML parser can serve as an engine for implementing this or a comparable protocol. XML parser can also be used to send data messages formatted as XML over HTTP. By adding XML and HTTP capabilities to application, software developers can begin to offer alternatives to traditional browsers that have significant value to their customers. This paper presents an XML parser that implements a subset of the XML specification. This is useful to all developers and users for checking the welformedness and validation of an XML documents.
Introduction
The World Wide Web Consortium has created an SGML working group to build a set of specifications that are easy and straightforward to use. The subset called XML has the advantages of SGML that is extensibility, structure, and validation in a language and is very easy to learn, use and implement than full SGML. XML is fully internationalized for both European and Asian languages, with all conforming processors required to support the Unicode character set in both its UTF-8 and UTF-16 encoding. The language is designed for the quickest possible client-side processing consistent with its primary purpose as an electronic publishing and data interchange format.
Most application needs to save some configuration data, and often need to transmit or receive data to or from other applications. This is especially true for software that interacts with the Internet. If you need a format for interchanging such data, one solution is to design your own binary format. Besides having some advantages of storing complex structures, list, arrays etc., it has got some drawbacks, such as binary format will not be easy to understand and modification will have compatibility problems. As an alternative we can use a text-based format, which is easy to use, but not powerful. XML provides a more general solution. It is text-based, hierarchical format that has an advantage of both binary and text based worlds. It is easy to use but is also powerful. Even it was primarily designed for the Web, it can be used for any application that needs to store data or communicate with other applications.
Extensible Markup Language
The eXtensible Markup Language came out of the world of the Standard Generalized Markup Language (SGML). Initially XML was developed to overcome the shortcomings of HTML, a markup language containing stylistic information. The aim of the XML’s developers was to create a language that was easy
to use over Internet, supported by a wide variety of applications, compatible with SGML and legible to humans. XML separates content from style as its ancestor, SGML.
A typical XML is hierarchical. It is made up of elements defined by tags. A document type definition (DTD), or XML schema, is used to define the structure of a document. An XML document is referred to as well formed if it conforms to the XML Standards, and correct (or valid) if it complies with a DTD or Schema. At the core of an XML application is an XML Parser. All XML parsers will check that the documents they receive are well formed, and most also check to see if these documents are valid.
Valid
XML documents have following validating criteria’s
(i) Meets validity constraints
(ii) Validity constraints referred to as VC or VCs
(iii) Parser checks and determines if validity constraints are fulfilled
(iv) No parser errors
(v) Has to contain a DTD or reference one
(vi) Xml documents without a DTD must always be well formed
Conclusion
With the development of Web technology, the developments of markup languages are also having a fast pace to meet the specific requirements of the individual products. This creates a lot of problems for maintaining the common standard among the developed markup languages. Fortunately extensible markup language fulfills these requirements. Giving the facilities to develop individual markup languages keeping the required standard same. This paper ahs presented techniques for the development of XML parser which is very much needed for checking the well formedness and validity of an XML document. These techniques are highly useful for all who wants to develop their own Web documents for Internet applications.