09-02-2013, 04:34 PM
Xml(Extendible Markup Language)
1Xml.doc (Size: 218.5 KB / Downloads: 21)
XML Definition
XML stands for extensible markup language. XML was developed around 1996 and is a subset of SGML. It's documents conform to SGML. XML was made less complicated than SGML to enable its use on the web. XML uses the ISO 10646 (Unicode) standard for encoding characters. It is Meta-Markup language.
HTML
• The most popular markup language
• Defines a set of tags
• Designed for presentation for data
• HTML documents are processed by
• HTML processing application (Browser)
Strengths of HTML:
Easy to implement and author
• Small number of tags
• Simple relationship between tags
• Syntax-checking is very forgiving
• Limited number of formats possible
• Viewers can be small and simple
Relationship between HTML and XML
Actually XML was not such a new invention as it may appear. In fact, before HTML there already existed a language called SGML, which was mostly used by the publishing industry to markup documents in their production process. More accidentally than planned, SGML served as the language for defining the HTML standard (think of SGML being a data model and HTML a schema). Only after HTML was so successful people remembered its roots in SGML, when the need for a data model allowing to express application-specific markup/schemas was recognized. Since SGML was a somewhat complex language, it was simplified in many respects, and the result of that was XML.
Character set:
XML documents may contain the following characters: carriage returns, line feeds & Unicode. Unicode is a standard of the Unicode consortium. Its goal is to enable computers to process the characters for most of the world’s major languages.(www.uni-code.org).
Characters vs. markup:
Once a parser determines that all characters in a document are legal, it must differentiate between markup text & character data. Markup text is enclosed in angle brackets (< & >). Character data is text between a start tag & end tag.
White space, entity references & built-in entities:
Spaces, tabs, line feeds & carriage returns are characters commonly called whitespace characters. An XML parser is required to pass all characters in a document, including white space characters, to the application using the XML document.
An application may consider white space characters either significant or insignificant white space characters may be collapsed into a single whitespace characters may be collapsed into a single whitespace character or even removed entity. This process is called normalization.