Extensible Markup Language
XML
XML and NoSQL Databases
Types of Databases
• Data is facts and figures
• Database is a related set of data
Kinds of databases
• Unstructured
– Meaning of data interpreted by user
• Semi-Structured
– Structure of data wrapped around data
• Structured
– Fixed structure of data
– Data added to the fixed structure
XML
• XML is a text based markup language that is fast becoming a
standard of data interchange
– An open standard from W3C
– A direct descendant from SGML
Example: Product Inventory Data
<Product>
<Name>Refrigerator</Name>
<Model Number>R3456d2h</Model Number>
<Manufacturer>General Electric</Manufacturer>
<Price>1290.00</Price>
<Quantity>1200</Quantity>
</Product>
Data Interchange
• XMLs key role is data interchange
• Two business partners want to exchange customer
data
– Agree on a set of tags
– Exchange data without having to change internal
databases
• Other business partners can join in the exchange by
using the tagset
– New tags can be added to extend the functionality
Key to successful data interchange is building
consensus and standardizing of tag sets
XML = Universal Data
• TCP/IP Universal Networking
• HTML Universal Rendering
• Java Universal Code
• XML Universal Data
• Numerous standard bodies are set up for
standardization of tags in different domains
– ebXML
– XBRL
– MML
– CML
HTML vs. XML
• Both are markup languages
– HTML has fixed set of tags
– XML allows user to specify the tags based on requirements
• Usage
– HTML tags specify how to display data
– XML tags specify semantics of the data
• Tag Interpretation
– HTML specifies what each tag and attribute means
– XML tags delimit data & leave interpretation to the parsing application
• Well formedness
– HTML very tolerant of rule violations (nesting, matching tags)
– XML very strictly follows rules of well formedness
Structure of XML
• Prolog
– Instructs the parser as to what it it parsing
– Contains processing instructions for processor
• Body
– Tags - Entities
– Attributes - Properties of Entities
– Comments - Statements for clarification in the document
Example
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?> Prolog
<contact>
<name>
<first name>Sanjay</first name>
<last name>Goel</last name>
</name>
<address> Body
<street>56 Della Street</street>
<city>Phoenix</city>
<state>AZ</state>
<zip>15784</zip>
</address>
</contact>
Prolog
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
• Contains eclaration that identifies a document as xml
• Version
– Version of XML markup language used in the data
– Not optional
• Encoding
– Identifies the character set used to encode the data
– Default compressed Unicode: UTF-8
• Standalone
– Tells whether or not this document references external
entity
• May contain entity definitions and tag specifications
XML Syntax: Elements &
Attributes
• Uses less-than and greater-than characters (<…>) as
delimiters
• Every opening tag must having an accompanying closing tag
– <First Name>Sanjay</First Name>
– Empty tags do not require an accompanying closing tag.
– Empty tags have a forward slash before the greater-than sign e.g.
<Name/>
• Tags can have attributes which must be enclosed in double
quotes
– <name first=“Sanjay” last=“Goel”)
• Elements should be properly nested
– The nesting can not be interleaved
– Each document must have one single root element
• Elements and attribute names are case sensitive
Tree Structure -
• Elements
XML documents have a tree structure containing multiple levels of
nested tags.
– Root element is a single XML element which encloses all of the other XML
elements and data in the document
– All other elements are children of the root element
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
<contact> Root Element
<name>
<first name>Sanjay</first name>
<last name>Goel</last name>
</name>
<address>
<street>56 Della Street</street> Child Elements
<city>Phoenix</city>
<state>AZ</state>
<zip>15784</zip>
</address>
</contact>
Attributes
• Attributes are properties associated with an element
• Each attribute is a name value pair
– No element may contain two attributes with same name
– Name and value are strings
Example
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
<contact>
<name first=“Sanjay” last=“Goel”></name> Attributes
<address>
<street>56 Della Street</street> Nested Elements
<city>Phoenix</city>
<state>AZ</state>
<zip>15784</zip>
</address>
</contact>
Elements vs. Attributes
• Data should be stored in Elements
• Information about data (meta-data) should be stored
in attributes
When in doubt use elements
• Rules of thumb
– Elements should have information which some one may
want to read.
– Attributes are appropriate for information about document
that has nothing to do with content of document
e.g. URLs, units, references, ids belong to attributes
– What is your meta-data may be some ones data
Comments
• XML comments begin with “<!--”and end with “-->”
– All data between these delimiters is discarded
– <!-- This is a list of names of people -->
• Comments should not come before XML declaration
• Comments can not be placed inside a tag
• Comments may be used to hide and surround tags
<Name>
<first>Sanjay</first>
<!-- <last>Goel</last> --> Last tag is ignored
</Name>
• “--” string may not occur inside a comment except as part of
its opening and closing tag
– <!-- the Red door -- that is the second --> Illegal
XML DTD
• A DTD is a set of rules that allow us to specify our own set
of elements and attributes.
• DTD is grammar to indicate what tags are legal in XML
documents
• XML Document is valid if it has an attached DTD and
document is structured according to rules defined in DTD
DTD Example
Xml Document And
Corresponding DTD
XML Schema
• Serves same purpose as database schema
Schemas are written in XML
Set of pre-defined simple types (such as string, integer)
Allows creation of user-defined complex
types
XML Schema
• RDBMS Schema (s_id integer, s_name string, s_status string)
XMLSchema
XML Document and Schema
XML Query Languages
Purpose :
Same functionality as database query languages (such as SQL) to
process Web data
Advantages :
• Query selective portions of the document (no need to transport entire
document)
• Smaller data size mean lesser communication cost
XQuery
• XQuery to XML is same as SQL to RDBMS
• Most databases supports XQuery
• XQuery is built on XPath operators
(XPath is a language that defines path expressions to locate document
data)
XPath Example
<Student id=“s1”>
<Name>John</Name>
<Age>22</Age>
<Email>jhn@[Link]</Email>
</Student>
XPath: /Student[Name=“John”]/Email
OUTPUT: <Email> element with value “jhn@[Link]”