0% found this document useful (0 votes)
3 views20 pages

Understanding XML and Its Structure

Uploaded by

aryadacademie
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

Understanding XML and Its Structure

Uploaded by

aryadacademie
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Extensible Markup Language

XML
XML and NoSQL Databases
Types of Databases
• Data is facts and figures
• Database is a related set of data
Kinds of databases
• Unstructured
– Meaning of data interpreted by user
• Semi-Structured
– Structure of data wrapped around data
• Structured
– Fixed structure of data
– Data added to the fixed structure
XML
• XML is a text based markup language that is fast becoming a
standard of data interchange
– An open standard from W3C
– A direct descendant from SGML

Example: Product Inventory Data


<Product>
<Name>Refrigerator</Name>
<Model Number>R3456d2h</Model Number>
<Manufacturer>General Electric</Manufacturer>
<Price>1290.00</Price>
<Quantity>1200</Quantity>
</Product>
Data Interchange
• XMLs key role is data interchange
• Two business partners want to exchange customer
data
– Agree on a set of tags
– Exchange data without having to change internal
databases
• Other business partners can join in the exchange by
using the tagset
– New tags can be added to extend the functionality

Key to successful data interchange is building


consensus and standardizing of tag sets
XML = Universal Data
• TCP/IP  Universal Networking
• HTML  Universal Rendering
• Java  Universal Code
• XML  Universal Data

• Numerous standard bodies are set up for


standardization of tags in different domains
– ebXML
– XBRL
– MML
– CML
HTML vs. XML
• Both are markup languages
– HTML has fixed set of tags
– XML allows user to specify the tags based on requirements
• Usage
– HTML tags specify how to display data
– XML tags specify semantics of the data
• Tag Interpretation
– HTML specifies what each tag and attribute means
– XML tags delimit data & leave interpretation to the parsing application
• Well formedness
– HTML very tolerant of rule violations (nesting, matching tags)
– XML very strictly follows rules of well formedness
Structure of XML
• Prolog
– Instructs the parser as to what it it parsing
– Contains processing instructions for processor
• Body
– Tags - Entities
– Attributes - Properties of Entities
– Comments - Statements for clarification in the document
Example
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>  Prolog
<contact>
<name>
<first name>Sanjay</first name>
<last name>Goel</last name>
</name>
<address>  Body
<street>56 Della Street</street>
<city>Phoenix</city>
<state>AZ</state>
<zip>15784</zip>
</address>
</contact>
Prolog
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
• Contains eclaration that identifies a document as xml
• Version
– Version of XML markup language used in the data
– Not optional
• Encoding
– Identifies the character set used to encode the data
– Default compressed Unicode: UTF-8
• Standalone
– Tells whether or not this document references external
entity
• May contain entity definitions and tag specifications
XML Syntax: Elements &
Attributes
• Uses less-than and greater-than characters (<…>) as
delimiters
• Every opening tag must having an accompanying closing tag
– <First Name>Sanjay</First Name>
– Empty tags do not require an accompanying closing tag.
– Empty tags have a forward slash before the greater-than sign e.g.
<Name/>
• Tags can have attributes which must be enclosed in double
quotes
– <name first=“Sanjay” last=“Goel”)
• Elements should be properly nested
– The nesting can not be interleaved
– Each document must have one single root element
• Elements and attribute names are case sensitive
Tree Structure -
• Elements
XML documents have a tree structure containing multiple levels of
nested tags.
– Root element is a single XML element which encloses all of the other XML
elements and data in the document
– All other elements are children of the root element

<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>


<contact>  Root Element
<name>
<first name>Sanjay</first name>
<last name>Goel</last name>
</name>
<address>
<street>56 Della Street</street>  Child Elements
<city>Phoenix</city>
<state>AZ</state>
<zip>15784</zip>
</address>
</contact>
Attributes
• Attributes are properties associated with an element
• Each attribute is a name value pair
– No element may contain two attributes with same name
– Name and value are strings

Example
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
<contact>
<name first=“Sanjay” last=“Goel”></name>  Attributes
<address>
<street>56 Della Street</street>  Nested Elements
<city>Phoenix</city>
<state>AZ</state>
<zip>15784</zip>
</address>
</contact>
Elements vs. Attributes
• Data should be stored in Elements
• Information about data (meta-data) should be stored
in attributes
 When in doubt use elements
• Rules of thumb
– Elements should have information which some one may
want to read.
– Attributes are appropriate for information about document
that has nothing to do with content of document
e.g. URLs, units, references, ids belong to attributes
– What is your meta-data may be some ones data
Comments
• XML comments begin with “<!--”and end with “-->”
– All data between these delimiters is discarded
– <!-- This is a list of names of people -->
• Comments should not come before XML declaration
• Comments can not be placed inside a tag
• Comments may be used to hide and surround tags
<Name>
<first>Sanjay</first>
<!-- <last>Goel</last> -->  Last tag is ignored
</Name>
• “--” string may not occur inside a comment except as part of
its opening and closing tag
– <!-- the Red door -- that is the second -->  Illegal
XML DTD

• A DTD is a set of rules that allow us to specify our own set


of elements and attributes.

• DTD is grammar to indicate what tags are legal in XML


documents
• XML Document is valid if it has an attached DTD and
document is structured according to rules defined in DTD
DTD Example

Xml Document And


Corresponding DTD
XML Schema

• Serves same purpose as database schema

 Schemas are written in XML

 Set of pre-defined simple types (such as string, integer)

 Allows creation of user-defined complex


types
XML Schema
• RDBMS Schema (s_id integer, s_name string, s_status string)
 XMLSchema

XML Document and Schema


XML Query Languages

Purpose :
Same functionality as database query languages (such as SQL) to
process Web data

Advantages :
• Query selective portions of the document (no need to transport entire
document)
• Smaller data size mean lesser communication cost
XQuery

• XQuery to XML is same as SQL to RDBMS

• Most databases supports XQuery

• XQuery is built on XPath operators


(XPath is a language that defines path expressions to locate document
data)
XPath Example
<Student id=“s1”>
<Name>John</Name>
<Age>22</Age>
<Email>jhn@[Link]</Email>
</Student>

XPath: /Student[Name=“John”]/Email
OUTPUT: <Email> element with value “jhn@[Link]”

You might also like