0% found this document useful (0 votes)

29 views29 pages

Understanding XML Structure and Schema

The document provides an overview of XML including: - XML was created to facilitate data exchange and is used to represent nested, hierarchical data structures. - XML documents are defined with tags that provide meaning and context to the data. - Schemas like DTDs constrain the structure and elements of XML documents but not the data types. DTDs specify allowed elements, attributes, and nesting for XML tags.

Uploaded by

Veenavijayanath Veena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views29 pages

Understanding XML Structure and Schema

Uploaded by

Veenavijayanath Veena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

XMLI

Structure of XML Data XML Document Schema XPATH

Introduction

XML: Extensible Markup Language Defined by the WWW Consortium (W3C) Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML Documents have tags giving extra information about sections of the document E.g. <title> XML </title> <slide> Introduction </slide> Extensible, unlike HTML Users can add new tags, and separately specify how the tag should be handled for display

XML Introduction (Cont.)

The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data, not just documents.

Much of the use of XML has been in data exchange applications, not as a replacement for HTML E.g. <bank> <account> <account_number> A-101 </account_number> <branch_name> Downtown </branch_name> <balance> 500 </balance> </account> <depositor> <account_number> A-101 </account_number> <customer_name> Johnson </customer_name> </depositor> </bank>

Tags make data (relatively) self-documenting

XML: Motivation

Data interchange is critical in todays networked world Examples: Banking: funds transfer Order processing (especially inter-company orders) Scientific data

Chemistry: ChemML, Genetics: BSML (Bio-Sequence Markup Language),

Paper flow of information between organizations is being replaced by electronic flow of information Each application area has its own set of standards for representing information XML has become the basis for all new generation data interchange formats

XML Motivation (Cont.)

Earlier generation formats were based on plain text with line headers indicating the meaning of fields Similar in concept to email headers Does not allow for nested structures, no standard type language Tied too closely to low level document structure (lines, spaces, etc) Each XML based standard defines what are valid elements, using XML type specification languages to specify the syntax DTD (Document Type Definition) XML Schema Plus textual descriptions of the semantics XML allows new tags to be defined as required However, this may be constrained by DTDs A wide variety of tools is available for parsing, browsing and querying XML documents/data

Comparison with Relational Data

Inefficient: tags, which in effect represent schema information, are repeated Better than relational tuples as a data-exchange format Unlike relational tuples, XML data is self-documenting due to presence of tags Non-rigid format: tags can be added Allows nested structures Wide acceptance, not only in database systems, but also in browsers, tools, and applications

Structure of XML Data

Tag: label for a section of data Element: section of data beginning with <tagname> and ending with matching </tagname> Elements must be properly nested Proper nesting <account> <balance> . </balance> </account> Improper nesting <account> <balance> . </account> </balance> Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element. Every document must have a single top-level element

Example of Nested Elements

<bank-1> <customer> <customer_name> Hayes </customer_name> <customer_street> Main </customer_street> <customer_city> Harrison </customer_city> <account> <account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance> </account> <account> </account> </customer> . . </bank-1>

Motivation for Nesting

Nesting of data is useful in data transfer Example: elements representing customer_id, customer_name, and address nested within an order element Nesting is not supported, or discouraged, in relational databases With multiple orders, customer name and address are stored redundantly normalization replaces nested structures in each order by foreign key into table storing customer name and address information Nesting is supported in object-relational databases But nesting is appropriate when transferring data External application does not have direct access to data referenced by a foreign key

Structure of XML Data (Cont.)

Mixture of text with sub-elements is legal in XML. Example: <account> This account is seldom used any more. <account_number> A-102</account_number> <branch_name> Perryridge</branch_name> <balance>400 </balance> </account> Useful for document markup, but discouraged for data representation

Attributes

Elements can have attributes <account acct-type = checking > <account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance> </account> Attributes are specified by name=value pairs inside the starting tag of an element An element may have several attributes, but each attribute name can only occur once <account acct-type = checking monthly-fee=5>

Attributes vs. Subelements

Distinction between subelement and attribute In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents In the context of data representation, the difference is unclear and may be confusing Same information can be represented in two ways

<account account_number = A-101> . </account> <account> <account_number>A-101</account_number> </account>

Suggestion: use attributes for identifiers of elements, and use subelements for contents

Namespaces

XML data has to be exchanged between organizations Same tag name may have different meaning in different organizations, causing confusion on exchanged documents Specifying a unique string as an element name avoids confusion Better solution: use unique-name:element-name Avoid using long unique names all over document by using XML Namespaces <bank Xmlns:FB=[Link] <FB:branch> <FB:branchname>Downtown</FB:branchname> <FB:branchcity> Brooklyn </FB:branchcity> </FB:branch> </bank>

More on XML Syntax

Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag <account number=A-101 branch=Perryridge balance=200 /> To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below <![CDATA[<account> </account>]]> Here, <account> and </account> are treated as just strings CDATA stands for character data

XML Document Schema

Database schemas constrain what information can be stored, and the data types of stored values XML documents are not required to have an associated schema However, schemas are very important for XML data exchange Otherwise, a site cannot automatically interpret data received from another site Two mechanisms for specifying XML schema Document Type Definition (DTD) Widely used XML Schema Newer, increasing use

Document Type Definition (DTD)

The type of an XML document can be specified using a DTD DTD constraints structure of XML data What elements can occur What attributes can/must an element have What subelements can/must occur inside each element, and how many times. DTD does not constrain data types All values represented as strings in XML DTD syntax <!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) >

Element Specification in DTD

Subelements can be specified as names of elements, or #PCDATA (parsed character data), i.e., character strings EMPTY (no subelements) or ANY (anything can be a subelement) Example <! ELEMENT depositor (customer_name account_number)> <! ELEMENT customer_name (#PCDATA)> <! ELEMENT account_number (#PCDATA)> Subelement specification may have regular expressions <!ELEMENT bank ( ( account | customer | depositor)+)> Notation:

| - alternatives + - 1 or more occurrences * - 0 or more occurrences

Bank DTD
<!DOCTYPE bank [ <!ELEMENT bank ( ( account | customer | depositor)+)> <!ELEMENT account (account_number branch_name balance)> <! ELEMENT customer(customer_name customer_street customer_city)> <! ELEMENT depositor (customer_name account_number)> <! ELEMENT account_number (#PCDATA)> <! ELEMENT branch_name (#PCDATA)> <! ELEMENT balance(#PCDATA)> <! ELEMENT customer_name(#PCDATA)> <! ELEMENT customer_street(#PCDATA)> <! ELEMENT customer_city(#PCDATA)>

Attribute Specification in DTD

Attribute specification : for each attribute Name Type of attribute CDATA ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs)

more on this later

Whether mandatory (#REQUIRED) has a default value (value), or neither (#IMPLIED) Examples <!ATTLIST account acct-type CDATA checking> <!ATTLIST customer customer_id ID # REQUIRED accounts IDREFS # REQUIRED >

IDs and IDREFs

An element can have at most one attribute of type ID The ID attribute value of each element in an XML document must be distinct Thus the ID attribute value is an object identifier An attribute of type IDREF must contain the ID value of an element in the same document An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document

Bank DTD with Attributes

Bank DTD with ID and IDREF attribute types. <!DOCTYPE bank-2[ <!ELEMENT account (branch, balance)> <!ATTLIST account account_number ID # REQUIRED owners IDREFS # REQUIRED> <!ELEMENT customer(customer_name, customer_street, customer_city)> <!ATTLIST customer customer_id ID # REQUIRED accounts IDREFS # REQUIRED> declarations for branch, balance, customer_name, customer_street and customer_city ]>

XML data with ID and IDREF attributes

<bank-2> <account account_number=A-401 owners=C100 C102> <branch_name> Downtown </branch_name> <balance> 500 </balance> </account> <customer customer_id=C100 accounts=A-401> <customer_name>Joe </customer_name> <customer_street> Monroe </customer_street> <customer_city> Madison</customer_city> </customer> <customer customer_id=C102 accounts=A-401 A-402> <customer_name> Mary </customer_name> <customer_street> Erin </customer_street> <customer_city> Newark </customer_city> </customer> </bank-2>

Limitations of DTDs

No typing of text elements and attributes All values are strings, no integers, reals, etc. Difficult to specify unordered sets of subelements Order is usually irrelevant in databases (unlike in the documentlayout environment from which XML evolved) (A | B)* allows specification of an unordered set, but Cannot ensure that each of A and B occurs only once IDs and IDREFs are untyped The owners attribute of an account may contain a reference to another account, which is meaningless owners attribute should ideally be constrained to refer to customer elements

Tree Model of XML Data

Query and transformation languages are based on a tree model of XML data An XML document is modeled as a tree, with nodes corresponding to elements and attributes Element nodes have child nodes, which can be attributes or subelements Text in an element is modeled as a text node child of the element Children of a node are ordered according to their order in the XML document Element and attribute nodes (except for the root node) have a single parent, which is an element node The root node has a single child, which is the root element of the document Example

XPath

XPath is used to address (select) parts of documents using path expressions A path expression is a sequence of steps separated by / Think of file names in a directory hierarchy Result of path expression: set of values that along with their containing elements/attributes match the specified path E.g. /bank/customer/customer_name evaluated on the bank data we saw earlier returns <customer_name>Hayes</customer_name> <customer_name>Johnson</customer_name> E.g. /bank/customer/customer_name/text( ) returns the same names, but without the enclosing tags

XPath (Cont.)

The initial / denotes root of the document (above the top-level tag) Path expressions are evaluated left to right Each step operates on the set of instances produced by the previous step Selection predicates may follow any step in a path, in [ ] E.g. /bank/customer/account[balance > 400] returns account elements with a balance value greater than 400 /bank/customer/account[balance] returns account elements containing a balance subelement Attributes are accessed using @ E.g. /bank/customer/account[balance > 400]/@account_number returns the account numbers of accounts with balance > 400 Here we assume account_number is an attribute Otherwise /bank/customer/account[balance > 400]/account_number IDREF attributes are not dereferenced automatically (more on this later)

Functions in XPath

XPath provides several functions The function count() at the end of a path counts the number of elements in the set generated by the path E.g. /bank/customer/[count(./account) > 1]

Returns customer with > 1 accounts

Also function for testing position (1, 2, ..) of node w.r.t. siblings Boolean connectives and and or and function not() can be used in predicates IDREFs can be referenced using function id() id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks E.g. /bank/customer/account/id(@owner) returns all customers referred to from the owners attribute of account elements.

More XPath Example

Element AA with two ancestors /*/*/AA First BB element of AA element /AA/BB[1] All the CC elements of the BB elements which has an sub-element A with value 3 /BB[A=3]/CC Any elements AA or elements CC of elements BB //AA | /BB/CC

Even More XPath Example

Select all sub-elements of elements BB of elements AA /BB/AA/* When you do not know the sub-elements Different from /BB/AA Select all attributes named aa //@aa Select all CITIES elements with an attribute named aa //CITIES[@aa] Select all CITIES elements with an attribute named aa with value 123 //CITIES[@aa = 123]

Understanding XML Structure and Syntax
No ratings yet
Understanding XML Structure and Syntax
51 pages
Understanding XML for Web Databases
No ratings yet
Understanding XML for Web Databases
58 pages
Understanding XML and JSON Data Formats
No ratings yet
Understanding XML and JSON Data Formats
26 pages
XML Data Structure and Schema Overview
No ratings yet
XML Data Structure and Schema Overview
57 pages
Understanding XML for Data Management
No ratings yet
Understanding XML for Data Management
9 pages
Understanding XML and Data Structures
No ratings yet
Understanding XML and Data Structures
41 pages
Understanding XML and Its Database Integration
No ratings yet
Understanding XML and Its Database Integration
47 pages
XML Data Management Fundamentals
No ratings yet
XML Data Management Fundamentals
37 pages
Understanding XML and JSON Structures
No ratings yet
Understanding XML and JSON Structures
58 pages
Understanding XML Data Models
No ratings yet
Understanding XML Data Models
19 pages
Chapter 10: XML: Database System Concepts
No ratings yet
Chapter 10: XML: Database System Concepts
59 pages
XML 2
No ratings yet
XML 2
59 pages
XML Basics for Customer Interface Design
No ratings yet
XML Basics for Customer Interface Design
103 pages
Web Databases and XML Essentials
No ratings yet
Web Databases and XML Essentials
13 pages
XML vs HTML: Key Differences Explained
No ratings yet
XML vs HTML: Key Differences Explained
104 pages
XML: Features, Applications, and Schema
No ratings yet
XML: Features, Applications, and Schema
23 pages
Understanding XML: Structure and Benefits
No ratings yet
Understanding XML: Structure and Benefits
37 pages
XML Data Integration Principles Explained
No ratings yet
XML Data Integration Principles Explained
73 pages
XML Basics for Java Developers
No ratings yet
XML Basics for Java Developers
29 pages
Introduction to XML Basics
No ratings yet
Introduction to XML Basics
25 pages
Understanding XML and Its Schema
No ratings yet
Understanding XML and Its Schema
73 pages
Understanding XML: Structure & Uses
No ratings yet
Understanding XML: Structure & Uses
10 pages
XML Basics: Structure and Syntax Guide
No ratings yet
XML Basics: Structure and Syntax Guide
77 pages
XML Basics for Data Exchange
No ratings yet
XML Basics for Data Exchange
9 pages
Introduction to XML Basics
No ratings yet
Introduction to XML Basics
81 pages
Understanding XML Structure and DTD
No ratings yet
Understanding XML Structure and DTD
66 pages
BSC (CS) - Unit - 4 - Web Technologies
No ratings yet
BSC (CS) - Unit - 4 - Web Technologies
32 pages
XML Basics for Internet Databases
No ratings yet
XML Basics for Internet Databases
71 pages
Understanding XML Databases and Usage
No ratings yet
Understanding XML Databases and Usage
10 pages
XML WT
No ratings yet
XML WT
8 pages
Overview of XML Basics and Usage
No ratings yet
Overview of XML Basics and Usage
45 pages
XML Basics for Database Applications
No ratings yet
XML Basics for Database Applications
25 pages
Understanding XML and Its Benefits
No ratings yet
Understanding XML and Its Benefits
33 pages
Comprehensive XML Notes
No ratings yet
Comprehensive XML Notes
4 pages
XML-Based Servers - Communicating Meaningful Information Over The Web Using XML
No ratings yet
XML-Based Servers - Communicating Meaningful Information Over The Web Using XML
42 pages
Understanding XML Structure and Usage
No ratings yet
Understanding XML Structure and Usage
29 pages
Understanding XML: Features and Uses
No ratings yet
Understanding XML: Features and Uses
74 pages
XML Basics and DTD Overview
No ratings yet
XML Basics and DTD Overview
8 pages
XML Basics: DTD vs Schema Overview
No ratings yet
XML Basics: DTD vs Schema Overview
39 pages
XML Basics and Interview Insights
No ratings yet
XML Basics and Interview Insights
28 pages
XML Database and Data Warehouse Overview
No ratings yet
XML Database and Data Warehouse Overview
59 pages
Understanding XML for Data Integration
No ratings yet
Understanding XML for Data Integration
73 pages
Understanding XML: Structure and Uses
No ratings yet
Understanding XML: Structure and Uses
45 pages
Understanding XML Data Structures
No ratings yet
Understanding XML Data Structures
31 pages
Understanding XML and Its Advantages
No ratings yet
Understanding XML and Its Advantages
17 pages
Understanding XML Syntax and Structure
No ratings yet
Understanding XML Syntax and Structure
34 pages
Understanding XML: Structure & Syntax
No ratings yet
Understanding XML: Structure & Syntax
39 pages
Understanding XML Basics and Syntax
No ratings yet
Understanding XML Basics and Syntax
22 pages
Understanding eXtensible Markup Language
No ratings yet
Understanding eXtensible Markup Language
30 pages
Understanding XML and Its Structure
No ratings yet
Understanding XML and Its Structure
20 pages
Understanding MSXML2 DOMDocument60 Methods
100% (1)
Understanding MSXML2 DOMDocument60 Methods
57 pages
Chapter 10: XML
No ratings yet
Chapter 10: XML
28 pages
Understanding XML Databases and Data Types
No ratings yet
Understanding XML Databases and Data Types
36 pages
Understanding XML: Basics and Syntax
No ratings yet
Understanding XML: Basics and Syntax
29 pages
Understanding XML Structure and Syntax
No ratings yet
Understanding XML Structure and Syntax
13 pages
What Is XML: Self-Describing Data Is The Data That Describes Both Its Content and Structure. Why XML
No ratings yet
What Is XML: Self-Describing Data Is The Data That Describes Both Its Content and Structure. Why XML
15 pages
XML Unit1
No ratings yet
XML Unit1
35 pages
Understanding XML vs HTML Differences
No ratings yet
Understanding XML vs HTML Differences
27 pages
XML & Xpath: Dsci 551 Wensheng Wu
No ratings yet
XML & Xpath: Dsci 551 Wensheng Wu
59 pages
Agency-Structure Integration in Sociology
100% (1)
Agency-Structure Integration in Sociology
2 pages
At The Stroke of Goodnight: Story Summary
No ratings yet
At The Stroke of Goodnight: Story Summary
7 pages
Test Your Dream for Success
100% (2)
Test Your Dream for Success
2 pages
Resume Lucian Vidrascu Ict Executive
No ratings yet
Resume Lucian Vidrascu Ict Executive
3 pages
Intro to Investment Finance Course Guide
No ratings yet
Intro to Investment Finance Course Guide
3 pages
Catering Craft Practice Exam Guide
No ratings yet
Catering Craft Practice Exam Guide
8 pages
Vonn's Controversial Sports Illustrated Cover
No ratings yet
Vonn's Controversial Sports Illustrated Cover
3 pages
Advances in Field Robotics Review
No ratings yet
Advances in Field Robotics Review
2 pages
IELTS Writing Task 2 Vocabulary Guide
No ratings yet
IELTS Writing Task 2 Vocabulary Guide
2 pages
Understanding COPAR in Community Health
No ratings yet
Understanding COPAR in Community Health
5 pages
Frankenstein Volume II: Despair and Guilt
No ratings yet
Frankenstein Volume II: Despair and Guilt
60 pages
Grade 4 Computer Science Quiz
No ratings yet
Grade 4 Computer Science Quiz
11 pages
Analyzing "I Don't Want to Watch"
No ratings yet
Analyzing "I Don't Want to Watch"
2 pages
Small Animal Clinical Nutrition
No ratings yet
Small Animal Clinical Nutrition
8 pages
Trusts and Societies - Grant of Lease
No ratings yet
Trusts and Societies - Grant of Lease
23 pages
St. Cyprian: Life and Legacy Overview
No ratings yet
St. Cyprian: Life and Legacy Overview
3 pages
Pedro Maldonado - Regents Assessment - Diffusion Through A Membrane Lab Report
No ratings yet
Pedro Maldonado - Regents Assessment - Diffusion Through A Membrane Lab Report
2 pages
Teaching English: Lesson Plans & Strategies
No ratings yet
Teaching English: Lesson Plans & Strategies
5 pages
Understanding the Raman Effect
No ratings yet
Understanding the Raman Effect
23 pages
Understanding Biological Diversity
No ratings yet
Understanding Biological Diversity
9 pages
Fish Processing and Composition Overview
No ratings yet
Fish Processing and Composition Overview
30 pages
UNDERSTANDING THE YOGA DARSHAN (An Exploration of The Yoga Sutra of Maharishi Patanjali)
75% (4)
UNDERSTANDING THE YOGA DARSHAN (An Exploration of The Yoga Sutra of Maharishi Patanjali)
87 pages
B1 Level Vocabulary Exercises
No ratings yet
B1 Level Vocabulary Exercises
2 pages
Arunima Sinha: Everest's First Amputee
No ratings yet
Arunima Sinha: Everest's First Amputee
10 pages
Digital Minimalism: Benefits & Concerns
No ratings yet
Digital Minimalism: Benefits & Concerns
4 pages
Insect Strain Identification via Wing Morphology
No ratings yet
Insect Strain Identification via Wing Morphology
2 pages
Understanding Research Methodology
No ratings yet
Understanding Research Methodology
3 pages
Common Mandarin Adverbs Explained
100% (1)
Common Mandarin Adverbs Explained
5 pages
Patterns of Development in Writing
100% (1)
Patterns of Development in Writing
59 pages
Inter 1st Year English Questions Full Marks 2025
No ratings yet
Inter 1st Year English Questions Full Marks 2025
3 pages

Understanding XML Structure and Schema

Uploaded by

Understanding XML Structure and Schema

Uploaded by

XMLI

Structure of XML Data XML Document Schema XPATH

XML Introduction (Cont.)

Tags make data (relatively) self-documenting

Chemistry: ChemML, Genetics: BSML (Bio-Sequence Markup Language),

XML Motivation (Cont.)

Comparison with Relational Data

Structure of XML Data

Example of Nested Elements

Motivation for Nesting

Structure of XML Data (Cont.)

Attributes vs. Subelements

<account account_number = A-101> . </account> <account> <account_number>A-101</account_number> </account>

More on XML Syntax

XML Document Schema

Document Type Definition (DTD)

Element Specification in DTD

| - alternatives + - 1 or more occurrences * - 0 or more occurrences

Attribute Specification in DTD

more on this later

IDs and IDREFs

Bank DTD with Attributes

XML data with ID and IDREF attributes

Tree Model of XML Data

Returns customer with > 1 accounts

More XPath Example

Even More XPath Example

You might also like