Schema
 Schemas
specify the structure of an XML document
constraints on its content
 This is also the purpose of a DTD
schema does it better
syntax is XML
can use existing XML tools
Schema
 What you can't do with DTD's
constrain the #PCDATA e.g.
 a telephone number
 a price
 a single word
precisely constrain repetition
 up to three children on a family ticket
a precise selection of elements
 in any combination or permutation
Schema
 XML is a meta-language for defining tag
languages
 A schema is a formal specification (in
XML) of the grammar for one language
useful for validating content & interchange
 XML Schema is a language for writing the
specifications
Schemas
• Common Vocabularies
• Shared Applications
• Network effect
• Formal Sets of Rules
• Machine-based XML processing
• Not human-based document processing
• Building Contracts
• Core rules for a series of transactions
Schemas
• DTDs
• good at describing documents
• can't manage complex data structures
• syntax is not extensible
• available tools won't work
Schemas
 Schemas build on primitive types
integers, floating point, strings, dates
 Types can be based on other types
aggregations
specifications
restrictions
equivalences
 Distinction between types and elements
Schemas
 Schema building is very like
OO data design
E-R diagrams
 Schemas may be complex compared to
the documents
because humans 'intuitively understand' tag
names
Schema Standards
• XML Schema (current W3C standard)
• large, full-featured, unimplemented
• XML-Data
• early contender, supported by Microsoft
• reduced set of XML-Data is part of IE5.
• DCD
• joint creation of Microsoft and IBM
• simpler version of XML-Data
Schema Standards
• SOX
• XML structures via OO-inheritance
• Schematron
• uses XSLT for schemas
• DSD
• like Schematron with simpler XML syntax
• RELAX
• based on hedge automata theory
• much simpler than XML Schema
Schema
History
Schema Problems
 Legal implications of schemas as
contracts
Eskimo Snow and Scottish Rain:
Legal Considerations of Schema Design
 http://www.w3.org/TR/md-policy-design
syntactic operability with semantic fault
occurs because DTDs and schemas mix
 syntax
 semantics
Schema Problems
• W3C standard "XML Schemas"
• Too big, too complex
• XML 1.0 spec = 30 pages, Schemas >200
• Too much, too soon
• it isn't clear that many developers are sure
what to do with this enormous toolkit today.
• Competitors
Defining A Schema (IE5)
 Take an example XML document instance
<?xml version="1.0"?>
<pizzaOrder>
<when>18:04:30</when>
<cost>8.75</cost>
<pizza>Hot n Spicy</pizza>
</pizzaOrder>
Defining A Schema (IE5)
 First declare that it uses a schema
definition via the default namespace
<?xml version="1.0"?
xmlns="x-schema:pizzaOrderSchema.xml">
<pizzaOrder>
<when>18:04:30</when>
<cost>8.75</cost>
<pizza>Hot n Spicy</pizza>
</pizzaOrder>
Defining a Schema (IE5)
 Now create an outline schema
<Schema
xmlns="urn:schemas-microsoft-com:xml-data">
...
</Schema>
 ie an XML document from the schema
language namespace
Defining a Schema (IE5)
 First we declare the kinds of elements we
have
<Schema xmlns="urn:schemas-microsoft-com:xml-data">
<ElementType name="when"/>
<ElementType name="cost"/>
<ElementType name="pizza"/>
<ElementType name="pizzaOrder"/>
</Schema>
Defining a Schema (IE5)
 and specify allowable content
<Schema xmlns="urn:schemas-microsoft-com:xml-data">
<ElementType name="when" content="textOnly"/>
<ElementType name="cost" content="textOnly"/>
<ElementType name="pizza" content="textOnly"/>
<ElementType name="pizzaOrder" content="eltOnly"/>
</Schema>
textOnly, eltOnly, mixed, empty
Defining a Schema (IE5)
 and then content model
<Schema xmlns="urn:schemas-microsoft-com:xml-data">
<ElementType name="when" content="textOnly" />
<ElementType name="cost" content="textOnly"/>
<ElementType name="pizza" content="textOnly"/>
<ElementType name="pizzaOrder" content="eltOnly">
<element type="when"/>
<element type="cost"/>
<element type="pizza"/>
</ElementType>
</Schema>
Defining a Schema (IE5)
 and even the content model for text
<Schema xmlns="urn:schemas-microsoft-com:xml-data">
<ElementType name="when" content="textOnly"
type="time"/>
<ElementType name="cost" content="textOnly"
type="float"/>
<ElementType name="pizza" content="textOnly"/>
<ElementType name="pizzaOrder" content="eltOnly">
<element type="when"/>
<element type="cost"/>
<element type="pizza"/>
...
Defining a Schema (IE5)
 But that requires another namespace
<Schema xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ElementType name="when" content="textOnly"
dt:type="time"/>
<ElementType name="cost" content="textOnly"
dt:type="float"/>
<ElementType name="pizza" content="textOnly"/>
<ElementType name="pizzaOrder" content="eltOnly">
<element type="when"/>
<element type="cost"/>
...
Schema Components
 Schemas build on the declarations and
usage of elements and attributes
 ElementType elements declare a kind of element
 AttributeType elements declare a kind of attribute
 element elements show the use of an element within
the context of another element
 attribute elements show the use of an attribute on an
element
Schema Components
 Various attributes specify the allowable
properties each element or attribute
 model specifies whether the element may contain
'foreign' elements, not specified in the schema
 minOccurs and maxOccurs put lower- and upper-
bounds on the repetition of an element
 order specifies whether subelements must appear in
the order specified, or whether only a single
subelement can be chosen
 required states that an attribute must be present
 default gives a default value for a missing attribute
Schema Data Types
 Microsoft's Schema provides 23 built-in
data types to which textual content can
conform
various numeric types (float, ints)
date, time, urn, uuid, char, hex, boolean and
blob
 No derived / extended types are allowed
 Separate namespace labels data vocab
Using Data Types
 A node's validated data type is directly
accessible within the IE DOM
DOMelement.nodeTypedValue
instead of .nodeValue or .text
 A node's schema definition is available
DOMElement.definition property
see
weather
.html