0% found this document useful (0 votes)

5 views21 pages

Web Unit 3

This document provides a comprehensive overview of XML, detailing its structure, syntax, and best practices for usage. It covers the differences between well-formed and valid XML, the role of Document Type Definitions (DTDs) and XML Schemas (XSDs), and includes practical guidance on choosing between them. Additionally, it discusses various XML constructs such as processing instructions, CDATA sections, and comments, while referencing authoritative sources and providing examples throughout.

Uploaded by

learnwithprodrive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views21 pages

Web Unit 3

Uploaded by

learnwithprodrive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Executive Summary

XML (Extensible Markup Language) is a textual data format that represents

information as a hierarchy of tagged elements, similar to a tree. Each XML
document has exactly one root element, and all other elements nest within
it[1]. Elements have names (tags) and may carry attributes and text. XML
was designed to be both human-readable and machine-readable, and it
relies on strict rules for well-formedness (correct syntax) and validity
(meeting a schema or DTD). This report thoroughly covers XML syntax
(elements, attributes, namespaces, processing instructions, CDATA,
comments, entities), best practices (e.g. meaningful tag names[2]), and
contrasts well-formed vs valid XML. It then delves into DTDs (Document
Type Definitions) – the original XML schema mechanism – explaining
internal vs external subsets, declarations for elements, attributes, entities,
notations, and content models (ANY, EMPTY, mixed, sequences/choices with
occurrence indicators). We show parameter entities and sample DTDs, and
outline validating XML against a DTD. Next, XML Schema (XSD) is
examined: its XML-based schema language, built-in/simple/complex types,
elements and attributes, namespaces, <import>, <include>, <redefine>, type
derivation (extension/restriction), substitution groups, and identity
constraints (<xsd:key>, <xsd:unique>, <xsd:keyref>). We provide example
schemas and validation steps. The report also offers practical guidance on
choosing DTD vs XSD (and migrating), common tools (e.g. xmllint, Xerces,
IDE plugins) for validation, and numerous examples. A comparison table
highlights differences in expressiveness, data typing, namespace support,
extensibility, and tooling. Throughout, authoritative sources (W3C specs) are
cited for definitions and rules, and diagrams (e.g. a timeline of XML/XSD and
a node-tree illustration) clarify structure. The content is organized with clear
headings and concise paragraphs, suitable for learning or teaching XML
concepts deeply and rigorously.
timeline
title Timeline of XML and Schema Standards
1998 : **XML 1.0** (W3C Rec.)
1999 : **XML Namespaces 1.0** (W3C Rec.)
2001 : **XML Schema 1.0** (W3C Rec., Parts 1 & 2)
2004 : **XML Schema Part 2: Datatypes** (W3C Rec.)
2012 : **XML Schema 1.1** (W3C Rec., Part 2: Datatypes)
2022 : **XML Schema 1.1, 2nd Edition** (W3C Rec.)

XML Elements, Attributes, and Document Structure

An XML document is structured as nested elements. Each element has a
start-tag <name> and an end-tag </name> (or can be empty with <name/>). For
example:
<note> ... </note>

Here <note> is the root element. Elements may contain text, other child
elements, or be empty. By definition, an XML document must have exactly
one root (document) element, and all other elements must nest properly
within it[1]. This is why XML data naturally forms a tree structure (see
diagram below).
Figure: XML document as a node-tree (root element with children and text
nodes). XML’s hierarchical structure enforces a single root and proper
nesting[1].
Example: In the simple XML below, <bookstore> is the root, containing
multiple <book> children, each with their own subelements and attributes:

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>

Each element name (tag) must follow XML naming rules: the first character
must be a letter or underscore (_), and it cannot begin with the letters “XML”
(in any case)[1][2]. After the first character, letters, digits, hyphens,
underscores, and dots are allowed. Authors should choose meaningful names
(words or combinations) and avoid spaces or purely symbolic names[2]. For
example, use <invoiceDate> instead of <d>. Certain symbols (like <, &) cannot
appear in names because they serve as markup delimiters.
Attributes are name–value pairs inside a start-tag, used to add metadata to
an element:

<book id="bk101" available="true">

Here id and available are attributes. Attribute values must be quoted

(single or double quotes). XML does not mandate ordering of attributes. By
default, unrecognized attributes are ignored by generic XML parsers unless
constrained by a schema/DTD (see below). Common built-in attribute types
(in DTD) include CDATA (any text), ID, IDREF, ENTITY, etc. We cover attribute
declarations under DTD and XSD below.
Character Data & Entities: Text between tags is called character data.
Certain characters have special meanings in XML ( <, >, &, quotes). To include
them as literal data, use entity references or CDATA sections (below).
Standard predefined entities are < for <, > for >, & for &, ' for
', and " for "[3]. For example, to write 5 < 10 in XML, one could write:

<note>5 < 10</note>

or use a CDATA section:

<note><![CDATA[5 < 10]]></note>

CDATA sections (marked <![CDATA[ ... ]]>) tell the parser to treat enclosed
text literally (so < and & need not be escaped)[3]. They cannot nest, so the
sequence ]]> must not appear inside a CDATA block[4][3]. CDATA is useful
for embedding chunks of text that include characters which would otherwise
be seen as markup.
Comments and Processing Instructions: XML supports comments () anywhere outside element content[5]. Comments cannot
contain the sequence -- (double hyphen). They are strictly for human or
application notes and are ignored by parsers. Processing instructions (PIs)
allow embedding information for applications. They start with <?target ...?
>. For example, <?xml-stylesheet type="text/xsl" href="[Link]"?> can
link an XSLT stylesheet. The XML specification defines PIs as
“[Instructions] begin with a target (name identifying the application)
and continue until ?>. They are not part of the document’s character
data but must be passed through to the application.”[6].
Namespaces: To avoid naming collisions when mixing vocabularies, XML
uses namespaces. A namespace is identified by a URI; element/attribute
names can be placed in a namespace via prefixes or default declarations[7].
For example:

<root xmlns:h="[Link]
xmlns:f="[Link]
<h:table><h:tr><h:td>Apples</h:td></h:tr></h:table>
<f:table><f:name>African Coffee Table</f:name></f:table>
</root>

Here h: and f: are prefixes bound to URIs. The W3C Namespaces spec
defines: “An XML namespace is identified by a URI reference; element and
attribute names may be placed in an XML namespace using the mechanisms
described…”[7]. In practice, this means you declare xmlns:prefix="URI" on
an element; that prefix then qualifies all child elements/attributes. Default
namespace (no prefix) is set with xmlns="URI" and applies to unprefixed
element names. Namespaces are not supported in DTDs (another reason to
prefer XSD as we will see).
Best Practices – Tag Naming and Structure: XML tags should be
meaningful words (e.g. <customerAddress>, not <cAddr>), typically in lower-
case or PascalCase. Avoid spaces or punctuation in names[2]. Structure your
document logically (e.g. group related elements under a container element).
By convention, use one element per concept and nest semantically. Also,
minimize redundant levels. E.g., for a list of items, use
<items><item>…</item></items>. The tree structure should be clear, with a
single root. And always quote attribute values and close empty tags with />.
For example: <br/> instead of <br> in XML.

Well-Formed vs Valid XML

A well-formed XML document obeys XML syntax rules: one root element,
properly nested tags, all tags closed, attributes quoted, no illegal characters,
etc[1][8]. It need not conform to any schema or DTD, but it must satisfy
XML’s grammar. For example, this is well-formed XML (but has no schema,
so it’s not valid):

<?xml version="1.0"?>
<greeting>Hello, world!</greeting>

A valid XML document is well-formed and complies with the constraints

defined in its DTD or XML Schema. The W3C XML spec states: “An XML
document is valid if it has an associated document type declaration and if
the document complies with the constraints expressed in it.”[8]. That means
the document’s elements and attributes appear in allowed contexts, with
correct content and datatypes as dictated by the DTD/XSD.
Example – Well-Formed but Not Valid:
Given the above <greeting> example, it has no <!DOCTYPE> or schema
declaration, so it is well-formed but not valid. Alternatively, if a DTD required
two child elements <to> and <from>, but the XML omitted one, it would still
be well-formed but fail validation.
Example – Not Well-Formed:

This is not well-formed (the <from> tag is closed with </to>). The parser will
throw an error. Or using an unescaped &: <text>AT&T</text> is not well-
formed; it must be <text>AT&T</text>.
Key Distinctions:
- Well-formedness is strictly syntactic and can be checked by any XML
parser.
- Validity requires a schema/DTD and ensures the content model and data
types match expectations.
In summary, any valid XML is necessarily well-formed, but a well-formed
document isn’t automatically valid (it must explicitly reference a DTD or
Schema and meet its rules).

XML: Processing Instructions, CDATA, and Comments

Beyond elements and attributes, XML supports several other constructs:
 Processing Instructions (PI): As noted, PIs start with <? and end
with ?>. They are intended for applications. E.g.,

<?xml-stylesheet type="text/xsl" href="[Link]"?>

Here xml-stylesheet is the target. The XML spec defines PIs as

“instructions for applications”; they are passed to the application but
not part of the element content[6]. The only reserved word is <?
xml ...?> at the top for the XML declaration.

 CDATA Sections: Introduced above. Useful for embedding chunks of

text (like code snippets) that contain < or &. CDATA is not markup –
only the closing ]]> is recognized. For example:

Within the CDATA, characters < and & do not need escaping. The spec
notes that within CDATA “only ]]> is recognized as markup”[3].

 Comments: Written , comments can appear almost

anywhere outside other markup[5]. They are strictly for humans, not
processed by XML parsers. Per the spec, “Comments may appear
anywhere in a document outside other markup; in addition, they may
appear within the DTD.”[5]. Comments must not contain -- (double
hyphen). Example:

<note> … </note>
 Entity References: Apart from the five predefined entities, you can
define general entities (text substitutions) in a DTD (e.g. <!ENTITY
writer "Donald Duck">) and use them like &writer;. These allow
reusing text or marking reserved characters. We’ll see entity
declarations under DTD.

These constructs do not affect the element hierarchy but are important for
embedding instructions, text blocks, or annotations.

DTD (Document Type Definition)

A DTD defines the legal structure and vocabulary (elements, attributes,
entities) of a class of XML documents. It can be internal (inside <!DOCTYPE>
in the same XML file) or external (in a separate .dtd file referenced by a
SYSTEM or PUBLIC identifier).

DOCTYPE Declaration
The DTD is declared with a DOCTYPE. Example of internal DTD in an XML file:
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to><from>Jani</from><heading>Reminder</heading><body>…</
body>
</note>

This declares note as the root element containing to, from, heading, body,
each text-only (#PCDATA)[9]. You can also use an external DTD:
<!DOCTYPE note SYSTEM "[Link]">

with [Link] containing those <!ELEMENT> lines.

Element Declarations
An element declaration has the form <!ELEMENT name contentspec>, where
contentspec describes what children or text are allowed. Content models
include:
- EMPTY (element must be empty, no content), e.g. <!ELEMENT br EMPTY>[10].
- ANY (any content allowed, not usually recommended), e.g. <!ELEMENT
container ANY>[11].
- A sequence or choice of sub-elements: (a,b,c) means element “a” then “b”
then “c” in order; (a|b|c) means a choice of one of those. E.g. <!ELEMENT
para (text|emph)*> means a paragraph can have any number of text or
<emph> children in any order[10].
- Mixed content: (#PCDATA | child1 | child2)* allows text interspersed with
specified child elements. E.g. <!ELEMENT p (#PCDATA|a|i|b)*> allows text and
<a>, <i>, <b> in any order[12].

You can use occurrence indicators: ? (0 or 1), * (0 or more), + (1 or more)

after an element or a group. For example:

<!ELEMENT person (firstname, lastname, phone*)>

means <person> must contain exactly one <firstname>, one <lastname>, and
zero or more <phone> children. The spec details that an element model is
essentially a regular expression over child element types[13].
Examples:

<!ELEMENT list (item+)> <!-- list has one or more item

elements -->
<!ELEMENT section (title, (p|list)*, note?)> 
<!ELEMENT emptyExample EMPTY>

Attribute-List Declarations
After element declarations, you can declare attributes for an element with <!
ATTLIST>. Syntax: <!ATTLIST element-name attrName attrType defaultDecl>.
For example:

<!ATTLIST person
id ID #REQUIRED
lang CDATA #IMPLIED
status (single|married|divorced) "single">

This says <person> has an id attribute of type ID (a unique XML ID) and it is
required, a lang attribute of type CDATA (optional), and a status attribute
whose value must be one of the enumerated choices; if omitted, the default
is "single". Key points from the spec: ID values must be unique document-
wide[14], and an element may have at most one ID attribute[15]. Other
types include IDREF(S), ENTITY(IES), NMTOKEN(S), NOTATION, as defined in XML
1.0[16][17].
Default declarations control optional/required status:
- #IMPLIED means optional.
- #REQUIRED means it must appear.
- #FIXED "value" means if present, it must equal that value; if absent, it is as
if that value is used anyway.
Entity Declarations
Entities are reusable pieces of text. Two kinds exist: general entities (used in
content) and parameter entities (used in DTD content). A general entity is
declared like:

<!ENTITY writer "Donald Duck">

Then &writer; or © can be used in the XML content. These are often
used for special characters or common phrases[18]. Numeric character
references (like © for ©) are also allowed.
Parameter entities (used in DTD only) start with %. For example:

<!ENTITY % htmlstruct "(head, body)">

<!ELEMENT html %htmlstruct;>

This lets you factor out DTD definitions. Parameter entities and external
subsets allow modular DTDs. Detailed rules about parameter entity
expansion and nesting are in the XML spec[19][20].

Notation Declarations
Notations declare formats for non-XML (binary) data referenced in attributes.
E.g.: <!NOTATION imgpng SYSTEM "image/png">. Notation attributes tie an
element to an external data type. These are rare in typical XML usage but
part of the DTD spec[21][17].

Content Models in DTDs

Key content model keywords: EMPTY, ANY, and mixed models with
(#PCDATA|...)*. Sequences and choices use commas and pipes respectively,
within parentheses. Example content model forms:
- Sequence: (title, author, year)
- Choice (one of): (yes|no|maybe)
- Mixed (text+elements): (#PCDATA|b|i|u)*
- Occurrence: (item)+, child?, etc.
DTD content models must be deterministic (unambiguous): the parser should
always know, reading left to right, which element type to expect. Non-
deterministic models (like (a|b|a)) are not allowed[22].

Example DTD and Validation

Consider the earlier <note> example. The DTD is:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>

To validate an XML file ([Link]) against this DTD ([Link]), a tool like
xmllint can be used:

xmllint --noout --dtdvalid [Link] [Link]

If [Link] is well-formed and follows the DTD, xmllint will produce no

output (no errors). If invalid, it will report which element/attribute is wrong.
For example, if <note> had <to> and <heading> only, it might say:

Element note content does not follow the DTD, expecting

(to,from,heading,body).

Step-by-step, a validating parser reads the DTD, constructs the grammar,

then parses the XML, checking each element’s children and attributes
against the declarations.

Well-Formedness Constraints (DTD Section)

Even in a DTD context, the XML document must be well-formed: properly
nested tags, matching start/end tags, unique attribute names in one
element, etc. The DTD adds extra constraints but does not override basic
XML syntax rules.

XML Schema (XSD)

XML Schema (often called XSD) is a W3C standard (Parts 1 and 2) for
defining XML document structure and data types in XML syntax. It is far more
powerful and expressive than DTDs[23]. Schemas are themselves XML
documents, and use XML Namespaces (usually the namespace
[Link]

Basics of XSD
A schema document begins with <xs:schema> (or <xsd:schema>) with
namespace declarations, e.g.:

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema
xmlns:xs="[Link]
targetNamespace="[Link]
xmlns="[Link]
elementFormDefault="qualified">
...
</xs:schema>
- targetNamespace declares the namespace of elements defined in this
schema.
- elementFormDefault="qualified" means local element names must be
qualified with this namespace.
- The schema uses <xs:element>, <xs:complexType>, <xs:simpleType>, etc., to
declare the valid structure.
XML schemas support namespaces, enabling mixing multiple schemas. You
can <xs:import> another namespace’s schema, <xs:include> another
schema in the same namespace, or <xs:redefine> to override declarations.

Elements and Types

In XSD, you declare elements either globally or locally. A global element is
top-level and can be referenced by name. For example:
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>

This defines a global element <note> whose content is a complex type: a

sequence of four string elements to, from, heading, body. Attributes can be
declared inside the complexType with <xs:attribute>.
Simple Types: If an element has no sub-elements, only text, it is a simple
type. You can use built-in types like xs:string, xs:int, xs:date, xs:boolean,
etc. Or derive a custom simple type by restricting or extending a base type.
For example:
<xs:simpleType name="ZipCodeType">
<xs:restriction base="xs:string">
<xs:pattern value="\d{5}(-\d{4})?"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="zip" type="ZipCodeType"/>

This defines ZipCodeType as a string matching US ZIP code patterns (5 or 9

digits). The element <zip> uses that type.
Schemas distinguish simpleType (no child elements, just a value) vs
complexType (can contain elements and attributes)【23†L125-134】【23†L127-
136】.
Example: A complex type with attributes:
<xs:complexType name="PersonType">
<xs:sequence>
<xs:element name="FirstName" type="xs:string"/>
<xs:element name="LastName" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="lang" type="xs:string" default="en"/>
</xs:complexType>
<xs:element name="Person" type="PersonType"/>

This says <Person> has subelements <FirstName> and <LastName> and

attributes id (must be unique ID) and optional lang (default "en").

Derivation (Extension and Restriction)

XSD lets you derive new types from existing ones. For complex types, you
can extend by adding elements/attributes, or restrict by removing or
narrowing them. For simple types, you typically restrict (by pattern,
maxLength, enumerations, etc.). Example of extension:
<xs:complexType name="EmployeeType">
<xs:complexContent>
<xs:extension base="PersonType">
<xs:sequence>
<xs:element name="Position" type="xs:string"/>
</xs:sequence>
<xs:attribute name="salary" type="xs:decimal"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>

This EmployeeType extends PersonType by adding a <Position> element and a

salary attribute.

Substitution Groups
XSD supports substitution groups, where one global element can be
substituted by others. For instance, if <payment> is in a substitution group
with <credit> and <debit>, an element <payment> in the XML could legally be
<credit> or <debit> instead. This is an advanced feature enabling
polymorphism. A sub-group is declared by substitutionGroup="headElement"
on the substituting element.

Identity Constraints (key, unique, keyref)

XML Schema allows you to enforce relational constraints:
- <xs:unique name="uniqueID"> ensures a field or combination of fields is
unique among all selected elements.
- <xs:key name="keyName"> is similar to unique but also requires the field is
non-null (like a primary key).
- <xs:keyref name="keyRef"> says a field’s values must match some key’s
values (like a foreign key).
These use XPath-like selectors and fields. For example:

<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded">
<xs:complexType>
<xs:attribute name="isbn" type="xs:ID" use="required"/>
<xs:attribute name="title" type="xs:string"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:unique name="uniqueISBN">
<xs:selector xpath="book"/>
<xs:field xpath="@isbn"/>
</xs:unique>
</xs:element>

This enforces each <book> in <catalog> must have a unique isbn attribute. A
<keyref> could enforce a reference from one element’s field to another’s key
field (e.g. an order referencing a customer key).

Namespaces in Schema and Documents

Schemas usually define xmlns:xs="[Link] An
XML document referencing the schema uses xsi:schemaLocation or
xsi:noNamespaceSchemaLocation to indicate which schema to use. For
example:

or if using a targetNamespace:

<ns:note xmlns:ns="[Link]
xmlns:xsi="[Link]
xsi:schemaLocation="[Link] [Link]">
...
</ns:note>
Example Schema and Validation
Consider the earlier <note> example. An equivalent XSD might be:

<xs:schema xmlns:xs="[Link]
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

With an XML file ([Link]):

<?xml version="1.0"?>
<note xmlns:xsi="[Link]
xsi:noNamespaceSchemaLocation="[Link]">

<to>Tove</to><from>Jani</from><heading>Reminder</heading><body>...</
body>
</note>

To validate using xmllint (for example):

xmllint --noout --schema [Link] [Link]

If the content matches the schema, no error is reported. If an element is

missing or an unexpected element appears, you get a validation error (e.g.
"Element 'note': Missing child element(s). Expected: 'body'."). If a type
mismatch occurs (say <to>123</to> with type xs:string is fine, but
<year>abc</year> with type xs:int would fail as “‘abc’ is not a valid value for
'xs:int'”).

Built-in and Custom Data Types

XSD Part 2 provides a rich hierarchy of built-in types (numeric, string,
date/time, binary, etc.)[23]. There are “atomic” primitive types like string,
integer, boolean, date, and derived types like positiveInteger,
normalizedString, token, etc. You can constrain these with facets:
minInclusive, pattern, length, etc. For example, to restrict a date:

<xs:simpleType name="YearType">
<xs:restriction base="xs:gYear">
<xs:minInclusive value="2000"/>
<xs:maxInclusive value="2025"/>
</xs:restriction>
</xs:simpleType>

Unions and lists allow combining types: a union might allow an element to be
one of several types, and a list allows space-separated lists of values. See
the XML Schema spec for full details[24][25].

Schema Extensibility and Versioning

XML Schema is extensible. You can <xs:include> to split a schema into files,
<xs:import> for multiple namespaces, and <xs:redefine> to alter included
components. Schemas themselves have version and final blocks to prevent
further extension if needed. The XSD is essentially object-oriented: types can
be inherited and reused.

Example – Derived Types

As an example of type derivation, the W3C XSD tutorial shows creating a
Dutch_ZIP_Code type by restricting xs:string with a pattern[26] (here’s a
paraphrase):

<xs:simpleType name="Dutch_ZIP_Code">
<xs:restriction base="xs:string">
<xs:pattern value="\d{4} {0,1}[A-Z]{2}"/>
</xs:restriction>
</xs:simpleType>

Then an element can use type="Dutch_ZIP_Code". The schema is checked

with XML tools (like xmllint --schema) ensuring each <zip> matches that
regex. This illustrates the power of XSD over DTD for typed data.

DTD vs XML Schema (XSD) Comparison

XML Schema
Feature DTD (XSD) Notes
Syntax SGML-based XML-based XSD is itself
(non-XML (uses an XML
syntax) namespaces) document
Data Typing None (all Rich built-in e.g. validate
content is types (string, number
text) integer, date, format, dates
etc.)[23];
custom types
via
restriction/ext
ension
XML Schema
Feature DTD (XSD) Notes
Namespaces Not supported Full support; DTD cannot
targetNamesp distinguish
ace, XMLNS
import/include
allowed
Element ANY, EMPTY, All of the XSD can
Models mixed, above plus express more
sequence, <xs:sequence> complex
choice[27][12] , <xs:choice>, models
<xs:all> in
complex types
Attributes Basic (CDATA, Any E.g. enforce
ID, IDREF, simpleType integer vs
ENTITY, (incl. arbitrary text
NMTOKEN, lists/unions);
NOTATION) default
[16][17] values, fixed
values,
required/optio
nal via use
Extensibility Limited; no High: Supports
inheritance of complexType reuse and
element types inheritance libraries
(extension/res
triction),
substitutionGr
oups
Identity No Yes: <xs:key>, Unique IDs,
Constraints <xs:unique>, foreign-key-
<xs:keyref> like
support constraints
relational
integrity
Validation Can only Validates both e.g. pattern
Complexity check structure and facets,
structure/orde data numeric
r; no datatype (patterns, ranges
checks ranges)[23]
Mixed Allowed via ...)*[12] Explicitly
Content mixed supported via
(#PCDATA mixed="true"
in
XML Schema
Feature DTD (XSD) Notes
complexType;
more flexible
Entities Supports Only general Parameter
general and entities (for entities allow
parameter XML instance DTD macros
entities; reuse), no
notations parameter
entities
Tooling & Older; most Modern; wide Newer and
Adoption XML parsers tool support more powerful
support it (xmllint,
natively; Xerces, .NET,
simpler to etc.)[23];
learn steeper
learning
Expressiven Simpler (no More One
ess data typing, expressive “superset” of
no (types, DTD
namespaces) namespaces, capabilities
modularity)
[23]

As one expert notes, “an XML Schema provides [the structure defined by a
DTD] plus a detailed way to define what the data can and cannot contain. It
provides far more control for the developer over what is legal, and it
provides an Object Oriented approach”[28]. The table above and [22†L36-
L41] underscore that XSD is a superset of DTD capabilities (with datatypes,
namespaces, inheritance)[23].

When to Use DTD vs XSD

There remain scenarios for DTDs: legacy systems, simplicity, or
performance. XSDs (especially with complex or large schemas) can be more
verbose to parse. SitePoint observes that DTDs are “mature and complex”
with existing libraries available, whereas XML Schema validation can impose
startup overhead (loading namespaces, DTDs for the schema itself, etc.)[29].
In environments where every millisecond counts or where only simple
validation is needed, a DTD might suffice. However, for most modern
applications (especially involving data exchange, web services, or where
data types matter), XSD is preferred for its precision.
The consensus:
- Use DTD if you need a quick, simple declaration of element structure, or if
working with old tools/standards that only support DTD.
- Use XML Schema if you need strong typing, namespace support, or
complex constraints (and have tools that support it).

Migrating from DTD to XSD

To convert a DTD to XML Schema, you can often use automated tools (e.g.
trang, OxygenXML converter) or manually rewrite:
- Declare a schema with the same root element.
- For each <!ELEMENT>, create an <xs:element> and an <xs:complexType> (with
<xs:sequence>/<xs:choice>) matching the DTD model.
- DTD attribute types (CDATA, ID, enums) map to XSD types (xs:string, xs:ID,
xs:NMTOKEN/enum restrictions).
- Entities and notation declarations have no direct schema equivalent;
hardcode entity values or use CDATA as needed.
As practical guidance, start by generating a basic schema from the DTD
(some XML editors can import DTD), then refine types and add namespaces.
Validate iteratively.

Validation Tools and Commands

A number of tools can check XML well-formedness and validity against DTDs
or XSDs:
 xmllint (libxml2): Common on UNIX/Linux.
 Check well-formed: xmllint --noout [Link] (no output = well-
formed).
 Validate DTD: xmllint --noout --dtdvalid [Link] [Link].
 Validate XSD: xmllint --noout --schema [Link] [Link].
Example: xmllint --noout --schema [Link] [Link] (no errors if
valid).

 Xerces (Apache): A Java-based parser. Can be used as a command-

line (xercescmd) or through code. Validates both DTD and XSD.

 XML IDEs/Editors: e.g. Oxygen XML Editor, XMLSpy, or even Visual

Studio/VS Code with XML plugins. They highlight validation errors with
DTD/XSD.

 Online Validators: Many websites allow pasting XML and XSD/DTD to

check validity.

 Browsers: Some browsers will validate XML if given a DTD (with

DOCTYPE) but this is old-fashioned and not reliable for complex
schemas.
When validating, common error messages include: missing required
elements/attributes, element not allowed here, datatype mismatch, or entity
undeclared. Always start by checking well-formedness before validating.

Examples
Sample XML + DTD
XML ([Link]):

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE note SYSTEM "[Link]">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

DTD ([Link]):

<!ELEMENT note (to, from, heading, body)>

<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>

Validation:
xmllint --noout --dtdvalid [Link] [Link]

- If all elements appear in order, no output (valid).

- Error example: If <body> is missing in [Link], xmllint reports: “[Link]:
line X: element note: validity error: Element 'note': Missing child element(s).
Expected: 'body'.”

Sample XML + XSD

XML ([Link]):

<?xml version="1.0"?>
<note xmlns:xsi="[Link]
xsi:noNamespaceSchemaLocation="[Link]">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
XSD ([Link]):

<xs:schema xmlns:xs="[Link]
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

Validation:
xmllint --noout --schema [Link] [Link]

- If <body> were removed from [Link], it would error: “Element 'note':

Missing child element(s). Expected: 'body'.”
- If, say, <to> contained a number 123 and the schema said xs:string, it
would still pass (numeric is valid string). But if <to> were defined
type="xs:int" and was abc, an error would say “'abc' is not a valid value for
'xs:int'.”

Comparison Summary
Both DTDs and XML Schema serve to define XML structure, but with different
power:

Aspect DTD XML Schema (XSD)

Syntax Format SGML-like (not XML) XML-based, uses
namespaces
Data Types No (all text) Many built-in
(string, date,
integer, etc.)[23]
Namespaces None Full support
(targetNamespace,
import/include)
Extensibility Limited (no Extensive
inheritance) (extension/restrictio
n,
substitutionGroups)
Validation Structure-only Structure + data
(order/occurrence) (patterns, value
Aspect DTD XML Schema (XSD)
ranges)
Tool Support Widespread, simple Widespread
tools (modern parsers,
IDEs)
Use Cases Simple/legacy, free- Complex data with
form text types (e.g. Web
services)

As noted by the XML Schema spec, XSD “substantially reconstructs and

considerably extends the capabilities found in XML 1.0 DTDs”[30]. In
practice, for new designs XML Schema is preferred for its rigor, and DTDs
remain mainly for backwards compatibility.

Further Reading and References

 W3C XML 1.0 (Fifth Edition) – for core syntax rules[31][2].
 W3C Namespaces in XML – for namespace definitions[7].
 W3C XML Schema Part 1: Structures – for schema component
definitions[30].
 W3C XML Schema Part 2: Datatypes – for built-in and derived
types[23].
 SitePoint, XML DTDs vs XML Schema – practical comparison[28][32].
This report draws on those authoritative sources (cited above) and various
XML tutorials/examples to ensure correctness and clarity. It is meant as a
comprehensive study guide on XML structure, DTDs, and XML Schema for
teaching or self-study.

[1] [2] [3] [4] [5] [6] [8] [10] [11] [12] [13] [14] [15] [16] [17] [19] [20] [21]
[22] [27] [31] Extensible Markup Language (XML) 1.0 (Fifth Edition)
[Link]
[7] Namespaces in XML 1.0 (Third Edition)
[Link]
[9] [18] XML DTD
[Link]
[23] [24] [25] XML Schema Part 2: Datatypes Second Edition
[Link]
[26] Introduction to XML Schemas - W3C
[Link]
[28] [29] [32] XML DTDs Vs XML Schema — SitePoint
[Link]
[30] XML Schema Part 1: Structures Second Edition
[Link]

Understanding XML Basics and Structure
No ratings yet
Understanding XML Basics and Structure
29 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
Understanding XML and XSD Basics
No ratings yet
Understanding XML and XSD Basics
25 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
SOAP Message Structure and WSDL Basics
No ratings yet
SOAP Message Structure and WSDL Basics
49 pages
XML Basics: Structure and Syntax
No ratings yet
XML Basics: Structure and Syntax
18 pages
Introduction to XML Basics
No ratings yet
Introduction to XML Basics
39 pages
Module 5 XML
No ratings yet
Module 5 XML
29 pages
Understanding XML: Basics and Syntax
No ratings yet
Understanding XML: Basics and Syntax
29 pages
Understanding XML Basics and Structure
No ratings yet
Understanding XML Basics and Structure
34 pages
Understanding XML Basics and DTDs
No ratings yet
Understanding XML Basics and DTDs
15 pages
XML Data Parsing and Structure Guide
No ratings yet
XML Data Parsing and Structure Guide
85 pages
XML Data Parsing and Structure Guide
No ratings yet
XML Data Parsing and Structure Guide
43 pages
Understanding XML: Structure and Benefits
No ratings yet
Understanding XML: Structure and Benefits
37 pages
Understanding XML: Structure and Syntax
No ratings yet
Understanding XML: Structure and Syntax
43 pages
Unit III
No ratings yet
Unit III
19 pages
Understanding XML: Basics and Structure
No ratings yet
Understanding XML: Basics and Structure
16 pages
Introduction to XML Basics
No ratings yet
Introduction to XML Basics
81 pages
Understanding XML: Structure & Syntax
No ratings yet
Understanding XML: Structure & Syntax
39 pages
Comprehensive Guide to XML Basics
No ratings yet
Comprehensive Guide to XML Basics
46 pages
Understanding XML Markup Language Basics
No ratings yet
Understanding XML Markup Language Basics
6 pages
Unit 1: Benefits of XML 1.structured Document
No ratings yet
Unit 1: Benefits of XML 1.structured Document
26 pages
XML Basics for Customer Interface Design
No ratings yet
XML Basics for Customer Interface Design
103 pages
Understanding XML: Structure and Syntax
No ratings yet
Understanding XML: Structure and Syntax
52 pages
Understanding XML Structure and Syntax
No ratings yet
Understanding XML Structure and Syntax
38 pages
Essential XML Facts and Overview
No ratings yet
Essential XML Facts and Overview
89 pages
XML Unit1
No ratings yet
XML Unit1
35 pages
Introduction to XML Basics
No ratings yet
Introduction to XML Basics
43 pages
Introduction to XML Basics and Syntax
No ratings yet
Introduction to XML Basics and Syntax
69 pages
XML Basics and Standards Overview
No ratings yet
XML Basics and Standards Overview
30 pages
XML Basics and Standards Overview
No ratings yet
XML Basics and Standards Overview
55 pages
Understanding XML Basics and DTD
No ratings yet
Understanding XML Basics and DTD
11 pages
Applications of XML
No ratings yet
Applications of XML
19 pages
Understanding XML Markup Language
No ratings yet
Understanding XML Markup Language
30 pages
Understanding XML and SGML Features
No ratings yet
Understanding XML and SGML Features
48 pages
Understanding XML for Web Applications
No ratings yet
Understanding XML for Web Applications
19 pages
XML Basics and Characteristics
No ratings yet
XML Basics and Characteristics
18 pages
XML Basics: Structure and Syntax Guide
No ratings yet
XML Basics: Structure and Syntax Guide
18 pages
XML and PHP Fundamentals Guide
No ratings yet
XML and PHP Fundamentals Guide
33 pages
Understanding SGML and XML Basics
No ratings yet
Understanding SGML and XML Basics
45 pages
Understanding XML: Structure and Syntax
No ratings yet
Understanding XML: Structure and Syntax
40 pages
XML Basics and DTD Tutorial PDF
No ratings yet
XML Basics and DTD Tutorial PDF
14 pages
Understanding XML Structure and DTD
No ratings yet
Understanding XML Structure and DTD
66 pages
XML Basics: Structure and Syntax
No ratings yet
XML Basics: Structure and Syntax
64 pages
XML Quick Guide
No ratings yet
XML Quick Guide
30 pages
IPT Chapter-3 (2) - 1
No ratings yet
IPT Chapter-3 (2) - 1
19 pages
Understanding XML Structure and Usage
No ratings yet
Understanding XML Structure and Usage
29 pages
Understanding XML and DTD Basics
No ratings yet
Understanding XML and DTD Basics
34 pages
Introduction to XML Basics
No ratings yet
Introduction to XML Basics
18 pages
Understanding XML: Structure and Syntax
No ratings yet
Understanding XML: Structure and Syntax
59 pages
Introduction to XML Basics and Structure
No ratings yet
Introduction to XML Basics and Structure
41 pages
XML and JSON Overview Guide
No ratings yet
XML and JSON Overview Guide
80 pages
Understanding XML: Structure and Uses
No ratings yet
Understanding XML: Structure and Uses
45 pages
XML Basics and Document Structure
No ratings yet
XML Basics and Document Structure
49 pages
Understanding XML Basics and Syntax
No ratings yet
Understanding XML Basics and Syntax
8 pages
Create XML with Internal and External DTD
No ratings yet
Create XML with Internal and External DTD
5 pages
New Unit-2 2 XML
No ratings yet
New Unit-2 2 XML
21 pages
Understanding XML Basics and Syntax
No ratings yet
Understanding XML Basics and Syntax
22 pages
XML Basics: Structure and Syntax Guide
No ratings yet
XML Basics: Structure and Syntax Guide
24 pages
OOP Concepts and Multithreading in Python
No ratings yet
OOP Concepts and Multithreading in Python
23 pages
DSA Problem Patterns and Solutions Guide
No ratings yet
DSA Problem Patterns and Solutions Guide
2 pages
EECS 3401: Design & Analysis of Algorithms
No ratings yet
EECS 3401: Design & Analysis of Algorithms
38 pages
Trends in Distributed Software Engineering
No ratings yet
Trends in Distributed Software Engineering
9 pages
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
10 pages
Grade - 10 Computer Engineering - Object Oriented Programming
0% (2)
Grade - 10 Computer Engineering - Object Oriented Programming
114 pages
Advanced JavaScript & React Plan
No ratings yet
Advanced JavaScript & React Plan
9 pages
Linked List Implementation in Python
No ratings yet
Linked List Implementation in Python
8 pages
Overview of System and Application Software
No ratings yet
Overview of System and Application Software
12 pages
Binary Tree Implementation in C
No ratings yet
Binary Tree Implementation in C
17 pages
Understanding Process Scheduling in OS
No ratings yet
Understanding Process Scheduling in OS
6 pages
Drawing the Ashoka Chakra in C Graphics
No ratings yet
Drawing the Ashoka Chakra in C Graphics
48 pages
Instruction-Level Parallelism Techniques
No ratings yet
Instruction-Level Parallelism Techniques
15 pages
Apcsa t2wk6 Periodic FRQ
No ratings yet
Apcsa t2wk6 Periodic FRQ
16 pages
Ab Initio Software Overview and Functions
No ratings yet
Ab Initio Software Overview and Functions
7 pages
Types of Processor Architectures Explained
No ratings yet
Types of Processor Architectures Explained
11 pages
ADS and Python Integration Using Datalink
No ratings yet
ADS and Python Integration Using Datalink
50 pages
Binary Counter
No ratings yet
Binary Counter
6 pages
C++ Vectors and Pointers Quiz Guide
No ratings yet
C++ Vectors and Pointers Quiz Guide
12 pages
x86 Processor Architecture Overview
No ratings yet
x86 Processor Architecture Overview
46 pages
Grand Test Prep: OOP, Networks, DBMS, OS, SDLC
No ratings yet
Grand Test Prep: OOP, Networks, DBMS, OS, SDLC
2 pages
Information Security Lab Manual
No ratings yet
Information Security Lab Manual
76 pages
Oracle 1Z0-804 Exam Questions Overview
No ratings yet
Oracle 1Z0-804 Exam Questions Overview
127 pages
Comparative Analysis of T5 Model For Abstractive Text Summarization On Different Datasets
No ratings yet
Comparative Analysis of T5 Model For Abstractive Text Summarization On Different Datasets
7 pages
Deadlock Avoidance in CPU Scheduling
No ratings yet
Deadlock Avoidance in CPU Scheduling
14 pages
Search Insert Position in Python
No ratings yet
Search Insert Position in Python
5 pages
Object Oriented Programming in C++
No ratings yet
Object Oriented Programming in C++
140 pages
Introduction to Compiler Design Concepts
No ratings yet
Introduction to Compiler Design Concepts
28 pages
Shift Registers in Digital Electronics
No ratings yet
Shift Registers in Digital Electronics
22 pages
Code Researcher: Deep Agent for Systems Code
No ratings yet
Code Researcher: Deep Agent for Systems Code
30 pages

Web Unit 3

Uploaded by

Web Unit 3

Uploaded by

Executive Summary

XML (Extensible Markup Language) is a textual data format that represents

XML Elements, Attributes, and Document Structure

<?xml version="1.0" encoding="UTF-8"?>

<book id="bk101" available="true">

Here id and available are attributes. Attribute values must be quoted

<note>5 &lt; 10</note>

or use a CDATA section:

<note><![CDATA[5 < 10]]></note>

Well-Formed vs Valid XML

A valid XML document is well-formed and complies with the constraints

XML: Processing Instructions, CDATA, and Comments

<?xml-stylesheet type="text/xsl" href="[Link]"?>

Here xml-stylesheet is the target. The XML spec defines PIs as

 CDATA Sections: Introduced above. Useful for embedding chunks of

 Comments: Written , comments can appear almost

DTD (Document Type Definition)

with [Link] containing those <!ELEMENT> lines.

You can use occurrence indicators: ? (0 or 1), * (0 or more), + (1 or more)

<!ELEMENT person (firstname, lastname, phone*)>

<!ELEMENT list (item+)> <!-- list has one or more item

<!ENTITY writer "Donald Duck">

<!ENTITY % htmlstruct "(head, body)">

Content Models in DTDs

Example DTD and Validation

xmllint --noout --dtdvalid [Link] [Link]

If [Link] is well-formed and follows the DTD, xmllint will produce no

Element note content does not follow the DTD, expecting

Step-by-step, a validating parser reads the DTD, constructs the grammar,

Well-Formedness Constraints (DTD Section)

XML Schema (XSD)

<?xml version="1.0" encoding="UTF-8"?>

Elements and Types

This defines a global element <note> whose content is a complex type: a

This defines ZipCodeType as a string matching US ZIP code patterns (5 or 9

This says <Person> has subelements <FirstName> and <LastName> and

Derivation (Extension and Restriction)

This EmployeeType extends PersonType by adding a <Position> element and a

Identity Constraints (key, unique, keyref)

Namespaces in Schema and Documents

With an XML file ([Link]):

To validate using xmllint (for example):

xmllint --noout --schema [Link] [Link]

If the content matches the schema, no error is reported. If an element is

Built-in and Custom Data Types

Schema Extensibility and Versioning

Example – Derived Types

Then an element can use type="Dutch_ZIP_Code". The schema is checked

DTD vs XML Schema (XSD) Comparison

When to Use DTD vs XSD

Migrating from DTD to XSD

Validation Tools and Commands

 Xerces (Apache): A Java-based parser. Can be used as a command-

 XML IDEs/Editors: e.g. Oxygen XML Editor, XMLSpy, or even Visual

 Online Validators: Many websites allow pasting XML and XSD/DTD to

 Browsers: Some browsers will validate XML if given a DTD (with

<?xml version="1.0" encoding="UTF-8"?>

<!ELEMENT note (to, from, heading, body)>

- If all elements appear in order, no output (valid).

Sample XML + XSD

- If <body> were removed from [Link], it would error: “Element 'note':

Aspect DTD XML Schema (XSD)

As noted by the XML Schema spec, XSD “substantially reconstructs and

Further Reading and References

You might also like

<note>5 < 10</note>