Due to the fact that XHTML is an XML application, certain practices that were perfectly legal in SGML-based HTML 4 [HTML] must be changed.
4.1 Documents must be well-formed
Well-formedness is a new concept introduced by [XML]. Essentially this means that all elements must either have closing tags or be written in a special form (as described below), and that all the elements must nest.
Although overlapping is illegal in SGML, it was widely tolerated in existing browsers.
CORRECT: nested elements.
<p>here is an emphasized <em>paragraph</em>.</p>
INCORRECT: overlapping elements
<p>here is an emphasized <em>paragraph.</p></em>
4.2 Element and attribute names must be in lower case
XHTML documents must use lower case for all HTML element and attribute names. This difference is necessary because XML is case-sensitive e.g. <li> and <LI> are different tags.
4.3 For non-empty elements, end tags are required
In SGML-based HTML 4 certain elements were permitted to
omit the end tag; with the elements that followed implying closure. This
omission is not permitted in XML-based XHTML. All elements other than those
declared in the DTD as EMPTY must have an end tag.
CORRECT: terminated elements
<p>here is a paragraph.</p><p>here is another paragraph.</p>
INCORRECT: unterminated elements
<p>here is a paragraph.<p>here is another paragraph.
4.4 Attribute values must always be quoted
All attribute values must be quoted, even those which appear to be numeric.
CORRECT: quoted attribute values
<table rows="3">
INCORRECT: unquoted attribute values
<table rows=3>
XML does not support attribute minimization.
Attribute-value pairs must be written in full. Attribute names such as compact
and checked cannot occur in elements without their value being
specified.
CORRECT: unminimized attributes
<dl compact="compact">
INCORRECT: minimized attributes
<dl compact>
Empty elements must either have an end tag or the start tag
must end with />. For instance, <br/> or <hr></hr>.
See HTML Compatibility
Guidelines for information on ways to ensure this is backward compatible
with HTML 4 user agents.
CORRECT: terminated empty tags
<br/><hr/>
INCORRECT: unterminated empty tags
<br><hr>
4.7 Whitespace handling in attribute values
In attribute values, user agents will strip leading and trailing whitespace from attribute values and map sequences of one or more whitespace characters (including line breaks) to a single inter-word space (an ASCII space character for western scripts). See Section 3.3.3 of [XML].
In XHTML, the script and style elements are declared as
having #PCDATA content. As a result, < and &
will be treated as the start of markup, and entities such as <
and & will be recognized as entity references by the XML
processor to < and & respectively. Wrapping the
content of the script or style element within a CDATA marked
section avoids the expansion of these entities.
<script>
<![CDATA[
... unescaped script content ...
]]>
</script>
CDATA sections are recognized by the XML
processor and appear as nodes in the Document Object Model, see Section
1.3 of the DOM Level 1 Recommendation [DOM].
An alternative is to use external script and style documents.
SGML gives the writer of a DTD the ability to exclude specific elements from being contained within an element. Such prohibitions (called "exclusions") are not possible in XML.
For example, the HTML 4 Strict DTD forbids the nesting of
an 'a' element within another 'a' element to any
descendant depth. It is not possible to spell out such prohibitions in XML. Even
though these prohibitions cannot be defined in the DTD, certain elements should
not be nested. A summary of such elements and the elements that should not be
nested in them is found in the normative Appendix B.
4.10 The elements with 'id' and 'name' attributes
HTML 4 defined the name attribute for the
elements a, applet, form, frame,
iframe, img, and map. HTML 4 also
introduced the id attribute. Both of these attributes are designed
to be used as fragment identifiers.
In XML, fragment identifiers are of type ID,
and there can only be a single attribute of type ID per element.
Therefore, in XHTML 1.0 the id attribute is defined to be of type ID.
In order to ensure that XHTML 1.0 documents are well-structured XML documents,
XHTML 1.0 documents MUST use the id attribute when defining
fragment identifiers, even on elements that historically have also had a name
attribute. See the HTML
Compatibility Guidelines for information on ensuring such anchors are
backwards compatible when serving XHTML documents as media type text/html.
Note that in XHTML 1.0, the name attribute of
these elements is formally deprecated, and will be removed in a subsequent
version of XHTML.
The Cascading Style Sheets level 2 Recommendation [CSS2] defines style properties which are applied to the parse tree of the HTML or XML document. Differences in parsing will produce different visual or aural results, depending on the selectors used. The following hints will reduce this effect for documents which are served without modification as both media types:
CSS style sheets for XHTML should use lower case element and attribute names.
In tables, the tbody element will be inferred by the parser of an HTML user agent, but not by the parser of an XML user agent. Therefore you should always explicitly add a tbody element if it is referred to in a CSS selector.
Within the XHTML name space, user agents are expected to recognize the "id" attribute as an attribute of type ID. Therefore, style sheets should be able to continue using the shorthand "#" selector syntax even if the user agent does not read the DTD.
Within the XHTML name space, user agents are expected to recognize the "class" attribute. Therefore, style sheets should be able to continue using the shorthand "." selector syntax.
CSS defines different conformance rules for HTML and XML documents; be aware that the HTML rules apply to XHTML documents delivered as HTML and the XML rules apply to XHTML documents delivered as XML.