Introduction to SGML
for the W3C Markup Validator

What is SGML?

SGML stands for Standard Generalized Markup Language. This is actually a slight misnomer, since SGML is actually a meta-language — that is, a language for writing markup languages. HTML is a markup language written in SGML — an "SGML application", to use the terminology.

You don't actually have to know much about SGML to use The Validator successfully. If you're interested, though, I recommend TEI's "A Gentle Introduction to SGML" as a good starting point. For in-depth treatment of SGML and HTML I recommend Martin Bryan's "Web SGML and HTML 4.0 Explained".

What is a DTD?

For our purposes, a DTD, or Document Type Definition, is simply a file that defines the syntax of a SGML-based language. The DTDs for HTML 2.0 and HTML 3.2 were written by the HTML Working Group of the IETF, in collaboration with the W3C. From HTML 4.0 on (this includes XHTML), the standards (both prose and DTDs) have been written by the W3C.

What is this DOCTYPE thing The Validator keeps pestering me for?

A DOCTYPE is a SGML document type declaration. Its purpose is to tell an SGML parser what DTD it should use to parse the document. It appears as the first line of the document, and has the form: <!DOCTYPE html PUBLIC "quoted string">

The quoted string is called a public identifier; it refers to the desired DTD by a "well-known" name, usually defined by an associated standard.

Most Web browsers don't actually use an SGML parser (in fact, none that I'm aware of do), and so they don't need a DOCTYPE declaration, and will ignore it if present. The Validator, however, does use an SGML parser, and therefore needs a DOCTYPE declaration. The Validator is more insistent on this point than WebTechs was, which would insert a DOCTYPE on the fly for you; The Validator requires that your DOCTYPE already be in the document.

So now you're preparing to add a DOCTYPE to your document. Be sure that the syntax is as described above, and that you use the correct public identifier; otherwise, The Validator will use the wrong DTD, or will be unable to find a DTD at all, and will produce a huge list of absolutely meaningless errors.

The W3C QA Activity maintains a List of Valid Doctypes that you can choose from, and the WDG maintains a document on "Choosing a DOCTYPE".

WARNING: Some HTML editors will insert a DOCTYPE declaration for you. Unfortunately, this pre-inserted DOCTYPE will sometimes confuse The Validator. This usually occurs when the inserted DOCTYPE does not correspond to the generated HTML. If your editor adds a DOCTYPE to your page, you may need to correct it as described above before running your page through The Validator.