User's guide for the W3C Markup Validator

Table of contents

Quick Start

Just type (or Cut&Paste) the URL for the page you want to validate into the text field on the form and press the "Validate this page" button.

If you have a local file you want to validate, choose the "File Upload" link from the navigation menu. Select the button labeled "Browse..." (or something like that, depending on your browser) and choose the file you want to upload in the usual manner for your OS.

Introduction

The W3C Markup Validation Service is a web gateway to a well known SGML parser called SP. SP will take your HTML and compare it to a set of objective syntax rules called a "DTD", a Document Type Definition. This way you can be sure your HTML is really valid and not just that it conforms to some random programmer's idea of "nice" HTML. Note that valid HTML does not guarantee that your pages will work OK in all browsers. Most of them are severely broken and you may need to find alternate ways of achieving your goal.

When you send an URL to the W3C Markup Validation Service, it will fetch that URL and feed it to the SGML parser. If you upload a file it'll get fed directly into the SGML parser. We then take the output from the SGML parser and format it nicely as HTML and send it back to your web browser. The W3C Markup Validation Service isn't generating any of the error messages; they are all generated by the underlying SGML Parser which is checking your HTML against the actual standard for the version of HTML you are using.

The Options

In addition to the text field where you enter an URL -- or the file selection field if you are uploading files -- there are a few checkboxes that alter the behaviour of the validator.

These options are:

Encoding

This allows you to override the character encoding information about your document. You may use this option for test purposes, but you will eventually have to serve your document with the correct character encoding, or the validator will complain about it and you document will not be valid.

Use Fallback instead of Override (Encoding) (fbc)

Uses the character encoding override mechanism described above, but only does it as a fall back mechanism if the actual document is not served with character encoding information. Think of this as a gentler override mechanism.

Type

This allows you to override the DOCTYPE declaration for you document. You may use this option for test purposes, but you will eventually have to serve your document with the correct character encoding, or the validator will complain about it and you document will not be valid.

Use Fallback instead of Override (Type) (fbd)

Uses the Doctype override mechanism described above, but only does it as a fall back mechanism if the actual document does not have a Doctype declaration. Think of this as a gentler override mechanism.

Show source input (ss)

Displays the HTML source of the document you validated and links error messages directly to lines in this output. Makes it easy to see what's wrong.

Show an outline of this document (outline)

Will generate an outline of your document from the H1 - H6 elements. For a properly formed document, this will be a nicely nested tree structure. The visualization of your document's structure makes it easier to see where you've skipped a heading.

Show parse tree (sp)

Shows you exactly how the SGML Parser read your document. Probably best used only by advanced users as it deals with low-level SGML constructs.

exclude attributes from the parse tree (noatt)

Suppress attributes from the parse tree to make it more readable.

Validate error pages

The Markup Validator will usually tell you if the page you tried to validate could not be retrieved (for example, if the server gave a "404 not found" message. In some circumstances you may want to be able to validate the error page sent by the server. This is the option to use then.

Verbose Output

This option triggers verbose output. By default the validator will only output the validity result if the document is valid, but you may want to have more information. This is the option to use then. The verbose output also provides helpful explanations in addition to the error messages, which makes it a generally useful option.

Calling/Linking to the Validator

You can link directly to the Validator home page, or you can call the Validator CGI program. The home page is <http://validator.w3.org/> at the moment (and for the foreseeable future) and the CGI program can be reached at <http://validator.w3.org/check>.

If you call the CGI program with extra path info matching "/referer" (i.e. <http://validator.w3.org/check/referer>) it will fetch the referring document and validate that. This means that if you embed a link to that URL in your pages, following on that link will send you the validation results for that page.

You can also link to the validation results for a specific page. You do this by giving "check" an "uri" parameter pointing at the page you want to validate. For example <http://validator.w3.org/check?uri=http://www.example.com/> will validate the www.example.com home page.

The various options are listed above in the section "The Options" in parenthesis after the long name. To add options to your links directly, append the options separated by a semi-colon. For example <http://validator.w3.org/check?uri=http://www.example.com/;ss=1;outline=1;sp=1> will validate the example.com home page with "Show Source", "Outline" and "Show Parse Tree" on, but "Exclude Attributes" off.

You may also see these separated by ampersands, but this usage is deprecated and support may be removed at some time in the future.

Interpreting the results

In spite of our efforts, interpreting the Markup Validator's error messages isn't quite what you'd call easy. The error messages are generated in the context of a full SGML environment which demands a somewhat higher level of technical detail then your average HTML document. We have set up a page listing errors and their explanation, which should help you find out what meaning lies behind the cryptic messages, and fix your markup.

We're working on ways to make the error messages more friendly, but for now, if the errors explanation page doesn't work for you, feel free to email the (publicly archived) www-validator@w3.org mailing list if you need help interpreting the results. This will have the added benefit of letting us know which error messages are causing the most trouble so we can fix those first. Please be as specific as possible and include the exact error message and, preferably, an URL we can validate to see for ourselves.

Output Options

In addition to the HTML output intended for human consumption in a browser, the Validator has some experimental features to generate machine parseable output in a few different forms. To enable these output options, append ";output=<option>" to the URL of the Validation results (an interface for these options will be provided when they exit the beta stage).

These options are experimental! The API and output format is subject to change without notice and may well be removed or disabled at any time. They are provided now to garner public feedback to determine how best to support this functionality in the future. One particularly likely option being considered is removing these features altogether in favor of a full-blown SOAP interface. You have been warned!

EARL/RDF (earl)
Produces output in the EARL RDF syntax.
Notation3 (n3)
Produces output in the Notation3 RDF syntax
XML (xml)

Produces output in a homegrown XML format (yes, we know...).

The DTD for this format is as follows:

	      
<!DOCTYPE result [
  <!ELEMENT result (meta, warnings?, messages?)>
  <!ATTLIST result
    version CDATA #FIXED '0.9'
  >

  <!ELEMENT meta (uri, modified, server, size, encoding, doctype)>
  <!ELEMENT uri      (#PCDATA)>
  <!ELEMENT modified (#PCDATA)>
  <!ELEMENT server   (#PCDATA)>
  <!ELEMENT size     (#PCDATA)>
  <!ELEMENT encoding (#PCDATA)>
  <!ELEMENT doctype  (#PCDATA)>

  <!ELEMENT warnings (warning)+>
  <!ELEMENT warning  (#PCDATA)>

  <!ELEMENT messages (msg)*>
  <!ELEMENT msg      (#PCDATA)>
  <!ATTLIST msg
    line   CDATA #IMPLIED
    col    CDATA #IMPLIED
    offset CDATA #IMPLIED
  >
]>
              
            

Each element except the containers (result, meta, warnings, messages) and the free-form text fields (warning, msg) will take a single value of a specific type.

The base document element is result. The only elements allowed to be directly contained at the first level are meta, warnings, and messages. warnings, and messages may be omitted if empty, and no first-level elements may appear more then once.

The meta element

The meta element contains various metadata about about the Validated document. It contains further elements describing each value.

uri
The URL of the document validated.
modified
The Last-Modified header field of the document as free-form text.
server
The Server header field of the document as free-form text.
size
The size in bytes of the document.
encoding
The Character Encoding used for Validation.
doctype
A text string describing the DOCTYPE used for Validation.

Currently, the type of these fields is free-form text, but it is intended that a future revision will switch to less opaque data types so these values can be reliably machine-parsed.

The warnings element

The warnings element can contain only one sub-element; the warning element. Multiple warning elements may appear and each one contains free-form text corresponding to a warning of the type found in the "Warnings" section of the HTML output (e.g. "DOCTYPE override in effect!").

The messages element.

The messages element can contain only one sub-element; the msg element. Multiple msg elements may appear and each contains free-form text representing one detected error. The msg element has three attributes; line, col, offset. These contain a number representing the line and column on which the error was detected, and the offset in characters from the beginning of the document (as opposed to col which can be said to be the offset from the beginning of the line).

Comma Tools / Site Tools

This site uses "comma tools", as does W3C and other sites. This means you can append a string (starting with a comma, hence the name) to the URL (address) of any page on the site and trigger a few administrative or technical tools for this page.

These tools are still under test, and reportedly do not work yet when appended to a validation result page.

What it does Tool used , shortcut
A plain text version of the page. HTML2Text ,text
Validate the markup. W3C Markup Validator ,validate
Check links (anchors). W3C Link Checker ,checklink or ,checklinks
Check links (recursively) W3C Link Checker ,rchecklink or ,rchecklinks
A version of the page with linearized tables. Tablin ,tablin
CVS history for the page or resource. CVSWeb ,cvs or ,cvslog

Installing a local Validator

You can download the Validator to run on your own system. For Web design departments or agencies it can be a very good idea, saving time and allowing you to not send documents under work or confidential pages over the wire, but it is a complex operation, and is not recommended for average users, for which the free online service at W3C should suffice.

We have created a simple Installation manual, which, along with the Developer's information, should help you install a local instance of the Markup Validator in your own network easily.