not able to extract a character encoding
The Markup Validator is a free tool and service that checks the syntax of (X)HTML documents.
The Validator is sort of like lint
for C. It compares
your HTML document to the defined syntax of HTML and reports any
discrepancies.
Learn more about the Markup Validator.
One of the important maxims of computer programming is: Be
conservative in what you produce; be liberal in what you accept.
Browsers follow the second half of this maxim by accepting Web pages and trying to display them even if they're not legal HTML. Usually this means that the browser will try to make educated guesses about what you probably meant. The problem is that different browsers (or even different versions of the same browser) will make different guesses about the same illegal construct; worse, if your HTML is really pathological, the browser could get hopelessly confused and produce a mangled mess, or even crash.
That's why you want to follow the first half of the maxim by making sure your pages are legal HTML. The best way to do that is by running your documents through one or more HTML validators.
The Markup Validator is Maintained at W3C by W3C staff and benevolent collaborators, who receive a lot of help from contributors. (Read the full credits)
We're doing our best to provide clear and reliable results as well as a good interface with the Markup Validator, but for some reason you may want to check other validators. Here are a few choices:
Looking for validators at W3C, but not the Markup Validator? Check out the list of validators at W3C, including well-known CSS validator, link checker, etc.
The Validator is based on James
Clark's nsgmls
SGML parser. The Validator itself is a CGI script that fetches your
URL, passes it through nsgmls
, and post-processes the
resulting error list for easier reading.
Read the instructions on our Feedback page.
Most probably, you will want to use the online Markup Validation service. Read the user's manual for further help with this service.
If, for some reason, you prefer running your own instance of the Markup Validator, check out our developer's documentation.
The output of the Markup Validator may be hard to decypher for newcomers and experts alike, so we are maintaining a list of error messages and their interpretation, which should help.
Don't panic. Did The Validator complain about your
DOCTYPE
declaration (or lack thereof)? Make sure your
document has a syntactically correct DOCTYPE
declaration, as described in the section
on DOCTYPE
, and make sure it correctly identifies
the type of HTML you're using. Then run it through The Validator
again; if you're lucky, you should get a lot fewer errors.
If this doesn't help, then you may be experiencing a cascade failure — one error that gets The Validator so confused that it can't make sense of the rest of your page. Try correcting the first few errors and running your page through The Validator again.
Be patient, with a little time and experience you will learn to use the Markup Validator to clean up your HTML documents in no time.
The Markup Validator can not do this for you. You may want to have a look at tools such as HTML Tidy.
A DOCTYPE Declaration is mandatory for most current markup languages and without one it is impossible to reliably validate a document.
One should place a DOCTYPE declaration as the very first thing in an HTML document. For example, for a typical XHTML 1.0 document:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <title>Title</title> </head> <body> <!-- ... body of document ... --> </body> </html>
For XML documents, you may also wish to include an "XML Declaration" even before the DOCTYPE Declaration, but this is not well supported in older browsers. More information about this can be found in the XHTML 1.0 Recommendation.
The W3C QA Activity maintains a List of Valid Doctypes that you can choose from, and the WDG maintains a document on "Choosing a DOCTYPE".
not able to extract a character encoding
An HTML document should be served along with its character encoding.
WDG has some good documentation on using character encodings that will help you fix your document or the way it is served by adding the proper character encoding information.
Most probably, you should read the ampersand section of WDG's excellent "common validation problem"
Most probably, you should read the script section of WDG's excellent "common validation problem"
HTML is based on SGML and uses an SGML feature (called SHORTTAG) (note that this is not the case with XHTML).
With this feature enabled, the "/" in <link ... /> or <meta ... /> already closes the link (or meta) tag, and the ">" becomes some regular text, which is not allowed in the <head> element. Since </head><body> is optional in HTML (again, not in XHTML), it is silently inserted, thus head-only elements like meta and style as well as "</head>" and "<body>", which may apear only once, become false.
(explanation courtesy of Christoph Päper)
This again (as in the previous case) comes from the SHORTTAG feature in HTML (not in XHTML). The typo is actually a "shorthand markup" and is a valid construct in HTML, even though its use is not recommended.