W3C Link Checker documentation

About this service
What it does
Use it online
Install it locally
Comments, suggestions and bugs

About this service

In order to check the validity of the technical reports that W3C publishes, the Systems Team has developed a link checker.

A first version was developed in August 1998 by Renaud Bruyeron. Since it was lacking some functionalities, Hugo Haas rewrote it more or less from scratch in November 1999.

The source code is available publicly under the W3C IPR software notice from CVS.

What it does

The link checker reads an HTML or XHTML document and extracts a list of anchors and links.

It checks that no anchor is defined twice.

It then checks that all the links are dereferenceable, including the fragments. It warns about HTTP redirects, including directory redirects.

It can check recursively a part of a Web site.

There is a command line version and a CGI version. They both support HTTP basic authentication. This is achieved in the CGI version by passing through the authorization information from the user browser to the site tested.

Use it online

There is an checklink">online version of the link checker.

The number of documents that can be checked recursively is limited and there is a delay between each document checked to avoid abuses.

Install it locally

The link checker is written in Perl. It is one single file, but it requires some CPAN modules.

In order to install it:

Install Perl.
You will need the following CPAN distributions, as well as the distributions they possibly depend on. Depending on your Perl version, you might already have some of these installed. For an introduction to installing Perl modules, see The CPAN FAQ.
- libwww-perl (version 5.60 or newer if you want HTTP/1.1 with Keep-Alive)
- HTML-Parser (version 3.00 or newer)
- CGI.pm
- URI
- Time-HiRes
- TermReadKey (optional but recommended for all platforms; required for password input in command line mode for systems that don't have the stty command, eg. Windows)
Download the link checker from CVS.

Calling checklink.pl without any arguments runs the CGI version, and running checklink.pl --help shows how to use the command line version.

If you want to enable the authentication capabilities with Apache, have a look at Steven Drake's hack.

Comments, suggestions and bugs

The current version has proven to be stable. It could however be improved, see the list of open enhancement ideas and bugs for details.

Please send comments, suggestions and bugs about the link checker to the www-validator mailing list (archives), with 'checklink' in the subject.