summaryrefslogtreecommitdiffstats
path: root/htdocs/docs/devel.html
blob: 2619fd01ef43bcd601678c21787ee71bf040af83 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
<!--#set var="revision" value="\$Id: devel.html,v 1.12 2002-12-08 01:46:44 link Exp $"
--><!--#set var="date" value="\$Date: 2002-12-08 01:46:44 $"
--><!--#set var="title" value="Developer Documentation for The W3C MarkUp Validation Service"
--><!--#set var="relroot" value="../"
--><!--#include virtual="../header.html" -->

    <p id="skip">
      The W3C MarkUp Validation Service consists of an SGML Parser, an SGML
      catalog, a CGI program and it's configuration files. In addition it
      relies on a moderately large set of Perl modules for it's operation.
    </p>
    <p>
      This document tries to draw a road map of the prerequisites and what the
      different parts of the system do. It is intended for system
      administrators and people interested in helping developing the validator.
      This is not end user documentation. See the
      <a href="users.html">User Manual</a> for usage instructions.
    </p>

    <div id="prereq" class="stb">
      <h2>Prerequisites</h2>
      <p>
        Apart from a properly configured web server, the Validator needs a
        SGML parser -- that does all the hard work -- and several Perl
        modules used by the "check" CGI script.
      </p>
      <p>
        The SGML parser we're currently using is <code>OpenSP 1.5</code>,
        which can be found on the <a href="http://openjade.sf.net/">OpenJade
        home page</a>.
      </p>
      <p>
        The canonical list of Perl modules we use can be found in the source
        for the "check" CGI script. There is a bunch of lines that of the form
        "use Foo::Bar" where each "Foo::Bar" represents a module. Most modules
        can be found on <a href="http://www.cpan.org/">CPAN</a> (minimum
        versions in parenthesis after the name). The following
        list was complete when CVS spit out:
        <code>$Date: 2002-12-08 01:46:44 $</code>. <tt>:-)</tt>
      </p>
      <dl>
        <dt><code>CGI (2.81)</code></dt>
        <dd>
          The all-singing, all-dancing,
          everything-<em>and</em>-the-kitchen-sink, Perl CGI library. This
          takes care of all those niggly little bits of CGI for us and make
          options parsing and file upload a breeze.
        </dd>
        <dt><code>CGI::Carp</code></dt>
        <dd>CGI-aware warn()/die()</dd>
        <dt><code>File::Spec</code></dt><dd>Portable filespecs.</dd>
        <dt><code>HTML::Parser (3.25)</code></dt>
        <dd>Minimal HTML Parser used for preparse and finding metadata.</dd>
        <dt><code>LWP::UserAgent (1.90)</code></dt>
        <dd>
          Gisle Aas' most excellent WWW library for Perl. This is where our
          support for downloading pages off the net comes from.
        </dd>
        <dt><code>Set::IntSpan</code></dt><dd>Efficient Set operations.</dd>
        <dt><code>Text::Iconv</code></dt>
        <dd>
          Perl-native interface to the (g)libc iconv(3) library. Handles
          charset conversion issues.
        </dd>
        <dt><code>Text::Wrap</code></dt>
        <dd>Wrap text to a sane width. Needed for source output in results.</dd>
        <dt><code>URI::Escape</code></dt>
        <dd>Module to handle escaping special characters in URIs.</dd>
      </dl>
    </div>
    <div id="config" class="stb">
      <h2>Configuration Files</h2>
      <p>
        The validator uses a number of configuration files -- most of which
        are really mapping tables of some form -- to avoid having to check in
        a new version of the code every time a new version of HTML comes out.
        All configuration files can be found in
        <code>$CVSROOT/validator/htdocs/config/</code>.
      </p>
      <p>
        To really understand what each does you should read the source, but
        here is a short description to get you started.
      </p>
      <dl>
        <dt>validator.conf</dt>
        <dd>
          Main configuration file. Gives various parameters (such as the
          address of the maintainer and the URL for the "Home Page") and
          the locations of the other configuration files and mapping tables.
        </dd>
        <dt>types.conf</dt>
        <dd>
          <p>
	    The main document type database for the Validator. This file
            contains information on all the document types we know of. It
            lets us map from a Public Identifier to a plain text version
            string, lookup an URL for more information on a DOCTYPE, and
            check which Content-Types and Namespaces are legal for this
            particular DOCTYPE.
          </p>
          <p>And entry in this file looks like this:</p>
<pre>
&lt;XHTML_1_1&gt;
  Name       = html
  Display    = XHTML 1.1
  Info_URL   = http://www.w3.org/TR/xhtml11/
  PubID      = -//W3C//DTD XHTML 1.1//EN
  SysID      = http://www.w3.org/TR/2001/REC-xhtml11-20010531/DTD/xhtml11-flat.dtd
  Parse_Mode = XML
  &lt;Content_Types&gt;
    Allowed   = application/xhtml+xml
    Forbidden = text/html
    Preferred = application/xhtml+xml
  &lt;/Content_Types&gt;
  &lt;Namespaces&gt;
    Allowed   = http://www.w3.org/1999/xhtml
    Required  = 1
  &lt;/Namespaces&gt;
  &lt;Badge&gt;
    URI    = http://www.w3.org/Icons/valid-xhtml11
    Height = 31
    Width  = 88
  &lt;/Badge&gt;
&lt;/XHTML_1_1&gt;
</pre>
          <p>
            The name used for each section (e.g. "XHTML_1_1") is arbitrary.
            The file will be turned inside out and will end up indexed by
            the "PubID". This means that you cannot have two entries with
            the same PubID. The rest of the parameters are:
          </p>
          <table class="config">
            <tr><th>Name</th><td>The "Document Type Name" for this document type.</td></tr>
	    <tr><th>Display</th><td>The pretty text version for the PubID.</td></tr>
	    <tr><th>Info_URL</th><td>URL for more information on the PubID.</td></tr>
            <tr><th>PubID</th><td>The Formal Public Identifier for this document type.</td></tr>
	    <tr><th>SysID</th><td>A System Identifier for the DTD.</td></tr>
	    <tr><th>Parse_Mode</th><td>Boolean describing whether to treat this as XML or SGML.</td></tr>
	    <tr>
	      <th>Content_Types</th>
              <td class="subtable">
	        <table>
	          <tr><th>Allowed</th><td>Allowed Content-Types</td></tr>
	          <tr><th>Forbidden</th><td>Forbidden Content-Types</td></tr>
	          <tr><th>Preferred</th><td>Preferred Content-Types</td></tr>
	        </table>
	      </td>
	    </tr>
	    <tr>
	      <th>Namespaces</th>
	      <td class="subtable">
	        <table>
	          <tr><th>Allowed</th><td>Allowed Namespaces</td></tr>
	          <tr><th>Required</th><td>Boolean describing whether a Namespace is required in this document type.</td></tr>
	        </table>
	      </td>
	    </tr>
	    <tr>
	      <th>Badge</th>
	      <td class="subtable">
	        <table>
	          <tr><th>URI</th><td>URI for a "Valid Foo" badge.</td></tr>
	          <tr><th>Height</th><td>Height of this image.</td></tr>
	          <tr><th>Width</th><td>Width of this image.</td></tr>
	        </table>
	      </td>
	    </tr>
          </table>
        </dd>
        <dt>eref.cfg</dt>
        <dd>
          Contains the mappings from element names to an URI fragment
          (relative to a configurable URI) for their definitions. Used
          in output when the "Show Source Input" option is enabled.
        </dd>
        <dt>frag.cfg</dt>
        <dd>
          Maps error messages to an URI fragment identifier where an
          explanation of that error can be found.
        </dd>
       </dl>
    </div>
    <div id="todo" class="stb">
      <h2>TODO</h2>
      <p>
        The TODO list for the Validator is online at
        &lt;<a href="../todo.html">http://validator.w3.org/todo.html</a>&gt;.
 	      This is probably the best place to start.
 	    </p>
      <p>
        However this list is by no means comprehensive. Feel free to suggest
        other features that should be on this list or send patches for your
        favourite feature.
      </p>
      <p>
        Keep in mind that features should be of general utility and that the
        point if the validator is that it does an <em>objective</em>
        validation instead of just what some random developer happens to
        think is a Good Idea&reg;. While extra features are nice, they
        shouldn't dilute the value of the validator as an objective check.
      </p>
    </div>

<!--#include virtual="../footer.html" -->
  </body>
</html>