missinglink
HTML Link Verification Software
By Ron Menelli, Radical Solutions, Inc.
Version 3.1, January 18, 1997
Copyright (c) 1995-1997 Radical Solutions, Inc.
All Rights Reserved
COPYRIGHT NOTICE
missinglink Copyright (c) 1995-1997 Radical Solutions, Inc. All
rights reserved.
missinglink is Shareware. You may try missinglink for 30 days. After
that period, you are required to pay a $20 fee to become a registered
user.
missinglink may be distributed free of charge as long as all of the
following conditions are met:
- This documentation file and the missinglink source file must be
distributed together in their original unmodified form with copyright
notices intact. Modified versions of missinglink may not be
distributed without the permission of the author.
- missinglink may not be distributed commercially without permission
of the author
DISCLAIMER OF WARRANTY
THIS SOFTWARE AND THE ACCOMPANYING FILES ARE PROVIDED "AS IS" AND
WITHOUT WARRANTIES, EITHER EXPRESSED OR IMPLIED. NO WARRANTY OF
FITNESS FOR A PARTICULAR PURPOSE IS OFFERED. ANYONE USING MISSINGLINK
AGREES TO ASSUME THE ENTIRE RISK OF USING THE PROGRAM. IN NO WAY CAN
THE AUTHOR BE MADE RESPONSIBLE FOR ANY DAMAGE DIRECTLY OR INDIRECTLY
CAUSED BY THE USE OR MISUSE OF THIS PROGRAM.
Now that *that's* over with...
1). What is it?
missinglink (ml for short) is a program, written in Perl, that is
designed to help a Webmaster debug his or her Web site. ml searches
through HTML files and checks each hyperlink to make sure the file it
references really does exist. It can check links to files on the
server, links that point to files on other HTTP servers and links
within imagemaps. ml will report any problems it finds to the user.
The main features of ml are:
- Will read NCSA httpd, Apache or similar configuration files and
extract configuration information from them.
- Will check and links.
- Correctly handles anchors: and .
- Detects and ignores query strings and extra path information passed
to CGI scripts.
- Handles NCSA-style imagemaps: validates all links within an imagemap.
- Handles the tag used in client-side imagemaps.
- Handles the tag used in frames.
- Handles .
- Is able to validate links to files on other http servers.
ml does *not* check your HTML files for valid syntax. If you'd like a
program to check syntax for you, I suggest the excellent 'weblint',
written by Neil Bowers. Weblint is available at:
http://www.khoral.com/staff/neilb/weblint.html
2). Installation
To install missinglink on a UNIX machine, copy the 'ml' file into your
path and type 'rehash'. If the system is set up as missinglink expects
it (well, it *might* happen!), no further work is required before
running ml.
missinglink expects the following:
* Perl is present, and can be found as /usr/bin/perl
* NCSA httpd (or any other httpd that uses the same configuration file
format, such as Apache httpd) is present, located in
/usr/local/etc/httpd
* Your system has NCSA imagemap or a compatible program that uses the
same imagemap.conf and imagemap files.
* The 'hostname' command returns the hostname that your web server
uses
If these conditions are met, you should not need to configure ml.
Note: ml may work on non-UNIX machines provided they have Perl.
However, this has not been tested. If you get ml working on a
non-UNIX machine, please let us know so we may tell others.
3). Configuration
ml has many options you can set. All options are changeable by
modifying the variables in a configuration file but the most common
options can be changed by providing switches on the ml command line.
The options are:
-h Display a help message
-cpath Specifies the location of the ml config file.
-e Check external links. By default, ml simply reports the
existence of links to files on other servers. If this
option is specified, ml will actually check to make sure
the file exists on the other server.
-f Follow hyperlinks on this server. If this option is
specified, when an HTML file is checked by ml, all files
that are referred to by the original file are checked as
well. This causes a recursive check of the entire site. ml
insures that each file is only checked once.
-s Show external links. If ml is not instructed to actually
connect to other HTTP servers to verify external links, this
option can be used to force ml to report all external links
it finds without verifying their correctness.
-Bpath This sets the path where ml looks for cgi-bin files. This
is not necessary if ml was able to read your server
configuration files.
-Cpath This sets the path where ml looks for httpd server
configuration files. This path defaults to
/usr/local/etc/httpd/conf. The configuration files are
assumed to be NCSA httpd .conf files.
-Dpath This sets the document root directory. This is the
directory where the http server looks for HTML documents.
This is not necessary if ml is able to read your server
configuration files.
-Hname This sets the server's hostname. The default is the name
returned by the 'hostname' command.
-Iname This sets the default index file name. The default is
'index.html', or whatever is specified in the server
configuration files.
ml accepts options in a configuration file. It will first look at a
filename specified using the -c option. If that is not specified, it
will look for .mlrc in the current directory. If that isn't found, it
will look for .mlrc in your home directory. The file 'mlrc' that comes
with ml is an example configuration file. It is suggested that you
copy mlrc to your home directory, rename it to .mlrc and edit it to
change ml's configuration to suit your needs.
4). Running ml
Running ml is pretty darn easy. Simply type:
ml [options] filename filename filename ...
Where [options] are the command line options as described in the
previous section, and the filename(s) are either files or directories
you wish to check.
If a filename is specified, ml simply checks the contents of the file.
If a directory name is specified, ml will check all HTML files in the
directory as well as all subdirectories in that directory.
Perl will report any errors it finds by printing one-line error
messages. The following are some examples:
Note: Messages beginning in '**' are errors, messages beginning in
'--' are informational messages.
** HREF Info/index.html not found in index.html
This line states that, in the file index.html (located in the current
directory), there is a link of the form ,
and the file Info/index.html does not exist.
** IMG SRC Graphics/thing.gif not found in Info/info.html
This line states that, in the file Info/info.html, there is a link of
the form , and Graphics/thing.gif does
not exist (relative to the Info directory).
-- HREF http://www.netscape.com in index.html
This is an informational message reporting an external link. By
default, ml does not attempt to verify links pointing to other
servers. If ml was invoked with the -e option, it would contact the
server and try to access the named file.
** Imagemap file help.imp not found for imagemap: help
This indicates that, in the imagemap.conf file, an imagemap called
'help' is listed, but the corresponding imagemap file, called
'help.imp' does not exist.
** Error looking up host: www.spam.com
This indicates that ml was unable to look up the name www.spam.com.
** Error connecting socket for http://www.blahblah.com
This indicates that the hostname www.blahblah.com is valid, but the
server is not allowing us to connect.
** Timeout occurred while accessing http://www.netscape.com
This indicates ml has connected to www.netscape.com, but the server
has not returned any data in the alloted amount of time (default is
60 seconds)
** Server returned 404 Not Found for http://www.rsol.com/something
This indicates that ml has tried to contact a server, but the server
has responded, saying that the requested file does not exist.
5). Bugs and unimplemented features
Bugs:
- When using followlinks (-f option), sometimes missing links get reported
twice.
ml does not do the following:
- Understand Netscape server configuration (and probably a lot of
other server configuration files).
- Process ftp, gopher or telnet links
- Process Java applets
And probably a lot of other things I haven't thought of.
6). Registration and contacting the author
As we've mentioned earlier:
missinglink is Shareware. You may try missinglink for 30 days. After
that period, you are required to pay a $20 (US) fee to become a
registered user.
Send the following registration form with a check or money order drawn
on a U.S. bank for $20 (US), payable to Radical Solutions, Inc.
----------------------------------------------------------------------
missinglink Registration form - ml version 3.1
Name: ________________________________________________
Address : ____________________________________________
____________________________________________
ZIP/Postal Code: _____________________________________
Country: ____________________________________________
E-Mail: ______________________________________________
----------------------------------------------------------------------
Send registrations, questions, comments and bug reports to:
Radical Solutions, Inc.
6819 Caminito Sueno
Carlsbad, CA 92009
USA
E-Mail: menelli@cts.com