missinglink HTML Link Verification Software By Ron Menelli, Radical Solutions, Inc. Version 3.1, January 18, 1997 Copyright (c) 1995-1997 Radical Solutions, Inc. All Rights Reserved COPYRIGHT NOTICE missinglink Copyright (c) 1995-1997 Radical Solutions, Inc. All rights reserved. missinglink is Shareware. You may try missinglink for 30 days. After that period, you are required to pay a $20 fee to become a registered user. missinglink may be distributed free of charge as long as all of the following conditions are met: - This documentation file and the missinglink source file must be distributed together in their original unmodified form with copyright notices intact. Modified versions of missinglink may not be distributed without the permission of the author. - missinglink may not be distributed commercially without permission of the author DISCLAIMER OF WARRANTY THIS SOFTWARE AND THE ACCOMPANYING FILES ARE PROVIDED "AS IS" AND WITHOUT WARRANTIES, EITHER EXPRESSED OR IMPLIED. NO WARRANTY OF FITNESS FOR A PARTICULAR PURPOSE IS OFFERED. ANYONE USING MISSINGLINK AGREES TO ASSUME THE ENTIRE RISK OF USING THE PROGRAM. IN NO WAY CAN THE AUTHOR BE MADE RESPONSIBLE FOR ANY DAMAGE DIRECTLY OR INDIRECTLY CAUSED BY THE USE OR MISUSE OF THIS PROGRAM. Now that *that's* over with... 1). What is it? missinglink (ml for short) is a program, written in Perl, that is designed to help a Webmaster debug his or her Web site. ml searches through HTML files and checks each hyperlink to make sure the file it references really does exist. It can check links to files on the server, links that point to files on other HTTP servers and links within imagemaps. ml will report any problems it finds to the user. The main features of ml are: - Will read NCSA httpd, Apache or similar configuration files and extract configuration information from them. - Will check and links. - Correctly handles anchors: and . - Detects and ignores query strings and extra path information passed to CGI scripts. - Handles NCSA-style imagemaps: validates all links within an imagemap. - Handles the tag used in client-side imagemaps. - Handles the tag used in frames. - Handles . - Is able to validate links to files on other http servers. ml does *not* check your HTML files for valid syntax. If you'd like a program to check syntax for you, I suggest the excellent 'weblint', written by Neil Bowers. Weblint is available at: http://www.khoral.com/staff/neilb/weblint.html 2). Installation To install missinglink on a UNIX machine, copy the 'ml' file into your path and type 'rehash'. If the system is set up as missinglink expects it (well, it *might* happen!), no further work is required before running ml. missinglink expects the following: * Perl is present, and can be found as /usr/bin/perl * NCSA httpd (or any other httpd that uses the same configuration file format, such as Apache httpd) is present, located in /usr/local/etc/httpd * Your system has NCSA imagemap or a compatible program that uses the same imagemap.conf and imagemap files. * The 'hostname' command returns the hostname that your web server uses If these conditions are met, you should not need to configure ml. Note: ml may work on non-UNIX machines provided they have Perl. However, this has not been tested. If you get ml working on a non-UNIX machine, please let us know so we may tell others. 3). Configuration ml has many options you can set. All options are changeable by modifying the variables in a configuration file but the most common options can be changed by providing switches on the ml command line. The options are: -h Display a help message -cpath Specifies the location of the ml config file. -e Check external links. By default, ml simply reports the existence of links to files on other servers. If this option is specified, ml will actually check to make sure the file exists on the other server. -f Follow hyperlinks on this server. If this option is specified, when an HTML file is checked by ml, all files that are referred to by the original file are checked as well. This causes a recursive check of the entire site. ml insures that each file is only checked once. -s Show external links. If ml is not instructed to actually connect to other HTTP servers to verify external links, this option can be used to force ml to report all external links it finds without verifying their correctness. -Bpath This sets the path where ml looks for cgi-bin files. This is not necessary if ml was able to read your server configuration files. -Cpath This sets the path where ml looks for httpd server configuration files. This path defaults to /usr/local/etc/httpd/conf. The configuration files are assumed to be NCSA httpd .conf files. -Dpath This sets the document root directory. This is the directory where the http server looks for HTML documents. This is not necessary if ml is able to read your server configuration files. -Hname This sets the server's hostname. The default is the name returned by the 'hostname' command. -Iname This sets the default index file name. The default is 'index.html', or whatever is specified in the server configuration files. ml accepts options in a configuration file. It will first look at a filename specified using the -c option. If that is not specified, it will look for .mlrc in the current directory. If that isn't found, it will look for .mlrc in your home directory. The file 'mlrc' that comes with ml is an example configuration file. It is suggested that you copy mlrc to your home directory, rename it to .mlrc and edit it to change ml's configuration to suit your needs. 4). Running ml Running ml is pretty darn easy. Simply type: ml [options] filename filename filename ... Where [options] are the command line options as described in the previous section, and the filename(s) are either files or directories you wish to check. If a filename is specified, ml simply checks the contents of the file. If a directory name is specified, ml will check all HTML files in the directory as well as all subdirectories in that directory. Perl will report any errors it finds by printing one-line error messages. The following are some examples: Note: Messages beginning in '**' are errors, messages beginning in '--' are informational messages. ** HREF Info/index.html not found in index.html This line states that, in the file index.html (located in the current directory), there is a link of the form , and the file Info/index.html does not exist. ** IMG SRC Graphics/thing.gif not found in Info/info.html This line states that, in the file Info/info.html, there is a link of the form , and Graphics/thing.gif does not exist (relative to the Info directory). -- HREF http://www.netscape.com in index.html This is an informational message reporting an external link. By default, ml does not attempt to verify links pointing to other servers. If ml was invoked with the -e option, it would contact the server and try to access the named file. ** Imagemap file help.imp not found for imagemap: help This indicates that, in the imagemap.conf file, an imagemap called 'help' is listed, but the corresponding imagemap file, called 'help.imp' does not exist. ** Error looking up host: www.spam.com This indicates that ml was unable to look up the name www.spam.com. ** Error connecting socket for http://www.blahblah.com This indicates that the hostname www.blahblah.com is valid, but the server is not allowing us to connect. ** Timeout occurred while accessing http://www.netscape.com This indicates ml has connected to www.netscape.com, but the server has not returned any data in the alloted amount of time (default is 60 seconds) ** Server returned 404 Not Found for http://www.rsol.com/something This indicates that ml has tried to contact a server, but the server has responded, saying that the requested file does not exist. 5). Bugs and unimplemented features Bugs: - When using followlinks (-f option), sometimes missing links get reported twice. ml does not do the following: - Understand Netscape server configuration (and probably a lot of other server configuration files). - Process ftp, gopher or telnet links - Process Java applets And probably a lot of other things I haven't thought of. 6). Registration and contacting the author As we've mentioned earlier: missinglink is Shareware. You may try missinglink for 30 days. After that period, you are required to pay a $20 (US) fee to become a registered user. Send the following registration form with a check or money order drawn on a U.S. bank for $20 (US), payable to Radical Solutions, Inc. ---------------------------------------------------------------------- missinglink Registration form - ml version 3.1 Name: ________________________________________________ Address : ____________________________________________ ____________________________________________ ZIP/Postal Code: _____________________________________ Country: ____________________________________________ E-Mail: ______________________________________________ ---------------------------------------------------------------------- Send registrations, questions, comments and bug reports to: Radical Solutions, Inc. 6819 Caminito Sueno Carlsbad, CA 92009 USA E-Mail: menelli@cts.com