Usage: webxref -help/-h -noxref -xref/-x -onexref -fluff -htmlonly
-nohttp -delay seconds
-silent/-s -verbose/-v -errors/-e -noint
-long/-l -html
-islocal
-avoid/-a
-one/-1 -depth
-root/-r -fullpath
-date -time -before -after
-find -findexpr >
-replace -replaceexpr -by
[-files/-f] file1 file2
file.html
=========================================
Which parameters to use for what purpose:
=========================================
Default webxref checks the given file and follows the links in
that file. While working it lets you know it's alive by printing
a '+' for each file checked ok, and a '-' for each file with a
problem.
A webxref run can take some time. You can, however, interrupt
webxref with ctrl-c (Unix). Webxref will report on the files
it has inspected up to that moment and exit. (*New!*)
(Note: this is not reliable! webxref is not interruptable at
any time, due to the C-libraries not being re-entrant. (This
probably does not interest you at all, but it's not the
author's fault.)) Specify -noint if you don't want webxref to
try and generate output after an interrupt.
When the whole site has been searched and all links have been
inspected webxref prints a report. Default only problems are
reported. Specify -long to obtain a long report. Specify -html
to get a report in HTML form.
If you want more information while webxref is working specify
-verbose to get messages on every file or -errors to see only
files with problems. With -silent webxref prints nothing at
all while working.
Webxref keeps track of which html-documents are being linked
to from other documents. This is called cross-referencing,
hence webxref's name. If you are not interested in this,
specify -noxref, so you won't be told where things have
failed and probably have to run webxref again. If you're
just interested in one location where a file is referenced
specify -onexref. This saves memory too.
If you need to know if there are files and/or directories in
your site that are not referenced at all by any pages in your
site specify -fluff.
If you want to only inspect files that really have the .html
or .htm extension specify -htmlonly
References starting with a '/', like
refer to the server "root" directory. Specify where this
directory is with -root
File names are abbreviated, that is /u/people/rick/www/a.html
is printed as "a.html" is webxref is called from ~/rick/www.
If you specify -fullpath you'll get the full paths.
If you use full URLs in your site referring to your own site,
say "www.sara.nl" is your www-address and you use links like
then tell webxref that
"www.sara.nl" actually can be found on the local machine with:
-islocal 'www.sara.nl'
If you want to avoid certain files use the -avoid parameter
to specify which files to avoid.
If you want to limit the number of files webxref inspects
you may want to limit the scan to 1 or 2 directories deep
in the file system. If you specify -depth 0 only files in
the current directory are inspected.
If you just want to check if links in a file are valid
specify -one (or -1). Only the links present in the file are
tested, but no more. Use this with -files to specify
a collection of files to just check those files.
When all local files are inspected webxref goes out into
the net to check if the http:// links work. This may be
time-consuming. Specify -nohttp if you don't want that.
To avoid overloading a webserver there is a delay of 1 second
between checks. If you want longer or shorter delays specify
the number of seconds with -delay. (Longer delays may be
necessary if a lot of links refer to the same webserver.)
To see if you have files or directories that were modified
last before or after a certain date/time use:
-before/-after -date yymmdd -time hhmmss. If -before
is given files are reported that were modified before the
date given, with -after files last modified after the date
given are reported.
To tell webxref which files to inspect simply list the file
or files at the end of the command, or use -files or -f
Webxref can search and even search-replace text, see later.
=======================
What the parameters do:
=======================
While checking webxref prints output according to:
-silent/-s Only list files with problems at the end of the run.
-verbose/-v Print information while checking files.
-errors/-e Print errors when they occur, even when -silent.
-noint Do not generate output on interrupt
Webxref generates a report according to:
-long/-l List all files found, not just problems.
-xref/-x List which files reference files (cross-references).
-noxref Do not list which files reference files (default).
-html Print report in html form, links made active and all.
Webxref inspects files/directories according to:
-fluff List which files/directories are never used.
-htmlonly Only inspect files with the .html/.htm extension.
-root rootdir The server root where cgi-bin, icons etc reside
default: the directory where webxref is called.
Links like are looked for
in the rootdir directory.
-fullpath Print full-length filenames, e.g. /u/people/rick/www.html
-islocal url 'www.mymachine.nl' is actually a local file reference.
-avoid regexp Avoid files with names matching regexp for inspection.
-depth number The maximum directory nesting level.
0 means: current directory only,
1 means: directories from the current directory.
100 probably means there is no restriction in
how deep webxref is allowed to find files.
-one/-1 Specify -one if you just want to check the links
from the given file(s) and no further link following.
-nohttp Do not check external URLs via the network.
-delay seconds Wait the specified number of seconds between HTTP checks
-date -time Date [yymm], time [hhmm].
-before -after List files that are modified before or after
the date/time given with -date and -time.
-files/-f files If you want webxref to test a series of files
user the -files parameter, else simply list the
file to test last.
=================
Find/replacement: ** EXPERT ONLY **
=================
Webxref can scan your site for files containing certain text.
To find fixed text use -find. To find text using e.g. wildcards
use -findexpr. The Perl expression is matched with the text of
the file under test. Take care to not have the shell interpret
'*' and '/' by using appropriate quoting. Search is always case-
insensitive. Webxref does search/replace beyond end-of-line.
I.e. newlines are matched, and can even be inserted (use
).
To replace text with something else use -replace and -replaceexpr
and -by. The string or expression you specify with -replace or
-replaceexpr is replaced by the string you specify with -by.
In case of editing, a backup file with a random numeric extension
is placed next to the resulting file. E.g. when index.html is
edited there'll be a file "index.html.1234" or something similar.
(DISCLAIMER: the author cannot be held responsible for any damage
resulting from using the edit- or any other functions of webxref
or indeed any software, hardware, chemical substance, imagined
or real (or imagined to be real) effects or by-effects of anything,
at all, whatsoever.)
-find string report files containing the given string
-findexpr regexp report files containing the given expression
-replace string *REPLACE* string by the string given with -by
-replaceexpr regexp *REPLACE* regexpr by the string given with -by
-by string replacement string (or regexp)
-nobackup Not implemented on purpose.
========
Examples
========
webxref file.html
Checks file.html and files/URLs referenced from file.html
Only lists problems at the end of the run, + and - for each
file checked.
webxref index.html another.html
checks index.html and another.html
webxref -one index.html
just check the links in index.html, don't follow the links
webxref -one *.html
Check only the links in the html-files in the current dir.
webxref -depth 0 index.html
Check index.html, but don't check files in directories
that are deeper in the file system.
webxref -nohttp file.html
Checks file.html, but not external URLs
webxref -htmlonly file.html
Checks file.html, but only files with the .html/htm extension
webxref -avoid '.*Archive.*' file.html
Checks file.html but avoids files with names containing
'Archive'
webxref -avoid '.*Archive.*|.*Distribution.*' file.html
Same as above, but also files with names containing
webxref -islocal www.sara.nl
Treat things like '