Last Revised: Wed Sep 13 13:56:20 MET DST 1995

Htgrep Query Formats

This page describes the query formats that are supported by the htgrep search engine package. The support for boolean queries was provided by Paul Sutton.

Htgrep supports either querying by keywords (described here) or by Perl regular expressions. Htgrep can be configured to use either querying format. By default, it assumes that queries are boolean keyword searches, unless the query contains a backslash ("\"), in which case a Perl regular expression is assumed.


How to enter search queries

The search query can either be one or more words (a simple search) separated by spaces, or a boolean expression. By default, all searches are performed in a case insensitive manner, for example, entering "HOUSE" is identical to entering "house", "House" or even "HoUSe". (If necessary, it is now possible to configure htgrep to understand case sensitive queries.)


Simple Searches

Enter a single word to find any search record that contains the exact whole word entered. For example, the search entry "world" would find records containing the word "world", but not "worldwide". If you enter more than one word, it will find entries containing all of the words you entered. For example, "world economy" will find entries containing both the word "world" and the word "economy" (but not necessarily next to each other or in that order).

To find parts of words, use an asterisk (*) to represent missing parts of the word. For example, if you enter "world*" it will match "worldwide", "worlds", etc. Similarly, "*world" would find "underworld", etc.


Boolean Searches

For more control over the search query, you can use a boolean expression. If you enter the word or between two search words (with a space between each word and the "or") it will find any record which contains either the first word, or the second word, or both. For example, "apple or orange" would find records containing the word "apple" or the word "orange", or both.

If instead of the word or you entered and it would match only records which contained both the word "apple" and the word "orange". Note that this would be the same as a simple search for "apple orange" because if the boolean commands are omitted, it defaults to assuming an and between each search word.

To find records which do not contain a particular word, place the word not before it. For example, "not blue" would find all the records which do not contain the word "blue". You can combine the "and", "or" and "not" commands, for example "apple and not red" would find records containing the word apple but not the word red.

For advanced use, you can use brackets to group the expression. For example, "apple and (red or green)" would find all records containing the word "apple" and either "red" or "green" (or both). If the brackets are omitted, the and command has higher precedence, so "apple and red or green" would find all records contain "apple" and "red", and also records containing "green".


Perl Regular Expressions

If you want to use a perl regular expression rather than a simple or boolean search, make sure you use a \char contruct (eg \w or \s). Any search query which contains a backslash will be treated as a perl regular expression.

You can also force htgrep to always prefer perl regular expressions by setting the tag "boolean=no". See the htgrep FAQ list for more details.


Paul Sutton, 26 August 1994
Oscar Nierstrasz, 13 September, 1995