hsearchtext(1)

NAME

hsearchtext - hash search an unordered text file of char
acter strings

SYNOPSIS

hsearchtext [-A] [-h number] [-I] [-N] [-P] [-r m|n] [-S]
[-v] filename [string(s)]

DESCRIPTION

Hsearchtext is for hash searching an unordered text file
of character strings.
The program requires mmap(2) to map the database file into
the Unix VM system. The database file name is a required command
line argument.
The database is a standard Unix text file, one string per
line.
The database mechanism is conservative with machine re
sources, requiring about 17.5 micro-seconds of machine time to
lookup a word in the Unix system dictionary, (2.5 MB, quarter of
a million words, single 466 MHz., Pentium, lightly loaded, Linux
2.2, time(1) command to lookup every word in the dictionary, di
vided by the number of words.)
The program, optionally, eliminates duplicate records,
(i.e., records that are lexically equal,) removes null records,
(i.e., "^$",) converts all characters to lowercase, and parses
records with whitespace, leaving only the last token as the
record.
The program can be used to hash queries. The strings to be
searched for may be supplied as additional optional command line
arguments, or redirected to the program via stdin for compatibil
ity with procmail(1), and other e-mail scripting agents.
A suitable procmail(1) recipe example might be:

:0 wfh
* ? something | hsearchtext reject.db
| formail -A "X-Notice: Word in reject.db database"
which could be, if necessary, overridden, on a case-by
case basis, with the example recipe:

:0 wfh
* ^X-Notice: +Word +in +reject.db +database
* ? something | hsearchtext accept.db
| formail -I "X-Notice: Word in reject.db database"
or similar construct, where the databases contain e-mail
addresses or domain names, etc.
Since the database file is read-only memory mapped, using
mmap(2), and the database file closed immediately after the mmap
call, the unstructured/unordered database file can be appended
from the output of the hsearchtext(1) program, i.e., for example,
constructs like:

hsearchtext -P example.db "this" "and" "that" > exam
ple.db
are permitted, (which, for example, would add the words
"this", "and", "that" to the unstructured/unordered database
file, example.db, but only if the words were not already in the
file.)
Additionally, it is not required that the database file
exist, and/or be consistent with the requirements of mmap(2).
Specifically, the file does not have to exist, and/or can have a
size of zero.
The program contains less than 700 lines of declarations
and statements, all of which are documented with in line com
ments.
The program has been compiled and tested on SunOS, So
laris, and Linux, and may work on other brands of Unix.
If used for querying an unordered text file of character
strings, the program returns 0 if no error and any of the speci
fied strings were found in the database file, 1 if no error and
no strings were found; else returns a unique error code greater
than 1 representing the error encountered-which will, also, print
an error diagnostic to stderr.
The -r option is useful for controlling the return value
under error conditions-for example, the program return can be
preempted if the database file can not be opened, (or read,) with
a return value of match, or no match, depending on environmental
requirements.

OPTIONS

filename
File name.
string(s)
Character string(s) to be searched for, (defaults
to stdin).
-A Return = match if all strings found, (match if any
string found).
-h number
Hash table size = prime number (99871).
-I Case sensitive alphabet.
-N Include null records.
-P Print the string(s) not in the database.
-r m|n On file error, exit return = match for m, no match
for n.
-S Disable whitespace in file warning.
-v Print the program's version information.

WARNINGS

Under buffer overflow conditions, the program makes no at
tempts at handling the situation-it just detects it, prints an
error message, and exits.
The program is capable of rejecting entire Class A, Class
B, or Class C, IP address ranges. Discretion is advised.

SEE ALSO

receivedIP(1), receivedIPdb(1), receivedIPdbdedup(1), re
ceivedIPdbrm(1), receivedIPdbusort(1), bsearchtext(1), re
ceivedAddress(1), receivedTodb(1), receivedMSGIDdb(1), receive
dUnknowndb(1), tolower(1), toupper(1), bsorttext(1) receivedIP
forgedb(1), hsearchtext(1), bsearchbody(1)

DIAGNOSTICS

Error messages for incompatible arguments, failure to al
locate memory, inaccessible files, opening and closing files.

AUTHORS

---------------------------------------------------------------------

A license is hereby granted to reproduce this software
source code and
to create executable versions from this source code for
personal,
non-commercial use. The copyright notice included with
the software
must be maintained in all copies produced.
THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO
WARRANTIES
WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF
MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PUR
POSE. THE
AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT
INFRINGE THE
INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY
COUNTRY.
Copyright (c) 2001-2007, John Conover, All Rights Re
served.
Comments and/or bug reports should be addressed to:

john@email.johncon.com (John Conover)
---------------------------------------------------------------------

January 16, 2007
Copyright © 2010-2024 Platon Technologies, s.r.o.           Home | Man pages | tLDP | Documents | Utilities | About
Design by styleshout