hsearchtext(1)
NAME
- hsearchtext - hash search an unordered text file of char
- acter strings
SYNOPSIS
hsearchtext [-A] [-h number] [-I] [-N] [-P] [-r m|n] [-S] [-v] filename [string(s)]
DESCRIPTION
- Hsearchtext is for hash searching an unordered text file
- of character strings.
- The program requires mmap(2) to map the database file into
- the Unix VM system. The database file name is a required command
- line argument.
- The database is a standard Unix text file, one string per
- line.
- The database mechanism is conservative with machine re
- sources, requiring about 17.5 micro-seconds of machine time to
- lookup a word in the Unix system dictionary, (2.5 MB, quarter of
- a million words, single 466 MHz., Pentium, lightly loaded, Linux
- 2.2, time(1) command to lookup every word in the dictionary, di
- vided by the number of words.)
- The program, optionally, eliminates duplicate records,
- (i.e., records that are lexically equal,) removes null records,
- (i.e., "^$",) converts all characters to lowercase, and parses
- records with whitespace, leaving only the last token as the
- record.
- The program can be used to hash queries. The strings to be
- searched for may be supplied as additional optional command line
- arguments, or redirected to the program via stdin for compatibil
- ity with procmail(1), and other e-mail scripting agents.
- A suitable procmail(1) recipe example might be:
:0 wfh
* ? something | hsearchtext reject.db
| formail -A "X-Notice: Word in reject.db database"- which could be, if necessary, overridden, on a case-by
- case basis, with the example recipe:
:0 wfh
* ^X-Notice: +Word +in +reject.db +database
* ? something | hsearchtext accept.db
| formail -I "X-Notice: Word in reject.db database"- or similar construct, where the databases contain e-mail
- addresses or domain names, etc.
- Since the database file is read-only memory mapped, using
- mmap(2), and the database file closed immediately after the mmap
- call, the unstructured/unordered database file can be appended
- from the output of the hsearchtext(1) program, i.e., for example,
- constructs like:
hsearchtext -P example.db "this" "and" "that" > exam- ple.db
- are permitted, (which, for example, would add the words
- "this", "and", "that" to the unstructured/unordered database
- file, example.db, but only if the words were not already in the
- file.)
- Additionally, it is not required that the database file
- exist, and/or be consistent with the requirements of mmap(2).
- Specifically, the file does not have to exist, and/or can have a
- size of zero.
- The program contains less than 700 lines of declarations
- and statements, all of which are documented with in line com
- ments.
- The program has been compiled and tested on SunOS, So
- laris, and Linux, and may work on other brands of Unix.
- If used for querying an unordered text file of character
- strings, the program returns 0 if no error and any of the speci
- fied strings were found in the database file, 1 if no error and
- no strings were found; else returns a unique error code greater
- than 1 representing the error encountered-which will, also, print
- an error diagnostic to stderr.
- The -r option is useful for controlling the return value
- under error conditions-for example, the program return can be
- preempted if the database file can not be opened, (or read,) with
- a return value of match, or no match, depending on environmental
- requirements.
OPTIONS
- filename
- File name.
- string(s)
- Character string(s) to be searched for, (defaults
- to stdin).
- -A Return = match if all strings found, (match if any
- string found).
- -h number
- Hash table size = prime number (99871).
- -I Case sensitive alphabet.
- -N Include null records.
- -P Print the string(s) not in the database.
- -r m|n On file error, exit return = match for m, no match
- for n.
- -S Disable whitespace in file warning.
- -v Print the program's version information.
WARNINGS
- Under buffer overflow conditions, the program makes no at
- tempts at handling the situation-it just detects it, prints an
- error message, and exits.
- The program is capable of rejecting entire Class A, Class
- B, or Class C, IP address ranges. Discretion is advised.
SEE ALSO
- receivedIP(1), receivedIPdb(1), receivedIPdbdedup(1), re
- ceivedIPdbrm(1), receivedIPdbusort(1), bsearchtext(1), re
- ceivedAddress(1), receivedTodb(1), receivedMSGIDdb(1), receive
- dUnknowndb(1), tolower(1), toupper(1), bsorttext(1) receivedIP
- forgedb(1), hsearchtext(1), bsearchbody(1)
DIAGNOSTICS
- Error messages for incompatible arguments, failure to al
- locate memory, inaccessible files, opening and closing files.
AUTHORS
---------------------------------------------------------------------
- A license is hereby granted to reproduce this software
- source code and
to create executable versions from this source code for - personal,
non-commercial use. The copyright notice included with - the software
must be maintained in all copies produced. - THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO
- WARRANTIES
WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF
MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PUR - POSE. THE
AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT - INFRINGE THE
INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY - COUNTRY.
- Copyright (c) 2001-2007, John Conover, All Rights Re
- served.
- Comments and/or bug reports should be addressed to:
john@email.johncon.com (John Conover)- ---------------------------------------------------------------------
January 16, 2007