clara(1)
NAME
clara - a cooperative OCR
SYNOPSIS
clara [options]
DESCRIPTION
Welcome. Clara OCR is a free OCR, written for systems supporting the C
library and the X Windows System. Clara OCR is intended for the cooperative OCR of books. There are some screenshots available at
http://www.claraocr.org/.
This documentation is extracted automatically from the comments of the
Clara OCR source code. It is known as "The Clara OCR Developer's
Guide". It's currently unfinished. First-time users are invited to read
"The Clara OCR Tutorial". There is also an advanced manual known as
"The Clara OCR Advanced User's Manual".
CONTENTS
ArrayClara OCR relies on the memory allocator both for allocation or resizing of some large blocks used by the main data structures, and for
allocation of a large number of small arrays. Currently Clara OCR does
not include or use an special memory allocator, but implements an
interface to realloc. The alloca function is also used sometimes along
the code, generally to allocate buffers for sorting arrays.
ArrayConcerning security, the following criteria is being used:
1. string operations are generally performed using services that accept
a size parameter, like snprint or strncpy, except when the code itself
is simple and guarantees that a overflow won't occur.
2. The CGI clara.pl invokes write privileges through sclara, a program
specially written to perform only a small set of simple operations
required for the operation of the Clara OCR web interface.
The following should be done:
ArrayA naive support for runtime index checking is provided through the
macro checkidx. This checking is performed only if the code is compiled
with the macro MEMCHECK defined and the command-line switch
ArrayClara OCR decides at runtime if the GUI will be used or not. So even
when using Clara OCR in batch mode (-b command-line switch), linking
with the X libraries is required.
ArrayClara OCR uses a lot of global variables. Large data structures, flags,
paths, etc, use stored on global variables. In some cases we use a naming strategy to make the code more readable. The important cases are:
ArrayIn order to allow the GUI to refresh the application window while one
OCR run is in course, Clara does not use multiple threads. The main
function alternates calls to xevents() to receive input and to continue_ocr() to perform OCR. As the OCR operations may take long to complete, a very simple model was implemented to allow the OCR services to
execute only partially.
ArrayEven for non-developers, a knowledge of the internal data structures
used by Clara OCR is required for fine tuning and to make simple
diagnostics.
ArrayAs one character of the document may be composed by two or more closures (for instance when it's broken), it's convenient to work not with
closures, but with sets of closures. So we define the concept of "symbol" as being a set of one or more closures. Initially, the OCR generates one unitary symbol for each closure. Subsequent steps may define
new symbols composed by two or more closures.
ArrayEach symbol is stored in a sdesc structure. Those structures form the
mc array. Once created, a symbol is never deleted. So it's index on the
mc array identifies it (this is important for the web-based revision
procedure). Note that closures and symbols are numbered on a documentrelated basis. The set of closures that define one symbol never
changes. So the symbol bounding box and the total number of black pixels also won't change either.
ArrayOne same closure may belong to more than one symbol. This is important
in order to allow various heuristic trials. For instance, the left closure of the "u" on the preceding section could be identified as the
body of an "i". In this case however we would not find its dot. So the
heuristic could try by chance another solution, for instance to join it
with the nearest closure (in that case, the right closure of the "u")
and try to match it with some pattern of the font.
ArrayThe font size is important for classifying all book symbols on pattern
"types". For instance, books generally use smaller letters for
footnotes. This classification is performed automatically by Clara OCR
and presented by the "PATTERN (types)" window.
ArrayThe vertical alignment of symbols is important for various heuristics.
For instance, the vertical line from a broken "p" matches an "l", but
using alignement tests we're able to refuse this match.
ArrayClara OCR applies The concept of "symbol" to atomic symbols like letters, digits or punctuation signs. Words (as "house" or "peace"), are
handled by Clara OCR as sequences of symbols.
It's very important to compute the words of the page. They provide a
context both to the OCR and to the reviewer. For instance, if the known
symbols of some word were identified as bold, then Clara will automatically make the bold button on when someone tries to review the unknown
symbols of that word. The same applies to prefer the recognition of one
symbol as the digit "1" instead of the character "l" if the known symbols of the "word" are digits. Words are also the basis for revision
based on spelling. Each words is stored on a wdesc structure on the
"word" array.
ArrayThe "acts" or "revision acts" are the human interventions for training
a symbol, merging a fragment to one symbol, etc.
As the human interventions are the more precious source of information,
Clara logs all revision acts, in order to be able to reuse them.
The transliterations are obtained from the revision acts, so each
transliteration refers one (or more) revision acts, and also inherits
some properties from that act (or those acts).
The acts are on the book scope, and not on the page scope. The acts are
stored on the file "acts" on the work directory.
ArrayClara OCR maintains a list of 0 or more proposed or deduced transliterations for each symbol. Along the OCR process, each transliteration
receives "votes" from reviewers (REVISION votes) or from machine deduction heuristics, based on shape similarity (SHAPE votes) or on spelling
(SPELLING votes).
So the choice of the "best" transliteration is performed through election. Votes are stored on structures of type vdesc, and transliterations are stored on structures of type trdesc. Each symbol stores a pointer for a (possibly empty) list of transliterations and each transliteration stores a pointer for a (possibly empty) list of votes.
ArrayThe election process used to choose the "best" transliteration for one
symbol (from those obtained through human revision or heuristics based
on shape similarity or spelling) consists in computing the "preference"
of each transliteration and choosing the one with maximum preference.
ArrayOnce we have computed the "best" transliteration, we can compute its
transliteration class, important for various heuristics. From the
transliteration class it's possible test things like "do we know the
transliteration of this symbol?" or "is it an alphanumeric character?"
or "concerning dimension and vertical alignment could it be an alphanumeric character?", and others.
There are two moments where the transliteration class is computed. The
first is when a transliteration is added to the symbol, and the second
is when the CHAR class is propagated.
The first uses the following criteria to compute the transliteration
class:
1. If the symbol has no transliteration at all, its class is UNDEF.
2. On all other cases, the transliteration with largest preference will
be classified as DOT, COMMA, NOISE, ACCENT and others. This search is
implemented by the classify_tr function in a straightforward way.
Array3.1 Skeleton pixels
The first method implemented by Clara OCR for symbol classification was
skeleton fitting. Two symbols are considered similar when each one contains the skeleton of the other.
Clara OCR implements five heuristics to compute skeletons. The heuristic to be used is informed through the command-line option -k as the SA
parameter. The value of SA may be 0, 1, 2, 3 or 4.
Heuristics 0, 1 and 2 consider a pixel as being a skeleton pixel if it
is the center of a circle inscribed within the closure, and tangent to
the pattern boundary in more than one point.
The discrete implementation of this idea is as follows: for each pixel
p of the closure, compute the minimum distance d from p to some boundary pixel. Now try to find two pixels on the closure boundary such that
the distance from each of them to p does not differ too much from d
(must be less than or equal to RR). These pixels are called "BPs".
To make the algorithm faster, the maximum distance from p to the boundary pixels considered is RX. In fact, if there exists a square of size
2*BT+1 centered at p, then p is considered a skeleton pixel.
As this criteria alone produces fat skeletons and isolated skeleton
pixels along the closure boundary, two other conditions are imposed:
the angular distance between the radiuses from p to each of those two
pixels must be "sufficiently large" (larger than MA), and a small path
joining these two boundary pixels (built only with boundary pixels)
must not exist (the "joined" function computes heuristically the smallest boundary path between the two pixels, and that distance is then
compared to MP).
The heuristics 1 and 2 are variants of heuristic 0:
1. (SA = 1) The minimum linear distance between the two BPs is specified as a factor (ML) of the square of the radius. This will avoid the
conversion from rectangular to polar coordinates and may save some CPU
time, but the results will be slightly different.
2. (SA = 2) No minimum distance checks are performed, but a minimum of
MB BPs is required to exist in order to consider the pixel p a skeleton
pixel.
The heuristic 3 is very simple. It computes the skeleton removing BT
times the boundary.
The heuristic 4 uses "growing lines". For angles varying in steps of approximately 22 degrees, a line of lenght RX pixels is drawn from each pixel. The heuristic check if the line can or cannot be entirely drawn using black pixels. Depending on the results, it decides if the pixel is an skeleton pixel or not. For instance: if all lines could be drawn, then the pixel is center of an inscribed circle, so it's considered an skeleton pixels. All considered cases can be found on the source code.
The heuristic 5 computes the distance from each pixel to the border,
for some definition of distance. When the distance is at least RX, it
is considered a skeleton pixel. Otherwise, it will be considered a
skeleton pixel if its distance to the border is close to the maximum
distance around it (see the code for details).
ArrayPairing applies to letters and digits. We say that the symbols a and b
(in this order) are paired if the symbol b follows the symbol a within
one same word. For instance, "h" and "a" are paired on the word "that",
"3" and "4" are paired on "12345", but "o" and "b" are not paired on
"to be" (because they're not on the same word).
The function s_pair tests symbol pairing, and returns the following
diagnostics:
0 .. the symbols are paired 1 .. insuficcient vertical intersection 2
.. one or both symbols above ascent 3 .. one or both symbols below
descent 4 .. maximum horizontal distance exceeded 5 .. incomplete data
6 .. different zones
If p is nonzero, then store the inferred alignment for each symbol (a
and b) on the va field of these symbols, except when these symbols have
the va field already defined.
ArrayThe "build" OCR step, implemented by the "build" function, distributes
the symbols on words (analysing the distance, heights and relative
position for each pair of symbols), and the words on lines (analysing
the distance, heights and relative position for each pair of words).
Various important heuristics take effect here.
0. Preparation
The first step of build is to distribute the symbols on words. This is
achieved by:
a. Undefining the next-symbol ("E" field) and previous-symbol ("W"
field) links for each symbol, the surrounding word ("sw" field) of each
symbol, and the next signal ("sl" field) for each symbol.
Remark: The next-symbol and previous symbol links are used to build the
list of symbols of each word. For instance, on the word "goal", "o" is
the next for "g" and the previous for "a", "g" has no previous and "l"
has no next).
b. Undefining the transliteration class of SCHARs and the uncertain
alignment information.
2. Distributing symbols on words
The second step is, for each CHAR not in any word, define a unitary
word around it, and extend it to right and left applying the symbol
pairing test.
3. Computing the alignment using the words
Some symbols do not have a well-defined alignment by themselves. For
instance, a dot may be baseline-aligned (a final dot) or 0-aligned (the
"i" dot). So when computing their alignments, we need to analyse their
neighborhoods. This is performed in this step.
4. Validating the recognition
Shape-based recognitions must be validated by special heuristics. For
instance, the left column of a broken "u" may be recognized as the body
of an "i" letter. A validation heuristic may refuse this recognition
for instance because the dot was not found. These heuristics are peralphabet.
5. Creating fake words for punctuation signs
To produce a clean output, symbols that do not belong to any word are
not included on the OCR output. So we need to create fake words for
punctuation signs like commas of final dots.
6. Aligning words
Words need to be aligned in order to detect the page text lines. This
is perfomed as follows:
a. Undefine the next-word and previous-word links for each word. These
are links for the previous and next word within lines. For instance, on
the line "our goal is", "goal" is the next for "our" and the previous
for "is", "our" has no previous and "is" has no next.
b. Distribution of the words on lines. This is just a matter of computing, for each word, its "next" word. So for each pair of words, we
test if they're "paired" in the sense of the function w_pair. In affirmative case, we make the left word point to the right word as its
"next" and the rigth point to the left as its "previous".
The function w_pair does not test the existence of intermediary words.
So on the line "our goal is" that function will report pairing between
"our" and "is". So after detecting pairing, our loop also checks if
the detected pairing is eventually "better" than those already
detected.
c. Sort the lines. The lines are sorted based on the comparison performed by the function "cmpln".
7. Computing word properties
Finally, word properties can be computed once we have detected the
words. Some of these properties are applied to untransliterated symbols. The properties are:
1. The baseline left and right ordinates.
2. The italic and bold flags.
3. The alphabet.
4. The word bounding box.
Array3.5 Synchronization
3.6 The function list_cl
The function list_cl lists all closures that intersect the rectangle of
height h and width w with top left (x,y). The result will be available
on the global array list_cl_r, on indexes 0..list_cl_sz-1. This service
is used to clip the closures or symbols (see list_s) currently visible
on the PAGE window. It's also used by OCR operations that require
locating the neighbours of one closure or symbol (see list_s).
The parameter reset must be zero on all calls, except on the very first
call of this function after loading one page.
Every time a new page is loaded, this service must be called informing
a nozero value for the reset parameter. In this case, the other parameters (x, y, w and h) are ignored, and the effect will be preparing the
page-specific static data structures used to speed up the operation.
Closures are located by list_cl from the static lists of closures clx
and cly, ordered by leftmost and topmost coordinates. Small and large
closures are handled separately. The number of closures with width
larger than FS is counted on nlx. The number of closures with height
larger than FS is counted on nly.
The clx array is divided in two parts. The left one contains
(topcl+1)-nlx indexes for the closures with width not larger than FS,
sorted by the leftmost coordinate. The right one contains the other
indexes, in descending order.
The cly array is divided in two parts. The left one contains
(topcl+1)-nly indexes for the closures with height not larger than FS,
sorted by the topmost coordinate. The right one contains the other
indexes, in descending order.
Array4.1 Main characteristics
1. Clara OCR GUI uses only 5 colors: white, gray, darkgray, verydarkgray and black. The RGB value for each one is customizable at startup
(-c command-line option). On truecolor displays, graymaps are displayed
using more graylevels than the 5 listed above.
2. The X I/O is not buffered. Buffered X I/O is implemented but it's
not being used.
3. Only one X font is used for all needs (button lables, menu entries,
HTML renderization, and messages).
4. Asynchronous refresh. The OCR operations just set the redraw flags
(redraw_button, redraw_wnd, redraw_int, etc) and let the redraw() function make its work.
ArrayThe current window is informed through the CDW global variable (set by
the setview function). The variable CDW is an index for the dw array of
dwdesc structs. Some macros are used to refer the fields of the structure dw[CDW]. The list of all them can be found on the headers under
the title "Parameters of the current window".
ArrayThe Bitmaps on HTML windows and on the PAGE window are exhibited in
"reduced" fashion (a square RFxRF of pixels from the bitmap is mapped
to one display pixel). If RF=1, then each bitmap pixel will map to one
display pixel.
ArrayClara is able to read a piece of HTML code, render it, display the rendered code, and attend events like selection of an anchor, filling a
text field, or submitting a form. Note that anchor selection and form
submission activate internal procedures, and won't call external customizable CGI programs.
Most windows displayed by Clara are produced using this HTML support.
When the "show HTML source" option on the "View" menu is selected,
Clara will display unrendered HTML, and it will become easier to identify the HTML windows. Note that all HTML is produced by Clara itself.
Clara won't read HTML from files or through HTTP.
Perhaps you are asking why Clara implements these things. Clara does
not intend to be a web browser. Clara supports HTML because we were in
need of a forms interface, and the HTML forms is around there, ready to
be used, and extensively proved on practice as an easy and effective
solution. Note that we're not trying to achieve completeness. Clara
HTML support is partial. There is only one font available, tables cannot be nested and most options are unavailable, PBM is the only graphic
format supported, etc. However, it attends our needs, and the code is
surprisingly small.
Let's understand how the HTML windows work. First of all, note that
there is a html flag on the structure that defines a window (structure
dwdesc). For instance, this flag is on for the window OUTPUT (initializition code at function setview).
When the function redraw is called and the window OUTPUT is visible on
the plate, the service draw_dw will be called informing OUTPUT through
the global variable CDW (Current Window). However, before making that,
redraw will test the flag RG to check if the HTML contents for the OUTPUT window must be generated again, calling a function specific to that
window. For instance, when a symbol is trained, this flag must be set
in order to notify asynchronously the need to recompute the window contents, and render it again.
ArrayThe rendering of each element on the HTML page creates one graphic element ("GE" for short).
Free text is rendered to one GE of type GE_TEXT per word. This is a
"feature". The rendering procedures are currently unable to put more
than one text word per GE.
IMG tags are rendered to one GE of type GE_IMG. Note that the value of
the SRC element cannot be the name of a file containing a image, but
must be "internal" or "pattern/n". These are keywords to the web clip
and the bitmap the pattern "n". The value of the SRC attribute is
stored on the "txt" field of the GE.
INPUT tags with TYPE=TEXT are rendered to one GE of type GE_INPUT. The
predefined value of the field (attribute VALUE) is stored on the field
"txt" of the GE. The name of the field (attribute NAME) is stored on
the field "arg" of the GE.
The Clara OCR HTML support added INPUT tags with TYPE=NUMBER. They're
rendered like TYPE=TEXT, but two steppers are added to faster selection. So such tags will create three GEs (left stepper, input field,
and right stepper).
INPUT tags with TYPE=CHECKBOX are rendered to one GE of type GE_CBOX.
The variable name (attribute NAME) is stored on the "arg" field. The
argument to VALUE is stored on the field "txt". The status of the
checkbox is stored on the "iarg" field (2 means "checked", 0 means "not
checked").
INPUT tags with TYPE=RADIO are rendered just like CHECKBOX. The only
difference is the type GE_RADIO instead GE_CBOX.
SELECT tags (starting a SELECT element) are rendered to one GE of type
SELECT. In fact, the entire SELECT element is stored on only one GE.
Each SELECT element originates one standard context menu, as implemented by the Clara GUI. The "iarg" field stores the menu index. The
free text on each OPTION element is stored as an item label on the
context menu. The implementation of the SELECT element is currently
buggy: (a) for each renderization, one entry on the array of context
menus will be allocated, and will never be freed, and (b) The attribute
NAME of the SELECT won't be stored anywhere.
INPUT tags with TYPE=SUBMIT are rendered to one GE of type GE_SUBMIT.
The value of the attribute VALUE is stored on the "txt" field. The
value of the ACTION attribute is stored on the field "arg". The field
"a" will store HTA_SUBMIT.
TD tags are rendered to one GE of type GE_RECT. The value of the
BGCOLOR attribute is stored on the "bg" field as a code (only the colors known by the Clara GUI are supported: WHITE, BLACK, GRAY, DARKGRAY
and VDGRAY). The coordianates of the cell within the table are stored
on the fields "tr" and "tc".
ArrayThe Clara OCR GUI tries to apply immediately all actions taken by the
user. So the HTML forms (e.g. the PATTERN window) do not contain SUBMIT
buttons, because they're not required (some forms contain a SUBMIT button disguised as a CONSIST facility, but this is just for the user's
convenience).
The editable input fields make auto-submission mechanisms a bit harder,
because we cannot apply consistency tests and process the form before
the user finishes filling the field, so auto-submission must be triggered on selected events. The triggers must be a bit smart, because
some events must be attended before submission (for instance toggle a
CHECKBOX), while others must be attended after submission (for instance
changing the current tab). So auto-submission must be carefully studied. The current strategy follows:
a. When the window PAGE (symbol) or the window PATTERN is visible,
auto-submit just after attending the buttons that change the current
symbol/pattern data (buttons BOLD, ITALIC, ALPHABET or PTYPE).
b. When the window PAGE (symbol) or the window PATTERN is visible,
auto-submit just before attending the left or right arrows.
c. When the user presses ENTER and an active input field exists, autosubmit.
d. Auto-submit as the first action taken by the setview service, in
order to flush the current form before changing the current tab or tab
mode.
e. Auto-submit just after opening any menu, in order to flush data
before some critic action like quitting the program or starting some
OCR step.
f. Auto-submit just after attending CHECKBOX or RADIO buttons.
Array5.5 The function show_hint
Messages are put on the status line (on the bottom of the application X
window) using the show_hint service. The show_hint service receives two
parameters: an integer f and a string (the message).
If f is 0, then the message is "discardable". It won't be displayed if
a permanent message is currently being displayed.
If f is 1, then the message is "fixed". It won't be erased by a subsequent show_hint call informing as message the empty string (in practical terms, the pointer motion won't clear the message).
If f is 2, then the message is "permanent" (the message will be cleared
only by other fixed or permanent message).
ArrayStarts a complete OCR run or some individual OCR step on one given
page, or on all pages. For instance, start_ocr is called by the GUI
when the user presses the "OCR" button or when the user requests loading of one specific page.
In fact, almost all user requested operation is performed as an "ocr
step"in order to take advantage from the execution model implemented by
the function continue_ocr. So start_ocr is the starting point for
attending almost all user requests.
If p is -1, process all pages, if p < -1, process only the current page
(cpage) otherwise process only the page p. If s>=0 then run only step
s, otherwise run all steps.
Array6.1 How to add a bitmap comparison method
It's not hard to add a bitmap comparison method to Clara OCR. This may
become very important when the available heuristics are unable to classify the symbols of some book, so a new heuristic must be created. In
order to exemplify that, we'll add a naive bitmap comparison method.
It'll just compare the number of black pixels on each bitmap, and consider that the bitmaps are similar when these numbers do not differ too
much.
Please remember that the code added or linked to Clara OCR must be GPL.
In order to add the new bitmap comparison method, we need to write a
function that compares two bitmaps returning how similar they are, add
this function as an alternative to the Options menu, and call it when
classifying the page symbols. We'll perform all these steps adding a
naive comparison method, step by step. The more difficult one is to
write the bitmap comparison method. This step is covered on the subsection "How to write a bitmap comparison function".
Let's present the other two steps. To add the new method to the Options
menu, we need:
ArrayThese are the steps to add a new button:
Array(Some) Major tasks
1. Vertical segmentation (partially done).
2. Heuristics to merge fragments.
3. Spelling-generated transliterations
4. Geometric detection of lines and words
5. Finish the documentation
6. Simplify the revision acts subsystem
Minor tasks
1. Change sprintf to snprintf.
2. Fix assymetric behaviour of the function "joined".
3. Optimize bitmap copies to copy words, not bits, where possible (partially done).
4. Support Multiple OCR zones (partially done).
5. Make sure that the access to the data structures is blocked during
OCR (all functions that change the data structures must check the value
of the flag "ocring").
6. Use 64-bit integers for bitmap comparisons and support big-endian
CPUs (partially done).
7. Clear memory buffers before freeing.
8. Allow the transliterations to refer multiple acts (partially done).
9. Rewrite composition of patterns for classification of linked symbols.
10. The flea stops but do not disappear when the window lost and regain
focus.
11. Substitute various magic numbers by per-density and per-minimumfontsize values.
ArrayClara OCR was written by Ricardo Ueda Karpischek. Giulio Lunati wrote
the internal preprocessor. Clara OCR includes bugfixes produced by
other developers. The Changelog (http://www.claraocr.org/CHANGELOG)
acknowledges all them (see below). Imre Simon contributed high-volume
tests, discussions with experts, selection of bibliographic resources,
propaganda and many ideas on how to make the software more useful.
Ricardo authored various free materials, some included (at least) in
Conectiva, Debian, FreeBSD and SuSE (the verb conjugator "conjugue",
the ispell dictionary br.ispell and the proxy axw3). He recently ported
the EiC interpreter to the Psion 5 handheld and patched the Xt-based
vncviewer to scale framebuffers and compute image diffs. Ricardo works
as an independent developer and instructor. He received no financial
aid to develop Clara OCR. He's not an employee of any company or organization.
Imre Simon promotes the usage and development of free technologies and
information from his research, teaching and administrative labour at
the University.
Roberto Hirata Junior and Marcelo Marcilio Silva contributed ideas on
character isolation and recognition. Richard Stallman suggested
improvements on how to generate HTML output. Marius Vollmer is helping
to add Guile support. Jacques Le Marois helped on the announce process.
We acknowledge Mike O'Donnell and Junior Barrera for their good criticism. We acknowledge Peter Lyman for his remarks about the Berkeley
Digital Library, and Wanderley Antonio Cavassin, Janos Simon and
Roberto Marcondes Cesar Junior for some web and bibliographic pointers.
Bruno Barbieri Gnecco provided hints and explanations about GOCR (main
author: Jorg Schulenburg). Luis Jose Cearra Zabala (author of OCRE) is
gently supporting our tentatives of using portions of his code. Adriano
Nagelschmidt Rodrigues and Carlos Juiti Watanabe carefully tried the
tutorial before the first announce. Eduardo Marcel Macan packaged Clara
OCR for Debian and suggested some improvements. Mandrakesoft is hosting
claraocr.org. We acknowledge Conectiva and SuSE for providing copies of
their outstanding distributions. Finally, we acknowledge the late Jose
Hugo de Oliveira Bussab for his interest in our work.
Adriano Nagelschmidt Rodrigues donated a 15" monitor.
The fonts used by the "view alphabet map" feature came from Roman Czyborra's "The ISO 8859 Alphabet Soup" page at http://czyborra.com/charsets/iso8859.html.
The names cited by the CHANGELOG (and not cited before) follow (small
patches, bug reports, specfiles, suggestions, explanations, etc).
- Brian G. (win32), Bruce Momjian, Charles Davant (server admin), Daniel Merigoux, De Clarke, Emile Snider (preprocessor, to be released), Erich Mueller, Franz Bakan (OS/2), groggy, Harold van Oostrom, Ho Chak Hung, Jeroen Ruigrok, Laurent-jan, Nathalie Vielmas, Romeu Mantovani Jr (packager), Ron Young, R P Herrold, Sergei Andrievskii, Stuart Yeates, Terran Melconian, Thomas Klausner (NetBSD), Tim McNerney, Tyler Akins.