Xapian(3pm)
NAME
Search::Xapian - Perl XS frontend to the Xapian C++ search library.
SYNOPSIS
use Search::Xapian;
my $db = Search::Xapian::Database->new( '[DATABASE DIR]' );
my $enq = $db->enquire( '[QUERY TERM]' );
printf "Running query '%s'\n", $enq->get_query()->get_description();
my @matches = $enq->matches(0, 10);
print scalar(@matches) . " results found\n";
foreach my $match ( @matches ) {
my $doc = $match->get_document();
printf "ID %d %d%% [ %s ]\n", $match->get_docid(), $match->get_percent(), $doc->get_data();
}
DESCRIPTION
This module wraps most methods of most Xapian classes. The missing
classes and methods should be added in the future. It also provides a
simplified, more 'perlish' interface to some common operations, as
demonstrated above.
There are some gaps in the POD documentation for wrapped classes, but
you can read the Xapian C++ API documentation at
<http://xapian.org/docs/apidoc/html/annotated.html> for details of
these. Alternatively, take a look at the code in the examples and
tests.
If you want to use Search::Xapian and the threads module together, make
sure you're using Search::Xapian >= 1.0.4.0 and Perl >= 5.8.7. As of
1.0.4.0, Search::Xapian uses CLONE_SKIP to make sure that the perl
wrapper objects aren't copied to new threads - without this the
underlying C++ objects can get destroyed more than once.
- If you encounter problems, or have any comments, suggestions, patches,
etc please email the Xapian-discuss mailing list (details of which can
be found at <http://xapian.org/lists>).
- EXPORT
- None by default.
- :db
DB_OPEN - Open a database, fail if database doesn't exist.
- DB_CREATE
- Create a new database, fail if database exists.
- DB_CREATE_OR_OPEN
- Open an existing database, without destroying data, or create a new database if one doesn't already exist.
- DB_CREATE_OR_OVERWRITE
- Overwrite database if it exists.
- :ops
- OP_AND
- Match if both subqueries are satisfied.
- OP_OR
- Match if either subquery is satisfied.
- OP_AND_NOT
- Match if left but not right subquery is satisfied.
- OP_XOR
- Match if left or right, but not both queries are satisfied.
- OP_AND_MAYBE
- Match if left is satisfied, but use weights from both.
- OP_FILTER
- Like OP_AND, but only weight using the left query.
- OP_NEAR
- Match if the words are near each other. The window should be
specified, as a parameter to "Search::Xapian::Query::Query", but it defaults to the number of terms in the list. - OP_PHRASE
- Match as a phrase (All words in order).
- OP_ELITE_SET
- Select an elite set from the subqueries, and perform a query with
these combined as an OR query. - OP_VALUE_RANGE
- Filter by a range test on a document value.
- :qpflags
- FLAG_DEFAULT
- This gives the QueryParser default flag settings, allowing you to
easily add flags to the default ones. - FLAG_BOOLEAN
- Support AND, OR, etc and bracketted subexpressions.
- FLAG_LOVEHATE
- Support + and -.
- FLAG_PHRASE
- Support quoted phrases.
- FLAG_BOOLEAN_ANY_CASE
- Support AND, OR, etc even if they aren't in ALLCAPS.
- FLAG_WILDCARD
- Support right truncation (e.g. Xap*).
- FLAG_PURE_NOT
- Allow queries such as 'NOT apples'.
- These require the use of a list of all documents in the database
which is potentially expensive, so this feature isn't enabled by
default. - FLAG_PARTIAL
- Enable partial matching.
- Partial matching causes the parser to treat the query as a
"partially entered" search. This will automatically treat the
final word as a wildcarded match, unless it is followed by
whitespace, to produce more stable results from interactive
searches. - FLAG_SPELLING_CORRECTION
FLAG_SYNONYM
FLAG_AUTO_SYNONYMS
FLAG_AUTO_MULTIWORD_SYNONYMS - :qpstem
STEM_ALL - Stem all terms.
- STEM_NONE
- Don't stem any terms.
- STEM_SOME
- Stem some terms, in a manner compatible with Omega (capitalised
words and those in phrases aren't stemmed). - :enq_order
- ENQ_ASCENDING
- docids sort in ascending order (default)
- ENQ_DESCENDING
- docids sort in descending order
- ENQ_DONT_CARE
- docids sort in whatever order is most efficient for the backend
- :standard
- Standard is db + ops + qpflags + qpstem
Version functions
- major_version
- Returns the major version of the Xapian C++ library being used.
E.g. for Xapian 1.0.9 this would return 1. - minor_version
- Returns the minor version of the Xapian C++ library being used.
E.g. for Xapian 1.0.9 this would return 0. - revision
- Returns the revision of the Xapian C++ library being used. E.g.
for Xapian 1.0.9 this would return 9. In a stable release series, Xapian libraries with the same minor and major versions are usually ABI compatible, so this often won't match the third component of
$Search::Xapian::VERSION (which is the version of the
Search::Xapian XS wrappers).
Numeric encoding functions
- sortable_serialise NUMBER
- Convert a floating point number to a string, preserving sort order.
- This method converts a floating point number to a string, suitable for using as a value for numeric range restriction, or for use as a sort key.
- The conversion is platform independent.
- The conversion attempts to ensure that, for any pair of values
supplied to the conversion algorithm, the result of comparing the
original values (with a numeric comparison operator) will be the
same as the result of comparing the resulting values (with a string comparison operator). On platforms which represent doubles with
the precisions specified by IEEE_754, this will be the case: if the representation of doubles is more precise, it is possible that two very close doubles will be mapped to the same string, so will
compare equal. - Note also that both zero and -zero will be converted to the same
representation: since these compare equal, this satisfies the
comparison constraint, but it's worth knowing this if you wish to
use the encoding in some situation where this distinction matters. - Handling of NaN isn't (currently) guaranteed to be sensible.
- sortable_unserialise SERIALISED_NUMBER
- Convert a string encoded using sortable_serialise back to a
floating point number. - This expects the input to be a string produced by
sortable_serialise(). If the input is not such a string, the value returned is undefined (but no error will be thrown). - The result of the conversion will be exactly the value which was
supplied to sortable_serialise() when making the string on platforms which represent doubles with the precisions specified by IEEE_754, but may be a different (nearby) value on other platforms.
TODO
- Error Handling
- Error handling for all methods liable to generate them.
- Documentation
- Add POD documentation for all classes, where possible just adapted from Xapian docs.
- Unwrapped classes
- The following Xapian classes are not yet wrapped: Error (and
subclasses), ErrorHandler, standard ExpandDecider subclasses (userdefined ones works), user-defined weight classes. - We don't yet wrap Xapian::Query::MatchAll,
Xapian::Query::MatchNothing, or Xapian::BAD_VALUENO. - Unwrapped methods
- The following methods are not yet wrapped: Enquire::get_eset(...)
with more than two arguments, Query ctor optional "parameter"
parameter, Remote::open(...), static
Stem::get_available_languages(). - We wrap MSet::swap() and MSet::operator[](), but not ESet::swap(),
ESet::operator[](). Is swap actually useful? Should we instead
tie MSet and ESet to allow them to just be used as lists?
CREDITS
Thanks to Tye McQueen <tye@metronet.com> for explaining the finer
points of how best to write XS frontends to C++ libraries, James Aylett
<james@tartarus.org> for clarifying the less obvious aspects of the
Xapian API, Tim Brody for patches wrapping ::QueryParser and ::Stopper
and especially Olly Betts <olly@survex.com> for contributing advice,
bugfixes, and wrapper code for the more obscure classes.
AUTHOR
Alex Bowley <kilinrax@cpan.org>
Please report any bugs/suggestions to <xapian-discuss@lists.xapian.org>
or use the Xapian bug tracker <http://xapian.org/bugs>. Please do NOT
use the CPAN bug tracker or mail any of the authors individually.
SEE ALSO
- Search::Xapian::BM25Weight, Search::Xapian::BoolWeight,
Search::Xapian::Database, Search::Xapian::Document,
Search::Xapian::Enquire, Search::Xapian::MultiValueSorter,
Search::Xapian::PositionIterator, Search::Xapian::PostingIterator,
Search::Xapian::QueryParser, Search::Xapian::Stem,
Search::Xapian::TermGenerator, Search::Xapian::TermIterator,
Search::Xapian::TradWeight, Search::Xapian::ValueIterator,
Search::Xapian::Weight, Search::Xapian::WritableDatabase, and
<http://xapian.org/>.