sindex(1)
NAME
sindex - index a sequence database for sfetch
SYNOPSIS
sindex [options] seqfile1 [seqfile2...]
DESCRIPTION
sindex  indexes  one or more seqfiles for future sequence retrievals by
sfetch.  An SSI ("squid sequence index") file is created  in  the  same
directory  with  the  sequence  files.  By default, this file is called
<seqfile>.ssi.
If there is more than one sequence file on the command  line,  the  SSI
filename will be constructed from the last sequence file name. This may
not be what you want; see the -o option to specify your  own  name  for
the SSI file.
sindex  is capable of indexing large files (>2 GB) if optional LFS support has been enabled at compile-time. See  INSTALL  instructions  that
came with @PACKAGE@.
OPTIONS
- -h Print brief help; includes version number and summary of all
- options, including expert options.
- -o <ssi outfile>
- Direct the SSI index to a file named <outfile>. By default, the SSI file would go to <seqfile>.ssi.
EXPERT OPTIONS
- --64 Force the SSI file into 64-bit (large seqfile) mode, even if the
- seqfile is small. You don't want to do this unless you're debugging.
- --external
- Force sindex to do its record sorting by external (on-disk) sorting. This is only useful for debugging, too.
- --informat <s>
- Specify that the sequence file  is  definitely  in  format  <s>;
 blocks sequence file format autodetection. This is useful in automated pipelines, because it improves robustness (autodetection can occasionally go wrong on a perversely misformed file). Common examples include genbank, embl, gcg, pir, stockholm, clustal, msf, or phylip; see the printed documentation for a complete list of accepted format names.
- --pfamseq
- A hack for Pfam; indexes a FASTA file that is known to have identifier lines in format ">[name] [accession] [optional description]". Normally only the sequence name would be indexed as a primary key in a FASTA SSI file, but this allows indexing both the name (as a primary key) and accession (as a secondary key).
SEE ALSO
afetch(1),  alistat(1),  compalign(1), compstruct(1), revcomp(1), seqsplit(1),  seqstat(1),  sfetch(1),  shuffle(1),  sreformat(1),   stranslate(1), weight(1).
AUTHOR
Biosquid  and  its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of Medicine Freely distributed under  the  GNU
General  Public  License (GPL) See COPYING in the source code distribution for more details, or contact me.
- Sean Eddy
 HHMI/Department of Genetics
 Washington University School of Medicine
 4444 Forest Park Blvd., Box 8510
 St Louis, MO 63108 USA
 Phone: 1-314-362-7666
 FAX : 1-314-362-2157
 Email: eddy@genetics.wustl.edu