Mask off segments of the query sequence that have
low compositional complexity, as determined by the
SEG program of Wootton andFederhen (1993) or, for BLASTN, by the DUST
program of Tatusov and Lipman (in preparation).
Filtering can eliminate statistically significant but
biologically uninteresting reports from the blast
output (e.g., hits against common acidic-, basic- or
proline-rich regions), leaving the more biologically
interesting regions of the query sequence available
for specific matching against database sequences.
Filtering is only applied to the query sequence (or
its translation products), not to database sequences.
Default filtering is DUST for BLASTN, SEG for other
programs.
It is not unusual for nothing at all to be masked
by SEG, when applied to sequences in SWISS-PROT,
so filtering should not be expected to
always yield an effect. Furthermore, in some cases,
sequences are masked in their entirety, indicating that
the statistical significance of any matches reported
against the unfiltered query sequence should be suspect.