pub struct Pair { /* private fields */ }
Expand description
A pair of byte offsets into a needle to use as a predicate.
This pair is used as a predicate to quickly filter out positions in a haystack in which a needle cannot match. In some cases, this pair can even be used in vector algorithms such that the vector algorithm only switches over to scalar code once this pair has been found.
A pair of offsets can be used in both substring search implementations and in prefilters. The former will report matches of a needle in a haystack where as the latter will only report possible matches of a needle.
The offsets are limited each to a maximum of 255 to keep memory usage low. Moreover, it’s rarely advantageous to create a predicate using offsets greater than 255 anyway.
The only guarantee enforced on the pair of offsets is that they are not
equivalent. It is not necessarily the case that index1 < index2
for
example. By convention, index1
corresponds to the byte in the needle
that is believed to be most the predictive. Note also that because of the
requirement that the indices be both valid for the needle used to build
the pair and not equal, it follows that a pair can only be constructed for
needles with length at least 2.
Implementations§
source§impl Pair
impl Pair
sourcepub fn new(needle: &[u8]) -> Option<Pair>
pub fn new(needle: &[u8]) -> Option<Pair>
Create a new pair of offsets from the given needle.
If a pair could not be created (for example, if the needle is too
short), then None
is returned.
This chooses the pair in the needle that is believed to be as predictive of an overall match of the needle as possible.
sourcepub fn with_ranker<R: HeuristicFrequencyRank>(
needle: &[u8],
ranker: R
) -> Option<Pair>
pub fn with_ranker<R: HeuristicFrequencyRank>( needle: &[u8], ranker: R ) -> Option<Pair>
Create a new pair of offsets from the given needle and ranker.
This permits the caller to choose a background frequency distribution with which bytes are selected. The idea is to select a pair of bytes that is believed to strongly predict a match in the haystack. This usually means selecting bytes that occur rarely in a haystack.
If a pair could not be created (for example, if the needle is too
short), then None
is returned.
sourcepub fn with_indices(needle: &[u8], index1: u8, index2: u8) -> Option<Pair>
pub fn with_indices(needle: &[u8], index1: u8, index2: u8) -> Option<Pair>
Create a new pair using the offsets given for the needle given.
This bypasses any sort of heuristic process for choosing the offsets and permits the caller to choose the offsets themselves.
Indices are limited to valid u8
values so that a Pair
uses less
memory. It is not possible to create a Pair
with offsets bigger than
u8::MAX
. It’s likely that such a thing is not needed, but if it is,
it’s suggested to build your own bespoke algorithm because you’re
likely working on a very niche case. (File an issue if this suggestion
does not make sense to you.)
If a pair could not be created (for example, if the needle is too
short), then None
is returned.