Struct aho_corasick::Input
source · pub struct Input<'h> { /* private fields */ }
Expand description
The configuration and the haystack to use for an Aho-Corasick search.
When executing a search, there are a few parameters one might want to configure:
- The haystack to search, provided to the
Input::new
constructor. This is the only required parameter. - The span within the haystack to limit a search to. (The default
is the entire haystack.) This is configured via
Input::span
orInput::range
. - Whether to run an unanchored (matches can occur anywhere after the
start of the search) or anchored (matches can only occur beginning at
the start of the search) search. Unanchored search is the default. This is
configured via
Input::anchored
. - Whether to quit the search as soon as a match has been found, regardless
of the
MatchKind
that the searcher was built with. This is configured viaInput::earliest
.
For most cases, the defaults for all optional parameters are appropriate. The utility of this type is that it keeps the default or common case simple while permitting tweaking parameters in more niche use cases while reusing the same search APIs.
Valid bounds and search termination
An Input
permits setting the bounds of a search via either
Input::span
or Input::range
. The bounds set must be valid, or
else a panic will occur. Bounds are valid if and only if:
- The bounds represent a valid range into the input’s haystack.
- or the end bound is a valid ending bound for the haystack and the start bound is exactly one greater than the end bound.
In the latter case, Input::is_done
will return true and indicates any
search receiving such an input should immediately return with no match.
Other than representing “search is complete,” the Input::span
and
Input::range
APIs are never necessary. Instead, callers can slice the
haystack instead, e.g., with &haystack[start..end]
. With that said, they
can be more convenient than slicing because the match positions reported
when using Input::span
or Input::range
are in terms of the original
haystack. If you instead use &haystack[start..end]
, then you’ll need to
add start
to any match position returned in order for it to be a correct
index into haystack
.
Example: &str
and &[u8]
automatically convert to an Input
There is a From<&T> for Input
implementation for all T: AsRef<[u8]>
.
Additionally, the AhoCorasick
search APIs accept
a Into<Input>
. These two things combined together mean you can provide
things like &str
and &[u8]
to search APIs when the defaults are
suitable, but also an Input
when they’re not. For example:
use aho_corasick::{AhoCorasick, Anchored, Input, Match, StartKind};
// Build a searcher that supports both unanchored and anchored modes.
let ac = AhoCorasick::builder()
.start_kind(StartKind::Both)
.build(&["abcd", "b"])
.unwrap();
let haystack = "abcd";
// A search using default parameters is unanchored. With standard
// semantics, this finds `b` first.
assert_eq!(
Some(Match::must(1, 1..2)),
ac.find(haystack),
);
// Using the same 'find' routine, we can provide an 'Input' explicitly
// that is configured to do an anchored search. Since 'b' doesn't start
// at the beginning of the search, it is not reported as a match.
assert_eq!(
Some(Match::must(0, 0..4)),
ac.find(Input::new(haystack).anchored(Anchored::Yes)),
);
Implementations§
source§impl<'h> Input<'h>
impl<'h> Input<'h>
sourcepub fn new<H: ?Sized + AsRef<[u8]>>(haystack: &'h H) -> Input<'h>
pub fn new<H: ?Sized + AsRef<[u8]>>(haystack: &'h H) -> Input<'h>
Create a new search configuration for the given haystack.
sourcepub fn span<S: Into<Span>>(self, span: S) -> Input<'h>
pub fn span<S: Into<Span>>(self, span: S) -> Input<'h>
Set the span for this search.
This routine is generic over how a span is provided. While
a Span
may be given directly, one may also provide a
std::ops::Range<usize>
. To provide anything supported by range
syntax, use the Input::range
method.
The default span is the entire haystack.
Note that Input::range
overrides this method and vice versa.
Panics
This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.
Example
This example shows how the span of the search can impact whether a match is reported or not.
use aho_corasick::{AhoCorasick, Input, MatchKind};
let patterns = &["b", "abcd", "abc"];
let haystack = "abcd";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let input = Input::new(haystack).span(0..3);
let mat = ac.try_find(input)?.expect("should have a match");
// Without the span stopping the search early, 'abcd' would be reported
// because it is the correct leftmost-first match.
assert_eq!("abc", &haystack[mat.span()]);
sourcepub fn range<R: RangeBounds<usize>>(self, range: R) -> Input<'h>
pub fn range<R: RangeBounds<usize>>(self, range: R) -> Input<'h>
Like Input::span
, but accepts any range instead.
The default range is the entire haystack.
Note that Input::span
overrides this method and vice versa.
Panics
This routine will panic if the given range could not be converted
to a valid Range
. For example, this would panic when given
0..=usize::MAX
since it cannot be represented using a half-open
interval in terms of usize
.
This routine also panics if the given range does not correspond to valid bounds in the haystack or the termination of a search.
Example
use aho_corasick::Input;
let input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
let input = Input::new("foobar").range(2..=4);
assert_eq!(2..5, input.get_range());
sourcepub fn anchored(self, mode: Anchored) -> Input<'h>
pub fn anchored(self, mode: Anchored) -> Input<'h>
Sets the anchor mode of a search.
When a search is anchored (via Anchored::Yes
), a match must begin
at the start of a search. When a search is not anchored (that’s
Anchored::No
), searchers will look for a match anywhere in the
haystack.
By default, the anchored mode is Anchored::No
.
Support for anchored searches
Anchored or unanchored searches might not always be available, depending on the type of searcher used and its configuration:
noncontiguous::NFA
always supports both unanchored and anchored searches.contiguous::NFA
always supports both unanchored and anchored searches.dfa::DFA
supports only unanchored searches by default.dfa::Builder::start_kind
can be used to change the default to supporting both kinds of searches or even just anchored searches.AhoCorasick
inherits the same setup as aDFA
. Namely, it only supports unanchored searches by default, butAhoCorasickBuilder::start_kind
can change this.
If you try to execute a search using a try_
(“fallible”) method with
an unsupported anchor mode, then an error will be returned. For calls
to infallible search methods, a panic will result.
Example
This demonstrates the differences between an anchored search and
an unanchored search. Notice that we build our AhoCorasick
searcher
with StartKind::Both
so that it supports both unanchored and
anchored searches simultaneously.
use aho_corasick::{
AhoCorasick, Anchored, Input, MatchKind, StartKind,
};
let patterns = &["bcd"];
let haystack = "abcd";
let ac = AhoCorasick::builder()
.start_kind(StartKind::Both)
.build(patterns)
.unwrap();
// Note that 'Anchored::No' is the default, so it doesn't need to
// be explicitly specified here.
let input = Input::new(haystack);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("bcd", &haystack[mat.span()]);
// While 'bcd' occurs in the haystack, it does not begin where our
// search begins, so no match is found.
let input = Input::new(haystack).anchored(Anchored::Yes);
assert_eq!(None, ac.try_find(input)?);
// However, if we start our search where 'bcd' starts, then we will
// find a match.
let input = Input::new(haystack).range(1..).anchored(Anchored::Yes);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("bcd", &haystack[mat.span()]);
sourcepub fn earliest(self, yes: bool) -> Input<'h>
pub fn earliest(self, yes: bool) -> Input<'h>
Whether to execute an “earliest” search or not.
When running a non-overlapping search, an “earliest” search will
return the match location as early as possible. For example, given
the patterns abc
and b
, and a haystack of abc
, a normal
leftmost-first search will return abc
as a match. But an “earliest”
search will return as soon as it is known that a match occurs, which
happens once b
is seen.
Note that when using MatchKind::Standard
, the “earliest” option
has no effect since standard semantics are already “earliest.” Note
also that this has no effect in overlapping searches, since overlapping
searches also use standard semantics and report all possible matches.
This is disabled by default.
Example
This example shows the difference between “earliest” searching and normal leftmost searching.
use aho_corasick::{AhoCorasick, Anchored, Input, MatchKind, StartKind};
let patterns = &["abc", "b"];
let haystack = "abc";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
// The normal leftmost-first match.
let input = Input::new(haystack);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("abc", &haystack[mat.span()]);
// The "earliest" possible match, even if it isn't leftmost-first.
let input = Input::new(haystack).earliest(true);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("b", &haystack[mat.span()]);
sourcepub fn set_span<S: Into<Span>>(&mut self, span: S)
pub fn set_span<S: Into<Span>>(&mut self, span: S)
Set the span for this search configuration.
This is like the Input::span
method, except this mutates the
span in place.
This routine is generic over how a span is provided. While
a Span
may be given directly, one may also provide a
std::ops::Range<usize>
.
Panics
This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.
Example
use aho_corasick::Input;
let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_span(2..4);
assert_eq!(2..4, input.get_range());
sourcepub fn set_range<R: RangeBounds<usize>>(&mut self, range: R)
pub fn set_range<R: RangeBounds<usize>>(&mut self, range: R)
Set the span for this search configuration given any range.
This is like the Input::range
method, except this mutates the
span in place.
Panics
This routine will panic if the given range could not be converted
to a valid Range
. For example, this would panic when given
0..=usize::MAX
since it cannot be represented using a half-open
interval in terms of usize
.
This routine also panics if the given range does not correspond to valid bounds in the haystack or the termination of a search.
Example
use aho_corasick::Input;
let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_range(2..=4);
assert_eq!(2..5, input.get_range());
sourcepub fn set_start(&mut self, start: usize)
pub fn set_start(&mut self, start: usize)
Set the starting offset for the span for this search configuration.
This is a convenience routine for only mutating the start of a span without having to set the entire span.
Panics
This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.
Example
use aho_corasick::Input;
let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_start(5);
assert_eq!(5..6, input.get_range());
sourcepub fn set_end(&mut self, end: usize)
pub fn set_end(&mut self, end: usize)
Set the ending offset for the span for this search configuration.
This is a convenience routine for only mutating the end of a span without having to set the entire span.
Panics
This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.
Example
use aho_corasick::Input;
let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_end(5);
assert_eq!(0..5, input.get_range());
sourcepub fn set_anchored(&mut self, mode: Anchored)
pub fn set_anchored(&mut self, mode: Anchored)
Set the anchor mode of a search.
This is like Input::anchored
, except it mutates the search
configuration in place.
Example
use aho_corasick::{Anchored, Input};
let mut input = Input::new("foobar");
assert_eq!(Anchored::No, input.get_anchored());
input.set_anchored(Anchored::Yes);
assert_eq!(Anchored::Yes, input.get_anchored());
sourcepub fn set_earliest(&mut self, yes: bool)
pub fn set_earliest(&mut self, yes: bool)
Set whether the search should execute in “earliest” mode or not.
This is like Input::earliest
, except it mutates the search
configuration in place.
Example
use aho_corasick::Input;
let mut input = Input::new("foobar");
assert!(!input.get_earliest());
input.set_earliest(true);
assert!(input.get_earliest());
sourcepub fn haystack(&self) -> &[u8] ⓘ
pub fn haystack(&self) -> &[u8] ⓘ
Return a borrow of the underlying haystack as a slice of bytes.
Example
use aho_corasick::Input;
let input = Input::new("foobar");
assert_eq!(b"foobar", input.haystack());
sourcepub fn start(&self) -> usize
pub fn start(&self) -> usize
Return the start position of this search.
This is a convenience routine for search.get_span().start()
.
Example
use aho_corasick::Input;
let input = Input::new("foobar");
assert_eq!(0, input.start());
let input = Input::new("foobar").span(2..4);
assert_eq!(2, input.start());
sourcepub fn end(&self) -> usize
pub fn end(&self) -> usize
Return the end position of this search.
This is a convenience routine for search.get_span().end()
.
Example
use aho_corasick::Input;
let input = Input::new("foobar");
assert_eq!(6, input.end());
let input = Input::new("foobar").span(2..4);
assert_eq!(4, input.end());
sourcepub fn get_span(&self) -> Span
pub fn get_span(&self) -> Span
Return the span for this search configuration.
If one was not explicitly set, then the span corresponds to the entire range of the haystack.
Example
use aho_corasick::{Input, Span};
let input = Input::new("foobar");
assert_eq!(Span { start: 0, end: 6 }, input.get_span());
sourcepub fn get_range(&self) -> Range<usize>
pub fn get_range(&self) -> Range<usize>
Return the span as a range for this search configuration.
If one was not explicitly set, then the span corresponds to the entire range of the haystack.
Example
use aho_corasick::Input;
let input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
sourcepub fn get_anchored(&self) -> Anchored
pub fn get_anchored(&self) -> Anchored
Return the anchored mode for this search configuration.
If no anchored mode was set, then it defaults to Anchored::No
.
Example
use aho_corasick::{Anchored, Input};
let mut input = Input::new("foobar");
assert_eq!(Anchored::No, input.get_anchored());
input.set_anchored(Anchored::Yes);
assert_eq!(Anchored::Yes, input.get_anchored());
sourcepub fn get_earliest(&self) -> bool
pub fn get_earliest(&self) -> bool
Return whether this search should execute in “earliest” mode.
Example
use aho_corasick::Input;
let input = Input::new("foobar");
assert!(!input.get_earliest());
sourcepub fn is_done(&self) -> bool
pub fn is_done(&self) -> bool
Return true if this input has been exhausted, which in turn means all subsequent searches will return no matches.
This occurs precisely when the start position of this search is greater than the end position of the search.
Example
use aho_corasick::Input;
let mut input = Input::new("foobar");
assert!(!input.is_done());
input.set_start(6);
assert!(!input.is_done());
input.set_start(7);
assert!(input.is_done());