Struct aho_corasick::Input

source ·
pub struct Input<'h> { /* private fields */ }
Expand description

The configuration and the haystack to use for an Aho-Corasick search.

When executing a search, there are a few parameters one might want to configure:

  • The haystack to search, provided to the Input::new constructor. This is the only required parameter.
  • The span within the haystack to limit a search to. (The default is the entire haystack.) This is configured via Input::span or Input::range.
  • Whether to run an unanchored (matches can occur anywhere after the start of the search) or anchored (matches can only occur beginning at the start of the search) search. Unanchored search is the default. This is configured via Input::anchored.
  • Whether to quit the search as soon as a match has been found, regardless of the MatchKind that the searcher was built with. This is configured via Input::earliest.

For most cases, the defaults for all optional parameters are appropriate. The utility of this type is that it keeps the default or common case simple while permitting tweaking parameters in more niche use cases while reusing the same search APIs.

Valid bounds and search termination

An Input permits setting the bounds of a search via either Input::span or Input::range. The bounds set must be valid, or else a panic will occur. Bounds are valid if and only if:

  • The bounds represent a valid range into the input’s haystack.
  • or the end bound is a valid ending bound for the haystack and the start bound is exactly one greater than the end bound.

In the latter case, Input::is_done will return true and indicates any search receiving such an input should immediately return with no match.

Other than representing “search is complete,” the Input::span and Input::range APIs are never necessary. Instead, callers can slice the haystack instead, e.g., with &haystack[start..end]. With that said, they can be more convenient than slicing because the match positions reported when using Input::span or Input::range are in terms of the original haystack. If you instead use &haystack[start..end], then you’ll need to add start to any match position returned in order for it to be a correct index into haystack.

Example: &str and &[u8] automatically convert to an Input

There is a From<&T> for Input implementation for all T: AsRef<[u8]>. Additionally, the AhoCorasick search APIs accept a Into<Input>. These two things combined together mean you can provide things like &str and &[u8] to search APIs when the defaults are suitable, but also an Input when they’re not. For example:

use aho_corasick::{AhoCorasick, Anchored, Input, Match, StartKind};

// Build a searcher that supports both unanchored and anchored modes.
let ac = AhoCorasick::builder()
    .start_kind(StartKind::Both)
    .build(&["abcd", "b"])
    .unwrap();
let haystack = "abcd";

// A search using default parameters is unanchored. With standard
// semantics, this finds `b` first.
assert_eq!(
    Some(Match::must(1, 1..2)),
    ac.find(haystack),
);
// Using the same 'find' routine, we can provide an 'Input' explicitly
// that is configured to do an anchored search. Since 'b' doesn't start
// at the beginning of the search, it is not reported as a match.
assert_eq!(
    Some(Match::must(0, 0..4)),
    ac.find(Input::new(haystack).anchored(Anchored::Yes)),
);

Implementations§

source§

impl<'h> Input<'h>

source

pub fn new<H: ?Sized + AsRef<[u8]>>(haystack: &'h H) -> Input<'h>

Create a new search configuration for the given haystack.

source

pub fn span<S: Into<Span>>(self, span: S) -> Input<'h>

Set the span for this search.

This routine is generic over how a span is provided. While a Span may be given directly, one may also provide a std::ops::Range<usize>. To provide anything supported by range syntax, use the Input::range method.

The default span is the entire haystack.

Note that Input::range overrides this method and vice versa.

Panics

This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.

Example

This example shows how the span of the search can impact whether a match is reported or not.

use aho_corasick::{AhoCorasick, Input, MatchKind};

let patterns = &["b", "abcd", "abc"];
let haystack = "abcd";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();
let input = Input::new(haystack).span(0..3);
let mat = ac.try_find(input)?.expect("should have a match");
// Without the span stopping the search early, 'abcd' would be reported
// because it is the correct leftmost-first match.
assert_eq!("abc", &haystack[mat.span()]);
source

pub fn range<R: RangeBounds<usize>>(self, range: R) -> Input<'h>

Like Input::span, but accepts any range instead.

The default range is the entire haystack.

Note that Input::span overrides this method and vice versa.

Panics

This routine will panic if the given range could not be converted to a valid Range. For example, this would panic when given 0..=usize::MAX since it cannot be represented using a half-open interval in terms of usize.

This routine also panics if the given range does not correspond to valid bounds in the haystack or the termination of a search.

Example
use aho_corasick::Input;

let input = Input::new("foobar");
assert_eq!(0..6, input.get_range());

let input = Input::new("foobar").range(2..=4);
assert_eq!(2..5, input.get_range());
source

pub fn anchored(self, mode: Anchored) -> Input<'h>

Sets the anchor mode of a search.

When a search is anchored (via Anchored::Yes), a match must begin at the start of a search. When a search is not anchored (that’s Anchored::No), searchers will look for a match anywhere in the haystack.

By default, the anchored mode is Anchored::No.

Support for anchored searches

Anchored or unanchored searches might not always be available, depending on the type of searcher used and its configuration:

If you try to execute a search using a try_ (“fallible”) method with an unsupported anchor mode, then an error will be returned. For calls to infallible search methods, a panic will result.

Example

This demonstrates the differences between an anchored search and an unanchored search. Notice that we build our AhoCorasick searcher with StartKind::Both so that it supports both unanchored and anchored searches simultaneously.

use aho_corasick::{
    AhoCorasick, Anchored, Input, MatchKind, StartKind,
};

let patterns = &["bcd"];
let haystack = "abcd";

let ac = AhoCorasick::builder()
    .start_kind(StartKind::Both)
    .build(patterns)
    .unwrap();

// Note that 'Anchored::No' is the default, so it doesn't need to
// be explicitly specified here.
let input = Input::new(haystack);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("bcd", &haystack[mat.span()]);

// While 'bcd' occurs in the haystack, it does not begin where our
// search begins, so no match is found.
let input = Input::new(haystack).anchored(Anchored::Yes);
assert_eq!(None, ac.try_find(input)?);

// However, if we start our search where 'bcd' starts, then we will
// find a match.
let input = Input::new(haystack).range(1..).anchored(Anchored::Yes);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("bcd", &haystack[mat.span()]);
source

pub fn earliest(self, yes: bool) -> Input<'h>

Whether to execute an “earliest” search or not.

When running a non-overlapping search, an “earliest” search will return the match location as early as possible. For example, given the patterns abc and b, and a haystack of abc, a normal leftmost-first search will return abc as a match. But an “earliest” search will return as soon as it is known that a match occurs, which happens once b is seen.

Note that when using MatchKind::Standard, the “earliest” option has no effect since standard semantics are already “earliest.” Note also that this has no effect in overlapping searches, since overlapping searches also use standard semantics and report all possible matches.

This is disabled by default.

Example

This example shows the difference between “earliest” searching and normal leftmost searching.

use aho_corasick::{AhoCorasick, Anchored, Input, MatchKind, StartKind};

let patterns = &["abc", "b"];
let haystack = "abc";

let ac = AhoCorasick::builder()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns)
    .unwrap();

// The normal leftmost-first match.
let input = Input::new(haystack);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("abc", &haystack[mat.span()]);

// The "earliest" possible match, even if it isn't leftmost-first.
let input = Input::new(haystack).earliest(true);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("b", &haystack[mat.span()]);
source

pub fn set_span<S: Into<Span>>(&mut self, span: S)

Set the span for this search configuration.

This is like the Input::span method, except this mutates the span in place.

This routine is generic over how a span is provided. While a Span may be given directly, one may also provide a std::ops::Range<usize>.

Panics

This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.

Example
use aho_corasick::Input;

let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_span(2..4);
assert_eq!(2..4, input.get_range());
source

pub fn set_range<R: RangeBounds<usize>>(&mut self, range: R)

Set the span for this search configuration given any range.

This is like the Input::range method, except this mutates the span in place.

Panics

This routine will panic if the given range could not be converted to a valid Range. For example, this would panic when given 0..=usize::MAX since it cannot be represented using a half-open interval in terms of usize.

This routine also panics if the given range does not correspond to valid bounds in the haystack or the termination of a search.

Example
use aho_corasick::Input;

let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_range(2..=4);
assert_eq!(2..5, input.get_range());
source

pub fn set_start(&mut self, start: usize)

Set the starting offset for the span for this search configuration.

This is a convenience routine for only mutating the start of a span without having to set the entire span.

Panics

This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.

Example
use aho_corasick::Input;

let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_start(5);
assert_eq!(5..6, input.get_range());
source

pub fn set_end(&mut self, end: usize)

Set the ending offset for the span for this search configuration.

This is a convenience routine for only mutating the end of a span without having to set the entire span.

Panics

This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.

Example
use aho_corasick::Input;

let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_end(5);
assert_eq!(0..5, input.get_range());
source

pub fn set_anchored(&mut self, mode: Anchored)

Set the anchor mode of a search.

This is like Input::anchored, except it mutates the search configuration in place.

Example
use aho_corasick::{Anchored, Input};

let mut input = Input::new("foobar");
assert_eq!(Anchored::No, input.get_anchored());

input.set_anchored(Anchored::Yes);
assert_eq!(Anchored::Yes, input.get_anchored());
source

pub fn set_earliest(&mut self, yes: bool)

Set whether the search should execute in “earliest” mode or not.

This is like Input::earliest, except it mutates the search configuration in place.

Example
use aho_corasick::Input;

let mut input = Input::new("foobar");
assert!(!input.get_earliest());
input.set_earliest(true);
assert!(input.get_earliest());
source

pub fn haystack(&self) -> &[u8]

Return a borrow of the underlying haystack as a slice of bytes.

Example
use aho_corasick::Input;

let input = Input::new("foobar");
assert_eq!(b"foobar", input.haystack());
source

pub fn start(&self) -> usize

Return the start position of this search.

This is a convenience routine for search.get_span().start().

Example
use aho_corasick::Input;

let input = Input::new("foobar");
assert_eq!(0, input.start());

let input = Input::new("foobar").span(2..4);
assert_eq!(2, input.start());
source

pub fn end(&self) -> usize

Return the end position of this search.

This is a convenience routine for search.get_span().end().

Example
use aho_corasick::Input;

let input = Input::new("foobar");
assert_eq!(6, input.end());

let input = Input::new("foobar").span(2..4);
assert_eq!(4, input.end());
source

pub fn get_span(&self) -> Span

Return the span for this search configuration.

If one was not explicitly set, then the span corresponds to the entire range of the haystack.

Example
use aho_corasick::{Input, Span};

let input = Input::new("foobar");
assert_eq!(Span { start: 0, end: 6 }, input.get_span());
source

pub fn get_range(&self) -> Range<usize>

Return the span as a range for this search configuration.

If one was not explicitly set, then the span corresponds to the entire range of the haystack.

Example
use aho_corasick::Input;

let input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
source

pub fn get_anchored(&self) -> Anchored

Return the anchored mode for this search configuration.

If no anchored mode was set, then it defaults to Anchored::No.

Example
use aho_corasick::{Anchored, Input};

let mut input = Input::new("foobar");
assert_eq!(Anchored::No, input.get_anchored());

input.set_anchored(Anchored::Yes);
assert_eq!(Anchored::Yes, input.get_anchored());
source

pub fn get_earliest(&self) -> bool

Return whether this search should execute in “earliest” mode.

Example
use aho_corasick::Input;

let input = Input::new("foobar");
assert!(!input.get_earliest());
source

pub fn is_done(&self) -> bool

Return true if this input has been exhausted, which in turn means all subsequent searches will return no matches.

This occurs precisely when the start position of this search is greater than the end position of the search.

Example
use aho_corasick::Input;

let mut input = Input::new("foobar");
assert!(!input.is_done());
input.set_start(6);
assert!(!input.is_done());
input.set_start(7);
assert!(input.is_done());

Trait Implementations§

source§

impl<'h> Clone for Input<'h>

source§

fn clone(&self) -> Input<'h>

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl<'h> Debug for Input<'h>

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl<'h, H: ?Sized + AsRef<[u8]>> From<&'h H> for Input<'h>

source§

fn from(haystack: &'h H) -> Input<'h>

Converts to this type from the input type.

Auto Trait Implementations§

§

impl<'h> RefUnwindSafe for Input<'h>

§

impl<'h> Send for Input<'h>

§

impl<'h> Sync for Input<'h>

§

impl<'h> Unpin for Input<'h>

§

impl<'h> UnwindSafe for Input<'h>

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.