Struct regex_automata::util::captures::Captures
source · pub struct Captures { /* private fields */ }
Expand description
The span offsets of capturing groups after a match has been found.
This type represents the output of regex engines that can report the
offsets at which capturing groups matches or “submatches” occur. For
example, the PikeVM
. When a match
occurs, it will at minimum contain the PatternID
of the pattern that
matched. Depending upon how it was constructed, it may also contain the
start/end offsets of the entire match of the pattern and the start/end
offsets of each capturing group that participated in the match.
Values of this type are always created for a specific GroupInfo
. It is
unspecified behavior to use a Captures
value in a search with any regex
engine that has a different GroupInfo
than the one the Captures
were
created with.
§Constructors
There are three constructors for this type that control what kind of information is available upon a match:
Captures::all
: Will store overall pattern match offsets in addition to the offsets of capturing groups that participated in the match.Captures::matches
: Will store only the overall pattern match offsets. The offsets of capturing groups (even ones that participated in the match) are not available.Captures::empty
: Will only store the pattern ID that matched. No match offsets are available at all.
If you aren’t sure which to choose, then pick the first one. The first one
is what convenience routines like,
PikeVM::create_captures
,
will use automatically.
The main difference between these choices is performance. Namely, if you ask for less information, then the execution of regex search may be able to run more quickly.
§Notes
It is worth pointing out that this type is not coupled to any one specific
regex engine. Instead, its coupling is with GroupInfo
, which is the
thing that is responsible for mapping capturing groups to “slot” offsets.
Slot offsets are indices into a single sequence of memory at which matching
haystack offsets for the corresponding group are written by regex engines.
§Example
This example shows how to parse a simple date and extract the components of the date via capturing groups:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};
let re = PikeVM::new(r"^([0-9]{4})-([0-9]{2})-([0-9]{2})$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "2010-03-14", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group(1));
assert_eq!(Some(Span::from(5..7)), caps.get_group(2));
assert_eq!(Some(Span::from(8..10)), caps.get_group(3));
§Example: named capturing groups
This example is like the one above, but leverages the ability to name capturing groups in order to make the code a bit clearer:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};
let re = PikeVM::new(r"^(?P<y>[0-9]{4})-(?P<m>[0-9]{2})-(?P<d>[0-9]{2})$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "2010-03-14", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group_by_name("y"));
assert_eq!(Some(Span::from(5..7)), caps.get_group_by_name("m"));
assert_eq!(Some(Span::from(8..10)), caps.get_group_by_name("d"));
Implementations§
source§impl Captures
impl Captures
sourcepub fn all(group_info: GroupInfo) -> Captures
pub fn all(group_info: GroupInfo) -> Captures
Create new storage for the offsets of all matching capturing groups.
This routine provides the most information for matches—namely, the spans of matching capturing groups—but also requires the regex search routines to do the most work.
It is unspecified behavior to use the returned Captures
value in a
search with a GroupInfo
other than the one that is provided to this
constructor.
§Example
This example shows that all capturing groups—but only ones that participated in a match—are available to query after a match has been found:
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::captures::Captures,
Span, Match,
};
let re = PikeVM::new(
r"^(?:(?P<lower>[a-z]+)|(?P<upper>[A-Z]+))(?P<digits>[0-9]+)$",
)?;
let mut cache = re.create_cache();
let mut caps = Captures::all(re.get_nfa().group_info().clone());
re.captures(&mut cache, "ABC123", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Match::must(0, 0..6)), caps.get_match());
// The 'lower' group didn't match, so it won't have any offsets.
assert_eq!(None, caps.get_group_by_name("lower"));
assert_eq!(Some(Span::from(0..3)), caps.get_group_by_name("upper"));
assert_eq!(Some(Span::from(3..6)), caps.get_group_by_name("digits"));
sourcepub fn matches(group_info: GroupInfo) -> Captures
pub fn matches(group_info: GroupInfo) -> Captures
Create new storage for only the full match spans of a pattern. This does not include any capturing group offsets.
It is unspecified behavior to use the returned Captures
value in a
search with a GroupInfo
other than the one that is provided to this
constructor.
§Example
This example shows that only overall match offsets are reported when
this constructor is used. Accessing any capturing groups other than
the 0th will always return None
.
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::captures::Captures,
Match,
};
let re = PikeVM::new(
r"^(?:(?P<lower>[a-z]+)|(?P<upper>[A-Z]+))(?P<digits>[0-9]+)$",
)?;
let mut cache = re.create_cache();
let mut caps = Captures::matches(re.get_nfa().group_info().clone());
re.captures(&mut cache, "ABC123", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Match::must(0, 0..6)), caps.get_match());
// We didn't ask for capturing group offsets, so they aren't available.
assert_eq!(None, caps.get_group_by_name("lower"));
assert_eq!(None, caps.get_group_by_name("upper"));
assert_eq!(None, caps.get_group_by_name("digits"));
sourcepub fn empty(group_info: GroupInfo) -> Captures
pub fn empty(group_info: GroupInfo) -> Captures
Create new storage for only tracking which pattern matched. No offsets are stored at all.
It is unspecified behavior to use the returned Captures
value in a
search with a GroupInfo
other than the one that is provided to this
constructor.
§Example
This example shows that only the pattern that matched can be accessed
from a Captures
value created via this constructor.
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::captures::Captures,
PatternID,
};
let re = PikeVM::new_many(&[r"[a-z]+", r"[A-Z]+"])?;
let mut cache = re.create_cache();
let mut caps = Captures::empty(re.get_nfa().group_info().clone());
re.captures(&mut cache, "aABCz", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(PatternID::must(0)), caps.pattern());
// We didn't ask for any offsets, so they aren't available.
assert_eq!(None, caps.get_match());
re.captures(&mut cache, &"aABCz"[1..], &mut caps);
assert!(caps.is_match());
assert_eq!(Some(PatternID::must(1)), caps.pattern());
// We didn't ask for any offsets, so they aren't available.
assert_eq!(None, caps.get_match());
sourcepub fn is_match(&self) -> bool
pub fn is_match(&self) -> bool
Returns true if and only if this capturing group represents a match.
This is a convenience routine for caps.pattern().is_some()
.
§Example
When using the PikeVM (for example), the lightest weight way of detecting whether a match exists is to create capturing groups that only track the ID of the pattern that match (if any):
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::captures::Captures,
};
let re = PikeVM::new(r"[a-z]+")?;
let mut cache = re.create_cache();
let mut caps = Captures::empty(re.get_nfa().group_info().clone());
re.captures(&mut cache, "aABCz", &mut caps);
assert!(caps.is_match());
sourcepub fn pattern(&self) -> Option<PatternID>
pub fn pattern(&self) -> Option<PatternID>
Returns the identifier of the pattern that matched when this
capturing group represents a match. If no match was found, then this
always returns None
.
This returns a pattern ID in precisely the cases in which is_match
returns true
. Similarly, the pattern ID returned is always the
same pattern ID found in the Match
returned by get_match
.
§Example
When using the PikeVM (for example), the lightest weight way of detecting which pattern matched is to create capturing groups that only track the ID of the pattern that match (if any):
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::captures::Captures,
PatternID,
};
let re = PikeVM::new_many(&[r"[a-z]+", r"[A-Z]+"])?;
let mut cache = re.create_cache();
let mut caps = Captures::empty(re.get_nfa().group_info().clone());
re.captures(&mut cache, "ABC", &mut caps);
assert_eq!(Some(PatternID::must(1)), caps.pattern());
// Recall that offsets are only available when using a non-empty
// Captures value. So even though a match occurred, this returns None!
assert_eq!(None, caps.get_match());
sourcepub fn get_match(&self) -> Option<Match>
pub fn get_match(&self) -> Option<Match>
Returns the pattern ID and the span of the match, if one occurred.
This always returns None
when Captures
was created with
Captures::empty
, even if a match was found.
If this routine returns a non-None
value, then is_match
is
guaranteed to return true
and pattern
is also guaranteed to return
a non-None
value.
§Example
This example shows how to get the full match from a search:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Match};
let re = PikeVM::new_many(&[r"[a-z]+", r"[A-Z]+"])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "ABC", &mut caps);
assert_eq!(Some(Match::must(1, 0..3)), caps.get_match());
sourcepub fn get_group(&self, index: usize) -> Option<Span>
pub fn get_group(&self, index: usize) -> Option<Span>
Returns the span of a capturing group match corresponding to the group index given, only if both the overall pattern matched and the capturing group participated in that match.
This returns None
if index
is invalid. index
is valid if and only
if it’s less than Captures::group_len
for the matching pattern.
This always returns None
when Captures
was created with
Captures::empty
, even if a match was found. This also always
returns None
for any index > 0
when Captures
was created with
Captures::matches
.
If this routine returns a non-None
value, then is_match
is
guaranteed to return true
, pattern
is guaranteed to return a
non-None
value and get_match
is guaranteed to return a non-None
value.
By convention, the 0th capture group will always return the same
span as the span returned by get_match
. This is because the 0th
capture group always corresponds to the entirety of the pattern’s
match. (It is similarly always unnamed because it is implicit.) This
isn’t necessarily true of all regex engines. For example, one can
hand-compile a thompson::NFA
via a
thompson::Builder
, which isn’t
technically forced to make the 0th capturing group always correspond to
the entire match.
§Example
This example shows how to get the capturing groups, by index, from a match:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span, Match};
let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert_eq!(Some(Match::must(0, 0..17)), caps.get_match());
assert_eq!(Some(Span::from(0..5)), caps.get_group(1));
assert_eq!(Some(Span::from(6..17)), caps.get_group(2));
// Looking for a non-existent capturing group will return None:
assert_eq!(None, caps.get_group(3));
assert_eq!(None, caps.get_group(9944060567225171988));
sourcepub fn get_group_by_name(&self, name: &str) -> Option<Span>
pub fn get_group_by_name(&self, name: &str) -> Option<Span>
Returns the span of a capturing group match corresponding to the group name given, only if both the overall pattern matched and the capturing group participated in that match.
This returns None
if name
does not correspond to a valid capturing
group for the pattern that matched.
This always returns None
when Captures
was created with
Captures::empty
, even if a match was found. This also always
returns None
for any index > 0
when Captures
was created with
Captures::matches
.
If this routine returns a non-None
value, then is_match
is
guaranteed to return true
, pattern
is guaranteed to return a
non-None
value and get_match
is guaranteed to return a non-None
value.
§Example
This example shows how to get the capturing groups, by name, from a match:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span, Match};
let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert_eq!(Some(Match::must(0, 0..17)), caps.get_match());
assert_eq!(Some(Span::from(0..5)), caps.get_group_by_name("first"));
assert_eq!(Some(Span::from(6..17)), caps.get_group_by_name("last"));
// Looking for a non-existent capturing group will return None:
assert_eq!(None, caps.get_group_by_name("middle"));
sourcepub fn iter(&self) -> CapturesPatternIter<'_> ⓘ
pub fn iter(&self) -> CapturesPatternIter<'_> ⓘ
Returns an iterator of possible spans for every capturing group in the matching pattern.
If this Captures
value does not correspond to a match, then the
iterator returned yields no elements.
Note that the iterator returned yields elements of type Option<Span>
.
A span is present if and only if it corresponds to a capturing group
that participated in a match.
§Example
This example shows how to collect all capturing groups:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};
let re = PikeVM::new(
// Matches first/last names, with an optional middle name.
r"^(?P<first>\pL+)\s+(?:(?P<middle>\pL+)\s+)?(?P<last>\pL+)$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Harry James Potter", &mut caps);
assert!(caps.is_match());
let groups: Vec<Option<Span>> = caps.iter().collect();
assert_eq!(groups, vec![
Some(Span::from(0..18)),
Some(Span::from(0..5)),
Some(Span::from(6..11)),
Some(Span::from(12..18)),
]);
This example uses the same regex as the previous example, but with a haystack that omits the middle name. This results in a capturing group that is present in the elements yielded by the iterator but without a match:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};
let re = PikeVM::new(
// Matches first/last names, with an optional middle name.
r"^(?P<first>\pL+)\s+(?:(?P<middle>\pL+)\s+)?(?P<last>\pL+)$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Harry Potter", &mut caps);
assert!(caps.is_match());
let groups: Vec<Option<Span>> = caps.iter().collect();
assert_eq!(groups, vec![
Some(Span::from(0..12)),
Some(Span::from(0..5)),
None,
Some(Span::from(6..12)),
]);
sourcepub fn group_len(&self) -> usize
pub fn group_len(&self) -> usize
Return the total number of capturing groups for the matching pattern.
If this Captures
value does not correspond to a match, then this
always returns 0
.
This always returns the same number of elements yielded by
Captures::iter
. That is, the number includes capturing groups even
if they don’t participate in the match.
§Example
This example shows how to count the total number of capturing groups
associated with a pattern. Notice that it includes groups that did not
participate in a match (just like Captures::iter
does).
use regex_automata::nfa::thompson::pikevm::PikeVM;
let re = PikeVM::new(
// Matches first/last names, with an optional middle name.
r"^(?P<first>\pL+)\s+(?:(?P<middle>\pL+)\s+)?(?P<last>\pL+)$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Harry Potter", &mut caps);
assert_eq!(4, caps.group_len());
sourcepub fn group_info(&self) -> &GroupInfo
pub fn group_info(&self) -> &GroupInfo
Returns a reference to the underlying group info on which these captures are based.
The difference between GroupInfo
and Captures
is that the former
defines the structure of capturing groups where as the latter is what
stores the actual match information. So where as Captures
only gives
you access to the current match, GroupInfo
lets you query any
information about all capturing groups, even ones for patterns that
weren’t involved in a match.
Note that a GroupInfo
uses reference counting internally, so it may
be cloned cheaply.
§Example
This example shows how to get all capturing group names from the
underlying GroupInfo
. Notice that we don’t even need to run a
search.
use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};
let re = PikeVM::new_many(&[
r"(?P<foo>a)",
r"(a)(b)",
r"ab",
r"(?P<bar>a)(?P<quux>a)",
r"(?P<foo>z)",
])?;
let caps = re.create_captures();
let expected = vec![
(PatternID::must(0), 0, None),
(PatternID::must(0), 1, Some("foo")),
(PatternID::must(1), 0, None),
(PatternID::must(1), 1, None),
(PatternID::must(1), 2, None),
(PatternID::must(2), 0, None),
(PatternID::must(3), 0, None),
(PatternID::must(3), 1, Some("bar")),
(PatternID::must(3), 2, Some("quux")),
(PatternID::must(4), 0, None),
(PatternID::must(4), 1, Some("foo")),
];
// We could also just use 're.get_nfa().group_info()'.
let got: Vec<(PatternID, usize, Option<&str>)> =
caps.group_info().all_names().collect();
assert_eq!(expected, got);
sourcepub fn interpolate_string(&self, haystack: &str, replacement: &str) -> String
pub fn interpolate_string(&self, haystack: &str, replacement: &str) -> String
Interpolates the capture references in replacement
with the
corresponding substrings in haystack
matched by each reference. The
interpolated string is returned.
See the interpolate
module for documentation on the
format of the replacement string.
§Example
This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.
use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};
let re = PikeVM::new_many(&[
r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let replacement = "year=$year, month=$month, day=$day";
// This matches the first pattern.
let hay = "On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_string(hay, replacement);
assert_eq!("year=2010, month=03, day=14", result);
// And this matches the second pattern.
let hay = "On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_string(hay, replacement);
assert_eq!("year=2010, month=03, day=14", result);
sourcepub fn interpolate_string_into(
&self,
haystack: &str,
replacement: &str,
dst: &mut String
)
pub fn interpolate_string_into( &self, haystack: &str, replacement: &str, dst: &mut String )
Interpolates the capture references in replacement
with the
corresponding substrings in haystack
matched by each reference. The
interpolated string is written to dst
.
See the interpolate
module for documentation on the
format of the replacement string.
§Example
This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.
use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};
let re = PikeVM::new_many(&[
r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let replacement = "year=$year, month=$month, day=$day";
// This matches the first pattern.
let hay = "On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = String::new();
caps.interpolate_string_into(hay, replacement, &mut dst);
assert_eq!("year=2010, month=03, day=14", dst);
// And this matches the second pattern.
let hay = "On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = String::new();
caps.interpolate_string_into(hay, replacement, &mut dst);
assert_eq!("year=2010, month=03, day=14", dst);
sourcepub fn interpolate_bytes(&self, haystack: &[u8], replacement: &[u8]) -> Vec<u8>
pub fn interpolate_bytes(&self, haystack: &[u8], replacement: &[u8]) -> Vec<u8>
Interpolates the capture references in replacement
with the
corresponding substrings in haystack
matched by each reference. The
interpolated byte string is returned.
See the interpolate
module for documentation on the
format of the replacement string.
§Example
This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.
use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};
let re = PikeVM::new_many(&[
r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let replacement = b"year=$year, month=$month, day=$day";
// This matches the first pattern.
let hay = b"On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_bytes(hay, replacement);
assert_eq!(&b"year=2010, month=03, day=14"[..], result);
// And this matches the second pattern.
let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_bytes(hay, replacement);
assert_eq!(&b"year=2010, month=03, day=14"[..], result);
sourcepub fn interpolate_bytes_into(
&self,
haystack: &[u8],
replacement: &[u8],
dst: &mut Vec<u8>
)
pub fn interpolate_bytes_into( &self, haystack: &[u8], replacement: &[u8], dst: &mut Vec<u8> )
Interpolates the capture references in replacement
with the
corresponding substrings in haystack
matched by each reference. The
interpolated byte string is written to dst
.
See the interpolate
module for documentation on the
format of the replacement string.
§Example
This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.
use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};
let re = PikeVM::new_many(&[
r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let replacement = b"year=$year, month=$month, day=$day";
// This matches the first pattern.
let hay = b"On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = vec![];
caps.interpolate_bytes_into(hay, replacement, &mut dst);
assert_eq!(&b"year=2010, month=03, day=14"[..], dst);
// And this matches the second pattern.
let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = vec![];
caps.interpolate_bytes_into(hay, replacement, &mut dst);
assert_eq!(&b"year=2010, month=03, day=14"[..], dst);
sourcepub fn extract<'h, const N: usize>(
&self,
haystack: &'h str
) -> (&'h str, [&'h str; N])
pub fn extract<'h, const N: usize>( &self, haystack: &'h str ) -> (&'h str, [&'h str; N])
This is a convenience routine for extracting the substrings
corresponding to matching capture groups in the given haystack
. The
haystack
should be the same substring used to find the match spans in
this Captures
value.
This is identical to Captures::extract_bytes
, except it works with
&str
instead of &[u8]
.
§Panics
This panics if the number of explicit matching groups in this
Captures
value is less than N
. This also panics if this Captures
value does not correspond to a match.
Note that this does not panic if the number of explicit matching
groups is bigger than N
. In that case, only the first N
matching
groups are extracted.
§Example
use regex_automata::nfa::thompson::pikevm::PikeVM;
let re = PikeVM::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let hay = "On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
assert!(caps.is_match());
let (full, [year, month, day]) = caps.extract(hay);
assert_eq!("2010-03-14", full);
assert_eq!("2010", year);
assert_eq!("03", month);
assert_eq!("14", day);
// We can also ask for fewer than all capture groups.
let (full, [year]) = caps.extract(hay);
assert_eq!("2010-03-14", full);
assert_eq!("2010", year);
sourcepub fn extract_bytes<'h, const N: usize>(
&self,
haystack: &'h [u8]
) -> (&'h [u8], [&'h [u8]; N])
pub fn extract_bytes<'h, const N: usize>( &self, haystack: &'h [u8] ) -> (&'h [u8], [&'h [u8]; N])
This is a convenience routine for extracting the substrings
corresponding to matching capture groups in the given haystack
. The
haystack
should be the same substring used to find the match spans in
this Captures
value.
This is identical to Captures::extract
, except it works with
&[u8]
instead of &str
.
§Panics
This panics if the number of explicit matching groups in this
Captures
value is less than N
. This also panics if this Captures
value does not correspond to a match.
Note that this does not panic if the number of explicit matching
groups is bigger than N
. In that case, only the first N
matching
groups are extracted.
§Example
use regex_automata::nfa::thompson::pikevm::PikeVM;
let re = PikeVM::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
assert!(caps.is_match());
let (full, [year, month, day]) = caps.extract_bytes(hay);
assert_eq!(b"2010-03-14", full);
assert_eq!(b"2010", year);
assert_eq!(b"03", month);
assert_eq!(b"14", day);
// We can also ask for fewer than all capture groups.
let (full, [year]) = caps.extract_bytes(hay);
assert_eq!(b"2010-03-14", full);
assert_eq!(b"2010", year);
source§impl Captures
impl Captures
Lower level “slot” oriented APIs. One does not typically need to use these
when executing a search. They are instead mostly intended for folks that
are writing their own regex engine while reusing this Captures
type.
sourcepub fn clear(&mut self)
pub fn clear(&mut self)
Clear this Captures
value.
After clearing, all slots inside this Captures
value will be set to
None
. Similarly, any pattern ID that it was previously associated
with (for a match) is erased.
It is not usually necessary to call this routine. Namely, a Captures
value only provides high level access to the capturing groups of the
pattern that matched, and only low level access to individual slots.
Thus, even if slots corresponding to groups that aren’t associated
with the matching pattern are set, then it won’t impact the higher
level APIs. Namely, higher level APIs like Captures::get_group
will
return None
if no pattern ID is present, even if there are spans set
in the underlying slots.
Thus, to “clear” a Captures
value of a match, it is usually only
necessary to call Captures::set_pattern
with None
.
§Example
This example shows what happens when a Captures
value is cleared.
use regex_automata::nfa::thompson::pikevm::PikeVM;
let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert!(caps.is_match());
let slots: Vec<Option<usize>> =
caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
// Note that the following ordering is considered an API guarantee.
assert_eq!(slots, vec![
Some(0),
Some(17),
Some(0),
Some(5),
Some(6),
Some(17),
]);
// Now clear the slots. Everything is gone and it is no longer a match.
caps.clear();
assert!(!caps.is_match());
let slots: Vec<Option<usize>> =
caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
assert_eq!(slots, vec![
None,
None,
None,
None,
None,
None,
]);
sourcepub fn set_pattern(&mut self, pid: Option<PatternID>)
pub fn set_pattern(&mut self, pid: Option<PatternID>)
Set the pattern on this Captures
value.
When the pattern ID is None
, then this Captures
value does not
correspond to a match (is_match
will return false
). Otherwise, it
corresponds to a match.
This is useful in search implementations where you might want to
initially call set_pattern(None)
in order to avoid the cost of
calling clear()
if it turns out to not be necessary.
§Example
This example shows that set_pattern
merely overwrites the pattern ID.
It does not actually change the underlying slot values.
use regex_automata::nfa::thompson::pikevm::PikeVM;
let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert!(caps.is_match());
assert!(caps.pattern().is_some());
let slots: Vec<Option<usize>> =
caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
// Note that the following ordering is considered an API guarantee.
assert_eq!(slots, vec![
Some(0),
Some(17),
Some(0),
Some(5),
Some(6),
Some(17),
]);
// Now set the pattern to None. Note that the slot values remain.
caps.set_pattern(None);
assert!(!caps.is_match());
assert!(!caps.pattern().is_some());
let slots: Vec<Option<usize>> =
caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
// Note that the following ordering is considered an API guarantee.
assert_eq!(slots, vec![
Some(0),
Some(17),
Some(0),
Some(5),
Some(6),
Some(17),
]);
sourcepub fn slots(&self) -> &[Option<NonMaxUsize>]
pub fn slots(&self) -> &[Option<NonMaxUsize>]
Returns the underlying slots, where each slot stores a single offset.
Every matching capturing group generally corresponds to two slots: one slot for the starting position and another for the ending position. Typically, either both are present or neither are. (The weasel word “typically” is used here because it really depends on the regex engine implementation. Every sensible regex engine likely adheres to this invariant, and every regex engine in this crate is sensible.)
Generally speaking, callers should prefer to use higher level routines
like Captures::get_match
or Captures::get_group
.
An important note here is that a regex engine may not reset all of the
slots to None
values when no match occurs, or even when a match of
a different pattern occurs. But this depends on how the regex engine
implementation deals with slots.
§Example
This example shows how to get the underlying slots from a regex match.
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::primitives::{PatternID, NonMaxUsize},
};
let re = PikeVM::new_many(&[
r"[a-z]+",
r"[0-9]+",
])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "123", &mut caps);
assert_eq!(Some(PatternID::must(1)), caps.pattern());
// Note that the only guarantee we have here is that slots 2 and 3
// are set to correct values. The contents of the first two slots are
// unspecified since the 0th pattern did not match.
let expected = &[
None,
None,
NonMaxUsize::new(0),
NonMaxUsize::new(3),
];
assert_eq!(expected, caps.slots());
sourcepub fn slots_mut(&mut self) -> &mut [Option<NonMaxUsize>]
pub fn slots_mut(&mut self) -> &mut [Option<NonMaxUsize>]
Returns the underlying slots as a mutable slice, where each slot stores a single offset.
This tends to be most useful for regex engine implementations for writing offsets for matching capturing groups to slots.
See Captures::slots
for more information about slots.