Struct regex_automata::util::captures::GroupInfo
source · pub struct GroupInfo(/* private fields */);
Expand description
Represents information about capturing groups in a compiled regex.
The information encapsulated by this type consists of the following. For each pattern:
- A map from every capture group name to its corresponding capture group index.
- A map from every capture group index to its corresponding capture group name.
- A map from capture group index to its corresponding slot index. A slot refers to one half of a capturing group. That is, a capture slot is either the start or end of a capturing group. A slot is usually the mechanism by which a regex engine records offsets for each capturing group during a search.
A GroupInfo
uses reference counting internally and is thus cheap to
clone.
Mapping from capture groups to slots
One of the main responsibilities of a GroupInfo
is to build a mapping
from (PatternID, u32)
(where the u32
is a capture index) to something
called a “slot.” As mentioned above, a slot refers to one half of a
capturing group. Both combined provide the start and end offsets of
a capturing group that participated in a match.
The mapping between group indices and slots is an API guarantee. That is, the mapping won’t change within a semver compatible release.
Slots exist primarily because this is a convenient mechanism by which
regex engines report group offsets at search time. For example, the
nfa::thompson::State::Capture
NFA state includes the slot index. When a regex engine transitions through
this state, it will likely use the slot index to write the current haystack
offset to some region of memory. When a match is found, those slots are
then reported to the caller, typically via a convenient abstraction like a
Captures
value.
Because this crate provides first class support for multi-pattern regexes,
and because of some performance related reasons, the mapping between
capturing groups and slots is a little complex. However, in the case of a
single pattern, the mapping can be described very simply: for all capture
group indices i
, its corresponding slots are at i * 2
and i * 2 + 1
.
Notice that the pattern ID isn’t involved at all here, because it only
applies to a single-pattern regex, it is therefore always 0
.
In the multi-pattern case, the mapping is a bit more complicated. To talk about it, we must define what we mean by “implicit” vs “explicit” capturing groups:
- An implicit capturing group refers to the capturing group that is
present for every pattern automatically, and corresponds to the overall
match of a pattern. Every pattern has precisely one implicit capturing
group. It is always unnamed and it always corresponds to the capture group
index
0
. - An explicit capturing group refers to any capturing group that
appears in the concrete syntax of the pattern. (Or, if an NFA was hand
built without any concrete syntax, it refers to any capturing group with an
index greater than
0
.)
Some examples:
\w+
has one implicit capturing group and zero explicit capturing groups.(\w+)
has one implicit group and one explicit group.foo(\d+)(?:\pL+)(\d+)
has one implicit group and two explicit groups.
Turning back to the slot mapping, we can now state it as follows:
- Given a pattern ID
pid
, the slots for its implicit group are always atpid * 2
andpid * 2 + 1
. - Given a pattern ID
0
, the slots for its explicit groups start atgroup_info.pattern_len() * 2
. - Given a pattern ID
pid > 0
, the slots for its explicit groups start immediately following where the slots for the explicit groups ofpid - 1
end.
In particular, while there is a concrete formula one can use to determine where the slots for the implicit group of any pattern are, there is no general formula for determining where the slots for explicit capturing groups are. This is because each pattern can contain a different number of groups.
The intended way of getting the slots for a particular capturing group
(whether implicit or explicit) is via the GroupInfo::slot
or
GroupInfo::slots
method.
See below for a concrete example of how capturing groups get mapped to slots.
Example
This example shows how to build a new GroupInfo
and query it for
information.
use regex_automata::util::{captures::GroupInfo, primitives::PatternID};
let info = GroupInfo::new(vec![
vec![None, Some("foo")],
vec![None],
vec![None, None, None, Some("bar"), None],
vec![None, None, Some("foo")],
])?;
// The number of patterns being tracked.
assert_eq!(4, info.pattern_len());
// We can query the number of groups for any pattern.
assert_eq!(2, info.group_len(PatternID::must(0)));
assert_eq!(1, info.group_len(PatternID::must(1)));
assert_eq!(5, info.group_len(PatternID::must(2)));
assert_eq!(3, info.group_len(PatternID::must(3)));
// An invalid pattern always has zero groups.
assert_eq!(0, info.group_len(PatternID::must(999)));
// 2 slots per group
assert_eq!(22, info.slot_len());
// We can map a group index for a particular pattern to its name, if
// one exists.
assert_eq!(Some("foo"), info.to_name(PatternID::must(3), 2));
assert_eq!(None, info.to_name(PatternID::must(2), 4));
// Or map a name to its group index.
assert_eq!(Some(1), info.to_index(PatternID::must(0), "foo"));
assert_eq!(Some(2), info.to_index(PatternID::must(3), "foo"));
Example: mapping from capture groups to slots
This example shows the specific mapping from capture group indices for each pattern to their corresponding slots. The slot values shown in this example are considered an API guarantee.
use regex_automata::util::{captures::GroupInfo, primitives::PatternID};
let info = GroupInfo::new(vec![
vec![None, Some("foo")],
vec![None],
vec![None, None, None, Some("bar"), None],
vec![None, None, Some("foo")],
])?;
// We first show the slots for each pattern's implicit group.
assert_eq!(Some((0, 1)), info.slots(PatternID::must(0), 0));
assert_eq!(Some((2, 3)), info.slots(PatternID::must(1), 0));
assert_eq!(Some((4, 5)), info.slots(PatternID::must(2), 0));
assert_eq!(Some((6, 7)), info.slots(PatternID::must(3), 0));
// And now we show the slots for each pattern's explicit group.
assert_eq!(Some((8, 9)), info.slots(PatternID::must(0), 1));
assert_eq!(Some((10, 11)), info.slots(PatternID::must(2), 1));
assert_eq!(Some((12, 13)), info.slots(PatternID::must(2), 2));
assert_eq!(Some((14, 15)), info.slots(PatternID::must(2), 3));
assert_eq!(Some((16, 17)), info.slots(PatternID::must(2), 4));
assert_eq!(Some((18, 19)), info.slots(PatternID::must(3), 1));
assert_eq!(Some((20, 21)), info.slots(PatternID::must(3), 2));
// Asking for the slots for an invalid pattern ID or even for an invalid
// group index for a specific pattern will return None. So for example,
// you're guaranteed to not get the slots for a different pattern than the
// one requested.
assert_eq!(None, info.slots(PatternID::must(5), 0));
assert_eq!(None, info.slots(PatternID::must(1), 1));
Implementations§
source§impl GroupInfo
impl GroupInfo
sourcepub fn new<P, G, N>(pattern_groups: P) -> Result<GroupInfo, GroupInfoError>
pub fn new<P, G, N>(pattern_groups: P) -> Result<GroupInfo, GroupInfoError>
Creates a new group info from a sequence of patterns, where each
sequence of patterns yields a sequence of possible group names. The
index of each pattern in the sequence corresponds to its PatternID
,
and the index of each group in each pattern’s sequence corresponds to
its corresponding group index.
While this constructor is very generic and therefore perhaps hard to
chew on, an example of a valid concrete type that can be passed to
this constructor is Vec<Vec<Option<String>>>
. The outer Vec
corresponds to the patterns, i.e., one Vec<Option<String>>
per
pattern. The inner Vec
corresponds to the capturing groups for
each pattern. The Option<String>
corresponds to the name of the
capturing group, if present.
It is legal to pass an empty iterator to this constructor. It will return an empty group info with zero slots. An empty group info is useful for cases where you have no patterns or for cases where slots aren’t being used at all (e.g., for most DFAs in this crate).
Errors
This constructor returns an error if the given capturing groups are invalid in some way. Those reasons include, but are not necessarily limited to:
- Too many patterns (i.e.,
PatternID
would overflow). - Too many capturing groups (e.g.,
u32
would overflow). - A pattern is given that has no capturing groups. (All patterns must
have at least an implicit capturing group at index
0
.) - The capturing group at index
0
has a name. It must be unnamed. - There are duplicate capturing group names within the same pattern. (Multiple capturing groups with the same name may exist, but they must be in different patterns.)
An example below shows how to trigger some of the above error conditions.
Example
This example shows how to build a new GroupInfo
and query it for
information.
use regex_automata::util::captures::GroupInfo;
let info = GroupInfo::new(vec![
vec![None, Some("foo")],
vec![None],
vec![None, None, None, Some("bar"), None],
vec![None, None, Some("foo")],
])?;
// The number of patterns being tracked.
assert_eq!(4, info.pattern_len());
// 2 slots per group
assert_eq!(22, info.slot_len());
Example: empty GroupInfo
This example shows how to build a new GroupInfo
and query it for
information.
use regex_automata::util::captures::GroupInfo;
let info = GroupInfo::empty();
// Everything is zero.
assert_eq!(0, info.pattern_len());
assert_eq!(0, info.slot_len());
Example: error conditions
This example shows how to provoke some of the ways in which building
a GroupInfo
can fail.
use regex_automata::util::captures::GroupInfo;
// Either the group info is empty, or all patterns must have at least
// one capturing group.
assert!(GroupInfo::new(vec![
vec![None, Some("a")], // ok
vec![None], // ok
vec![], // not ok
]).is_err());
// Note that building an empty group info is OK.
assert!(GroupInfo::new(Vec::<Vec<Option<String>>>::new()).is_ok());
// The first group in each pattern must correspond to an implicit
// anonymous group. i.e., One that is not named. By convention, this
// group corresponds to the overall match of a regex. Every other group
// in a pattern is explicit and optional.
assert!(GroupInfo::new(vec![vec![Some("foo")]]).is_err());
// There must not be duplicate group names within the same pattern.
assert!(GroupInfo::new(vec![
vec![None, Some("foo"), Some("foo")],
]).is_err());
// But duplicate names across distinct patterns is OK.
assert!(GroupInfo::new(vec![
vec![None, Some("foo")],
vec![None, Some("foo")],
]).is_ok());
There are other ways for building a GroupInfo
to fail but are
difficult to show. For example, if the number of patterns given would
overflow PatternID
.
sourcepub fn empty() -> GroupInfo
pub fn empty() -> GroupInfo
This creates an empty GroupInfo
.
This is a convenience routine for calling GroupInfo::new
with an
iterator that yields no elements.
Example
This example shows how to build a new empty GroupInfo
and query it
for information.
use regex_automata::util::captures::GroupInfo;
let info = GroupInfo::empty();
// Everything is zero.
assert_eq!(0, info.pattern_len());
assert_eq!(0, info.all_group_len());
assert_eq!(0, info.slot_len());
sourcepub fn to_index(&self, pid: PatternID, name: &str) -> Option<usize>
pub fn to_index(&self, pid: PatternID, name: &str) -> Option<usize>
Return the capture group index corresponding to the given name in the
given pattern. If no such capture group name exists in the given
pattern, then this returns None
.
If the given pattern ID is invalid, then this returns None
.
This also returns None
for all inputs if these captures are empty
(e.g., built from an empty GroupInfo
). To check whether captures
are are present for a specific pattern, use GroupInfo::group_len
.
Example
This example shows how to find the capture index for the given pattern and group name.
Remember that capture indices are relative to the pattern, such that the same capture index value may refer to different capturing groups for distinct patterns.
use regex_automata::{nfa::thompson::NFA, PatternID};
let (pid0, pid1) = (PatternID::must(0), PatternID::must(1));
let nfa = NFA::new_many(&[
r"a(?P<quux>\w+)z(?P<foo>\s+)",
r"a(?P<foo>\d+)z",
])?;
let groups = nfa.group_info();
assert_eq!(Some(2), groups.to_index(pid0, "foo"));
// Recall that capture index 0 is always unnamed and refers to the
// entire pattern. So the first capturing group present in the pattern
// itself always starts at index 1.
assert_eq!(Some(1), groups.to_index(pid1, "foo"));
// And if a name does not exist for a particular pattern, None is
// returned.
assert!(groups.to_index(pid0, "quux").is_some());
assert!(groups.to_index(pid1, "quux").is_none());
sourcepub fn to_name(&self, pid: PatternID, group_index: usize) -> Option<&str>
pub fn to_name(&self, pid: PatternID, group_index: usize) -> Option<&str>
Return the capture name for the given index and given pattern. If the
corresponding group does not have a name, then this returns None
.
If the pattern ID is invalid, then this returns None
.
If the group index is invalid for the given pattern, then this returns
None
. A group index
is valid for a pattern pid
in an nfa
if and
only if index < nfa.pattern_capture_len(pid)
.
This also returns None
for all inputs if these captures are empty
(e.g., built from an empty GroupInfo
). To check whether captures
are are present for a specific pattern, use GroupInfo::group_len
.
Example
This example shows how to find the capture group name for the given pattern and group index.
use regex_automata::{nfa::thompson::NFA, PatternID};
let (pid0, pid1) = (PatternID::must(0), PatternID::must(1));
let nfa = NFA::new_many(&[
r"a(?P<foo>\w+)z(\s+)x(\d+)",
r"a(\d+)z(?P<foo>\s+)",
])?;
let groups = nfa.group_info();
assert_eq!(None, groups.to_name(pid0, 0));
assert_eq!(Some("foo"), groups.to_name(pid0, 1));
assert_eq!(None, groups.to_name(pid0, 2));
assert_eq!(None, groups.to_name(pid0, 3));
assert_eq!(None, groups.to_name(pid1, 0));
assert_eq!(None, groups.to_name(pid1, 1));
assert_eq!(Some("foo"), groups.to_name(pid1, 2));
// '3' is not a valid capture index for the second pattern.
assert_eq!(None, groups.to_name(pid1, 3));
sourcepub fn pattern_names(&self, pid: PatternID) -> GroupInfoPatternNames<'_> ⓘ
pub fn pattern_names(&self, pid: PatternID) -> GroupInfoPatternNames<'_> ⓘ
Return an iterator of all capture groups and their names (if present) for a particular pattern.
If the given pattern ID is invalid or if this GroupInfo
is empty,
then the iterator yields no elements.
The number of elements yielded by this iterator is always equal to
the result of calling GroupInfo::group_len
with the same
PatternID
.
Example
This example shows how to get a list of all capture group names for a particular pattern.
use regex_automata::{nfa::thompson::NFA, PatternID};
let nfa = NFA::new(r"(a)(?P<foo>b)(c)(d)(?P<bar>e)")?;
// The first is the implicit group that is always unnammed. The next
// 5 groups are the explicit groups found in the concrete syntax above.
let expected = vec![None, None, Some("foo"), None, None, Some("bar")];
let got: Vec<Option<&str>> =
nfa.group_info().pattern_names(PatternID::ZERO).collect();
assert_eq!(expected, got);
// Using an invalid pattern ID will result in nothing yielded.
let got = nfa.group_info().pattern_names(PatternID::must(999)).count();
assert_eq!(0, got);
sourcepub fn all_names(&self) -> GroupInfoAllNames<'_> ⓘ
pub fn all_names(&self) -> GroupInfoAllNames<'_> ⓘ
Return an iterator of all capture groups for all patterns supported by
this GroupInfo
. Each item yielded is a triple of the group’s pattern
ID, index in the pattern and the group’s name, if present.
Example
This example shows how to get a list of all capture groups found in one NFA, potentially spanning multiple patterns.
use regex_automata::{nfa::thompson::NFA, PatternID};
let nfa = NFA::new_many(&[
r"(?P<foo>a)",
r"a",
r"(a)",
])?;
let expected = vec![
(PatternID::must(0), 0, None),
(PatternID::must(0), 1, Some("foo")),
(PatternID::must(1), 0, None),
(PatternID::must(2), 0, None),
(PatternID::must(2), 1, None),
];
let got: Vec<(PatternID, usize, Option<&str>)> =
nfa.group_info().all_names().collect();
assert_eq!(expected, got);
Unlike other capturing group related routines, this routine doesn’t panic even if captures aren’t enabled on this NFA:
use regex_automata::nfa::thompson::{NFA, WhichCaptures};
let nfa = NFA::compiler()
.configure(NFA::config().which_captures(WhichCaptures::None))
.build_many(&[
r"(?P<foo>a)",
r"a",
r"(a)",
])?;
// When captures aren't enabled, there's nothing to return.
assert_eq!(0, nfa.group_info().all_names().count());
sourcepub fn slots(
&self,
pid: PatternID,
group_index: usize
) -> Option<(usize, usize)>
pub fn slots( &self, pid: PatternID, group_index: usize ) -> Option<(usize, usize)>
Returns the starting and ending slot corresponding to the given capturing group for the given pattern. The ending slot is always one more than the starting slot returned.
Note that this is like GroupInfo::slot
, except that it also returns
the ending slot value for convenience.
If either the pattern ID or the capture index is invalid, then this returns None.
Example
This example shows that the starting slots for the first capturing group of each pattern are distinct.
use regex_automata::{nfa::thompson::NFA, PatternID};
let nfa = NFA::new_many(&["a", "b"])?;
assert_ne!(
nfa.group_info().slots(PatternID::must(0), 0),
nfa.group_info().slots(PatternID::must(1), 0),
);
// Also, the start and end slot values are never equivalent.
let (start, end) = nfa.group_info().slots(PatternID::ZERO, 0).unwrap();
assert_ne!(start, end);
sourcepub fn slot(&self, pid: PatternID, group_index: usize) -> Option<usize>
pub fn slot(&self, pid: PatternID, group_index: usize) -> Option<usize>
Returns the starting slot corresponding to the given capturing group for the given pattern. The ending slot is always one more than the value returned.
If either the pattern ID or the capture index is invalid, then this returns None.
Example
This example shows that the starting slots for the first capturing group of each pattern are distinct.
use regex_automata::{nfa::thompson::NFA, PatternID};
let nfa = NFA::new_many(&["a", "b"])?;
assert_ne!(
nfa.group_info().slot(PatternID::must(0), 0),
nfa.group_info().slot(PatternID::must(1), 0),
);
sourcepub fn pattern_len(&self) -> usize
pub fn pattern_len(&self) -> usize
Returns the total number of patterns in this GroupInfo
.
This may return zero if the GroupInfo
was constructed with no
patterns.
This is guaranteed to be no bigger than PatternID::LIMIT
because
GroupInfo
construction will fail if too many patterns are added.
Example
use regex_automata::nfa::thompson::NFA;
let nfa = NFA::new_many(&["[0-9]+", "[a-z]+", "[A-Z]+"])?;
assert_eq!(3, nfa.group_info().pattern_len());
let nfa = NFA::never_match();
assert_eq!(0, nfa.group_info().pattern_len());
let nfa = NFA::always_match();
assert_eq!(1, nfa.group_info().pattern_len());
sourcepub fn group_len(&self, pid: PatternID) -> usize
pub fn group_len(&self, pid: PatternID) -> usize
Return the number of capture groups in a pattern.
If the pattern ID is invalid, then this returns 0
.
Example
This example shows how the values returned by this routine may vary for different patterns and NFA configurations.
use regex_automata::{nfa::thompson::{NFA, WhichCaptures}, PatternID};
let nfa = NFA::new(r"(a)(b)(c)")?;
// There are 3 explicit groups in the pattern's concrete syntax and
// 1 unnamed and implicit group spanning the entire pattern.
assert_eq!(4, nfa.group_info().group_len(PatternID::ZERO));
let nfa = NFA::new(r"abc")?;
// There is just the unnamed implicit group.
assert_eq!(1, nfa.group_info().group_len(PatternID::ZERO));
let nfa = NFA::compiler()
.configure(NFA::config().which_captures(WhichCaptures::None))
.build(r"abc")?;
// We disabled capturing groups, so there are none.
assert_eq!(0, nfa.group_info().group_len(PatternID::ZERO));
let nfa = NFA::compiler()
.configure(NFA::config().which_captures(WhichCaptures::None))
.build(r"(a)(b)(c)")?;
// We disabled capturing groups, so there are none, even if there are
// explicit groups in the concrete syntax.
assert_eq!(0, nfa.group_info().group_len(PatternID::ZERO));
sourcepub fn all_group_len(&self) -> usize
pub fn all_group_len(&self) -> usize
Return the total number of capture groups across all patterns.
This includes implicit groups that represent the entire match of a pattern.
Example
This example shows how the values returned by this routine may vary for different patterns and NFA configurations.
use regex_automata::{nfa::thompson::{NFA, WhichCaptures}, PatternID};
let nfa = NFA::new(r"(a)(b)(c)")?;
// There are 3 explicit groups in the pattern's concrete syntax and
// 1 unnamed and implicit group spanning the entire pattern.
assert_eq!(4, nfa.group_info().all_group_len());
let nfa = NFA::new(r"abc")?;
// There is just the unnamed implicit group.
assert_eq!(1, nfa.group_info().all_group_len());
let nfa = NFA::new_many(&["(a)", "b", "(c)"])?;
// Each pattern has one implicit groups, and two
// patterns have one explicit group each.
assert_eq!(5, nfa.group_info().all_group_len());
let nfa = NFA::compiler()
.configure(NFA::config().which_captures(WhichCaptures::None))
.build(r"abc")?;
// We disabled capturing groups, so there are none.
assert_eq!(0, nfa.group_info().all_group_len());
let nfa = NFA::compiler()
.configure(NFA::config().which_captures(WhichCaptures::None))
.build(r"(a)(b)(c)")?;
// We disabled capturing groups, so there are none, even if there are
// explicit groups in the concrete syntax.
assert_eq!(0, nfa.group_info().group_len(PatternID::ZERO));
sourcepub fn slot_len(&self) -> usize
pub fn slot_len(&self) -> usize
Returns the total number of slots in this GroupInfo
across all
patterns.
The total number of slots is always twice the total number of capturing groups, including both implicit and explicit groups.
Example
This example shows the relationship between the number of capturing groups and slots.
use regex_automata::util::captures::GroupInfo;
// There are 11 total groups here.
let info = GroupInfo::new(vec![
vec![None, Some("foo")],
vec![None],
vec![None, None, None, Some("bar"), None],
vec![None, None, Some("foo")],
])?;
// 2 slots per group gives us 11*2=22 slots.
assert_eq!(22, info.slot_len());
sourcepub fn implicit_slot_len(&self) -> usize
pub fn implicit_slot_len(&self) -> usize
Returns the total number of slots for implicit capturing groups.
This is like GroupInfo::slot_len
, except it doesn’t include the
explicit slots for each pattern. Since there are always exactly 2
implicit slots for each pattern, the number of implicit slots is always
equal to twice the number of patterns.
Example
This example shows the relationship between the number of capturing groups, implicit slots and explicit slots.
use regex_automata::util::captures::GroupInfo;
// There are 11 total groups here.
let info = GroupInfo::new(vec![vec![None, Some("foo"), Some("bar")]])?;
// 2 slots per group gives us 11*2=22 slots.
assert_eq!(6, info.slot_len());
// 2 implicit slots per pattern gives us 2 implicit slots since there
// is 1 pattern.
assert_eq!(2, info.implicit_slot_len());
// 2 explicit capturing groups gives us 2*2=4 explicit slots.
assert_eq!(4, info.explicit_slot_len());
sourcepub fn explicit_slot_len(&self) -> usize
pub fn explicit_slot_len(&self) -> usize
Returns the total number of slots for explicit capturing groups.
This is like GroupInfo::slot_len
, except it doesn’t include the
implicit slots for each pattern. (There are always 2 implicit slots for
each pattern.)
For a non-empty GroupInfo
, it is always the case that slot_len
is
strictly greater than explicit_slot_len
. For an empty GroupInfo
,
both the total number of slots and the number of explicit slots is
0
.
Example
This example shows the relationship between the number of capturing groups, implicit slots and explicit slots.
use regex_automata::util::captures::GroupInfo;
// There are 11 total groups here.
let info = GroupInfo::new(vec![vec![None, Some("foo"), Some("bar")]])?;
// 2 slots per group gives us 11*2=22 slots.
assert_eq!(6, info.slot_len());
// 2 implicit slots per pattern gives us 2 implicit slots since there
// is 1 pattern.
assert_eq!(2, info.implicit_slot_len());
// 2 explicit capturing groups gives us 2*2=4 explicit slots.
assert_eq!(4, info.explicit_slot_len());
sourcepub fn memory_usage(&self) -> usize
pub fn memory_usage(&self) -> usize
Returns the memory usage, in bytes, of this GroupInfo
.
This does not include the stack size used up by this GroupInfo
.
To compute that, use std::mem::size_of::<GroupInfo>()
.