pub struct Match<'h> { /* private fields */ }
Expand description
Represents a single match of a regex in a haystack.
A Match
contains both the start and end byte offsets of the match and the
actual substring corresponding to the range of those byte offsets. It is
guaranteed that start <= end
. When start == end
, the match is empty.
Since this Match
can only be produced by the top-level Regex
APIs
that only support searching UTF-8 encoded strings, the byte offsets for a
Match
are guaranteed to fall on valid UTF-8 codepoint boundaries. That
is, slicing a &str
with Match::range
is guaranteed to never panic.
Values with this type are created by Regex::find
or
Regex::find_iter
. Other APIs can create Match
values too. For
example, Captures::get
.
The lifetime parameter 'h
refers to the lifetime of the matched of the
haystack that this match was produced from.
Numbering
The byte offsets in a Match
form a half-open interval. That is, the
start of the range is inclusive and the end of the range is exclusive.
For example, given a haystack abcFOOxyz
and a match of FOO
, its byte
offset range starts at 3
and ends at 6
. 3
corresponds to F
and
6
corresponds to x
, which is one past the end of the match. This
corresponds to the same kind of slicing that Rust uses.
For more on why this was chosen over other schemes (aside from being consistent with how Rust the language works), see this discussion and Dijkstra’s note on a related topic.
Example
This example shows the value of each of the methods on Match
for a
particular search.
use regex::Regex;
let re = Regex::new(r"\p{Greek}+").unwrap();
let hay = "Greek: αβγδ";
let m = re.find(hay).unwrap();
assert_eq!(7, m.start());
assert_eq!(15, m.end());
assert!(!m.is_empty());
assert_eq!(8, m.len());
assert_eq!(7..15, m.range());
assert_eq!("αβγδ", m.as_str());
Implementations§
source§impl<'h> Match<'h>
impl<'h> Match<'h>
sourcepub fn start(&self) -> usize
pub fn start(&self) -> usize
Returns the byte offset of the start of the match in the haystack. The start of the match corresponds to the position where the match begins and includes the first byte in the match.
It is guaranteed that Match::start() <= Match::end()
.
This is guaranteed to fall on a valid UTF-8 codepoint boundary. That is, it will never be an offset that appears between the UTF-8 code units of a UTF-8 encoded Unicode scalar value. Consequently, it is always safe to slice the corresponding haystack using this offset.
sourcepub fn end(&self) -> usize
pub fn end(&self) -> usize
Returns the byte offset of the end of the match in the haystack. The
end of the match corresponds to the byte immediately following the last
byte in the match. This means that &slice[start..end]
works as one
would expect.
It is guaranteed that Match::start() <= Match::end()
.
This is guaranteed to fall on a valid UTF-8 codepoint boundary. That is, it will never be an offset that appears between the UTF-8 code units of a UTF-8 encoded Unicode scalar value. Consequently, it is always safe to slice the corresponding haystack using this offset.
sourcepub fn is_empty(&self) -> bool
pub fn is_empty(&self) -> bool
Returns true if and only if this match has a length of zero.
Note that an empty match can only occur when the regex itself can
match the empty string. Here are some examples of regexes that can
all match the empty string: ^
, ^$
, \b
, a?
, a*
, a{0}
,
(foo|\d+|quux)?
.
sourcepub fn range(&self) -> Range<usize>
pub fn range(&self) -> Range<usize>
Returns the range over the starting and ending byte offsets of the match in the haystack.
It is always correct to slice the original haystack searched with this range. That is, because the offsets are guaranteed to fall on valid UTF-8 boundaries, the range returned is always valid.