Struct regex_automata::meta::Builder
source · pub struct Builder { /* private fields */ }
Expand description
A builder for configuring and constructing a Regex
.
The builder permits configuring two different aspects of a Regex
:
Builder::configure
will set high-level configuration options as described by aConfig
.Builder::syntax
will set the syntax level configuration options as described by autil::syntax::Config
. This only applies when building aRegex
from pattern strings.
Once configured, the builder can then be used to construct a Regex
from
one of 4 different inputs:
Builder::build
creates a regex from a single pattern string.Builder::build_many
creates a regex from many pattern strings.Builder::build_from_hir
creates a regex from aregex-syntax::Hir
expression.Builder::build_many_from_hir
creates a regex from manyregex-syntax::Hir
expressions.
The latter two methods in particular provide a way to construct a fully
feature regular expression matcher directly from an Hir
expression
without having to first convert it to a string. (This is in contrast to the
top-level regex
crate which intentionally provides no such API in order
to avoid making regex-syntax
a public dependency.)
As a convenience, this builder may be created via Regex::builder
, which
may help avoid an extra import.
§Example: change the line terminator
This example shows how to enable multi-line mode by default and change the line terminator to the NUL byte:
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.syntax(syntax::Config::new().multi_line(true))
.configure(Regex::config().line_terminator(b'\x00'))
.build(r"^foo$")?;
let hay = "\x00foo\x00";
assert_eq!(Some(Match::must(0, 1..4)), re.find(hay));
§Example: disable UTF-8 requirement
By default, regex patterns are required to match UTF-8. This includes regex patterns that can produce matches of length zero. In the case of an empty match, by default, matches will not appear between the code units of a UTF-8 encoded codepoint.
However, it can be useful to disable this requirement, particularly if
you’re searching things like &[u8]
that are not known to be valid UTF-8.
use regex_automata::{meta::Regex, util::syntax, Match};
let mut builder = Regex::builder();
// Disables the requirement that non-empty matches match UTF-8.
builder.syntax(syntax::Config::new().utf8(false));
// Disables the requirement that empty matches match UTF-8 boundaries.
builder.configure(Regex::config().utf8_empty(false));
// We can match raw bytes via \xZZ syntax, but we need to disable
// Unicode mode to do that. We could disable it everywhere, or just
// selectively, as shown here.
let re = builder.build(r"(?-u:\xFF)foo(?-u:\xFF)")?;
let hay = b"\xFFfoo\xFF";
assert_eq!(Some(Match::must(0, 0..5)), re.find(hay));
// We can also match between code units.
let re = builder.build(r"")?;
let hay = "☃";
assert_eq!(re.find_iter(hay).collect::<Vec<Match>>(), vec![
Match::must(0, 0..0),
Match::must(0, 1..1),
Match::must(0, 2..2),
Match::must(0, 3..3),
]);
Implementations§
source§impl Builder
impl Builder
sourcepub fn build(&self, pattern: &str) -> Result<Regex, BuildError>
pub fn build(&self, pattern: &str) -> Result<Regex, BuildError>
Builds a Regex
from a single pattern string.
If there was a problem parsing the pattern or a problem turning it into a regex matcher, then an error is returned.
§Example
This example shows how to configure syntax options.
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.syntax(syntax::Config::new().crlf(true).multi_line(true))
.build(r"^foo$")?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));
sourcepub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<Regex, BuildError>
pub fn build_many<P: AsRef<str>>( &self, patterns: &[P], ) -> Result<Regex, BuildError>
Builds a Regex
from many pattern strings.
If there was a problem parsing any of the patterns or a problem turning them into a regex matcher, then an error is returned.
§Example: finding the pattern that caused an error
When a syntax error occurs, it is possible to ask which pattern caused the syntax error.
use regex_automata::{meta::Regex, PatternID};
let err = Regex::builder()
.build_many(&["a", "b", r"\p{Foo}", "c"])
.unwrap_err();
assert_eq!(Some(PatternID::must(2)), err.pattern());
§Example: zero patterns is valid
Building a regex with zero patterns results in a regex that never matches anything. Because this routine is generic, passing an empty slice usually requires a turbo-fish (or something else to help type inference).
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.build_many::<&str>(&[])?;
assert_eq!(None, re.find(""));
sourcepub fn build_from_hir(&self, hir: &Hir) -> Result<Regex, BuildError>
pub fn build_from_hir(&self, hir: &Hir) -> Result<Regex, BuildError>
Builds a Regex
directly from an Hir
expression.
This is useful if you needed to parse a pattern string into an Hir
for other reasons (such as analysis or transformations). This routine
permits building a Regex
directly from the Hir
expression instead
of first converting the Hir
back to a pattern string.
When using this method, any options set via Builder::syntax
are
ignored. Namely, the syntax options only apply when parsing a pattern
string, which isn’t relevant here.
If there was a problem building the underlying regex matcher for the
given Hir
, then an error is returned.
§Example
This example shows how one can hand-construct an Hir
expression and
build a regex from it without doing any parsing at all.
use {
regex_automata::{meta::Regex, Match},
regex_syntax::hir::{Hir, Look},
};
// (?Rm)^foo$
let hir = Hir::concat(vec![
Hir::look(Look::StartCRLF),
Hir::literal("foo".as_bytes()),
Hir::look(Look::EndCRLF),
]);
let re = Regex::builder()
.build_from_hir(&hir)?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));
Ok::<(), Box<dyn std::error::Error>>(())
sourcepub fn build_many_from_hir<H: Borrow<Hir>>(
&self,
hirs: &[H],
) -> Result<Regex, BuildError>
pub fn build_many_from_hir<H: Borrow<Hir>>( &self, hirs: &[H], ) -> Result<Regex, BuildError>
Builds a Regex
directly from many Hir
expressions.
This is useful if you needed to parse pattern strings into Hir
expressions for other reasons (such as analysis or transformations).
This routine permits building a Regex
directly from the Hir
expressions instead of first converting the Hir
expressions back to
pattern strings.
When using this method, any options set via Builder::syntax
are
ignored. Namely, the syntax options only apply when parsing a pattern
string, which isn’t relevant here.
If there was a problem building the underlying regex matcher for the
given Hir
expressions, then an error is returned.
Note that unlike Builder::build_many
, this can only fail as a
result of building the underlying matcher. In that case, there is
no single Hir
expression that can be isolated as a reason for the
failure. So if this routine fails, it’s not possible to determine which
Hir
expression caused the failure.
§Example
This example shows how one can hand-construct multiple Hir
expressions and build a single regex from them without doing any
parsing at all.
use {
regex_automata::{meta::Regex, Match},
regex_syntax::hir::{Hir, Look},
};
// (?Rm)^foo$
let hir1 = Hir::concat(vec![
Hir::look(Look::StartCRLF),
Hir::literal("foo".as_bytes()),
Hir::look(Look::EndCRLF),
]);
// (?Rm)^bar$
let hir2 = Hir::concat(vec![
Hir::look(Look::StartCRLF),
Hir::literal("bar".as_bytes()),
Hir::look(Look::EndCRLF),
]);
let re = Regex::builder()
.build_many_from_hir(&[&hir1, &hir2])?;
let hay = "\r\nfoo\r\nbar";
let got: Vec<Match> = re.find_iter(hay).collect();
let expected = vec![
Match::must(0, 2..5),
Match::must(1, 7..10),
];
assert_eq!(expected, got);
Ok::<(), Box<dyn std::error::Error>>(())
sourcepub fn configure(&mut self, config: Config) -> &mut Builder
pub fn configure(&mut self, config: Config) -> &mut Builder
Configure the behavior of a Regex
.
This configuration controls non-syntax options related to the behavior
of a Regex
. This includes things like whether empty matches can split
a codepoint, prefilters, line terminators and a long list of options
for configuring which regex engines the meta regex engine will be able
to use internally.
§Example
This example shows how to disable UTF-8 empty mode. This will permit empty matches to occur between the UTF-8 encoding of a codepoint.
use regex_automata::{meta::Regex, Match};
let re = Regex::new("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
// Matches only occur at the beginning and end of the snowman.
assert_eq!(got, vec![
Match::must(0, 0..0),
Match::must(0, 3..3),
]);
let re = Regex::builder()
.configure(Regex::config().utf8_empty(false))
.build("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
// Matches now occur at every position!
assert_eq!(got, vec![
Match::must(0, 0..0),
Match::must(0, 1..1),
Match::must(0, 2..2),
Match::must(0, 3..3),
]);
Ok::<(), Box<dyn std::error::Error>>(())
sourcepub fn syntax(&mut self, config: Config) -> &mut Builder
pub fn syntax(&mut self, config: Config) -> &mut Builder
Configure the syntax options when parsing a pattern string while
building a Regex
.
These options only apply when Builder::build
or Builder::build_many
are used. The other build methods accept Hir
values, which have
already been parsed.
§Example
This example shows how to enable case insensitive mode.
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.syntax(syntax::Config::new().case_insensitive(true))
.build(r"δ")?;
assert_eq!(Some(Match::must(0, 0..2)), re.find(r"Δ"));
Ok::<(), Box<dyn std::error::Error>>(())