<!DOCTYPE html><htmllang="en"><head><metacharset="utf-8"><metaname="viewport"content="width=device-width, initial-scale=1.0"><metaname="generator"content="rustdoc"><metaname="description"content="This crate exposes a variety of regex engines used by the `regex` crate. It provides a vast, sprawling and “expert” level API to each regex engine. The regex engines provided by this crate focus heavily on finite automata implementations and specifically guarantee worst case `O(m * n)` time complexity for all searches. (Where `m ~ len(regex)` and `n ~ len(haystack)`.)"><title>regex_automata - Rust</title><linkrel="preload"as="font"type="font/woff2"crossoriginhref="../static.files/SourceSerif4-Regular-46f98efaafac5295.ttf.woff2"><linkrel="preload"as="font"type="font/woff2"crossoriginhref="../static.files/FiraSans-Regular-018c141bf0843ffd.woff2"><linkrel="preload"as="font"type="font/woff2"crossoriginhref="../static.files/FiraSans-Medium-8f9a781e4970d388.woff2"><linkrel="preload"as="font"type="font/woff2"crossoriginhref="../static.files/SourceCodePro-Regular-562dcc5011b6de7d.ttf.woff2"><linkrel="preload"as="font"type="font/woff2"crossoriginhref="../static.files/SourceCodePro-Semibold-d899c5a5c4aeb14a.ttf.woff2"><linkrel="stylesheet"href="../static.files/normalize-76eba96aa4d2e634.css"><linkrel="stylesheet"href="../static.files/rustdoc-ac92e1bbe349e143.css"><metaname="rustdoc-vars"data-root-path="../"data-static-root-path="../static.files/"data-current-crate="regex_automata"data-themes=""data-resource-suffix=""data-rustdoc-version="1.76.0 (07dca489a 2024-02-04)"data-channel="1.76.0"data-search-js="search-2b6ce74ff89ae146.js"data-settings-js="settings-4313503d2e1961c2.js"><scriptsrc="../static.files/storage-f2adc0d6ca4d09fb.js"></script><scriptdefersrc="../crates.js"></script><scriptdefersrc="../static.files/main-305769736d49e732.js"></script><noscript><linkrel="stylesheet"href="../static.files/noscript-feafe1bb7466e4bd.css"></noscript><linkrel="alternate icon"type="image/png"href="../static.files/favicon-16x16-8b506e7a72182f1c.png"><linkrel="alternate icon"type="image/png"href="../static.files/favicon-32x32-422f7d1d52889060.png"><linkrel="icon"type="image/svg+xml"href="../static.files/favicon-2c020d218678b618.svg"></head><bodyclass="rustdoc mod crate"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><navclass="mobile-topbar"><buttonclass="sidebar-menu-toggle">☰</button></nav><navclass="sidebar"><divclass="sidebar-crate"><h2><ahref="../regex_automata/index.html">regex_automata</a><spanclass="version">0.4.6</span></h2></div><divclass="sidebar-elems"><ulclass="block">
<main><divclass="width-limiter"><navclass="sub"><formclass="search-form"><span></span><divid="sidebar-button"tabindex="-1"><ahref="../regex_automata/all.html"title="show sidebar"></a></div><inputclass="search-input"name="search"aria-label="Run search in the documentation"autocomplete="off"spellcheck="false"placeholder="Click or press ‘S’ to search, ‘?’ for more options…"type="search"><divid="help-button"tabindex="-1"><ahref="../help.html"title="help">?</a></div><divid="settings-menu"tabindex="-1"><ahref="../settings.html"title="settings"><imgwidth="22"height="22"alt="Change settings"src="../static.files/wheel-7b819b6101059cd0.svg"></a></div></form></nav><sectionid="main-content"class="content"><divclass="main-heading"><h1>Crate <aclass="mod"href="#">regex_automata</a><buttonid="copy-path"title="Copy item path to clipboard"><imgsrc="../static.files/clipboard-7571035ce49a181d.svg"width="19"height="18"alt="Copy item path"></button></h1><spanclass="out-of-band"><aclass="src"href="../src/regex_automata/lib.rs.html#1-648">source</a> · <buttonid="toggle-all-docs"title="collapse all docs">[<span>−</span>]</button></span></div><detailsclass="toggle top-doc"open><summaryclass="hideme"><span>Expand description</span></summary><divclass="docblock"><p>This crate exposes a variety of regex engines used by the <code>regex</code> crate.
It provides a vast, sprawling and “expert” level API to each regex engine.
The regex engines provided by this crate focus heavily on finite automata
implementations and specifically guarantee worst case <code>O(m * n)</code> time
complexity for all searches. (Where <code>m ~ len(regex)</code> and <code>n ~ len(haystack)</code>.)</p>
<p>The primary goal of this crate is to serve as an implementation detail for the
<code>regex</code> crate. A secondary goal is to make its internals available for use by
others.</p>
<h2id="table-of-contents"><ahref="#table-of-contents">Table of contents</a></h2>
<ul>
<li><ahref="#should-i-be-using-this-crate">Should I be using this crate?</a> gives some
reasons for and against using this crate.</li>
<li><ahref="#examples">Examples</a> provides a small selection of things you can do with
this crate.</li>
<li><ahref="#available-regex-engines">Available regex engines</a> provides a hyperlinked
list of all regex engines in this crate.</li>
<li><ahref="#api-themes">API themes</a> discusses common elements used throughout this
crate.</li>
<li><ahref="#crate-features">Crate features</a> documents the extensive list of Cargo
features available.</li>
</ul>
<h2id="should-i-be-using-this-crate"><ahref="#should-i-be-using-this-crate">Should I be using this crate?</a></h2>
<p>If you find yourself here because you just want to use regexes, then you should
first check out whether the <ahref="https://docs.rs/regex"><code>regex</code> crate</a> meets
your needs. It provides a streamlined and difficult-to-misuse API for regex
searching.</p>
<p>If you’re here because there is something specific you want to do that can’t
be easily done with <code>regex</code> crate, then you are perhaps in the right place.
It’s most likely that the first stop you’ll want to make is to explore the
<ahref="meta/index.html"title="mod regex_automata::meta"><code>meta</code> regex APIs</a>. Namely, the <code>regex</code> crate is just a light wrapper
over a <ahref="meta/struct.Regex.html"title="struct regex_automata::meta::Regex"><code>meta::Regex</code></a>, so its API will probably be the easiest to transition
to. In contrast to the <code>regex</code> crate, the <code>meta::Regex</code> API supports more
search parameters and does multi-pattern searches. However, it isn’t quite as
ergonomic.</p>
<p>Otherwise, the following is an inexhaustive list of reasons to use this crate:</p>
<ul>
<li>You want to analyze or use a <ahref="nfa/thompson/struct.NFA.html"title="struct regex_automata::nfa::thompson::NFA">Thompson <code>NFA</code></a> directly.</li>
<li>You want more powerful multi-pattern search than what is provided by
<code>RegexSet</code> in the <code>regex</code> crate. All regex engines in this crate support
multi-pattern searches.</li>
<li>You want to use one of the <code>regex</code> crate’s internal engines directly because
of some interesting configuration that isn’t possible via the <code>regex</code> crate.
DFA</a> and <ahref="dfa">fully compiled DFAs</a> support searching by exploring
the automaton one state at a time. This might be useful, for example, for
stream searches or searches of strings stored in non-contiguous in memory.</li>
<li>You want to build a fully compiled DFA and then <ahref="dfa::dense::DFA::from_bytes">use zero-copy
deserialization</a> to load it into memory and use
it for searching. This use case is supported in core-only no-std/no-alloc
environments.</li>
<li>You want to run <ahref="struct.Input.html#method.anchored"title="method regex_automata::Input::anchored">anchored searches</a> without using the <code>^</code>
anchor in your regex pattern.</li>
<li>You need to work-around contention issues with
sharing a regex across multiple threads. The
<ahref="meta/struct.Regex.html#method.search_with"title="method regex_automata::meta::Regex::search_with"><code>meta::Regex::search_with</code></a> API permits bypassing
any kind of synchronization at all by requiring the caller to provide the
mutable scratch spaced needed during a search.</li>
<li>You want to build your own regex engine on top of the <code>regex</code> crate’s
<p>This section tries to identify a few interesting things you can do with this
crate and demonstrates them.</p>
<h4id="multi-pattern-searches-with-capture-groups"><ahref="#multi-pattern-searches-with-capture-groups">Multi-pattern searches with capture groups</a></h4>
<p>One of the more frustrating limitations of <code>RegexSet</code> in the <code>regex</code> crate
(at the time of writing) is that it doesn’t report match positions. With this
crate, multi-pattern support was intentionally designed in from the beginning,
which means it works in all regex engines and even for capture groups as well.</p>
<p>This example shows how to search for matches of multiple regexes, where each
regex uses the same capture group names to parse different key-value formats.</p>
<li><ahref="hybrid/regex/struct.Regex.html"title="struct regex_automata::hybrid::regex::Regex"><code>hybrid::regex::Regex</code></a> is a regex engine that works on top of a lazily
built DFA. Its performance profile is very similar to that of fully compiled
DFAs, but can be slower in some pathological cases. Fully compiled DFAs are
also amenable to more optimizations, such as state acceleration, that aren’t
available in a lazy DFA. You might use this lazy DFA if you can’t abide the
worst case exponential compile time of a full DFA, but still want the DFA
search performance in the vast majority of cases. A lazy DFA based regex can
only report the start and end of each match.</li>
<li>[<code>dfa::onepass::DFA</code>] is a regex engine that is implemented as a DFA, but
can report the matches of each capture group in addition to the start and end
of each match. The catch is that it only works on a somewhat small subset of
regexes known as “one-pass.” You’ll want to use this for cases when you need
capture group matches and the regex is one-pass since it is likely to be faster
than any alternative. A one-pass DFA can handle all types of regexes, but does
have some reasonable limits on the number of capture groups it can handle.</li>
<li>[<code>nfa::thompson::backtrack::BoundedBacktracker</code>] is a regex engine that uses
backtracking, but keeps track of the work it has done to avoid catastrophic
backtracking. Like the one-pass DFA, it provides the matches of each capture
group. It retains the <code>O(m * n)</code> worst case time bound. This tends to be slower
than the one-pass DFA regex engine, but faster than the PikeVM. It can handle
all types of regexes, but usually only works well with small haystacks and
small regexes due to the memory required to avoid redoing work.</li>
<li><ahref="nfa/thompson/pikevm/struct.PikeVM.html"title="struct regex_automata::nfa::thompson::pikevm::PikeVM"><code>nfa::thompson::pikevm::PikeVM</code></a> is a regex engine that can handle all
regexes, of all sizes and provides capture group matches. It tends to be a tool
of last resort because it is also usually the slowest regex engine.</li>
<li><ahref="meta/struct.Regex.html"title="struct regex_automata::meta::Regex"><code>meta::Regex</code></a> is the meta regex engine that combines <em>all</em> of the above
engines into one. The reason for this is that each of the engines above have
their own caveats such as, “only handles a subset of regexes” or “is generally
slow.” The meta regex engine accounts for all of these caveats and composes
the engines in a way that attempts to mitigate each engine’s weaknesses while
emphasizing its strengths. For example, it will attempt to run a lazy DFA even
if it might fail. In which case, it will restart the search with a likely
slower but more capable regex engine. The meta regex engine is what you should
default to. Use one of the above engines directly only if you have a specific
<p>Most search routines in this crate accept anything that implements
<code>Into<Input></code>. Both <code>&str</code> and <code>&[u8]</code> haystacks satisfy this constraint, which
means that things like <code>engine.search("foo")</code> will work as you would expect.</p>
<p>By virtue of accepting an <code>Into<Input></code> though, callers can provide more than
just a haystack. Indeed, the <ahref="struct.Input.html"title="struct regex_automata::Input"><code>Input</code></a> type has more details, but briefly,
callers can use it to configure various aspects of the search:</p>
<ul>
<li>The span of the haystack to search via <ahref="struct.Input.html#method.span"title="method regex_automata::Input::span"><code>Input::span</code></a> or <ahref="struct.Input.html#method.range"title="method regex_automata::Input::range"><code>Input::range</code></a>,
which might be a substring of the haystack.</li>
<li>Whether to run an anchored search or not via <ahref="struct.Input.html#method.anchored"title="method regex_automata::Input::anchored"><code>Input::anchored</code></a>. This
permits one to require matches to start at the same offset that the search
started.</li>
<li>Whether to ask the regex engine to stop as soon as a match is seen via
<ahref="struct.Input.html#method.earliest"title="method regex_automata::Input::earliest"><code>Input::earliest</code></a>. This can be used to find the offset of a match as soon
as it is known without waiting for the full leftmost-first match to be found.
This can also be used to avoid the worst case <code>O(m * n^2)</code> time complexity
of iteration.</li>
</ul>
<p>Some lower level search routines accept an <code>&Input</code> for performance reasons.
In which case, <code>&Input::new("haystack")</code> can be used for a simple search.</p>
<p>Most, but not all, regex engines in this crate can fail to execute a search.
When a search fails, callers cannot determine whether or not a match exists.
That is, the result is indeterminate.</p>
<p>Search failure, in all cases in this crate, is represented by a <ahref="struct.MatchError.html"title="struct regex_automata::MatchError"><code>MatchError</code></a>.
Routines that can fail start with the <code>try_</code> prefix in their name. For example,
<ahref="hybrid/regex/struct.Regex.html#method.try_search"title="method regex_automata::hybrid::regex::Regex::try_search"><code>hybrid::regex::Regex::try_search</code></a> can fail for a number of reasons.
<code>try_</code> prefix. For example, <ahref="hybrid/regex/struct.Regex.html#method.find"title="method regex_automata::hybrid::regex::Regex::find"><code>hybrid::regex::Regex::find</code></a> will panic in
cases where <ahref="hybrid/regex/struct.Regex.html#method.try_search"title="method regex_automata::hybrid::regex::Regex::try_search"><code>hybrid::regex::Regex::try_search</code></a> would return an error, and
<ahref="meta/struct.Regex.html#method.find"title="method regex_automata::meta::Regex::find"><code>meta::Regex::find</code></a> will never panic. Therefore, callers need to pay close
attention to the panicking conditions in the documentation.</p>
<p>In most cases, the reasons that a search fails are either predictable or
<ahref="hybrid/dfa/struct.Config.html#method.unicode_word_boundary"title="method regex_automata::hybrid::dfa::Config::unicode_word_boundary">configuring heuristic support for Unicode word boundaries</a>.
crate. Let’s look more closely at an example: <ahref="hybrid/regex/struct.Builder.html"title="struct regex_automata::hybrid::regex::Builder"><code>hybrid::regex::Builder</code></a>. It
<li><ahref="hybrid/regex/struct.Builder.html#method.syntax"title="method regex_automata::hybrid::regex::Builder::syntax"><code>hybrid::regex::Builder::syntax</code></a> accepts a
<ahref="util/syntax/struct.Config.html"title="struct regex_automata::util::syntax::Config"><code>util::syntax::Config</code></a> for configuring the options found in the
<ahref="../regex_syntax/index.html"title="mod regex_syntax"><code>regex-syntax</code></a> crate. For example, whether to match
<li><ahref="hybrid/regex/struct.Builder.html#method.thompson"title="method regex_automata::hybrid::regex::Builder::thompson"><code>hybrid::regex::Builder::thompson</code></a> accepts a <ahref="nfa/thompson/struct.Config.html"title="struct regex_automata::nfa::thompson::Config"><code>nfa::thompson::Config</code></a> for
<li><ahref="hybrid/regex/struct.Builder.html#method.dfa"title="method regex_automata::hybrid::regex::Builder::dfa"><code>hybrid::regex::Builder::dfa</code></a> accept a <ahref="hybrid/dfa/struct.Config.html"title="struct regex_automata::hybrid::dfa::Config"><code>hybrid::dfa::Config</code></a> for
methods like <ahref="hybrid/regex/struct.Builder.html#method.build"title="method regex_automata::hybrid::regex::Builder::build"><code>hybrid::regex::Builder::build</code></a>, which accepts a pattern
DFAs, in turn, have their own builder that permits <ahref="hybrid/dfa/struct.Builder.html#method.build_from_nfa"title="method regex_automata::hybrid::dfa::Builder::build_from_nfa">construction directly from
rabbit hole, a Thompson NFA has its own compiler that permits <ahref="nfa/thompson/struct.Compiler.html#method.build_from_hir"title="method regex_automata::nfa::thompson::Compiler::build_from_hir">construction
directly from an HIR</a>. The lazy DFA
regex engine builder lets you follow this rabbit hole all the way down, but
also provides convenience routines that do it for you when you don’t need
precise control over every component.</p>
<p>The <ahref="meta/index.html"title="mod regex_automata::meta">meta regex engine</a> is a good example of something that utilizes the
full flexibility of these builders. It often needs not only precise control
over each component, but also shares them across multiple regex engines.
(Most sharing is done by internal reference accounting. For example, an
<ahref="nfa/thompson/struct.NFA.html"title="struct regex_automata::nfa::thompson::NFA"><code>NFA</code></a> is reference counted internally which makes cloning
<li><strong>syntax</strong> - Enables a dependency on <code>regex-syntax</code>. This makes APIs
for building regex engines from pattern strings available. Without the
<code>regex-syntax</code> dependency, the only way to build a regex engine is generally
to deserialize a previously built DFA or to hand assemble an NFA using its
<ahref="nfa/thompson/struct.Builder.html"title="struct regex_automata::nfa::thompson::Builder">builder API</a>. Once you have an NFA, you can build any
of the regex engines in this crate. The <code>syntax</code> feature also enables <code>alloc</code>.</li>
<li><strong>meta</strong> - Enables the meta regex engine. This also enables the <code>syntax</code> and
<code>nfa-pikevm</code> features, as both are the minimal requirements needed. The meta
regex engine benefits from enabling any of the other regex engines and will
use them automatically when appropriate.</li>
<li><strong>nfa</strong> - Enables all NFA related features below.
<ul>
<li><strong>nfa-thompson</strong> - Enables the Thompson NFA APIs. This enables <code>alloc</code>.</li>
<li><strong>nfa-pikevm</strong> - Enables the PikeVM regex engine. This enables
<code>nfa-thompson</code>.</li>
<li><strong>nfa-backtrack</strong> - Enables the bounded backtracker regex engine. This
enables <code>nfa-thompson</code>.</li>
</ul>
</li>
<li><strong>dfa</strong> - Enables all DFA related features below.
<ul>
<li><strong>dfa-build</strong> - Enables APIs for determinizing DFAs from NFAs. This
enables <code>nfa-thompson</code> and <code>dfa-search</code>.</li>
<li><strong>dfa-search</strong> - Enables APIs for searching with DFAs.</li>
<li><strong>dfa-onepass</strong> - Enables the one-pass DFA API. This enables
<code>nfa-thompson</code>.</li>
</ul>
</li>
<li><strong>hybrid</strong> - Enables the hybrid NFA/DFA or “lazy DFA” regex engine. This
enables <code>alloc</code> and <code>nfa-thompson</code>.</li>
</div></details><h2id="modules"class="section-header"><ahref="#modules">Modules</a></h2><ulclass="item-table"><li><divclass="item-name"><aclass="mod"href="hybrid/index.html"title="mod regex_automata::hybrid">hybrid</a></div><divclass="desc docblock-short">A module for building and searching with lazy deterministic finite automata
(DFAs).</div></li><li><divclass="item-name"><aclass="mod"href="meta/index.html"title="mod regex_automata::meta">meta</a></div><divclass="desc docblock-short">Provides a regex matcher that composes several other regex matchers
automatically.</div></li><li><divclass="item-name"><aclass="mod"href="nfa/index.html"title="mod regex_automata::nfa">nfa</a></div><divclass="desc docblock-short">Provides non-deterministic finite automata (NFA) and regex engines that use
them.</div></li><li><divclass="item-name"><aclass="mod"href="util/index.html"title="mod regex_automata::util">util</a></div><divclass="desc docblock-short">A collection of modules that provide APIs that are useful across many regex
engines.</div></li></ul><h2id="structs"class="section-header"><ahref="#structs">Structs</a></h2><ulclass="item-table"><li><divclass="item-name"><aclass="struct"href="struct.HalfMatch.html"title="struct regex_automata::HalfMatch">HalfMatch</a></div><divclass="desc docblock-short">A representation of “half” of a match reported by a DFA.</div></li><li><divclass="item-name"><aclass="struct"href="struct.Input.html"title="struct regex_automata::Input">Input</a></div><divclass="desc docblock-short">The parameters for a regex search including the haystack to search.</div></li><li><divclass="item-name"><aclass="struct"href="struct.Match.html"title="struct regex_automata::Match">Match</a></div><divclass="desc docblock-short">A representation of a match reported by a regex engine.</div></li><li><divclass="item-name"><aclass="struct"href="struct.MatchError.html"title="struct regex_automata::MatchError">MatchError</a></div><divclass="desc docblock-short">An error indicating that a search stopped before reporting whether a
match exists or not.</div></li><li><divclass="item-name"><aclass="struct"href="struct.PatternID.html"title="struct regex_automata::PatternID">PatternID</a></div><divclass="desc docblock-short">The identifier of a regex pattern, represented by a <ahref="util/primitives/struct.SmallIndex.html"title="struct regex_automata::util::primitives::SmallIndex"><code>SmallIndex</code></a>.</div></li><li><divclass="item-name"><aclass="struct"href="struct.PatternSet.html"title="struct regex_automata::PatternSet">PatternSet</a></div><divclass="desc docblock-short">A set of <code>PatternID</code>s.</div></li><li><divclass="item-name"><aclass="struct"href="struct.PatternSetInsertError.html"title="struct regex_automata::PatternSetInsertError">PatternSetInsertError</a></div><divclass="desc docblock-short">An error that occurs when a <code>PatternID</code> failed to insert into a
<code>PatternSet</code>.</div></li><li><divclass="item-name"><aclass="struct"href="struct.PatternSetIter.html"title="struct regex_automata::PatternSetIter">PatternSetIter</a></div><divclass="desc docblock-short">An iterator over all pattern identifiers in a <ahref="struct.PatternSet.html"title="struct regex_automata::PatternSet"><code>PatternSet</code></a>.</div></li><li><divclass="item-name"><aclass="struct"href="struct.Span.html"title="struct regex_automata::Span">Span</a></div><divclass="desc docblock-short">A representation of a span reported by a regex engine.</div></li></ul><h2id="enums"class="section-header"><ahref="#enums">Enums</a></h2><ulclass="item-table"><li><divclass="item-name"><aclass="enum"href="enum.Anchored.html"title="enum regex_automata::Anchored">Anchored</a></div><divclass="desc docblock-short">The type of anchored search to perform.</div></li><li><divclass="item-name"><aclass="enum"href="enum.MatchErrorKind.html"title="enum regex_automata::MatchErrorKind">MatchErrorKind</a></div><divclass="desc docblock-short">The underlying kind of a <ahref="struct.MatchError.html"title="struct regex_automata::MatchError"><code>MatchError</code></a>.</div></li><li><divclass="item-name"><aclass="enum"href="enum.MatchKind.html"title="enum regex_automata::MatchKind">MatchKind</a></div><divclass="desc docblock-short">The kind of match semantics to use for a regex pattern.</div></li></ul></section></div></main></body></html>