mirror of
https://github.com/edg-l/edlang.git
synced 2024-11-09 17:48:24 +00:00
48 lines
7 KiB
HTML
48 lines
7 KiB
HTML
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="Provides non-deterministic finite automata (NFA) and regex engines that use them."><title>regex_automata::nfa - Rust</title><script>if(window.location.protocol!=="file:")document.head.insertAdjacentHTML("beforeend","SourceSerif4-Regular-46f98efaafac5295.ttf.woff2,FiraSans-Regular-018c141bf0843ffd.woff2,FiraSans-Medium-8f9a781e4970d388.woff2,SourceCodePro-Regular-562dcc5011b6de7d.ttf.woff2,SourceCodePro-Semibold-d899c5a5c4aeb14a.ttf.woff2".split(",").map(f=>`<link rel="preload" as="font" type="font/woff2" crossorigin href="../../static.files/${f}">`).join(""))</script><link rel="stylesheet" href="../../static.files/normalize-76eba96aa4d2e634.css"><link rel="stylesheet" href="../../static.files/rustdoc-dd39b87e5fcfba68.css"><meta name="rustdoc-vars" data-root-path="../../" data-static-root-path="../../static.files/" data-current-crate="regex_automata" data-themes="" data-resource-suffix="" data-rustdoc-version="1.80.0 (051478957 2024-07-21)" data-channel="1.80.0" data-search-js="search-d52510db62a78183.js" data-settings-js="settings-4313503d2e1961c2.js" ><script src="../../static.files/storage-118b08c4c78b968e.js"></script><script defer src="../sidebar-items.js"></script><script defer src="../../static.files/main-20a3ad099b048cf2.js"></script><noscript><link rel="stylesheet" href="../../static.files/noscript-df360f571f6edeae.css"></noscript><link rel="alternate icon" type="image/png" href="../../static.files/favicon-32x32-422f7d1d52889060.png"><link rel="icon" type="image/svg+xml" href="../../static.files/favicon-2c020d218678b618.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><nav class="mobile-topbar"><button class="sidebar-menu-toggle" title="show sidebar"></button></nav><nav class="sidebar"><div class="sidebar-crate"><h2><a href="../../regex_automata/index.html">regex_automata</a><span class="version">0.4.7</span></h2></div><h2 class="location"><a href="#">Module nfa</a></h2><div class="sidebar-elems"><section><ul class="block"><li><a href="#modules">Modules</a></li></ul></section><h2><a href="../index.html">In crate regex_automata</a></h2></div></nav><div class="sidebar-resizer"></div><main><div class="width-limiter"><rustdoc-search></rustdoc-search><section id="main-content" class="content"><div class="main-heading"><h1>Module <a href="../index.html">regex_automata</a>::<wbr><a class="mod" href="#">nfa</a><button id="copy-path" title="Copy item path to clipboard">Copy item path</button></h1><span class="out-of-band"><a class="src" href="../../src/regex_automata/nfa/mod.rs.html#1-55">source</a> · <button id="toggle-all-docs" title="collapse all docs">[<span>−</span>]</button></span></div><details class="toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>Provides non-deterministic finite automata (NFA) and regex engines that use
|
||
them.</p>
|
||
<p>While NFAs and DFAs (deterministic finite automata) have equivalent <em>theoretical</em>
|
||
power, their usage in practice tends to result in different engineering trade
|
||
offs. While this isn’t meant to be a comprehensive treatment of the topic, here
|
||
are a few key trade offs that are, at minimum, true for this crate:</p>
|
||
<ul>
|
||
<li>NFAs tend to be represented sparsely where as DFAs are represented densely.
|
||
Sparse representations use less memory, but are slower to traverse. Conversely,
|
||
dense representations use more memory, but are faster to traverse. (Sometimes
|
||
these lines are blurred. For example, an <code>NFA</code> might choose to represent a
|
||
particular state in a dense fashion, and a DFA can be built using a sparse
|
||
representation via <a href="crate::dfa::sparse::DFA"><code>sparse::DFA</code></a>.</li>
|
||
<li>NFAs have espilon transitions and DFAs don’t. In practice, this means that
|
||
handling a single byte in a haystack with an NFA at search time may require
|
||
visiting multiple NFA states. In a DFA, each byte only requires visiting
|
||
a single state. Stated differently, NFAs require a variable number of CPU
|
||
instructions to process one byte in a haystack where as a DFA uses a constant
|
||
number of CPU instructions to process one byte.</li>
|
||
<li>NFAs are generally easier to amend with secondary storage. For example, the
|
||
<a href="thompson/pikevm/struct.PikeVM.html" title="struct regex_automata::nfa::thompson::pikevm::PikeVM"><code>thompson::pikevm::PikeVM</code></a> uses an NFA to match, but also uses additional
|
||
memory beyond the model of a finite state machine to track offsets for matching
|
||
capturing groups. Conversely, the most a DFA can do is report the offset (and
|
||
pattern ID) at which a match occurred. This is generally why we also compile
|
||
DFAs in reverse, so that we can run them after finding the end of a match to
|
||
also find the start of a match.</li>
|
||
<li>NFAs take worst case linear time to build, but DFAs take worst case
|
||
exponential time to build. The <a href="../hybrid/index.html" title="mod regex_automata::hybrid">hybrid NFA/DFA</a> mitigates this
|
||
challenge for DFAs in many practical cases.</li>
|
||
</ul>
|
||
<p>There are likely other differences, but the bottom line is that NFAs tend to be
|
||
more memory efficient and give easier opportunities for increasing expressive
|
||
power, where as DFAs are faster to search with.</p>
|
||
<h2 id="why-only-a-thompson-nfa"><a class="doc-anchor" href="#why-only-a-thompson-nfa">§</a>Why only a Thompson NFA?</h2>
|
||
<p>Currently, the only kind of NFA we support in this crate is a <a href="https://en.wikipedia.org/wiki/Thompson%27s_construction">Thompson
|
||
NFA</a>. This refers
|
||
to a specific construction algorithm that takes the syntax of a regex
|
||
pattern and converts it to an NFA. Specifically, it makes gratuitous use of
|
||
epsilon transitions in order to keep its structure simple. In exchange, its
|
||
construction time is linear in the size of the regex. A Thompson NFA also makes
|
||
the guarantee that given any state and a character in a haystack, there is at
|
||
most one transition defined for it. (Although there may be many epsilon
|
||
transitions.)</p>
|
||
<p>It possible that other types of NFAs will be added in the future, such as a
|
||
<a href="https://en.wikipedia.org/wiki/Glushkov%27s_construction_algorithm">Glushkov NFA</a>.
|
||
But currently, this crate only provides a Thompson NFA.</p>
|
||
</div></details><h2 id="modules" class="section-header">Modules<a href="#modules" class="anchor">§</a></h2><ul class="item-table"><li><div class="item-name"><a class="mod" href="thompson/index.html" title="mod regex_automata::nfa::thompson">thompson</a></div><div class="desc docblock-short">Defines a Thompson NFA and provides the <a href="thompson/pikevm/struct.PikeVM.html" title="struct regex_automata::nfa::thompson::pikevm::PikeVM"><code>PikeVM</code></a> and
|
||
<a href="backtrack::BoundedBacktracker"><code>BoundedBacktracker</code></a> regex engines.</div></li></ul></section></div></main></body></html> |