<!DOCTYPE html><htmllang="en"><head><metacharset="utf-8"><metaname="viewport"content="width=device-width, initial-scale=1.0"><metaname="generator"content="rustdoc"><metaname="description"content="This module provides APIs for dealing with the alphabets of finite state machines."><title>regex_automata::util::alphabet - Rust</title><linkrel="preload"as="font"type="font/woff2"crossoriginhref="../../../static.files/SourceSerif4-Regular-46f98efaafac5295.ttf.woff2"><linkrel="preload"as="font"type="font/woff2"crossoriginhref="../../../static.files/FiraSans-Regular-018c141bf0843ffd.woff2"><linkrel="preload"as="font"type="font/woff2"crossoriginhref="../../../static.files/FiraSans-Medium-8f9a781e4970d388.woff2"><linkrel="preload"as="font"type="font/woff2"crossoriginhref="../../../static.files/SourceCodePro-Regular-562dcc5011b6de7d.ttf.woff2"><linkrel="preload"as="font"type="font/woff2"crossoriginhref="../../../static.files/SourceCodePro-Semibold-d899c5a5c4aeb14a.ttf.woff2"><linkrel="stylesheet"href="../../../static.files/normalize-76eba96aa4d2e634.css"><linkrel="stylesheet"href="../../../static.files/rustdoc-5bc39a1768837dd0.css"><metaname="rustdoc-vars"data-root-path="../../../"data-static-root-path="../../../static.files/"data-current-crate="regex_automata"data-themes=""data-resource-suffix=""data-rustdoc-version="1.77.2 (25ef9e3d8 2024-04-09)"data-channel="1.77.2"data-search-js="search-dd67cee4cfa65049.js"data-settings-js="settings-4313503d2e1961c2.js"><scriptsrc="../../../static.files/storage-4c98445ec4002617.js"></script><scriptdefersrc="../sidebar-items.js"></script><scriptdefersrc="../../../static.files/main-48f368f3872407c8.js"></script><noscript><linkrel="stylesheet"href="../../../static.files/noscript-04d5337699b92874.css"></noscript><linkrel="alternate icon"type="image/png"href="../../../static.files/favicon-16x16-8b506e7a72182f1c.png"><linkrel="alternate icon"type="image/png"href="../../../static.files/favicon-32x32-422f7d1d52889060.png"><linkrel="icon"type="image/svg+xml"href="../../../static.files/favicon-2c020d218678b618.svg"></head><bodyclass="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><navclass="mobile-topbar"><buttonclass="sidebar-menu-toggle"title="show sidebar"></button></nav><navclass="sidebar"><divclass="sidebar-crate"><h2><ahref="../../../regex_automata/index.html">regex_automata</a><spanclass="version">0.4.6</span></h2></div><h2class="location"><ahref="#">Module alphabet</a></h2><divclass="sidebar-elems"><section><ulclass="block"><li><ahref="#structs">Structs</a></li></ul></section><h2><ahref="../index.html">In regex_automata::util</a></h2></div></nav><divclass="sidebar-resizer"></div>
<main><divclass="width-limiter"><navclass="sub"><formclass="search-form"><span></span><divid="sidebar-button"tabindex="-1"><ahref="../../../regex_automata/all.html"title="show sidebar"></a></div><inputclass="search-input"name="search"aria-label="Run search in the documentation"autocomplete="off"spellcheck="false"placeholder="Click or press ‘S’ to search, ‘?’ for more options…"type="search"><divid="help-button"tabindex="-1"><ahref="../../../help.html"title="help">?</a></div><divid="settings-menu"tabindex="-1"><ahref="../../../settings.html"title="settings"><imgwidth="22"height="22"alt="Change settings"src="../../../static.files/wheel-7b819b6101059cd0.svg"></a></div></form></nav><sectionid="main-content"class="content"><divclass="main-heading"><h1>Module <ahref="../../index.html">regex_automata</a>::<wbr><ahref="../index.html">util</a>::<wbr><aclass="mod"href="#">alphabet</a><buttonid="copy-path"title="Copy item path to clipboard"><imgsrc="../../../static.files/clipboard-7571035ce49a181d.svg"width="19"height="18"alt="Copy item path"></button></h1><spanclass="out-of-band"><aclass="src"href="../../../src/regex_automata/util/alphabet.rs.html#1-1139">source</a> · <buttonid="toggle-all-docs"title="collapse all docs">[<span>−</span>]</button></span></div><detailsclass="toggle top-doc"open><summaryclass="hideme"><span>Expand description</span></summary><divclass="docblock"><p>This module provides APIs for dealing with the alphabets of finite state
machines.</p>
<p>There are two principal types in this module, <ahref="struct.ByteClasses.html"title="struct regex_automata::util::alphabet::ByteClasses"><code>ByteClasses</code></a> and <ahref="struct.Unit.html"title="struct regex_automata::util::alphabet::Unit"><code>Unit</code></a>.
The former defines the alphabet of a finite state machine while the latter
represents an element of that alphabet.</p>
<p>To a first approximation, the alphabet of all automata in this crate is just
a <code>u8</code>. Namely, every distinct byte value. All 256 of them. In practice, this
can be quite wasteful when building a transition table for a DFA, since it
requires storing a state identifier for each element in the alphabet. Instead,
we collapse the alphabet of an automaton down into equivalence classes, where
every byte in the same equivalence class never discriminates between a match or
a non-match from any other byte in the same class. For example, in the regex
<code>[a-z]+</code>, then you could consider it having an alphabet consisting of two
equivalence classes: <code>a-z</code> and everything else. In terms of the transitions on
an automaton, it doesn’t actually require representing every distinct byte.
Just the equivalence classes.</p>
<p>The downside of equivalence classes is that, of course, searching a haystack
deals with individual byte values. Those byte values need to be mapped to
their corresponding equivalence class. This is what <code>ByteClasses</code> does. In
practice, doing this for every state transition has negligible impact on modern
CPUs. Moreover, it helps make more efficient use of the CPU cache by (possibly
considerably) shrinking the size of the transition table.</p>
<p>One last hiccup concerns <code>Unit</code>. Namely, because of look-around and how the
DFAs in this crate work, we need to add a sentinel value to our alphabet
of equivalence classes that represents the “end” of a search. We call that
sentinel <ahref="struct.Unit.html#method.eoi"title="associated function regex_automata::util::alphabet::Unit::eoi"><code>Unit::eoi</code></a> or “end of input.” Thus, a <code>Unit</code> is either an
equivalence class corresponding to a set of bytes, or it is a special “end of
input” sentinel.</p>
<p>In general, you should not expect to need either of these types unless you’re
doing lower level shenanigans with DFAs, or even building your own DFAs.
(Although, you don’t have to use these types to build your own DFAs of course.)
For example, if you’re walking a DFA’s state graph, it’s probably useful to
make use of <ahref="struct.ByteClasses.html"title="struct regex_automata::util::alphabet::ByteClasses"><code>ByteClasses</code></a> to visit each element in the DFA’s alphabet instead
of just visiting every distinct <code>u8</code> value. The latter isn’t necessarily wrong,
</div></details><h2id="structs"class="section-header">Structs<ahref="#structs"class="anchor">§</a></h2><ulclass="item-table"><li><divclass="item-name"><aclass="struct"href="struct.ByteClassElements.html"title="struct regex_automata::util::alphabet::ByteClassElements">ByteClassElements</a></div><divclass="desc docblock-short">An iterator over all elements in an equivalence class.</div></li><li><divclass="item-name"><aclass="struct"href="struct.ByteClassIter.html"title="struct regex_automata::util::alphabet::ByteClassIter">ByteClassIter</a></div><divclass="desc docblock-short">An iterator over each equivalence class.</div></li><li><divclass="item-name"><aclass="struct"href="struct.ByteClassRepresentatives.html"title="struct regex_automata::util::alphabet::ByteClassRepresentatives">ByteClassRepresentatives</a></div><divclass="desc docblock-short">An iterator over representative bytes from each equivalence class.</div></li><li><divclass="item-name"><aclass="struct"href="struct.ByteClasses.html"title="struct regex_automata::util::alphabet::ByteClasses">ByteClasses</a></div><divclass="desc docblock-short">A representation of byte oriented equivalence classes.</div></li><li><divclass="item-name"><aclass="struct"href="struct.Unit.html"title="struct regex_automata::util::alphabet::Unit">Unit</a></div><divclass="desc docblock-short">Unit represents a single unit of haystack for DFA based regex engines.</div></li></ul></section></div></main></body></html>