edlang/regex_automata/util/alphabet/index.html
2024-04-13 08:42:00 +00:00

37 lines
8.6 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="This module provides APIs for dealing with the alphabets of finite state machines."><title>regex_automata::util::alphabet - Rust</title><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/SourceSerif4-Regular-46f98efaafac5295.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/FiraSans-Regular-018c141bf0843ffd.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/FiraSans-Medium-8f9a781e4970d388.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/SourceCodePro-Regular-562dcc5011b6de7d.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/SourceCodePro-Semibold-d899c5a5c4aeb14a.ttf.woff2"><link rel="stylesheet" href="../../../static.files/normalize-76eba96aa4d2e634.css"><link rel="stylesheet" href="../../../static.files/rustdoc-5bc39a1768837dd0.css"><meta name="rustdoc-vars" data-root-path="../../../" data-static-root-path="../../../static.files/" data-current-crate="regex_automata" data-themes="" data-resource-suffix="" data-rustdoc-version="1.77.2 (25ef9e3d8 2024-04-09)" data-channel="1.77.2" data-search-js="search-dd67cee4cfa65049.js" data-settings-js="settings-4313503d2e1961c2.js" ><script src="../../../static.files/storage-4c98445ec4002617.js"></script><script defer src="../sidebar-items.js"></script><script defer src="../../../static.files/main-48f368f3872407c8.js"></script><noscript><link rel="stylesheet" href="../../../static.files/noscript-04d5337699b92874.css"></noscript><link rel="alternate icon" type="image/png" href="../../../static.files/favicon-16x16-8b506e7a72182f1c.png"><link rel="alternate icon" type="image/png" href="../../../static.files/favicon-32x32-422f7d1d52889060.png"><link rel="icon" type="image/svg+xml" href="../../../static.files/favicon-2c020d218678b618.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><nav class="mobile-topbar"><button class="sidebar-menu-toggle" title="show sidebar"></button></nav><nav class="sidebar"><div class="sidebar-crate"><h2><a href="../../../regex_automata/index.html">regex_automata</a><span class="version">0.4.6</span></h2></div><h2 class="location"><a href="#">Module alphabet</a></h2><div class="sidebar-elems"><section><ul class="block"><li><a href="#structs">Structs</a></li></ul></section><h2><a href="../index.html">In regex_automata::util</a></h2></div></nav><div class="sidebar-resizer"></div>
<main><div class="width-limiter"><nav class="sub"><form class="search-form"><span></span><div id="sidebar-button" tabindex="-1"><a href="../../../regex_automata/all.html" title="show sidebar"></a></div><input class="search-input" name="search" aria-label="Run search in the documentation" autocomplete="off" spellcheck="false" placeholder="Click or press S to search, ? for more options…" type="search"><div id="help-button" tabindex="-1"><a href="../../../help.html" title="help">?</a></div><div id="settings-menu" tabindex="-1"><a href="../../../settings.html" title="settings"><img width="22" height="22" alt="Change settings" src="../../../static.files/wheel-7b819b6101059cd0.svg"></a></div></form></nav><section id="main-content" class="content"><div class="main-heading"><h1>Module <a href="../../index.html">regex_automata</a>::<wbr><a href="../index.html">util</a>::<wbr><a class="mod" href="#">alphabet</a><button id="copy-path" title="Copy item path to clipboard"><img src="../../../static.files/clipboard-7571035ce49a181d.svg" width="19" height="18" alt="Copy item path"></button></h1><span class="out-of-band"><a class="src" href="../../../src/regex_automata/util/alphabet.rs.html#1-1139">source</a> · <button id="toggle-all-docs" title="collapse all docs">[<span>&#x2212;</span>]</button></span></div><details class="toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>This module provides APIs for dealing with the alphabets of finite state
machines.</p>
<p>There are two principal types in this module, <a href="struct.ByteClasses.html" title="struct regex_automata::util::alphabet::ByteClasses"><code>ByteClasses</code></a> and <a href="struct.Unit.html" title="struct regex_automata::util::alphabet::Unit"><code>Unit</code></a>.
The former defines the alphabet of a finite state machine while the latter
represents an element of that alphabet.</p>
<p>To a first approximation, the alphabet of all automata in this crate is just
a <code>u8</code>. Namely, every distinct byte value. All 256 of them. In practice, this
can be quite wasteful when building a transition table for a DFA, since it
requires storing a state identifier for each element in the alphabet. Instead,
we collapse the alphabet of an automaton down into equivalence classes, where
every byte in the same equivalence class never discriminates between a match or
a non-match from any other byte in the same class. For example, in the regex
<code>[a-z]+</code>, then you could consider it having an alphabet consisting of two
equivalence classes: <code>a-z</code> and everything else. In terms of the transitions on
an automaton, it doesnt actually require representing every distinct byte.
Just the equivalence classes.</p>
<p>The downside of equivalence classes is that, of course, searching a haystack
deals with individual byte values. Those byte values need to be mapped to
their corresponding equivalence class. This is what <code>ByteClasses</code> does. In
practice, doing this for every state transition has negligible impact on modern
CPUs. Moreover, it helps make more efficient use of the CPU cache by (possibly
considerably) shrinking the size of the transition table.</p>
<p>One last hiccup concerns <code>Unit</code>. Namely, because of look-around and how the
DFAs in this crate work, we need to add a sentinel value to our alphabet
of equivalence classes that represents the “end” of a search. We call that
sentinel <a href="struct.Unit.html#method.eoi" title="associated function regex_automata::util::alphabet::Unit::eoi"><code>Unit::eoi</code></a> or “end of input.” Thus, a <code>Unit</code> is either an
equivalence class corresponding to a set of bytes, or it is a special “end of
input” sentinel.</p>
<p>In general, you should not expect to need either of these types unless youre
doing lower level shenanigans with DFAs, or even building your own DFAs.
(Although, you dont have to use these types to build your own DFAs of course.)
For example, if youre walking a DFAs state graph, its probably useful to
make use of <a href="struct.ByteClasses.html" title="struct regex_automata::util::alphabet::ByteClasses"><code>ByteClasses</code></a> to visit each element in the DFAs alphabet instead
of just visiting every distinct <code>u8</code> value. The latter isnt necessarily wrong,
but it could be potentially very wasteful.</p>
</div></details><h2 id="structs" class="section-header">Structs<a href="#structs" class="anchor">§</a></h2><ul class="item-table"><li><div class="item-name"><a class="struct" href="struct.ByteClassElements.html" title="struct regex_automata::util::alphabet::ByteClassElements">ByteClassElements</a></div><div class="desc docblock-short">An iterator over all elements in an equivalence class.</div></li><li><div class="item-name"><a class="struct" href="struct.ByteClassIter.html" title="struct regex_automata::util::alphabet::ByteClassIter">ByteClassIter</a></div><div class="desc docblock-short">An iterator over each equivalence class.</div></li><li><div class="item-name"><a class="struct" href="struct.ByteClassRepresentatives.html" title="struct regex_automata::util::alphabet::ByteClassRepresentatives">ByteClassRepresentatives</a></div><div class="desc docblock-short">An iterator over representative bytes from each equivalence class.</div></li><li><div class="item-name"><a class="struct" href="struct.ByteClasses.html" title="struct regex_automata::util::alphabet::ByteClasses">ByteClasses</a></div><div class="desc docblock-short">A representation of byte oriented equivalence classes.</div></li><li><div class="item-name"><a class="struct" href="struct.Unit.html" title="struct regex_automata::util::alphabet::Unit">Unit</a></div><div class="desc docblock-short">Unit represents a single unit of haystack for DFA based regex engines.</div></li></ul></section></div></main></body></html>