mirror of
https://github.com/edg-l/edlang.git
synced 2024-11-15 12:38:23 +00:00
37 lines
8.5 KiB
HTML
37 lines
8.5 KiB
HTML
|
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="This module provides APIs for dealing with the alphabets of finite state machines."><title>regex_automata::util::alphabet - Rust</title><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/SourceSerif4-Regular-46f98efaafac5295.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/FiraSans-Regular-018c141bf0843ffd.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/FiraSans-Medium-8f9a781e4970d388.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/SourceCodePro-Regular-562dcc5011b6de7d.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../static.files/SourceCodePro-Semibold-d899c5a5c4aeb14a.ttf.woff2"><link rel="stylesheet" href="../../../static.files/normalize-76eba96aa4d2e634.css"><link rel="stylesheet" href="../../../static.files/rustdoc-ac92e1bbe349e143.css"><meta name="rustdoc-vars" data-root-path="../../../" data-static-root-path="../../../static.files/" data-current-crate="regex_automata" data-themes="" data-resource-suffix="" data-rustdoc-version="1.76.0 (07dca489a 2024-02-04)" data-channel="1.76.0" data-search-js="search-2b6ce74ff89ae146.js" data-settings-js="settings-4313503d2e1961c2.js" ><script src="../../../static.files/storage-f2adc0d6ca4d09fb.js"></script><script defer src="../sidebar-items.js"></script><script defer src="../../../static.files/main-305769736d49e732.js"></script><noscript><link rel="stylesheet" href="../../../static.files/noscript-feafe1bb7466e4bd.css"></noscript><link rel="alternate icon" type="image/png" href="../../../static.files/favicon-16x16-8b506e7a72182f1c.png"><link rel="alternate icon" type="image/png" href="../../../static.files/favicon-32x32-422f7d1d52889060.png"><link rel="icon" type="image/svg+xml" href="../../../static.files/favicon-2c020d218678b618.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><nav class="mobile-topbar"><button class="sidebar-menu-toggle">☰</button></nav><nav class="sidebar"><div class="sidebar-crate"><h2><a href="../../../regex_automata/index.html">regex_automata</a><span class="version">0.4.5</span></h2></div><h2 class="location"><a href="#">Module alphabet</a></h2><div class="sidebar-elems"><section><ul class="block"><li><a href="#structs">Structs</a></li></ul></section><h2><a href="../index.html">In regex_automata::util</a></h2></div></nav><div class="sidebar-resizer"></div>
|
|||
|
<main><div class="width-limiter"><nav class="sub"><form class="search-form"><span></span><div id="sidebar-button" tabindex="-1"><a href="../../../regex_automata/all.html" title="show sidebar"></a></div><input class="search-input" name="search" aria-label="Run search in the documentation" autocomplete="off" spellcheck="false" placeholder="Click or press ‘S’ to search, ‘?’ for more options…" type="search"><div id="help-button" tabindex="-1"><a href="../../../help.html" title="help">?</a></div><div id="settings-menu" tabindex="-1"><a href="../../../settings.html" title="settings"><img width="22" height="22" alt="Change settings" src="../../../static.files/wheel-7b819b6101059cd0.svg"></a></div></form></nav><section id="main-content" class="content"><div class="main-heading"><h1>Module <a href="../../index.html">regex_automata</a>::<wbr><a href="../index.html">util</a>::<wbr><a class="mod" href="#">alphabet</a><button id="copy-path" title="Copy item path to clipboard"><img src="../../../static.files/clipboard-7571035ce49a181d.svg" width="19" height="18" alt="Copy item path"></button></h1><span class="out-of-band"><a class="src" href="../../../src/regex_automata/util/alphabet.rs.html#1-1139">source</a> · <button id="toggle-all-docs" title="collapse all docs">[<span>−</span>]</button></span></div><details class="toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>This module provides APIs for dealing with the alphabets of finite state
|
|||
|
machines.</p>
|
|||
|
<p>There are two principal types in this module, <a href="struct.ByteClasses.html" title="struct regex_automata::util::alphabet::ByteClasses"><code>ByteClasses</code></a> and <a href="struct.Unit.html" title="struct regex_automata::util::alphabet::Unit"><code>Unit</code></a>.
|
|||
|
The former defines the alphabet of a finite state machine while the latter
|
|||
|
represents an element of that alphabet.</p>
|
|||
|
<p>To a first approximation, the alphabet of all automata in this crate is just
|
|||
|
a <code>u8</code>. Namely, every distinct byte value. All 256 of them. In practice, this
|
|||
|
can be quite wasteful when building a transition table for a DFA, since it
|
|||
|
requires storing a state identifier for each element in the alphabet. Instead,
|
|||
|
we collapse the alphabet of an automaton down into equivalence classes, where
|
|||
|
every byte in the same equivalence class never discriminates between a match or
|
|||
|
a non-match from any other byte in the same class. For example, in the regex
|
|||
|
<code>[a-z]+</code>, then you could consider it having an alphabet consisting of two
|
|||
|
equivalence classes: <code>a-z</code> and everything else. In terms of the transitions on
|
|||
|
an automaton, it doesn’t actually require representing every distinct byte.
|
|||
|
Just the equivalence classes.</p>
|
|||
|
<p>The downside of equivalence classes is that, of course, searching a haystack
|
|||
|
deals with individual byte values. Those byte values need to be mapped to
|
|||
|
their corresponding equivalence class. This is what <code>ByteClasses</code> does. In
|
|||
|
practice, doing this for every state transition has negligible impact on modern
|
|||
|
CPUs. Moreover, it helps make more efficient use of the CPU cache by (possibly
|
|||
|
considerably) shrinking the size of the transition table.</p>
|
|||
|
<p>One last hiccup concerns <code>Unit</code>. Namely, because of look-around and how the
|
|||
|
DFAs in this crate work, we need to add a sentinel value to our alphabet
|
|||
|
of equivalence classes that represents the “end” of a search. We call that
|
|||
|
sentinel <a href="struct.Unit.html#method.eoi" title="associated function regex_automata::util::alphabet::Unit::eoi"><code>Unit::eoi</code></a> or “end of input.” Thus, a <code>Unit</code> is either an
|
|||
|
equivalence class corresponding to a set of bytes, or it is a special “end of
|
|||
|
input” sentinel.</p>
|
|||
|
<p>In general, you should not expect to need either of these types unless you’re
|
|||
|
doing lower level shenanigans with DFAs, or even building your own DFAs.
|
|||
|
(Although, you don’t have to use these types to build your own DFAs of course.)
|
|||
|
For example, if you’re walking a DFA’s state graph, it’s probably useful to
|
|||
|
make use of <a href="struct.ByteClasses.html" title="struct regex_automata::util::alphabet::ByteClasses"><code>ByteClasses</code></a> to visit each element in the DFA’s alphabet instead
|
|||
|
of just visiting every distinct <code>u8</code> value. The latter isn’t necessarily wrong,
|
|||
|
but it could be potentially very wasteful.</p>
|
|||
|
</div></details><h2 id="structs" class="section-header"><a href="#structs">Structs</a></h2><ul class="item-table"><li><div class="item-name"><a class="struct" href="struct.ByteClassElements.html" title="struct regex_automata::util::alphabet::ByteClassElements">ByteClassElements</a></div><div class="desc docblock-short">An iterator over all elements in an equivalence class.</div></li><li><div class="item-name"><a class="struct" href="struct.ByteClassIter.html" title="struct regex_automata::util::alphabet::ByteClassIter">ByteClassIter</a></div><div class="desc docblock-short">An iterator over each equivalence class.</div></li><li><div class="item-name"><a class="struct" href="struct.ByteClassRepresentatives.html" title="struct regex_automata::util::alphabet::ByteClassRepresentatives">ByteClassRepresentatives</a></div><div class="desc docblock-short">An iterator over representative bytes from each equivalence class.</div></li><li><div class="item-name"><a class="struct" href="struct.ByteClasses.html" title="struct regex_automata::util::alphabet::ByteClasses">ByteClasses</a></div><div class="desc docblock-short">A representation of byte oriented equivalence classes.</div></li><li><div class="item-name"><a class="struct" href="struct.Unit.html" title="struct regex_automata::util::alphabet::Unit">Unit</a></div><div class="desc docblock-short">Unit represents a single unit of haystack for DFA based regex engines.</div></li></ul></section></div></main></body></html>
|