<main><divclass="width-limiter"><navclass="sub"><formclass="search-form"><span></span><divid="sidebar-button"tabindex="-1"><ahref="../../../regex_syntax/all.html"title="show sidebar"></a></div><inputclass="search-input"name="search"aria-label="Run search in the documentation"autocomplete="off"spellcheck="false"placeholder="Click or press ‘S’ to search, ‘?’ for more options…"type="search"><divid="help-button"tabindex="-1"><ahref="../../../help.html"title="help">?</a></div><divid="settings-menu"tabindex="-1"><ahref="../../../settings.html"title="settings"><imgwidth="22"height="22"alt="Change settings"src="../../../static.files/wheel-7b819b6101059cd0.svg"></a></div></form></nav><sectionid="main-content"class="content"><divclass="main-heading"><h1>Module <ahref="../../index.html">regex_syntax</a>::<wbr><ahref="../index.html">hir</a>::<wbr><aclass="mod"href="#">literal</a><buttonid="copy-path"title="Copy item path to clipboard"><imgsrc="../../../static.files/clipboard-7571035ce49a181d.svg"width="19"height="18"alt="Copy item path"></button></h1><spanclass="out-of-band"><aclass="src"href="../../../src/regex_syntax/hir/literal.rs.html#1-3214">source</a> · <buttonid="toggle-all-docs"title="collapse all docs">[<span>−</span>]</button></span></div><detailsclass="toggle top-doc"open><summaryclass="hideme"><span>Expand description</span></summary><divclass="docblock"><p>Provides literal extraction from <code>Hir</code> expressions.</p>
<p>An <ahref="struct.Extractor.html"title="struct regex_syntax::hir::literal::Extractor"><code>Extractor</code></a> pulls literals out of <ahref="../struct.Hir.html"title="struct regex_syntax::hir::Hir"><code>Hir</code></a> expressions and returns a
<ahref="struct.Seq.html"title="struct regex_syntax::hir::literal::Seq"><code>Seq</code></a> of <ahref="struct.Literal.html"title="struct regex_syntax::hir::literal::Literal"><code>Literal</code></a>s.</p>
<p>The purpose of literal extraction is generally to provide avenues for
optimizing regex searches. The main idea is that substring searches can be an
order of magnitude faster than a regex search. Therefore, if one can execute
a substring search to find candidate match locations and only run the regex
search at those locations, then it is possible for huge improvements in
performance to be realized.</p>
<p>With that said, literal optimizations are generally a black art because even
though substring search is generally faster, if the number of candidates
produced is high, then it can create a lot of overhead by ping-ponging between
the substring search and the regex search.</p>
<p>Here are some heuristics that might be used to help increase the chances of
effective literal optimizations:</p>
<ul>
<li>Stick to small <ahref="struct.Seq.html"title="struct regex_syntax::hir::literal::Seq"><code>Seq</code></a>s. If you search for too many literals, it’s likely
to lead to substring search that is only a little faster than a regex search,
and thus the overhead of using literal optimizations in the first place might
make things slower overall.</li>
<li>The literals in your <ahref="struct.Seq.html"title="struct regex_syntax::hir::literal::Seq"><code>Seq</code></a> shouldn’t be too short. In general, longer is
better. A sequence corresponding to single bytes that occur frequently in the
haystack, for example, is probably a bad literal optimization because it’s
likely to produce many false positive candidates. Longer literals are less
likely to match, and thus probably produce fewer false positives.</li>
<li>If it’s possible to estimate the approximate frequency of each byte according
to some pre-computed background distribution, it is possible to compute a score
of how “good” a <code>Seq</code> is. If a <code>Seq</code> isn’t good enough, you might consider
skipping the literal optimization and just use the regex engine.</li>
</ul>
<p>(It should be noted that there are always pathological cases that can make
any kind of literal optimization be a net slower result. This is why it
might be a good idea to be conservative, or to even provide a means for
literal optimizations to be dynamically disabled if they are determined to be
ineffective according to some measure.)</p>
<p>You’re encouraged to explore the methods on <ahref="struct.Seq.html"title="struct regex_syntax::hir::literal::Seq"><code>Seq</code></a>, which permit shrinking
the size of sequences in a preference-order preserving fashion.</p>
<p>Finally, note that it isn’t strictly necessary to use an <ahref="struct.Extractor.html"title="struct regex_syntax::hir::literal::Extractor"><code>Extractor</code></a>. Namely,
an <code>Extractor</code> only uses public APIs of the <ahref="struct.Seq.html"title="struct regex_syntax::hir::literal::Seq"><code>Seq</code></a> and <ahref="struct.Literal.html"title="struct regex_syntax::hir::literal::Literal"><code>Literal</code></a> types,
so it is possible to implement your own extractor. For example, for n-grams
or “inner” literals (i.e., not prefix or suffix literals). The <code>Extractor</code>
is mostly responsible for the case analysis over <code>Hir</code> expressions. Much of
the “trickier” parts are how to combine literal sequences, and that is all
implemented on <ahref="struct.Seq.html"title="struct regex_syntax::hir::literal::Seq"><code>Seq</code></a>.</p>
</div></details><h2id="structs"class="section-header">Structs<ahref="#structs"class="anchor">§</a></h2><ulclass="item-table"><li><divclass="item-name"><aclass="struct"href="struct.Extractor.html"title="struct regex_syntax::hir::literal::Extractor">Extractor</a></div><divclass="desc docblock-short">Extracts prefix or suffix literal sequences from <ahref="../struct.Hir.html"title="struct regex_syntax::hir::Hir"><code>Hir</code></a> expressions.</div></li><li><divclass="item-name"><aclass="struct"href="struct.Literal.html"title="struct regex_syntax::hir::literal::Literal">Literal</a></div><divclass="desc docblock-short">A single literal extracted from an <ahref="../struct.Hir.html"title="struct regex_syntax::hir::Hir"><code>Hir</code></a> expression.</div></li><li><divclass="item-name"><aclass="struct"href="struct.Seq.html"title="struct regex_syntax::hir::literal::Seq">Seq</a></div><divclass="desc docblock-short">A sequence of literals.</div></li></ul><h2id="enums"class="section-header">Enums<ahref="#enums"class="anchor">§</a></h2><ulclass="item-table"><li><divclass="item-name"><aclass="enum"href="enum.ExtractKind.html"title="enum regex_syntax::hir::literal::ExtractKind">ExtractKind</a></div><divclass="desc docblock-short">The kind of literals to extract from an <ahref="../struct.Hir.html"title="struct regex_syntax::hir::Hir"><code>Hir</code></a> expression.</div></li></ul><h2id="functions"class="section-header">Functions<ahref="#functions"class="anchor">§</a></h2><ulclass="item-table"><li><divclass="item-name"><aclass="fn"href="fn.rank.html"title="fn regex_syntax::hir::literal::rank">rank</a></div><divclass="desc docblock-short">Returns the “rank” of the given byte.</div></li></ul></section></div></main></body></html>