The practical efficiency of regular expression membership algorithms

Show simple item record

dc.contributor.advisor Konstantinidis, Stavros
dc.creator Gray, Justin P.
dc.date.accessioned 2022-05-27T14:17:05Z
dc.date.available 2022-05-27T14:17:05Z
dc.date.issued 2022-04-28
dc.identifier.uri http://library2.smu.ca/xmlui/handle/01/30937
dc.description 1 online resource (71 pages) : graphs, charts
dc.description Includes abstract.
dc.description Includes bibliographical references (69-71).
dc.description.abstract Regular expressions encode text patterns and define languages of symbolic words. The membership problem decides if a given word is an element of the language described by a given regular expression. This problem has various well-studied algorithms, but current research only shows asymptotic complexity and performance with respect to samples of randomly generated regular expressions. Our research aims to answer how the algorithms perform when using practical regular expressions used in the real-world on a representative test set of words. A set of compatible regular expressions have been collected from public GitHub repositories. Each compatible expression (i.e., no backreferences or improper formatting) is then converted into an equivalent unambiguous mathematical representation. For each distinct expression, we have tested Thompson, Glushkov, position, follow, and partial derivative NFA constructions, as well as partial derivatives and exponential backtracking directly on the regular expression tree. These algorithms have been implemented into a modified version of the Python’s FAdo library and include UNIX-inspired extensions such as character classes, the wild dot, and UTF-8 support. We find that efficiently constructing a small NFA is the best approach to this problem; using follow and PDDAG algorithms are experimentally shown as the best. en_CA
dc.description.provenance Submitted by Greg Hilliard (greg.hilliard@smu.ca) on 2022-05-27T14:17:05Z No. of bitstreams: 1 Gray_Justin_Honours_2022.pdf: 1877823 bytes, checksum: 3707c6e1ffc29a5c4b139b0bfa400fc7 (MD5) en
dc.description.provenance Made available in DSpace on 2022-05-27T14:17:05Z (GMT). No. of bitstreams: 1 Gray_Justin_Honours_2022.pdf: 1877823 bytes, checksum: 3707c6e1ffc29a5c4b139b0bfa400fc7 (MD5) Previous issue date: 2022-04-28 en
dc.language.iso en en_CA
dc.publisher Halifax, N.S. : Saint Mary's University
dc.title The practical efficiency of regular expression membership algorithms en_CA
dc.type Text en_CA
thesis.degree.name Bachelor of Science (Honours Computing Science)
thesis.degree.level Undergraduate
thesis.degree.discipline Mathematics and Computing Science
thesis.degree.grantor Saint Mary's University (Halifax, N.S.)
 Find Full text

Files in this item

 
 

This item appears in the following Collection(s)

Show simple item record