Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expressions for non-strings

Tags:

regex

I was wondering if there is such a thing as regular expressions for sequential data that isn't a string.

I know that regular expressions essentially boil down to DFAs, but I'm more interested in higher-level languages for specifying these DFAs.

like image 218
Ryan Fox Avatar asked Dec 09 '25 13:12

Ryan Fox


2 Answers

There is absolutely nothing in the theory of regular expressions that prevents them from being applied to something else than just strings of characters. It's just that most regular expression engine implementations don't allow that.

However, if you have a regular expression engine that allows you to treat a string as un-encoded 8-Bit data (sometimes called BINARY, 8BIT or ASCII-8BIT), then you can use that engine to parse byte-oriented binary data.

Ragel is a state machine compiler that is specifically designed for parsing binary protocols. You write your state machine in a high-level (regexp-like) DSL and Ragel then compiles that into your target language – Ragel currently supports C, C++, Objective-C, D, Java and Ruby.

Most functional programming languages have powerful pattern matching facilities baked right into the language itself. those facilities can be used to pattern match binary data. One example of this is Erlang's support for building and pattern matching binary data structures.

OMeta is a pattern matching and pattern transformation language that is basically a superset of regular expressions, on steroids. It supports matching of not only strings of characters but also arrays and lists of integers and arbitrary objects.

like image 150
Jörg W Mittag Avatar answered Dec 11 '25 02:12

Jörg W Mittag


You can argue that a grammar is a form of regular expression for things that are more complex than just strings. In principle, you can devise regular expressions on other tokens than just characters. As one option, you could argue that a regex for Unicode is such a creature - it certainly isn't matching simple bytes as the classic regex does.

like image 23
Jonathan Leffler Avatar answered Dec 11 '25 01:12

Jonathan Leffler



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!