I was wondering if there is such a thing as regular expressions for sequential data that isn't a string.
I know that regular expressions essentially boil down to DFAs, but I'm more interested in higher-level languages for specifying these DFAs.
There is absolutely nothing in the theory of regular expressions that prevents them from being applied to something else than just strings of characters. It's just that most regular expression engine implementations don't allow that.
However, if you have a regular expression engine that allows you to treat a string as un-encoded 8-Bit data (sometimes called BINARY, 8BIT or ASCII-8BIT), then you can use that engine to parse byte-oriented binary data.
Ragel is a state machine compiler that is specifically designed for parsing binary protocols. You write your state machine in a high-level (regexp-like) DSL and Ragel then compiles that into your target language – Ragel currently supports C, C++, Objective-C, D, Java and Ruby.
Most functional programming languages have powerful pattern matching facilities baked right into the language itself. those facilities can be used to pattern match binary data. One example of this is Erlang's support for building and pattern matching binary data structures.
OMeta is a pattern matching and pattern transformation language that is basically a superset of regular expressions, on steroids. It supports matching of not only strings of characters but also arrays and lists of integers and arbitrary objects.
You can argue that a grammar is a form of regular expression for things that are more complex than just strings. In principle, you can devise regular expressions on other tokens than just characters. As one option, you could argue that a regex for Unicode is such a creature - it certainly isn't matching simple bytes as the classic regex does.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With