Journal of Functional Programming

Articles

Regular-expression derivatives re-examined

SCOTT OWENSa1, JOHN REPPYa2 and AARON TURONa3

a1 University of Cambridge (e-mail: Scott.Owens@cl.cam.ac.uk)

a2 University of Chicago (e-mail: jhr@cs.uchicago.edu)

a3 University of Chicago, Northeastern University (e-mail: turon@ccs.neu.edu)

Abstract

Regular-expression derivatives are an old, but elegant, technique for compiling regular expressions to deterministic finite-state machines. It easily supports extending the regular-expression operators with boolean operations, such as intersection and complement. Unfortunately, this technique has been lost in the sands of time and few computer scientists are aware of it. In this paper, we reexamine regular-expression derivatives and report on our experiences in the context of two different functional-language implementations. The basic implementation is simple and we show how to extend it to handle large character sets (e.g., Unicode). We also show that the derivatives approach leads to smaller state machines than the traditional algorithm given by McNaughton and Yamada.