Parsers are security-critical components of many software systems, and verified parsing therefore has a key role to play in secure software design. However, existing verified parsers for context-free grammars are limited in their expressiveness, termination properties, or performance characteristics. They are only compatible with a restricted class of grammars, they are not guaranteed to terminate on all inputs, or they are not designed to be performant on grammars for real-world programming languages and data formats.
In this work, we present CoStar, a verified parser that addresses these limitations. The parser is implemented with the Coq Proof Assistant and is based on the ALL(*) parsing algorithm. CoStar is sound and complete for all non-left-recursive grammars; it produces a correct parse tree for its input whenever such a tree exists, and it correctly detects ambiguous inputs. CoStar also provides strong termination guarantees; it terminates without error on all inputs when applied to a non-left-recursive grammar. Finally, CoStar achieves linear-time performance on a range of unambiguous grammars for commonly used languages and data formats.
Andres Erbsen Massachusetts Institute of Technology, Samuel Gruetter Massachusetts Institute of Technology, Joonwon Choi Massachusetts Institute of Technology, Clark Wood Massachusetts Institute of Technology, Adam Chlipala Massachusetts Institute of Technology
Andres Erbsen Massachusetts Institute of Technology, Samuel Gruetter Massachusetts Institute of Technology, Joonwon Choi Massachusetts Institute of Technology, Clark Wood Massachusetts Institute of Technology, Adam Chlipala Massachusetts Institute of Technology