Thursday, September 19, 2019

Converting PCRE recursive regex pattern to .NET balancing groups definition

PCRE has a feature called recursive pattern, which can be used to match nested subgroups. For example, consider the "grammar"

Q -> \w | '[' A ';' Q* ','? Q* ']' | '<' A '>'
A -> (Q | ',')*
// to match ^A$.

It can be done in PCRE with the pattern


(Example test case:

Should match:

abcdefg abc,def,ghi abc,,,def ,,,,,, [abc;] [a,bc;] sss[abc;d] as[abc;d,e] [abc;d,e][fgh;j,k]
[b;,] <,,,> <> <><> <>,<> a<<<<>>>> <<<<<>>>><><<<>>>>
[a;b] [[;];] [,;,] [;[;]] [<[;]>;<[;][;,<[;,]>]>]

Should not match:

bc> [a;d,e] [a] <<<<<>>>><><<<>>>>> <<<<<>>>><><<<>>> [abc;def;] [[;],] [;,,] [abc;d,e,f]
[<[;]>;<[;][;,<[;,]>]]> ]

There is no recursive pattern in .NET. Instead, it provides balancing groups for stack-based manipulation for matching simple nested patterns.

Is it possible to convert the above PCRE pattern into .NET Regex style?

(Yes I know it's better not to use regex in for this. It's just a theoretical question.)


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...