After reading an article about verbose regular expressions in Python, and how one could do something like this in ECMAScript, I realised that there is one really simple way how this could be done:
Simply concatenate multiple chunks, which are separated by whitespace, back into one. So /expr1/ /expre2/ is read and executed as /expr1expr2/.
This way it becomes much more readable and we can also easily add comments:
new RegExp(
/expr1/ // comments on expr1
/expr2/ // comments on expr2
, 'i' )
Doing this with syntax (two regex literals directly next to each other) is actually very difficult, due to the similarities with infix division operator. Actually, just parsing a single regex is difficult due to division operator.
I hoped it would be simple. If there is some code that recognizes a regex expression then it may not be to hard to repeat it until no expressions are found.
Free spacing looks like what Verbose Regular Expressions are in Python.
It's a pitty, but we can still do strings that can be broken up and converted to a regular expression.
Just out of curiosity: how does parsing of a regex work?
I assumed (but this maybe naive thinking) that when a forward slash was found, it would just look for the next forward slash that wasn't escaped. And what's in between would then be regarded as a regex.
Except when the forward slash is part of a comment or a division operator. See the details in ECMA-262 - Ecma International
There are no syntactic grammar contexts where both a leading division or division-assignment, and a leading RegularExpressionLiteral are permitted.
Notice for example that the expression /a/ /b/g is already permitted by the current grammar, but has a /a/ regex literal that is divided by the variable b and then divided by the variable g.
it would just look for the next forward slash that wasn't escaped
You'd also have to exclude forward slashes within character classes: /[/]/ is a legal regex. So it's a bit more complicated. But it's not that much more complicated; see the definition of RegularExpressionLiteral. (Note that there is a second, significantly more complicated grammar used to parse the literals once this simpler grammar has identified them.)
In theory, you could merge these two steps, but in practice, it's rather complicated to do (you'd have to do a fair bit of math to tame it to something actually efficiently implementable) and, unless you're writing an engine with a built-in regexp runtime (none of the major engines do, BTW) or a regexp transpiler like regexpu, it's almost never worth the effort.