Make it possible to divide regular expressions into chunks

spdev · June 6, 2020, 1:34pm

Regular expressions are often hard to read.

After reading an article about verbose regular expressions in Python, and how one could do something like this in ECMAScript, I realised that there is one really simple way how this could be done:

Simply concatenate multiple chunks, which are separated by whitespace, back into one. So /expr1/ /expre2/ is read and executed as /expr1expr2/.

This way it becomes much more readable and we can also easily add comments:

new RegExp(
	/expr1/ // comments on expr1
	/expr2/ // comments on expr2
	, 'i' )

AFAIK this shouldn't be to hard to do.

jridgewell · June 8, 2020, 6:00pm

Doing this with syntax (two regex literals directly next to each other) is actually very difficult, due to the similarities with infix division operator. Actually, just parsing a single regex is difficult due to division operator.

There's a related regex feature called free spacing that would allow just this. It hasn't been discussed in a while, though. Last mention I can find is Allow spaces in curly brackets in RegularExpressions

spdev · June 16, 2020, 11:32am

Thanks for explaining.

I hoped it would be simple. If there is some code that recognizes a regex expression then it may not be to hard to repeat it until no expressions are found.

Free spacing looks like what Verbose Regular Expressions are in Python.

It's a pitty, but we can still do strings that can be broken up and converted to a regular expression.

claudiameadows · June 18, 2020, 1:52am

@jridgewell What about something like @/multline regexp/flags or some variant thereof (using a symbol that's not a valid binary operator)?

spdev · July 24, 2020, 12:02pm

Just out of curiosity: how does parsing of a regex work?

I assumed (but this maybe naive thinking) that when a forward slash was found, it would just look for the next forward slash that wasn't escaped. And what's in between would then be regarded as a regex.

bergus · July 24, 2020, 12:52pm

Except when the forward slash is part of a comment or a division operator. See the details in ECMA-262 - Ecma International

There are no syntactic grammar contexts where both a leading division or division-assignment, and a leading RegularExpressionLiteral are permitted.

Notice for example that the expression /a/ /b/g is already permitted by the current grammar, but has a /a/ regex literal that is divided by the variable b and then divided by the variable g.

bakkot · July 24, 2020, 5:00pm

it would just look for the next forward slash that wasn't escaped

You'd also have to exclude forward slashes within character classes: /[/]/ is a legal regex. So it's a bit more complicated. But it's not that much more complicated; see the definition of RegularExpressionLiteral. (Note that there is a second, significantly more complicated grammar used to parse the literals once this simpler grammar has identified them.)

claudiameadows · July 28, 2020, 9:02am

It's a two-step process:

Consume a regexp token and save the inner source with its flags: https://www.ecma-international.org/ecma-262/#sec-literals-regular-expression-literals.
Parse the inner regexp as per https://www.ecma-international.org/ecma-262/#sec-patterns.

In theory, you could merge these two steps, but in practice, it's rather complicated to do (you'd have to do a fair bit of math to tame it to something actually efficiently implementable) and, unless you're writing an engine with a built-in regexp runtime (none of the major engines do, BTW) or a regexp transpiler like regexpu, it's almost never worth the effort.

Topic		Replies	Views
Block Regular Expressions (Regexs/Regexps) 💡 Ideas	1	318	November 23, 2020
Proposal: RegExp Substring Matching 💡 Ideas proposal	7	156	February 2, 2024
RegExp composition 💡 Ideas proposal	7	676	November 28, 2022
Streaming regexp support 💡 Ideas	1	281	November 22, 2020
RegExp: Comments 💡 Ideas	30	1466	November 1, 2021

Make it possible to divide regular expressions into chunks

Related topics