Beside actual implementation bugs (1), I think that backtracking in RegExp is a footgun, and that having a way to match text possessively would make it easier to write predictable parsers.
It would be nice to have either possessive quantifiers (/a*/ => /a*+/, /a?/ => /a?+/, etc...) or a new p flag that would trigger the possessive mode for all operators.
Both options are backwards compatible AFAICT, since they cause errors in current JS.
Not sure how the polyfill story would play out...
Currently /^(?:ab|[^c])*$/ in irregexp (Chromium/Firefox) goes exponential when fed 'ab'.repeat(n) + 'c', even though it has enough info to know that it won't succeed once it reaches the 'c'.
But I would never expect a common JS developer to be able to come up with a solution like that, and I expect many would be confused reading it. So a major , I think having easier control of the regex engine's backtracking would be a major improvement!
Good to know you can emulate it, but not without polluting the capture array, and the back-reference means you're limited to having them before the ninth capture is done.
RegExps are confusing to read anyways, they haven't been designed for their current use case (Kleene devised the notation to describe grammars mathematically, where sub-patterns were abstracted as one letter variables, and Thompson adopted it for write-only commands to be typed into physical, screen-less terminals. It was optimal to quickly unload ideas into the keyboard). They graduated to their current status through worse is better...
The trick remains obscure though, even with a cleaner syntax (the one I favor to write complex regexps).
const possessive = op => (...args) =>
sequence(
lookAhead(capture(suffix(op, ...args))),
ref(1)
)
// once defined it is quite usable though...
const zeroPlusP = possessive('*')
const r = sequence(/^/, zeroPlusP(/ab|[^c]/), /$/)
You can use named backreferences; but also, i suspect (without verifying right now) that you can have up to 99 backreferences, just like in replacement?
Yeah, that's true, I'm still thinking in ES5 on that front (because AFAIK transpilers don't handle regexps created using the RegExp constructor, and my lib uses it extensively under the hood, and it doesn't have a runtime like XRegExp).
TIL for the backreferences between 10 and 99. That's good to know (and a bug in my lib).
Even if they are polyfilled with named back-references, the captured strings are generated, and also end up in the array at the spot they would have occupied as an anonymous capture.
So the polyfill situation doesn't look promising... It is in the ballpark of what XRegExp does though (see how it polyfills sticky matches by adding a disjuntion with an empty capture at the end of the source, then popping the last capture).
Babel is able to wrap regexs (only the literals, not the calls to RegExp()) for named capture groups, and they'd be able to do the same here.
I already started! (Sorry, I forgot to message here).
After speaking with the Chrome and Safari regex implementors (Firefox and Edge use Chrome's implementation), they want to pursue implementation-only changes to eliminate backtracking. Everyone feels that possessive quantifiers (and atomic groups, which possessives devolve into) are an "extreme power user" feature, and it won't help the average dev. If we can make implementation only changes, then most every regex already in use will just be automatically better.
It's great that they want to improve the backtracking behavior :-)
As a power user, I still wish this was available tough.
The average user may not benefit directly, but they may benefit from libs that take advantage of it.
Suppose the aforementioned optimizations fail in a corner case, authors could still handle the problem themselves rather than waiting for upstream fixes.
Worse, imagine a corner case regression in the RegExp engine (not caught by the test suite) that causes some client code that used to work fine to undergo exponential backtracking, effectively DOSing the machine.
RegExps are not safe to use IMO without this, and it's a shame because the underlying engines are pretty neat.
Atomic groups and possessive matches are also similar in semantics parser combinators. It would make it trivial to upgrade something built on top of compose-regexp to actual parser combinators, without changing the semantics, and that would be awesome.