It would be interesting. And which libraries? It can't be done with native regex.
A library you'd write to experiment with the idea.
I'm working on a library, BABLR
I can experiment with PCRE if it comes to that.
I don't know why we are debating about the feasibility of this. Many other regex engines, that ES uses as reference when proposing and designing features, already have this feature. It has been battle tested in the wild in many languages. No regex engine that supports backreferencing is an FSM and modeling regexes as FSM has never been a goal. The opinions we need to hear are implementors on whether they consider it acceptable performance-wise (since they've, in the past, rejected/redesigned features that can't be implemented performantly).
How to find out from browser developers?
You can implement backreferences with a lightly modified FSM. All you need to do is take the stored slice you already need for group matching and test it exactly like you would a literal sequence like in the regexp /abc/
.
Groups can be stored in a fixed-length array - there's only a finite number of them in a given regexp. You don't need recursion to implement it.
What's more surprising is neither re2 nor Rust's regex crate supports it.
An FSM can only have finite memory. A "fixed-length array" is still unbounded because each array member is a string, which is unbounded.
- I said a "modified FSM", not a pure one. But in either case, the whole "machine" state with output, current FSM state, and all, is still ultimately a fixed-size allocation linear in proportion to the regexp itself unless you do a complete non-backtracking DFA lowering.
- FSMs are finite in interpreter state. Groups are output, and to model that output as interpreter state, you have to upgrade the abstract machine to a nested stack automaton (which also gives you the power to match backreferences).
- The reason you can get away with this with so little modification is because you're reusing a stored output buffer slot. There really isn't much added overhead here, unlike other types of recusion.
That's not surprising at all. RE2: One of its primary guarantees is that the match time is linear in the length of the input string.
Try /a([a-z]*)\1z/
against "agazzilionletterz"
I found it in a proposal: GitHub - rbuckton/proposal-regexp-features: Proposal to investigate additional language features for ECMAScript Regular Expressions
And a lot of new things there.
Sad that still stage 0.
What is with this proposal? It looks like it is frozen (not under development). The next step is "Prose outlining the problem or need and the general shape of a solution" (unmarked).
And as for me, too many things in one proposal. Some things may block adding others to the standard (because either everything should be added or nothing).
This repo, AFAIK, is used as a backlog and each feature would be extracted out of it as an individual proposal. In fact rbuckton has already proposed many things on this list: GitHub - tc39/proposal-regexp-modifiers: Regular Expression Pattern Modifiers for ECMAScript, GitHub - tc39/proposal-regexp-atomic-operators, GitHub - tc39/proposal-regexp-buffer-boundaries: Regular Expression Buffer Boundaries for ECMAScript to name a few. But since resource is limited there has to be priorities.
But no individual for recursion, which is the most useful from them, as for me :(
Sad that he doesn't think so. From other side, maybe he does them from simple to difficult.
I only see one more from him: "Regular Expression \R
Escape". No more?
Do you mean free time? 3 years is quite enough for all of them.