Adding recursion to regex

oleedd · August 9, 2024, 3:01pm

It would be interesting. And which libraries? It can't be done with native regex.

claudiameadows · August 9, 2024, 10:29pm

A library you'd write to experiment with the idea.

conartist6 · August 10, 2024, 1:35am

I'm working on a library, BABLR

oleedd · August 10, 2024, 8:51am

I can experiment with PCRE if it comes to that.

Josh-Cena · August 11, 2024, 2:09am

I don't know why we are debating about the feasibility of this. Many other regex engines, that ES uses as reference when proposing and designing features, already have this feature. It has been battle tested in the wild in many languages. No regex engine that supports backreferencing is an FSM and modeling regexes as FSM has never been a goal. The opinions we need to hear are implementors on whether they consider it acceptable performance-wise (since they've, in the past, rejected/redesigned features that can't be implemented performantly).

oleedd · August 16, 2024, 12:34pm

How to find out from browser developers?

claudiameadows · August 31, 2024, 5:58am

You can implement backreferences with a lightly modified FSM. All you need to do is take the stored slice you already need for group matching and test it exactly like you would a literal sequence like in the regexp /abc/.

Groups can be stored in a fixed-length array - there's only a finite number of them in a given regexp. You don't need recursion to implement it.

What's more surprising is neither re2 nor Rust's regex crate supports it.

Josh-Cena · August 31, 2024, 1:57pm

An FSM can only have finite memory. A "fixed-length array" is still unbounded because each array member is a string, which is unbounded.

claudiameadows · August 31, 2024, 8:29pm

I said a "modified FSM", not a pure one. But in either case, the whole "machine" state with output, current FSM state, and all, is still ultimately a fixed-size allocation linear in proportion to the regexp itself unless you do a complete non-backtracking DFA lowering.
FSMs are finite in interpreter state. Groups are output, and to model that output as interpreter state, you have to upgrade the abstract machine to a nested stack automaton (which also gives you the power to match backreferences).
The reason you can get away with this with so little modification is because you're reusing a stored output buffer slot. There really isn't much added overhead here, unlike other types of recusion.

lightmare · September 2, 2024, 7:45am

That's not surprising at all. RE2: One of its primary guarantees is that the match time is linear in the length of the input string.

Try /a([a-z]*)\1z/ against "agazzilionletterz"

oleedd · September 2, 2024, 10:48am

I found it in a proposal: GitHub - rbuckton/proposal-regexp-features: Proposal to investigate additional language features for ECMAScript Regular Expressions
And a lot of new things there.
Sad that still stage 0.

oleedd · September 3, 2024, 10:38am

What is with this proposal? It looks like it is frozen (not under development). The next step is "Prose outlining the problem or need and the general shape of a solution" (unmarked).
And as for me, too many things in one proposal. Some things may block adding others to the standard (because either everything should be added or nothing).

Josh-Cena · September 3, 2024, 1:54pm

This repo, AFAIK, is used as a backlog and each feature would be extracted out of it as an individual proposal. In fact rbuckton has already proposed many things on this list: GitHub - tc39/proposal-regexp-modifiers: Regular Expression Pattern Modifiers for ECMAScript, GitHub - tc39/proposal-regexp-atomic-operators, GitHub - tc39/proposal-regexp-buffer-boundaries: Regular Expression Buffer Boundaries for ECMAScript to name a few. But since resource is limited there has to be priorities.

oleedd · September 3, 2024, 3:19pm

But no individual for recursion, which is the most useful from them, as for me :(
Sad that he doesn't think so. From other side, maybe he does them from simple to difficult.

I only see one more from him: "Regular Expression \R Escape". No more?

Do you mean free time? 3 years is quite enough for all of them.

Topic		Replies	Views
Support of "if" and "(?R)" in Regex I have questions	4	56	July 14, 2024
RE2 - Consider having it as alternative engine choice 💡 Ideas proposal	14	999	July 10, 2022
Possessive RegExp matching 💡 Ideas	8	1110	February 12, 2020
Safe Regex engine to prevent ReDOS Attack 💡 Ideas	10	1056	April 22, 2022
Сancelable/async regexp 💡 Ideas	4	408	February 28, 2020

Adding recursion to regex

Related topics