Safe Regex engine to prevent ReDOS Attack

SeyyedKhandon · August 31, 2020, 1:51pm

Currently, JavaScript regex engine suffers from lacking atomic groups and other features for preventing ReDos Attack. this makes it very complicated to handle these kinds of attacks e.g coming up with a safe regex for a url link such as the one below, can be very daunting:
/^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\*\+,;=.]+$/

therefore It will be very good to handle this in js like java or other languages to prevent ReDos attacks, e.g java8 had this problem but in java9 this problem has been handled.

......
#You can find some patterns and their problems here

jridgewell · August 31, 2020, 4:05pm

Hi Seyyed,

See prior discussion at Possessive RegExp matching.

I actually started a proposal for this, but when I spoke with implementers they said they would rather make implementation only changes (and changes are being pursued!). They are not interested in more advanced features like atomic groups or possessive quantifiers. They would possibly be interested in a "regular mode" flag that disables all backtracking for a particular regex.

claudiameadows · September 7, 2020, 7:10am

Also, there have been some efforts into identifying regexps vulnerable to catastrophic backtracking, like vuln-regex-detector. However, checking for this in the general case is very expensive (they only use heuristics, and in the general case it's an exponential problem).

claudiameadows · September 7, 2020, 7:10am

I assume this more or less tells them to always construct a DFA where possible, even when it's expensive to construct?

SeyyedKhandon · October 9, 2020, 10:42am

Is it really the right way it should be? if it's preventing the attack, it would be good, but I think its not enough for a language, because all developers are not a security man, so it wont prevent the bugs from being spread in this topic.

pygy · April 4, 2022, 11:58am

I'll revive this, having a flag that disables backtracking would make it trivial to avoid ReDOS, and would in general provide RegExps that are easier to reason about.

Is there interest among delegates to push this?

claudiameadows · April 4, 2022, 11:38pm

V8's implemented an experimental engine to provide guaranteed constant-time lookups, so it's more an implementation choice. (They do offer an additional optional /l flag, but the main goal is to just use this as a fallback engine until they can get performance on par with Irregexp for simple stuff.)

bakkot · April 5, 2022, 4:04am

There is a proposal for possessive quantifiers, yes: GitHub - rbuckton/proposal-regexp-atomic-operators

conartist6 · April 16, 2022, 6:06pm

There is also now @iter-tools/regex which uses a non-backtracking algorithm and should (in theory?) be safe from ReDOS.

pygy · April 16, 2022, 7:31pm

I'm actually working on a compose-regexp update that makes the /(?=(...))\1/ "polyfill" usable (and composable). I'll keep you posted when I publish a new version.

pygy · April 22, 2022, 10:56pm

I've just released compose-regexp@0.6.1, which now comes with an atomic() helper that uses the look ahead / back reference trick to emulate atomic groups (re. backref / lookBehind when matching backwards).

The first example in the README is about ReDOS protection.

import {atomic, sequence} from 'compose-regexp'

// classic ReDOS-vulnerable RegExp:
const ReDOS = /^(([a-z])+.)+[A-Z]([a-z])+$/

// fixed with compose-regexp, this does not backtrack
const fixed = sequence(/^/, atomic(/(([a-z])+.)+/), /[A-Z]([a-z])+$/)

The second deals with character classes operations (difference, intersection, etc...) and arbitrary bounds:

import {bound, charSet, flags, suffix} from 'compose-regexp'

const LcGrekLetter = charSet.intersection(/\p{Lowercase}/u, /\p{Script=Greek}/u)
LcGrekLetter.test("Γ") // false
LcGrekLetter.test("γ") // true
LcGrekLetter.test("x") // false

// like /\b/ but for Greek
const b = bound(/\p{Script=Greek}/u)

const LcGrekWords = flags.add('g', [b, suffix("+", LcGrekLetter), b])
for (
  lc of `Θεωρείται ως ο σημαντικότερος θεμελιωτής ...`.matchAll(LcGrekWords)
) {
  console.log(lc) //'ως', 'ο', 'σημαντικότερος', 'θεμελιωτής'
}

Topic		Replies	Views
Сancelable/async regexp 💡 Ideas	4	408	February 28, 2020
Adding recursion to regex 💡 Ideas proposal	53	362	September 3, 2024
RE2 - Consider having it as alternative engine choice 💡 Ideas proposal	14	999	July 10, 2022
Possessive RegExp matching 💡 Ideas	8	1110	February 12, 2020
Idea: compare a string against a set of regular expressions 💡 Ideas	0	209	February 21, 2022

Safe Regex engine to prevent ReDOS Attack

Related topics