Safe Regex engine to prevent ReDOS Attack

Currently, JavaScript regex engine suffers from lacking atomic groups and other features for preventing ReDos Attack. this makes it very complicated to handle these kinds of attacks e.g coming up with a safe regex for a url link such as the one below, can be very daunting:
/^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+$/

therefore It will be very good to handle this in js like java or other languages to prevent ReDos attacks, e.g java8 had this problem but in java9 this problem has been handled.

......
#You can find some patterns and their problems here

Hi Seyyed,

See prior discussion at Possessive RegExp matching.

I actually started a proposal for this, but when I spoke with implementers they said they would rather make implementation only changes (and changes are being pursued!). They are not interested in more advanced features like atomic groups or possessive quantifiers. They would possibly be interested in a "regular mode" flag that disables all backtracking for a particular regex.

2 Likes

Also, there have been some efforts into identifying regexps vulnerable to catastrophic backtracking, like vuln-regex-detector. However, checking for this in the general case is very expensive (they only use heuristics, and in the general case it's an exponential problem).

2 Likes

I assume this more or less tells them to always construct a DFA where possible, even when it's expensive to construct?

1 Like

Is it really the right way it should be? if it's preventing the attack, it would be good, but I think its not enough for a language, because all developers are not a security man, so it wont prevent the bugs from being spread in this topic.

I'll revive this, having a flag that disables backtracking would make it trivial to avoid ReDOS, and would in general provide RegExps that are easier to reason about.

Is there interest among delegates to push this?

V8's implemented an experimental engine to provide guaranteed constant-time lookups, so it's more an implementation choice. (They do offer an additional optional /l flag, but the main goal is to just use this as a fallback engine until they can get performance on par with Irregexp for simple stuff.)

1 Like

There is a proposal for possessive quantifiers, yes: GitHub - rbuckton/proposal-regexp-atomic-operators

2 Likes

There is also now @iter-tools/regex which uses a non-backtracking algorithm and should (in theory?) be safe from ReDOS.

I'm actually working on a compose-regexp update that makes the /(?=(...))\1/ "polyfill" usable (and composable). I'll keep you posted when I publish a new version.

I've just released compose-regexp@0.6.1, which now comes with an atomic() helper that uses the look ahead / back reference trick to emulate atomic groups (re. backref / lookBehind when matching backwards).

The first example in the README is about ReDOS protection.

import {atomic, sequence} from 'compose-regexp'

// classic ReDOS-vulnerable RegExp:
const ReDOS = /^(([a-z])+.)+[A-Z]([a-z])+$/

// fixed with compose-regexp, this does not backtrack
const fixed = sequence(/^/, atomic(/(([a-z])+.)+/), /[A-Z]([a-z])+$/)

The second deals with character classes operations (difference, intersection, etc...) and arbitrary bounds:

import {bound, charSet, flags, suffix} from 'compose-regexp'

const LcGrekLetter = charSet.intersection(/\p{Lowercase}/u, /\p{Script=Greek}/u)
LcGrekLetter.test("Γ") // false
LcGrekLetter.test("γ") // true
LcGrekLetter.test("x") // false

// like /\b/ but for Greek
const b = bound(/\p{Script=Greek}/u)

const LcGrekWords = flags.add('g', [b, suffix("+", LcGrekLetter), b])
for (
  lc of `Θεωρείται ως ο σημαντικότερος θεμελιωτής ...`.matchAll(LcGrekWords)
) {
  console.log(lc) //'ως', 'ο', 'σημαντικότερος', 'θεμελιωτής'
}