String.prototype.reverse()

"I am proposing String.prototype.reverse() to resolve the performance overhead and Unicode-handling errors associated with the common split().reverse().join() pattern. While the current workaround is widely used, it is non-performant for large strings and fails to correctly handle multi-unit UTF-16 characters (emojis), leading to data corruption.

Currently, JavaScript lacks a native method to reverse strings. Developers are forced to rely on a clunky "type-conversion" pattern: str.split('').reverse().join(''). This workaround presents two significant technical issues:

  1. Memory & Performance Inefficiency: The current pattern requires creating an intermediate Array object. For large strings, this results in unnecessary heap allocation and memory pressure.

  2. Unicode Data Corruption: The standard split('') method is not Unicode-aware. It breaks "surrogate pairs" (such as emojis or complex mathematical symbols), resulting in corrupted strings and "garbage" characters upon reversal.

So,

Introduce a native String.prototype.reverse() method. This method would be implemented at the engine level (C++), allowing for:

  1. In-place Reversal: Optimized pointer manipulation without the overhead of creating temporary arrays.

  2. Unicode Awareness: Built-in logic to detect surrogate pairs and keep them intact, ensuring data integrity for modern web applications.

/**

  • Proposed implementation for String.prototype.reverse

  • Uses the spread operator to ensure Unicode-safe reversal.
    */
    if (!String.prototype.reverse) {
    String.prototype.reverse = function() {
    // [...this] handles emojis correctly unlike .split('')
    const reversed = [...this].reverse().join('');

    console.log("Input String:", this.toString());
    console.log("Reversed Result:", reversed);

    return reversed;
    };
    }

// Example usage:
const myText = "JavaScript ";
console.log(myText.reverse()); // tpircSavaJ"

1 Like

Is this a duplicate of String#reverse method?

Wow, I hadn’t seen that one before. Both propose the same idea, but mine is more advanced and not copied.

The same problems remain as in the earlier thread.

The code-unit vs codepoint distinction is indeed important, and your suggested impl is the right way to reverse by codepoint. But it still reverses strings incorrectly, such as:

function strrev(s) { return [...s].reverse().join(''); }

let accentA = "a" + String.fromCodePoint(0x301) + "e"; // 'áe'
let accentE = strrev(x); // 'éa'

let chinaFlag = "πŸ‡¨πŸ‡³"; // actually πŸ‡¨ followed by πŸ‡³
let newCaledoniaFlag = strrev(chinaFlag); // 'πŸ‡³πŸ‡¨', πŸ‡³ followed by πŸ‡¨

To reverse correctly you have to recognize grapheme clusters, which are multi-codepoint. The issue is that the definition of grapheme clusters changes with the Unicode version, as we add more, so strings that reversed in a particular way at one point might change their value if we update; but if we don’t update, strings that people expect would reverse wouldn’t do so correctly. A bit of a pickle for a language like JS.

Also, grapheme clusters aren’t detectable in reverse; you have to parse them from the start of the string. The flag emojis, for example, are composed of any valid pair of flag-emoji letters; if you have more than two in a row, the first two are consumed for one flag, then the next two, etc. For example, taking πŸ‡¨ + πŸ‡³ + πŸ‡¬ forward yields ':china:πŸ‡¬' - China’s flag followed by an unparied πŸ‡¬. If you naively process it backwards with this strrev() you instead get ':guinea:πŸ‡¨' - Guinea’s flag followed by an unpaired πŸ‡¨, But if you process it backwards and recognize country pairs so they stay in the original order, you’ll get ':nigeria:πŸ‡¨' - Nigeria’s flag followed by an unpaired πŸ‡¨. It is actually impossible to correctly produce a string containing an unpaired πŸ‡¬ followed by a :china:, which would be the ideal reversal, without inserting additional characters (a zero-width non-joining space, in this case). 'πŸ‡¬β€ŒπŸ‡¨πŸ‡³' is actually four codepoints long, and cannot be written any shorter.

So, to reverse grapheme clusters correctly you still have to do a forward scan to identify cluster boundaries (or more complicated backwards scanning to look for misgroupings), which ends up negating a lot of the perf optimization you could get, and in some cases a correct grapheme clustering is impossible to reproduce anyway without changing the data of the string.

(It’s widely recognized that the design of the flag emojis was a completely disaster for these reasons, fwiw, but they still exist.)

1 Like

Also, as in the original thread, no one has actually a suggested a reason you’d ever need this outside of toy problems.

1 Like

Could it possibly be this?

function strrev(s) { return [...new Intl.Segmenter().segment(s)].reverse().map(({segment}) => segment).join("") }

let accentA = "a" + String.fromCodePoint(0x301) + "e"; // 'áe'
let accentE = strrev(accentA); // 'eá'

let chinaFlag = "πŸ‡¨πŸ‡³"; // actually πŸ‡¨ followed by πŸ‡³
let newCaledoniaFlag = strrev(chinaFlag); // 'πŸ‡¨πŸ‡³', πŸ‡¨ followed by πŸ‡³

// πŸˆβ€β¬›πŸ»β€β„οΈ
let blackAndPolarBear =  "🐈" + String.fromCodePoint(0x200D) + "⬛" 
  + "🐻" + String.fromCodePoint(0x200D) + "❄" + String.fromCodePoint(0xFE0F); 
let newBlackAndPolarBear = strrev(blackAndPolarBear); // πŸ»β€β„οΈπŸˆβ€β¬›

(post deleted by author)