String.hash

mmkal · May 29, 2025, 6:25pm

edited to suggest String.hash instead of Math.hashCode

Developers often need a fast, non-cryptographic hash for small strings, but currently need to copy paste from stack overflow or install a dependency.

I searched for this but didn't find it (exactly). I think it'd be useful if the spec provided a String.hash('some string') function.

I found this discussion which asks for a method to be added Object - which I believe is much more controversial.

To make this alternative more likely to succeed, I'd suggest:

Put it on the String global as a static method
Only accept a string input
Make no particular guarantees about collision-avoidance - just use a long-established algorithm like Java's .hashCode() default implementation which has perfectly valid use-cases.

Here's an implementation the likes of which I've found myself copy-pasting from stack overflow many times:

function hashString(str){
	let hash = 0;
	for (let i = 0; i < str.length; i++) {
		hash += Math.pow(str.charCodeAt(i) * 31, str.length - i);
		hash = hash & hash; // Convert to 32bit integer
	}
	return hash;
}

Forcing people to copy-paste code that they don't fully understand, even when it's pretty innocuous like this, seems bad. On projects I work on that use eslint or similar tools (many configs have rules like no-bitwise which make it even more cumbersome to copy this implementation into a project)

This, I think, would be a very easy addition to the spec, easy to implement, and easy to document (compared to other proposals at least).

Note:

namespacing could of course be bikeshedded - I thought Math was appropriate since this is just a simple mathematical formula, but String was suggested as a better alternative
naming too - maybe it could be String.insecureHashCode(...) instead to make sure nobody thinks it's cryptographic and only uses it for cases where real cryptography isn't needed
equivalents to the dead-simple above snippet include crypto.subtle, but require async/await and creations of buffers/TextEncoders etc.
this would be very fast for use cases that don't involve security and value speed and simplicity instead

bakkot · May 30, 2025, 2:57am

It's an interesting thought. Certainly I've found myself needing to hash strings pretty often, and like you I just copy a snippet around, which is both annoying and slow. I'd call it String.hash personally; I think the case for having a built-in hash for strings is strong and the design more straightforward (especially given that we can't feasibly do arbitrary objects).

mmkal · May 30, 2025, 6:07pm

I like that, I've edited the original to push for that instead (with a note to avoid confusion)

aclaymore · June 1, 2025, 9:31am

Joining the club, I've also had to add a userland sync hashing function for the Composites proposal polyfill: proposal-composites/polyfill/internal/murmur.ts at ae5ea98e7c966581f46af37e80f954335ad78948 · tc39/proposal-composites · GitHub

I think the main design question would be if the hash function (and starting state) would be exactly defined (portable) or if the values can change across runs (reduce collision attacks). Or if the proposal offers both forms.

If the hash is different between runs but is guaranteed to be consistent between connected agents (e.g. web workers) this would be a great uplift over current userland implementations where it may be challenging to sync the seed all workers use.

The next biggest win, IMO, if this could be added to the language would be if the function could access the existing cached hash string value (that the most popular JS engines have) if it's already been computed for that particular string instance. As strings are not valid WeakMap keys there is no safe way for userland to cache hashes in the same way engines can. While the specification proposal couldn't mandate caching, as it wouldn't be observable, it should be discussed with implementors to validate what would happen in reality.

WebReflection · June 3, 2025, 10:49am

wouldn't a "random prefix" already cover the non portable case? with fixed algo you get both caching and portability out of the box so that if anyone wants to make that less portable it can just add some random prefix to the string that's being hashed keeping the cache guards and reducing collisions?

bakkot · June 3, 2025, 3:15pm

To my mind the main advantage of non-portability is that it frees engines to experiment with implementations - the use of the internal hash that @aclaymore mentions is unlikely to be possible with a fully specified hash function, for example. But I think there's likely to be a lot of pushback on that; many people (myself included) really don't like adding more non-explicit nondeterminism to the language.

aclaymore · June 3, 2025, 5:05pm

Yes, though my primary point was could all the agents agree on that random prefix so that two workers that hash the same string both get the same result without first having to message each other to agree on a prefix. Because this is something that would be a large value add that can't (easily) be done today.

WebReflection · June 3, 2025, 8:09pm

plus consider client/server/db stored hash that are suddenly unusable anymore in the future if anything changes in there

via broadcast channel or message channel or even post message that's inevitable already for a variety of things, reason I don't think it's super hard ... the usefulness of a random hash remains to me though, we went from a clear background/example (the Java one) to full "let's make it more complex" without presenting use cases ... anything obvious I could think about?

bakkot · June 3, 2025, 8:36pm

If these are non-portable, they would be explicitly required to include some randomization at agent init so that they would never be portable across sessions, so that you would immediately run into that rather than it changing at some point in the future. That's how Go handles map iteration order, for example. But yes, doing that does mean you can't meaningfully persist hashes, which would be annoying for some applications.

There's two main things, as mentioned:

built-in defensiveness against a class of denial of service attacks which are enabled by attackers crafting strings which will have the same hash, which is only practical if hashes are fixed, and
allowing engines to choose their own implementations, which is going to be faster - often much faster - than any particular algorithm we might mandate.

Whether these outweigh the downsides of non-portability depends on your values.

Topic		Replies	Views
Proposal:String-toSpliced 🦋 Proposals	2	57	September 14, 2024
Proposal: Object.equals and hashCode 🦋 Proposals proposal	3	333	July 20, 2023
Array#pack & String#unpack? 💡 Ideas	4	419	March 5, 2022
Custom hash and equal functions for Map and Set collections 💡 Ideas proposal	20	2755	April 30, 2021
ECMAScript proposal: Hash Comments (replacement for Type Annotations) 💡 Ideas proposal	20	192	April 28, 2025

String.hash

Related topics