Sometimes object keys need a specific hash functions to speed up the search for especially large objects or they have circular dependencies. Sometimes it's not necessary to stringify the whole object. In other situations you need to keep original key's object (not string or some unique id). Therefore, almost all languages ββwhich expose collections in their standard libraries have the ability to override the default hash and equal functions.
Similar to collection-normalization proposal I propose add hash
and equals
methods during construction:
const set = new Set(undefined, {
hash(key) {
return murmur3(key.name) ^ key.type
},
equals(lhs, rhs) {
return (
lhs.name == rhs.name &&
lhs.type == rhs.type
)
}
})
set.add({ name: 'a', type: 0 })
set.add({ name: 'b', type: 1 })
Similar to Map
and probably for equivalent weak collections.
Also this approach allow us reuse usually existing methods in the following way:
class Vec2 {
constructor(x, y) { this.x = x; this.y = y }
static equals(a, b) { return a.x == b.x && a.y == b.y }
static hash(v) { return fnv1a(v.x, fnv1a(v.y)) }
}
const pos2cell = new Map(undefined, Vec2)
It's pretty common request:
-
data structures - JavaScript hashmap equivalent - Stack Overflow
-
javascript - How to use ES6 Hash Map on any object without maintaing a reference (I.e. Java hashcode) - Stack Overflow
-
javascript - Define a custom hash() method for use with ES6 maps - Stack Overflow
-
javascript - Associative array without toString, etc - Stack Overflow
1 Like
What would be considered a valid return type for the hash function? Only 32 bit integers?
Does appear that under-the-hood at least v8 uses 32bit int hash structures for Map and Set. Not sure about other JS engines v8/ordered-hash-table.cc at dc712da548c7fb433caed56af9a021d964952728 Β· v8/v8 Β· GitHub
Yes, indeed. All hash table's implementations which I know use 32 or 64-bit unsigned integers for the result of the hash function.
What's the value of this that the collection normalization proposal doesn't provide?
normalization proposal provide coerceKey
and coerceValue
. coerceKey
has similar goal with hash
, but returns string as a result. This proposal targets more low-level stuff and suggests returning a customized raw hash number for hash
method. Also provide equal
method which important for actual key comparison.
Hash lookup consists of two stages. It takes object key, calculate hash (currently it stringify object first) value and if this value exists in map the next step is compare stringified keys. This proposal suggesting provide ability to customize this stages by providing hash
and equals
in user space.
Examples in other languages.
C++:
using h = std::hash<int>;
auto hash = [](const Vec2& n) { return ((17 * 31 + h()(n.x)) * 31 + h()(n.y)) * 31; };
auto equal = [](const Vec2& l, const Vec2& r) { return l.x == r.x && l.y == r.y; };
std::unordered_map<Vec2, int, decltype(hash), decltype(equal)> map(32, hash, equal);
Rust:
extern crate fnv;
use std::collections::HashMap;
use std::hash::BuildHasherDefault;
use fnv::FnvHasher;
#[derive(PartialEq, Eq, Hash, PartialOrd, Ord)]
pub Vec2 {
...
}
let mut map = HashMap<Vec2, String, BuildHasherDefault<FnvHasher>>::default();
I get how it's different, but what is the value added?
I thought I covered the main reasons in enough detail in the original post. Provide lower-level control for the user over how objects are turned into a hash, by the way, sometimes hashing can be completely avoided. coerceKey
does not provide this option. So hash function overloading and coerceKey
are orthogonal things
I for one am looking forward to GitHub - tc39/proposal-record-tuple: ECMAScript proposal for the Record and Tuple value types. | Stage 2: it will change! being implemented and I think it makes this no longer needed. Typically mutable objects shouldn't be used in a hash set or as a key in a hash map so this seems more ideal to me.
Hashing is generally necessary even for integers to get proper hash table performance because you need inputs to be sufficiently evenly distributed across the table to minimize the worst-case linear factor (which is effectively zero with the best hash functions). And it's worth noting that most people don't even know how to set a table's load factor correctly, much less write a good hash function for hash tables unassisted.
I'm more in favor of using the proposed tuples and records for composite map keys than adding a dedicated hash function - it solves the same issue even better without subjecting the developer to stuff that generally should be implementation details.
1 Like
I meant case when object already have some linearly grow unique id or id generated by PRNG with linear distribution
Regarding use tuples / records instead objects for keys. Well this have some limitations. Like what if you already receive objects which need to use as keys? What if this objects required prototypes, what if this objects required circular references? All this not possible with key records.
Look at Python which also have tuples. Python still allow you override hash and equal for dicts: Glossary β Python 2.7.18 documentation via overriding __hash__()
and __eq__()
/__cmp__()
for specific object.
If you're that concerned about performance, you'd be better off writing your own hash table implementation.
I don't see implementations doing anything different than what Abseil does with their hash tables - rehashing the hash result. And as for equality, I expect that tuples will bring only negligible overhead compared to raw results.
Personally, I would recommend waiting until V8 implements the record/tuple proposal, then adding support for your proposal and benchmarking it.
This proposal not only about performance =) Please see starter message. I list there quite a few things that tuples / records cannot solve by their nature
And we don't wait tuples to solve part of this problem due to tuple:
class PointListNode extends Point {
constructor(x=0, y=0) {
super(x, y);
this.next = null;
this.prev = null;
}
...
// add
// remove
}
let map = new Map
let root = new PointListNode
map.set(#[root.x, root.y], someValue)
could be resolved via:
map.set(`${root.x}${root.y}`, someValue)
But this not solve problem when you need have key as Object (not tuple or string) to iterate key-value later and have full access to this keyed object
In theory, you can use references as perfect hashes if you assign a unique value to each reference, each string, each number, etc., given arbitrarily large memory available. Your proposal is about performance and computational complexity (which is literally the theoretical study of algorithm performance), not pure algorithmic theory.
Many times we need to hash based on the key if the key is a complex object. If we don't have a reference to the key in the map, we will never be able to access it unless we traverse it.
Such as when I use web worker to post a map data. The worker thread have not any reference of the keys.