Custom hash and equal functions for Map and Set collections

Sometimes object keys need a specific hash functions to speed up the search for especially large objects or they have circular dependencies. Sometimes it's not necessary to stringify the whole object. In other situations you need to keep original key's object (not string or some unique id). Therefore, almost all languages ​​which expose collections in their standard libraries have the ability to override the default hash and equal functions.

Similar to collection-normalization proposal I propose add hash and equals methods during construction:

const set = new Set(undefined, {
  hash(key) {
     return murmur3(key.name) ^ key.type
  },

  equals(lhs, rhs) {
    return (
      lhs.name == rhs.name && 
      lhs.type == rhs.type
    )
  }
})

set.add({ name: 'a', type: 0 })
set.add({ name: 'b', type: 1 })

Similar to Map and probably for equivalent weak collections.

Also this approach allow us reuse usually existing methods in the following way:

class Vec2 {
  constructor(x, y) { this.x = x; this.y = y }

  static equals(a, b) { return a.x == b.x && a.y == b.y }
  static hash(v)      { return fnv1a(v.x, fnv1a(v.y)) }
}

const pos2cell = new Map(undefined, Vec2)

It's pretty common request:

  1. data structures - JavaScript hashmap equivalent - Stack Overflow

  2. javascript - How to use ES6 Hash Map on any object without maintaing a reference (I.e. Java hashcode) - Stack Overflow

  3. javascript - Define a custom hash() method for use with ES6 maps - Stack Overflow

  4. javascript - Associative array without toString, etc - Stack Overflow

1 Like

What would be considered a valid return type for the hash function? Only 32 bit integers?

Does appear that under-the-hood at least v8 uses 32bit int hash structures for Map and Set. Not sure about other JS engines v8/ordered-hash-table.cc at dc712da548c7fb433caed56af9a021d964952728 Β· v8/v8 Β· GitHub

Yes, indeed. All hash table's implementations which I know use 32 or 64-bit unsigned integers for the result of the hash function.

another discussion:
https://esdiscuss.org/topic/maps-with-object-keys

What's the value of this that the collection normalization proposal doesn't provide?

normalization proposal provide coerceKey and coerceValue. coerceKey has similar goal with hash, but returns string as a result. This proposal targets more low-level stuff and suggests returning a customized raw hash number for hash method. Also provide equal method which important for actual key comparison.

Hash lookup consists of two stages. It takes object key, calculate hash (currently it stringify object first) value and if this value exists in map the next step is compare stringified keys. This proposal suggesting provide ability to customize this stages by providing hash and equals in user space.

Examples in other languages.
C++:

using h = std::hash<int>;
auto hash = [](const Vec2& n) { return ((17 * 31 + h()(n.x)) * 31 + h()(n.y)) * 31; };
auto equal = [](const Vec2& l, const Vec2& r) { return l.x == r.x && l.y == r.y; };

std::unordered_map<Vec2, int, decltype(hash), decltype(equal)> map(32, hash, equal);

Rust:

extern crate fnv;

use std::collections::HashMap;
use std::hash::BuildHasherDefault;
use fnv::FnvHasher;

#[derive(PartialEq, Eq, Hash, PartialOrd, Ord)]
pub Vec2 {
  ...
}

let mut map = HashMap<Vec2, String, BuildHasherDefault<FnvHasher>>::default();

I get how it's different, but what is the value added?

I thought I covered the main reasons in enough detail in the original post. Provide lower-level control for the user over how objects are turned into a hash, by the way, sometimes hashing can be completely avoided. coerceKey does not provide this option. So hash function overloading and coerceKey are orthogonal things

I for one am looking forward to GitHub - tc39/proposal-record-tuple: ECMAScript proposal for the Record and Tuple value types. | Stage 2: it will change! being implemented and I think it makes this no longer needed. Typically mutable objects shouldn't be used in a hash set or as a key in a hash map so this seems more ideal to me.

Hashing is generally necessary even for integers to get proper hash table performance because you need inputs to be sufficiently evenly distributed across the table to minimize the worst-case linear factor (which is effectively zero with the best hash functions). And it's worth noting that most people don't even know how to set a table's load factor correctly, much less write a good hash function for hash tables unassisted.

I'm more in favor of using the proposed tuples and records for composite map keys than adding a dedicated hash function - it solves the same issue even better without subjecting the developer to stuff that generally should be implementation details.

1 Like

I meant case when object already have some linearly grow unique id or id generated by PRNG with linear distribution

Regarding use tuples / records instead objects for keys. Well this have some limitations. Like what if you already receive objects which need to use as keys? What if this objects required prototypes, what if this objects required circular references? All this not possible with key records.

Look at Python which also have tuples. Python still allow you override hash and equal for dicts: Glossary β€” Python 2.7.18 documentation via overriding __hash__() and __eq__()/__cmp__() for specific object.

If you're that concerned about performance, you'd be better off writing your own hash table implementation.

If you're that concerned about performance, you'd be better off writing your own hash table implementation.
Currently it's only one way. But it's not optimal. Because built-in Map/Set written on native C++ so you never outperform this. But that implementations already have and use hash and equal methods that we could overload / customize in user space. So why not do it?

I don't see implementations doing anything different than what Abseil does with their hash tables - rehashing the hash result. And as for equality, I expect that tuples will bring only negligible overhead compared to raw results.

Personally, I would recommend waiting until V8 implements the record/tuple proposal, then adding support for your proposal and benchmarking it.

This proposal not only about performance =) Please see starter message. I list there quite a few things that tuples / records cannot solve by their nature

And we don't wait tuples to solve part of this problem due to tuple:

class PointListNode extends Point { 
  constructor(x=0, y=0) {
    super(x, y);
    this.next = null;
    this.prev = null;
  }
  ...
  // add
  // remove
}

let map = new Map
let root = new PointListNode
map.set(#[root.x, root.y], someValue)

could be resolved via:

map.set(`${root.x}${root.y}`, someValue)

But this not solve problem when you need have key as Object (not tuple or string) to iterate key-value later and have full access to this keyed object

In theory, you can use references as perfect hashes if you assign a unique value to each reference, each string, each number, etc., given arbitrarily large memory available. Your proposal is about performance and computational complexity (which is literally the theoretical study of algorithm performance), not pure algorithmic theory.

Many times we need to hash based on the key if the key is a complex object. If we don't have a reference to the key in the map, we will never be able to access it unless we traverse it.

Such as when I use web worker to post a map data. The worker thread have not any reference of the keys.