Custom hash and equal functions for Map and Set collections

MaxGraey · February 28, 2021, 9:24pm

Sometimes object keys need a specific hash functions to speed up the search for especially large objects or they have circular dependencies. Sometimes it's not necessary to stringify the whole object. In other situations you need to keep original key's object (not string or some unique id). Therefore, almost all languages which expose collections in their standard libraries have the ability to override the default hash and equal functions.

Similar to collection-normalization proposal I propose add hash and equals methods during construction:

const set = new Set(undefined, {
  hash(key) {
     return murmur3(key.name) ^ key.type
  },

  equals(lhs, rhs) {
    return (
      lhs.name == rhs.name && 
      lhs.type == rhs.type
    )
  }
})

set.add({ name: 'a', type: 0 })
set.add({ name: 'b', type: 1 })

Similar to Map and probably for equivalent weak collections.

Also this approach allow us reuse usually existing methods in the following way:

class Vec2 {
  constructor(x, y) { this.x = x; this.y = y }

  static equals(a, b) { return a.x == b.x && a.y == b.y }
  static hash(v)      { return fnv1a(v.x, fnv1a(v.y)) }
}

const pos2cell = new Map(undefined, Vec2)

It's pretty common request:

aclaymore · February 28, 2021, 10:15pm

What would be considered a valid return type for the hash function? Only 32 bit integers?

Does appear that under-the-hood at least v8 uses 32bit int hash structures for Map and Set. Not sure about other JS engines v8/ordered-hash-table.cc at dc712da548c7fb433caed56af9a021d964952728 · v8/v8 · GitHub

MaxGraey · February 28, 2021, 10:59pm

Yes, indeed. All hash table's implementations which I know use 32 or 64-bit unsigned integers for the result of the hash function.

MaxGraey · March 1, 2021, 2:04pm

another discussion:
https://esdiscuss.org/topic/maps-with-object-keys

claudiameadows · March 2, 2021, 12:48pm

What's the value of this that the collection normalization proposal doesn't provide?

MaxGraey · March 2, 2021, 1:43pm

normalization proposal provide coerceKey and coerceValue. coerceKey has similar goal with hash, but returns string as a result. This proposal targets more low-level stuff and suggests returning a customized raw hash number for hash method. Also provide equal method which important for actual key comparison.

Hash lookup consists of two stages. It takes object key, calculate hash (currently it stringify object first) value and if this value exists in map the next step is compare stringified keys. This proposal suggesting provide ability to customize this stages by providing hash and equals in user space.

MaxGraey · March 2, 2021, 1:56pm

Examples in other languages.
C++:

using h = std::hash<int>;
auto hash = [](const Vec2& n) { return ((17 * 31 + h()(n.x)) * 31 + h()(n.y)) * 31; };
auto equal = [](const Vec2& l, const Vec2& r) { return l.x == r.x && l.y == r.y; };

std::unordered_map<Vec2, int, decltype(hash), decltype(equal)> map(32, hash, equal);

Rust:

extern crate fnv;

use std::collections::HashMap;
use std::hash::BuildHasherDefault;
use fnv::FnvHasher;

#[derive(PartialEq, Eq, Hash, PartialOrd, Ord)]
pub Vec2 {
  ...
}

let mut map = HashMap<Vec2, String, BuildHasherDefault<FnvHasher>>::default();

claudiameadows · March 2, 2021, 2:39pm

I get how it's different, but what is the value added?

MaxGraey · March 2, 2021, 2:51pm

I thought I covered the main reasons in enough detail in the original post. Provide lower-level control for the user over how objects are turned into a hash, by the way, sometimes hashing can be completely avoided. coerceKey does not provide this option. So hash function overloading and coerceKey are orthogonal things

mfulton26 · March 2, 2021, 2:54pm

I for one am looking forward to GitHub - tc39/proposal-record-tuple: ECMAScript proposal for the Record and Tuple value types. | Stage 2: it will change! being implemented and I think it makes this no longer needed. Typically mutable objects shouldn't be used in a hash set or as a key in a hash map so this seems more ideal to me.

claudiameadows · March 2, 2021, 11:45pm

Hashing is generally necessary even for integers to get proper hash table performance because you need inputs to be sufficiently evenly distributed across the table to minimize the worst-case linear factor (which is effectively zero with the best hash functions). And it's worth noting that most people don't even know how to set a table's load factor correctly, much less write a good hash function for hash tables unassisted.

I'm more in favor of using the proposed tuples and records for composite map keys than adding a dedicated hash function - it solves the same issue even better without subjecting the developer to stuff that generally should be implementation details.

MaxGraey · March 3, 2021, 12:03am

I meant case when object already have some linearly grow unique id or id generated by PRNG with linear distribution

MaxGraey · March 3, 2021, 12:17am

Regarding use tuples / records instead objects for keys. Well this have some limitations. Like what if you already receive objects which need to use as keys? What if this objects required prototypes, what if this objects required circular references? All this not possible with key records.

Look at Python which also have tuples. Python still allow you override hash and equal for dicts: Glossary — Python 2.7.18 documentation via overriding __hash__() and __eq__()/__cmp__() for specific object.

claudiameadows · March 10, 2021, 7:08am

If you're that concerned about performance, you'd be better off writing your own hash table implementation.

MaxGraey · March 10, 2021, 8:11am

If you're that concerned about performance, you'd be better off writing your own hash table implementation.
Currently it's only one way. But it's not optimal. Because built-in Map/Set written on native C++ so you never outperform this. But that implementations already have and use hash and equal methods that we could overload / customize in user space. So why not do it?

claudiameadows · March 12, 2021, 7:47am

I don't see implementations doing anything different than what Abseil does with their hash tables - rehashing the hash result. And as for equality, I expect that tuples will bring only negligible overhead compared to raw results.

Personally, I would recommend waiting until V8 implements the record/tuple proposal, then adding support for your proposal and benchmarking it.

MaxGraey · March 12, 2021, 11:26am

This proposal not only about performance =) Please see starter message. I list there quite a few things that tuples / records cannot solve by their nature

MaxGraey · March 12, 2021, 11:44am

And we don't wait tuples to solve part of this problem due to tuple:

class PointListNode extends Point { 
  constructor(x=0, y=0) {
    super(x, y);
    this.next = null;
    this.prev = null;
  }
  ...
  // add
  // remove
}

let map = new Map
let root = new PointListNode
map.set(#[root.x, root.y], someValue)

could be resolved via:

map.set(`${root.x}${root.y}`, someValue)

But this not solve problem when you need have key as Object (not tuple or string) to iterate key-value later and have full access to this keyed object

claudiameadows · March 14, 2021, 12:32am

In theory, you can use references as perfect hashes if you assign a unique value to each reference, each string, each number, etc., given arbitrarily large memory available. Your proposal is about performance and computational complexity (which is literally the theoretical study of algorithm performance), not pure algorithmic theory.

yjhmelody · April 30, 2021, 9:49am

Many times we need to hash based on the key if the key is a complex object. If we don't have a reference to the key in the map, we will never be able to access it unless we traverse it.

Such as when I use web worker to post a map data. The worker thread have not any reference of the keys.

Topic		Replies	Views
Extra `key` option for maps and sets 💡 Ideas proposal	2	330	August 28, 2020
Hashable data types 💡 Ideas	6	392	August 9, 2022
Add enumeration methods (and clear) to WeakMap and WeakSet 💡 Ideas	3	343	December 2, 2020
Proposal: Object.equals and hashCode 🦋 Proposals proposal	3	340	July 20, 2023
Array.prototype.toObjKey and Array.prototype.toObjFunc 💡 Ideas proposal	3	351	January 2, 2021

Custom hash and equal functions for Map and Set collections

Related topics