Builtin Ord / Compare method for primitives

Many languages contain special method for ordering which very useful for sorting or data structures with predicates like heap, priority queue, LRU cache and etc.

Problem

Sometimes implement ord or cmp / compare properly in user space is quite hard. Best example is numbers.

Proper ordering (baseline):

Float64Array.of(-1, -0, NaN, 1, -Infinity, +0, Infinity).sort()
// > [-Infinity, -1, -0, 0, 1, Infinity, NaN]

But implement same ordering with ordinal Array's sort is quite difficult:

[-1, -0, NaN, 1, -Infinity, +0, Infinity].sort()
// > [-1, -Infinity, -0, 0, 1, Infinity, NaN]  -  expected wrong ordering

[-1, -0, NaN, 1, -Infinity, +0, Infinity].sort((a, b) => a - b) // most popular
// > [-1, -0, NaN, -Infinity, 0, 1, Infinity]  -  oh, thats still wrong!

// try to be more clever!
[-1, -0, NaN, 1, -Infinity, +0, Infinity].sort((a, b) => Object.is(a, b) ? 0 : (a > b) - (a < b)) 
// > [-1, -0, NaN, -Infinity, 0, 1, Infinity] - hmm, still wrong :-(

[-1, -0, NaN, 1, -Infinity, +0, Infinity].sort((a, b) => isNaN(a) ? 1 : isNaN(b) ? -1 : a - b)
// >  [-Infinity, -1, -0, 0, 1, Infinity, NaN] - at last!

but wait...

[-1, -0, NaN, 1, -Infinity, +0, Infinity, -0].sort((a, b) => isNaN(a) ? 1 : isNaN(b) ? -1 : a - b)
// > [-Infinity, -1, -0, 0, -0, 1, Infinity, NaN] // still wrong!

// and finally
[-1, -0, NaN, 1, -Infinity, +0, Infinity, -0].sort((a, b) => isNaN(a) ? 1 : isNaN(b) ? -1 : Object.is(a - b, -0) ? -1 : a - b)
// > [-Infinity, -1, -0, -0, 0, 1, Infinity, NaN]

Next problem is compare two Date objects! moment and temporal-proposal already have such special methods for comparison like Instant.compare(d1, d2), DateTime.compare(d1, d2) and etc.

Another example is sorting two objects by key string. Simple example:

[{ key: 'a2' }, { key: 'a1' }].sort((a, b) => (a.key > b.key) - (a.key < b.key));
// or
[{ key: 'a2' }, { key: 'a1' }].sort((a, b) => a.key > b.key ? 1 : a.key < b.key ? -1 : 0));

In any case it's suboptimal due to required 2 string comparisons which may be expensive. Also it looks quite inexpressive. Same for BigInt.compare which could be much faster than a - b or (a > b) - (a < b) approaches.

Solution

I suggest add for every javasript's plain primitive (except Boolean and Symbol) ord or compare class method which return -1 when a < b, +1 when a > b and 0 when a == b (with proper handling NaNs for numbers):

Number.compare(a, b)
String.compare(a, b)
Date.compare(a, b)
BigInt.compare(a, b)

That's it. No generic comparison or mixed type comparisons.

Examples

[-1, -0, NaN, 1, -Infinity, +0, Infinity].sort(Number.compare)
// >  [-Infinity, -1, -0, 0, 1, Infinity, NaN]

[{ key: 'a2' }, { key: 'a1' }].sort((a, b) => String.compare(a.key, b.key)); // ascending order
// > [{ key: 'a1' }, { key: 'a2' }]

[{ key: 'a1' }, { key: 'a2' }].sort((a, b) => -String.compare(a.key, b.key)); // descending order
// > [{ key: 'a2' }, { key: 'a1' }]

Related threads:

Thanks! I also want to mention three-way or starship operator (<=>) which already present in Perl, Ruby, Apache Groovy, PHP, Eclipse Ceylon and C++20.

Also discussed in TS: Combined Comparison (Spaceship) Operator · Issue #24451 · microsoft/TypeScript · GitHub

We attempted to propose this, and the committee rejected it: https://github.com/hemanth/proposal-generic-comparison

I see. But you proposal related to array-equal and deep-equal.
This proposal and similar proposal by claudiameadows relate to more concrete non-generic ordering.

Before that, I looked at this topic (compare/ordering plain types) quite often, but it seems to me that the problem itself was not sufficiently reasoned and demonstrated.

1 Like

The proposal I linked was initially motivated by arrays, but was to be extended to a protocol that the spaceship operator invoked, that would apply ordering to potentially every builtin type. It was rejected, among other reasons, because of questions about you order NaN vs Symbol() vs Symbol('') vs Symbol('') (a different symbol with the same description) vs a Map vs a Set etc.

It was rejected, among other reasons, because of questions about you order NaN vs Symbol() vs Symbol('') vs Symbol('') (a different symbol with the same description) vs a Map vs a Set etc.

Exactly. This proposal didn't even try handle this cases. It just propose to add compare for small set of plain primitive types: Number <=> Number, String <=> String, Date <=> Date and BigInt <=> BigInt. All arguments will coerced to some of this types. For example Number.compare could be implemented as (in pseudocode):

Number.compare = function (a, b) {
  if (IsSmi(a) && IsSmi(b)) {
    return ToSmi(a) - ToSmi(b)
  }
  a = ToNumber(a)
  b = ToNumber(b)
  if (NumberIsNaN(a)) return -1
  if (NumberIsNaN(b)) return  1
  if (BitsEqual(a, b)) return 0
  return NumberSignbit(a - b) ? -1 : 1
}

String:

String.compare = function (a, b) {
  a = ToString(a)
  b = ToString(b)
  return StringCompareInternal(a, b) // something more efficient than (a > b) - (a < b)
}

BigInt:

BigInt.compare = function (a, b) {
  a = ToBigInt(a)
  b = ToBigInt(b)
  if (BigIntGetSignbit(a) != BigIntGetSignbit(b)) {
    return BigIntGetSignbit(a) ? -1 : 1
  }
  return BigIntCompareInternal(a, b) // efficient linear BigInt comparison nibble by nibble
}

To be clear, this was a proposal for stage 1, which is a general exploration of the problem (no specific implementation yet, even if we had one in mind). The committee rejected exploring the problem space of generic comparison.

And I totally understand problem with generic comparison. That's why I proposed comparison only for small and concrete four cases. Btw Temporal proposal already have similar compare method for Duration, PlainTime, PlainDateTime and etc

I would personally be against the spaceship operator - it's an operator that's pretty much specific to the sort function. Python found a user-friendly way to do custom sorting without needing to introduce a sort operator - instead of having your custom sorter return -1/0/1, you instead tell python how to map an array value to a sortable key, that can be used to order the parameters. This is how it would look like in javascript:

[{ thing: 'def' }, { thing: 'abc' }].sortBy(obj => obj.thing)

This will handle all but a few obscure use cases of the spaceship operator. Some popular libraries like lodash also provide helper methods such as this.

Also, if I were actually creating a data structure such as a priority queue (or any of the other data structures you mentioned), I would expect it to treat 0 and -0 as the same value (not ordering 0 above -0), and I would expect NaN to be just as invalid of a priority as a string (not giving it the highest priority).

[{ thing: 'def' }, { thing: 'abc' }].sortBy(obj => obj.thing)

This will only solve one part of the problem. You will also need an additional hint for ascending or descending order.
it also doesn't tell compiler how to interpret the fields. What type are they? Date? Number? String? Each case requires special handling. Especially for mixed types like [{ thing: "-10" }, { thing: 1 }]

Reverse would be implemented as a second optional parameter (that's how python does it)

[{ thing: 'def' }, { thing: 'abc' }].sortBy(obj => obj.thing, { reverse: true })

Under the hood, it'll just use < and > to sort the values. If value coercion is needed, that can be done in the callback. (i.e. obj => Number(obj.thing)). numbers and strings already get sorted just fine when using > and < (unless you're in an odd scenario where you're trying to distinguish -0 from 0, or support NaN). Dates can be sorted by converting them to timestamps (i.e. use the callback date => date.getTime())

I'm not proposing that we actually do this, just mentioning that I would much prefer a function like this over the spaceship operator (or, I guess I would prefer this over sort helper functions for primitives too) - which, if javascript had the spaceship operator, you would do the same things to fix these mentioned issues - coerce strings to numbers (or numbers to strings if you actually are trying to sort that way instead), turn dates to timestamps, etc.

I see. Thanks for explanation. I mentioned starship operator as one of possible option, but currently don't think it good idea. So I crossed out the mention of that in my second comment. starship operator required general (polymorphic) ordering operation which not I'm proposing here so don't focus on this anymore.

Alright.

I originally mentioned the sortBy() thing because I remembered that that's how python chose to handle the issue that the starship operator was solving - but I think it can also be a solution to this whole thing you're proposing too. Your new functions, such as String.compare(), are just special case versions of the starship operator that only operate on a specific type (I presume you would want it to auto-coerce to those types when wrong types are given? Or, even better, have it throw an error, requiring explicit coersion?)

You mentioned four comparison functions. All of which can be replicated using a single sortBy() function:

['b', 'a'].sortBy(x => x) // or x => String(x) for proper type coercion
[2, 1].sortBy(x => x) // or x => Number(x) for proper type coercion
[2n, 1n].sortBy(x => x) // or x => BigInt(x) for proper type coercion
[date1, date2].sortBy(x => x.getTime())

If you actually need to distinguish between -0 and 0, and sort using NaN, a multi-tiered sort can be used (this will be very easy to do if tuples become standardized)

// First, is sorts by NaN. Any numbers who aren't NaN are then sorted by their values.
// Finally, if two values are equal, and one if positive 0, that will be sorted higher.
// This (according to my understanding) will just become possible, because this is what happens when you use the `>` and `<` operators on tuples.
// This flexibility will also allow one to easily construct a sort function that supports NaN, but treats +0 and -0 as the same, or vice verca (depending on the user's requirements)
[2, NaN, 0, -0].sortBy(x => #[isNaN(x), x, Object.is(x, +0)])

I take that back about the tuple thing - I did some testing in the current tuple playground and that's not how they behave. That's a shame.

[2, NaN, 0, -0].sortBy(x => #[isNaN(x), x, Object.is(x, +0)])

I worry this really hurt performance and still don't properly sort numbers.
Also sorting double could be really fast and branchless. But it required native reinterpret numbers as raw bits. See AssemblyScript implementation: assemblyscript/sort.ts at 057ce465f0c1b8c5346d4b28df8e47089f9db9c9 · AssemblyScript/assemblyscript · GitHub

What's a concrete use case for wanting to do a comparison with +0 > -0 instead of the usual +0 === -0, and treating NaN as bigger than all other numbers?

NaN may prevent sorting at all. For example:

[4, NaN, 3, NaN, 2, NaN, 1, NaN].sort((a, b) => a - b)
// > [4, NaN, 3, NaN, 2, NaN, 1, NaN]

Float64Array.of(4, NaN, 3, NaN, 2, NaN, 1, NaN).sort()
// > [1, 2, 3, 4, NaN, NaN, NaN, NaN]

Regarding -0 and +0. It may be important for some Math operations. See this comment:

I won't deny that there are times when +0 and -0 behave differently, and it can be useful to distinguish the two values. I've had to do so before. But, I don't necessarily see that as a reason for wanting to sort +0 higher than -0. If we went back to a hypothetical priority queue example, and assume we use this Number.compare() algorithm under the hood, then we would get the following odd outcome:

queue = new PriorityQueue()

queue.add('a', { priority: -0 })
queue.add('b', { priority: +0 })
queue.front() // 'b' is at the front, even though 'a' was added first, because -0 is not the same priority as +0.

Most likely, if this priority queue is receiving -0 and +0 as parameters, this is just due to an artifact of how the priorities got calculated. I wouldn't expect to have to normalize the sign of my zeros before giving this queue my priority to make it treat +0 and -0 the same, nor do I see a practical reason for giving +0 higher priority over -0.

If I happened to give the priorityQueue a priority of "some string" or `{ myKey: 'myValue' } or NaN, I would expect it to throw an error - I just gave it a bad parameter that can't be sorted on. I, likewise, don't see a practical reason for NaN to have higher priority than all other numbers.

If you're trying to sort an array using the sort function (a, b) => a - b, and your array contains strings, objects, or NaN, then a programmer error has already happened before that line has executed, and in an ideal world, the sort function would throw, but because Javascript is lenient, it'll attempt a sort anyways. Instead of making NaN sortable, the real solution would be to check for NaN beforehand and throw if it's found. i.e.

const items = [4, NaN, 3, NaN, 2, NaN, 1, NaN]
if (items.some(x => isNaN(x) || typeof x !== 'number')) throw new Error('Bad parameter!')
return items.sort((a, b) => a - b)

If we did have a Number. compare() function, my preference would be that it threw if it received parameters such as a string, object, or NaN.

There might be a reason for wanting these two properties in a numeric sorting algorithm, but, from what I can tell, most of the time they aren't desired, and I honestly can't think of a concrete use case where they would be desirable properties.