LossyRef, better than WeakRef

In How to implement TwoWayWeakMap? Using WeakRef? there is an example of a TwoWayWeakMap.

However a problem is that, if someone tries to get an object key using a non-object value with the keyFrom method, if they do this often enough before GC ever has a chance to kick in, then it is possible they can inadvertently prevent the object from being collected (depending on the runtime, and depending on random timing of a program).

It seems that a new feature like LossyRef, whose deref() method would return undefined if an object is no longer reachable (except for in the LossyRef itself), would lead to more robust and fail-proof code.

As soon as a variable goes out of scope and is no longer reachable by any other code (the ref inside LossyRef not counting), then lossyRef.deref() would return undefined, although the object may still have not been collected (but will be).

Bike shed naming ideas: LossyRef, ReachableRef, ScopedRef

That's easier said than done. If you knew the point at which an object becomes unreachable, then you wouldn't need a garbage collector, and consequently LossyRef would be equivalent to WeakRef.

3 Likes

Polling a WeakRef is an anti-pattern. FinalizationRegistry callbacks are usually what should be used instead to avoid observing the object and keeping it alive.

As @lightmare mentioned, an object becoming unreachable is exactly what WeakRef and FinalizationRegistry handle, there is no other step.

@mhofman WeakRef returns an object if it hasn't been collected yet, even if there are no references to it. This means that user code can accidentally regain a reference to an object that would otherwise soon be collected.

Use of FinalizationRegistry does not prevent this issue in any way; FinalizationRegistry only notifies us after an object has been collected, but before this happens, a weakref.deref() call can still cause code to go from zero strong references of an object to more than zero strong references of the object, therefore this can prevent a FinalizationRegistry callback from ever firing.

The idea here is that LosableRef (toying with naming ideas) entirely prevents this problem.

@lightmare It seems possible for a JavaScript engine to track when objects go out of scope and are no longer referenced. I don't think this means an object has to be immediately collected as soon as it is no longer referenced.

However, I'm not knowledgeable enough to know whether this would add too much overhead.

What I do know, is that this concept would prevent the above issue.

Is it possible to do it, and do it performantly enough?

(If you can explain why it isn't possible, that would be super helpful.)

An object reference going out of scope and an object no longer having any references by the program are 2 different things. Engines generally implement generational gc, which doesn't track single references but instead check if any reference still exist.

Basically what you're asking is to gain some knowledge about the references that exist to your object that the engine doesn't have. Garbage collection is how an engine finds out an object is unreachable by the program and can be collected. Usually the object will be collected immediately during garbage collection. The weakref target and the weak collection entries will be cleared right away, then at some point later the finalization registry callback will be invoked.

Derefing a weakref during that window will not rematerialize the object. And before garbage collection, the engine has no other state regarding the references to the object, it's like if the program still held a reference somewhere.

FYI, I did implement your TwoWayWeakMap avoiding unnecessary deref, which should greatly reduce this weakref "reanimation" case, and limits it to when the program gets the keys associated to a value in the map.

2 Likes

To add to what @mhofman said. Some garbage collectors are also conservative.

Which means they can mistake some other data as a reference to the object. So even when there are no real references left an object may still not be collected after a GC scan.

1 Like

There are two time windows:

  1. Between the moment when the user code dropped its last reference to an object and when GC happens, and
  2. what you described: between when GC happens and when a FinalizaationRegistry callback happens.

Isn't it possible that before GC happens, inside of the the first time window, that a user can dereference a WeakRef and regain a reference back to an object, therefore preventing GC of the object and hence preventing the FinalizationRegistry callback, and that if a user loses and always re-gains an object reference in this manner before GC has a chance to run, that they will never collect the object?

Isn't that the problem that How to implement TwoWayWeakMap? Using WeakRef? has shown?

Let me know if you need me to show both the working and leaky examples here.

@aclaymore What's the solution to the problem? Without considering how an engine works, as an end user I would imagine this (despite that maybe it is too difficult to implement perhaps):

  1. Make a WeakRef around an object
  2. As soon as user code drops the last reference, dereferencing should be impossible.

That's the theoretical behavior that would prevent user bugs.

The first window exists exactly because it's too expensive for implementations to collect objects as soon as they are no longer referenced. If there was a cheap way to figure out as soon as an object is no longer referenced, implementations would be able to collect them right away, making the first window non existent. Basically the first window, while it exists, isn't something an implementation can detect.

The solution to the problem is to not probe WeakRef to test for reachability, and only deref a WeakRef when there is an actual use for the target value.

I don't know how it's done with non-reference counting pointers, but with reference counting, it's both cheap and easy. Suppose the engine used reference counted pointers, and the references were defined similar to this:

class Reference: public EventHandler {
   private:
      long count = 0;
      void *mem = NULL;
   public:
      Reference(size_t size) {
         mem  = calloc(size);
         ++count;
      }

      ~Reference() {
         if (mem != NULL) {
            free(mem);
         }
      }

      void ref() {
         //Designer can opt to throw for the opposite case.
         //Once count==0, this Reference is dead.
         if (count > 0) {
            ++count;
         }
      }
      
      void deref() {
         if (count > 0) {
            if (!--count) {
               free(mem);
               mem = NULL;
               fireEvent("deallocated");
            }
         }
      }

      bool isValid() {
         return count > 0;
      }

      operator void*() const {
         return mem;
      }
}

With something like this as the core reference, pointers could listen for the deallocated event and throw away their dead references. This means that WeakRef could know that no other uses of the reference exist anywhere and respond accordingly. Since the reference cannot be ref'd again once the count hits 0, there's no risk of reviving a dead reference before garbage collection. In this example code, just presume that free hands the memory block to the garbage collector for later cleanup.

Reference counting cannot handle reference cycles. Some engines may use them internally as an optimization, but it will not allow to detect some objects becoming unreachable. That's why all engines have a garbage collector.

2 Likes

I've used reference counting to handle object with cycles before with no issues and no leaks. The only time reference counting can't handle objects with cycles is when the developer fails to release all acquired reference copies in the destructor. When using a properly designed autoptr, it's a no-brainer. You just have to remember 1 rule. Always pass the autoptr object "by value". The moment you try to do it by reference, you're begging for a leak.

The other issue with reference counting is that it adds a cost, all references now take up more space and have to keep updating the counter when they are passed around, GC on the other hand can be implemented to mostly run in a different thread with limited usage of locks.

No, unless some object participating in the cycle explicitly breaks the cycle, reference counting cannot collect cycles. Thankfully JS is a managed memory language, so programs don't require explicitly dropping references to objects for cleanup to succeed.

@aclaymore That's true with a poorly designed autoptr. With a properly designed autoptr, a new reference object is only created to manage unmanaged or newly allocated memory. The autoptr maintains a pointer to this reference, updating the reference count whenever the autoptr is copied. This is why you don't ever pass a properly designed autoptr by reference (or pointer, if you don't see those as the same thing).

BTW, use of autoptr doesn't preclude use of GC. Just let the GC manage the memory and autoptr manage the reference to the pointer. When the autoptr's reference finally releases the pointer, instead of immediately freeing the memory, the GC does whatever it's going to do (probably just adding the pointer to a pool of free memory to be released later or reused during the next allocation request).

@mhofman You make me worry that memory management techniques are being lost due to the prevailing use of garbage collection. I'm relaying direct experience and you're telling me I'm wrong why? Let's make sure we're speaking the same language. With reference counting, a cycle occurs if one of the values pointed to at some path through the referenced pointer is the referenced pointer itself, right?

So if each pointer along the path between the instances of the referenced pointer is a unique, properly designed autoptr, and the instances of the referenced pointer held by those unique autoptr instances are the same reference object, then that reference object will contain the correct count. The destruction of each unique autoptr also properly modifies the reference object to maintain the correctness of the count.

No where is there any need to explicitly break a cycle. All that's required is to ensure that the autoptr instances are only ever used "by value". The instant you try to pass one by reference/pointer, or store one in an object as a reference/pointer, you risk breaking the count and causing either a leak or a hole. If you need to see it in code to understand, I can mock something up for you.

From Wikipedia on Reference counting:

A mechanism relying purely on reference counts will never consider cyclic chains of objects for deletion, since their reference count is guaranteed to stay nonzero (cf. picture).

I'm not sure where the confusion is. If a system is capable of creating and modifying objects so that cycles exist, then reference counting alone (aka storing an autoptr in the object's heap to other objects) will not work to deallocate the objects participating in the cycle.

I do agree in general, but with a minor nit: cycle collectors do exist. Here's a couple examples in Rust:

The greater proposal IMHO is a bad idea, though - the moment you store such a reference on a heap object and fail to change it later, you end up forced to assign WeakRef semantics anyways. The only time it would apply is if you always clear object references and/or keep all such references on the stack, and by that point, it'd be tantamount to an explicit free (and we all know how error-prone that is) with a safety net.

Also worth noting that it'd essentially force every non-optimized block to have a finally to decrement counters of every LossyRef encountered, and that'd rapidly become a bottleneck in interpreted code. (One could conditionally create the finally upon receiving a LossyRef, but this would essentially just trade off finally blocks for an extra condition checked for every function call and property access.) I doubt implementors would get on board with that prospect.

After a good night's sleep, I remembered what the catch was to the design (30 years of dust on that memory). The trick has to do with the rule, only use by value. This guaranteed that the autoptr would appear on the stack in most cases. The autoptr I wrote would check and see if it was on the heap. If it was, it would refuse to increment any reference it received.

Thinking about it with the experience I've gained since then, it seems I put a light weight garbage collector in the autoptr I made back then. :thinking: :man_shrugging: