Draft for a proposal for a Pure Function API

Based on the adoption of one of the two following possible future proposal (depending of the community interest for my previous ideas) :

  • proposal_explicit_reference_syntax
  • proposal_explicit_ownership_syntax

This API provides a way to check if a function is pure and thus allow us to fearless compose them or paralellize them in thread pools. This also assure a way to make future high level API that can use secure and context free function that can run beside main script.

This API is based on at least one of the following proposals :

API

interface Function {
    isPure(func: Function): boolean
}

isPure return true if the passed function only use arguments that respect (or use 0 arguments):

  • "@arg" (read only reference)
  • "copy arg" (deep copied argument)
  • "move arg" (scope moved argument)

And all function used in current function body return true for Function.isPure()

Examples

No arguments

function doNothing() {
    return
}

console.assert(Function.isPure(doNothing) === true)

Read-only reference arguments

function compute(@matrixA, @matrixB) {
    const result = Matrix.add(@matrixA, @matrixB)
    console.log(result)
    return result
}

console.assert(Function.isPure(compute) === false)

Explicit onwnership described arguments

async function extractUser(copy username) {
    const datas = await fetch('api/username')
    return await datas.json()?.user
}

console.assert(Function.isPure(extractUser) === false)

const datas = await fetch('api/username')
async function extractUserPure(move datas) {
    return await datas.json()?.user
}

console.assert(Function.isPure(extractUserPure) === true)

Arguments mix

function foo(@arg0, copy arg1, move arg2) {
    return result
}

console.assert(Function.isPure(foo) === true)

function buzz(@arg0, copy arg1, move arg2, arg3) { //arg3 is no referenced of ownerhip described argument
    return result
}

console.assert(Function.isPure(foo) === false)

FAQs

  • Throw an expection can be a criteria of non-pure function in order to assure the constant execution flow.

Proposal: https://github.com/JOTSR/proposal_pure_function_api`

I really like this idea! Had a similar idea that you can define a function with pure keyword before, like pure function somePureFunction(). This would only allow this function to access passed parameters and variables defined in its own scope, and would not allow to modify any passed parameters (they are readonly). All variables outside its scope would be considered undefined and undeclared.

A pure function would also only be able to call other pure functions.

What about globals, like Array, Promise, Map, etc?

2 Likes

What would you consider to be a "read-only parameter"? In particular do you allow objects as parameters? How do you test if the object is pure?

More generally, should there be a way to achieve memoization or similar non-observable caching in pure functions?

What if one of the parameters is console.log() or a function that isn't pure itself, and the pure function calls it?

I currently thinking of a re-design of this idea in odrer to promote and simplify the adoption of this functionnal programming pattern.
It could be like:

fn name(/* args  as describe in the first post */) { /* inner scope */ }
//anonyme
fn () {}

The rules are simple:

  • only primitives, unreferenced object (and owned like or referenced like from my others proposal) are allowed for arguments
  • no access to outer scope except of globalThis properties and methods (same rules as for arguments)
  • call of non pure function raise a synthax error (maybe at parse time)
  • can't "throw" error as it is an irregular behaviour in the execution flow
  • due to previous characteristics it allow simple recursive optimisation for runtimes

The 2 rules for I'm not sure are:

  • maybe it can be good to don't bind this like in arrow function
  • a syntax for explicitely import variables from outer scope (maybe confusing and out of scope but can solve the black box aspect of some function)
fn name() use(/* required external variables */) {}

My idea is to throw a syntax error, at parse time if possible

There is no notion of owned reference in JS and I don't see how one could be introduced.

Similarly, JS is a dynamic language and you can't know before execution what the type of a binding is. In particular you can't know if a binding in scope would be a pure function or not, at least not until trying to invoke it.

And that's where things start to get really sticky. If the implementation can't throw, how would calling non pure functions or accessing non pure objects be prevented? Also stack overflows etc.

I don't see what's simple. I don't consider purity as an optimization vector but as a way for invoking code to make sure what it calls is side effect free.

1 Like

To be clear, I would love to figure out a way to do pure execution in JS, but I believe the mechanism will have to be more along the lines of side-effect checks that engines like V8 implement for their debugger's inspection (e.g. execution of getters), aka start an invocation in no "side-effect" mode. The biggest problem is how to allow some state to be mutated, such as state reachable from arguments, which may be encapsulated.

Maybe I’m not understanding your objections, but if you explicitly declare a function as ‘pure’ at “compile-time”, the function should not be able to modify any parameters in any way, as if a recursive object.freeze had been called on any objects and arrays passed as parameters. Also, the function should only be able to call other explicitly declared pure functions.

It should not be able to access anything from the global scope except perhaps exceptions like ‘Map’, ‘Proxy’ and other standard library objects, but these too should be read-only from the function’s perspective.

Would this be hard to achieve? It should throw a syntax error if any of these rules are violated.

But that's the thing, these objects were not recursively frozen, so how do you prevent the pure function from modifying those objects? Compare that with objects allocated during the execution of the pure function. Nothing differentiates them, besides tracking which objects are allocated during the execution, at runtime.

Besides, a common relaxation of purity checks is being allowed to mutate the state held by some arguments.

On the scoping topic, you have to be consistent. Either you have access to bindings in global scope, or you don't. Having access to a subset begs the question of how you decide what the subset is.

Also how do you access "other pure functions"? I assume they'd not all be from the global scope? So once again, how do you restrict which bindings from your surrounding scopes are available.

This whole proposal is very hand wavy. JavaScript is not a statically compiled language like rust, and cannot perform "compile time" checks based on what a binding contains.

It is only for example as I made this proposal compatible with Explicit Ownership Syntax and Explicit Reference Syntax. I just said if any future proposal provide a such functionnality it will be valid as arguments.

You can do this check at parse time as it is a syntax rule of this proposal (or at least at runtime), eg:

const arrow = () => true

fn pure(arrow = arrow /* ...args */) {
  // code
  console.log(/* code */)
  //code
  arrow()
}

Can result in a pseudo AST of shape

{
  type: PureFunction,
  name: "pure",
  args: [/* */],
  body: {
    //...
    type: Function, //Syntax error (Function are not allowed in PureFunction body)
    name: "console.log",
    //...
    type: ArrowFunction, //also throws
    name: "arrow"
    //...
}

Of course these checks can be done at runtime for those that can be deduced at parse time.

When I say can't throw, this is not to have fn that don't throws but you can't use throw new Error() "manually" or using throwable function. It is a sort of like in rust with Result enum that function can return and panic that is raise by the "runtime" to ensure that the function respect the signature.

fn name() {} declaration if a constant declaration unlike with function keyword. As the input and output are "constant" (known before runtime) you could simply unwrape recursive function like:
(only if the function is returning itself)
Only for example, not a guideline of an implementation:

fn recurse(a, b) {
  //...
  a += 0.1
  b = a + b
  if (a < 100) return recurse(b, a + 2 * b)
  if (a < 200) return b
  return a
}

To a tail call optimised form

fn recurse(a, b) {
  while (true) { // all the body of the function is passed in the loop
    //...
    a += 0.1
    b = a + b
    if (a < 100) {
      [a, b] = [b, a +  2 * b] //recursive call is simply transformed into reassignment
      continue // + continue
    }
    if (a < 200) return b
    return a
  }
}

You only have access to globalThis not the global scope in order to have access to standard objects like Date, Map, ... . To use a outer scope object you have to pass it explicitely as argument. For the same reason you can't pass a reference as argument as it can be accessed from the outiside.
supposing the following calls:

const c = new Date()
const d = 5
pure(1, [/* primitives or literals*/], c, d, { key: 'prop', date: new Date() }) //error
//c raise a syntax error since it is an external reference, you have to clone it
pure2(date = new Date(c)) //ok date is cloned

An Explicit Ownership Syntax and Explicit Reference Syntax like syntax can be helpful but with these rules we ensure the "purity" of the function.

I think the idea is pretty robust. There is no need to have type since you can distinguish primitives and Object, const and let. I also forgot to mention it in my previous posts but with the fn keyword you declare implicitely a const pure function and like arrow function there is no access to internal properties like arguments ... I think the best is also to don't bind this.

Ugh? The AST for that would be invocations on a binding (arrow) and on a ref (console.log), but it wouldn't know what the target of those are at parsing time...

What about language constructs that throw? E.g. null.foo? What is a throwable function? A lot of intrinsics throw.

I do not understand your definition of constant. You seem to be describing function inlining. I also don't see any reason an explicit pure marking is required for an engine to perform the optimization you describe.

Being pedantic here, but globalThis is a binding looked up on the global scope... And accessing Date is similarly looking up that binding on the outer scopes, bottom out in the global scope and the global object.

At runtime only, and it seems most of your ideas centers on ahead of time checks.

I will answer the linear type semantics in the other thread.

As you have a const declaration you can follow the declaration in same and children scopes to raise some syntax error before runtime (unless you use a reference to the function) the others check indeed need to be at runtime. But yes my assumption is probably too optimistic compared to the complexity or time cost of such behaviour. I don't know what optimisation are made by parsers but at least it can be catch with dev tools like eslint or typescript.

fn name() {}
//equivalent to
const name = fn() {}

With "fn" is a freeze object (or equivalent) and don't provide any access to internal object properties

I don't understand your point, you can use try...catch and have some errors like type and syntax error. You just can't explicitely write a throw expression in the pure function body (and children scopes). You can write an synta or a type error without restriction.

By global scope I mean module or script scope. Imports, let and const declaration are not accessible in globalThis. It's just a convenient way to access to std global Objects but this can be discussed and removed from this proposal if it induce problems.

Yout have the distinction between primitives and object in many parser (like in V8) on declaration. But yes, as I said upper in this response I don't know what optimisation are made on the ast by the major engine, and what is possible to do without complexity cost, but even if it mainly performed at runtime external tools like eslint, tyspescript, ... can prevent runtime checks.

I think this proposal will be facing 1,000 impossible hurdles if we're trying to shoot for these things to be parse-time errors. You have to deal with issues like:

  • You should be allowed to call array.map() (unless someone monkeypatched it into something impure, which is a whole issue on its own), but you shouldn't be able to call array.push(). But, how do you even know if an array is being passed in in the first place, since there is no type system in JavaScript, it could just be an object that happens to have two pure, or maybe impure functions on it called .map() and .push(). How can we detect, at parse time, if there's any issues with any of this?
  • The mechanics for having one pure function call another seems impractical. You're saying that these functions can't access anything from their outer scopes, including other functions? Which means the only way to split a function up and create helpers, is to pass those helpers explicitly into the main function? That sounds ugly.

You might have more luck if we just focus on runtime errors - while we're currently executing code inside a pure function, we'll cause the runtime to throw an error if any kind of mutation or side-effect happens. (and, it's ok to have errors - you can have a function both be pure, and throw errors - I see no reason to try and band the throw statement). You could also allow access to the outer scopes now (but, just forbid the updating of data from those outer scopes).

This, however, opens up its own can of worms that would need to be discussed, like:

  • As previously mentioned, how do you handle things like memoization, which are technically impure, but are usually given a free pass, because, for all intents and purposes of outside users, it is pure. (Perhaps we don't allow memoization, that's certainly a valid answer, but it could make the usage of pure functions much more challenging for the functional crowd, which, ironically, would care about these kinds of functions more).
  • How do you interop with libraries, that were written before this proposal came out? They're not exporting functions that are marked as pure, even if they technically are pure. So, does using an older library poison your codebase, and make it so you can't mark any of your functions as pure either, even if they are? It would be a real shame if we put out a proposal, that automatically made all existing JS code out there become "legacy", and put pressure on everyone to update their code and mark things as "pure" if they should be pure.

I believe that any small optimisation before runtime is good to take, but yes it seems to be difficult to catch as behaviour as I though but at least it can be perform "simply" by tools like eslint and typescript.

array.push() is allowed unless the array is scoped to the function body (and the same for all "impure" methods of inner scoped objects).

For this problem arguments can only be owned object (literals or via owned like syntax proposals), you can't pass a reference (at parse time) as parameter (maybe it should be ok for freezed const with runtime checks).

To assume purity you have to write:

fn outer(/* args */) {}

fn calling(/* args */, outer) {
  //...
  outer(/* ... */)
  //...
}

calling(..., outer) //pass outer as argument 

A other idea was to trust the user (maybe it have to be strictely regumated since it can induce "purity" error) with an explicit scope access syntax like:

fn outer(/* args */) {}

fn calling(/* args */) use (outer) { //outer have to be defined in the calling scope of "calling"
  //...
  outer(/* ... */)
  //...
}

calling(...) //don't need to pass outer as argument 

The problem with the throw statement is you can't know (without reading the implementation) if a function as a such behaviour. Moreover the idea of pure function is to develop functionnal principle and throwing is not good for composition or flow management, the idea is to prefere [res, err] return pattern (or equivalent with enum, object, ...). Finally it guarentee that the function don't throw unless very un solvable runtime error (and not only because of a way to return non critic errors). Another option (but can be less used and so diminish the syntax interest) is to allow throw but deleguate warning or error for tooling like eslint or ts (--no-throw-expression-in-pure-function).

Maybe globalThis and (depending on the future choices) global scope object as to be seen as const declaration and frozen.

I don't really know how to minimize the cost of compatibility.

  • If we used the first idea of this proposal and use the Function.isPure() test, all was made at runtime so there is no cost (criteria have to be tweaked compared to the original idea to encapse these cases).
  • If we use the new fn syntax maybe an explicit or implicit with runtime check (like in first idea) to cast them into pure function. Explicit cast example Function.toPure().
  • If we use the fn name() use() {} syntax you can simply add a compatibility layer by fn pure(/* args */) use (oldPure) { return oldPure(/* args */) }.
  • Manually patch for and work with legacy code as done for var keyword.
  • If ensure sufficient runtime checks allow classic pure function as explicit pure function arguments and then solve all new use case.

I don't see many solutions, one is to use the "use" keyword:

fn pure(a: number): number {
	return a + 2
}

fn memo(fn: PureFunction) {
	const cache = new WeakMap()
	return fn(...args: Parameters<typeof fn>): ReturnType<typeof fn> use (cache) {
		if (cache.has(args)) return cache.get(args)
		const result = fn(...args)
		cache.set(args, result)
		return result
	}
}

The other is to assume that knowing there is no side effect engine can perform memoization following their own heuristics.

Maybe, with the "fn" syntax it will be usefull to have a expression like notation (as arrow function) in order to promote the usage of pure functions

const arrowed = fn() => "expression"

the idea is to prefere [res, err] return pattern (or equivalent with enum, object, ...).

This is actually one of the issues I have with the Elm language - the lack of the ability to "throw" an error, instead, they want to force you to return tuples like that. I feel like there's a place for both of these, and leaving one out will cause issues.

Let's say, for example, that you develop a function that, when given a tic-tac-toe board, it returns "X" if X won, "O" when O won, "CatScratch" when it's a tie, or null if the game isn't over yet. Then, let's say a board like this gets passed in:

[
  ["X", "X", "X"],
  [null, null, null],
  ["O", "O", "O"],
]

What should it return? Well, under your argument, we should really be returning a tuple. If we get a broken board like this, we can return an error stating that it's a broken board, otherwise, we can return the normal response. But, what's the caller going to do with that brokenBoard response - well, you only get that response if there's some fatal bug in your code, so there's really nothing the caller can do but send it up to their caller. And their caller can send it up to their caller. Until, in the case of Elm, you return a message to the runtime requesting that you display an error message on the screen saying that a fatal error occurred. So, Elm likes to tout about how it has zero runtime errors, since it doesn't provide a way to throw errors, only return them, but this "zero runtime error" concept isn't really true, what they have, instead, is developers being required to manually pipe down runtime errors to their callers. Or, what most people probably do instead, you just don't bother with any of that nonesense, and don't bother adding assertions to your code to verify that everything is in a good state, and instead let the program continue to run in it's invalid state (in this case, the function would just pick either X or O and move on - not such a bad thing here, but it could be worse in other scenarios). In other words, this design that Elm has chosen is actively encoraging programmers to not bother with assertions and early errors.

I'd rather not adopt that :).