Can for-of be optimized to simple for loop speed?

I noticed that the larger an array is, the slower for-of is exponentially. f.e. on my machine, about 5ms and 10 ms for iterating 1,000,000 items with for and for-of respectively. However when I up to 10,000,000 items, I get 10ms and 90 ms respectively.

Is it possible that an engine can treat a for-of loop like a simple for loop on an array, and de-optimize only once someone has written code that would require iterator features?

For example, if I write code like

let i = 0
for (const n of items) other[i++] = n

I'm not doing anything special that requires any of the guarantees that for-of provides with respect to iterators, so it seems that an engine could possibly optimize that to the equivalent of a simple for(let i=0, l=items.length; i<l; i++) loop.

I think engine's already do that... If you find the code is deoptimized, maybe you should open a bug report on v8 or other engine you use.

2 Likes

In fact a for...of might even be faster as the engine does not have to keep a separate index but can simply add the memory offset to the pointer in memory (but that really depends what the for loop does).

While I do not know V8 internals, from what I can see optimizing away a for ...of works pretty well.
For this code:

// called 10000 to trigger jitting
function hot() {
    const arr = [1, 2, 3, 4];
    let sum = 0;
    for (const entry of arr) {
        sum += entry;
    }
    return sum;
}

V8 jit-compiles this (made some edits for readability):

  ;; PREQUEL - It is checked whether 'next' was overriden for the array.
  ;; If not, the hot path is taken, otherwise it deoptimizes
  movq rcx,0x38d4c4d56789    ;; object: 0x38d4c4d56789 <JSFunction next (sfi = 0x22e69dad7971)>
  cmpq rcx,rdx
  jnz deopt_a

  ;; the array pointer is loaded into register rdx
  movq rcx,0x221e75ebf471    ;; object: 0x221e75ebf471 <FixedArray[4]>
  movl rdi,[rcx+0x13]
  ;; Stack Guard
  cmpq rsp,[r13-0x20] (external value (StackGuard::address_of_jslimit()))
  jna bailout_c
for_loop:
  movl r8,0x1
  movl r9,0xffffffff
  jmp skip4
  nop
close_for_loop:
  movq rdi,r14
skip4:
  cmpl r8,0x4
  jc skip5
  movq r8,r9
  movl r12,0x1
  movq r11,[r13+0x20] (root (undefined_value))
  jmp skip6
skip5:
  movl r11,r8
  movq r11,[rcx+r11*8+0xf]
  addl r8,0x1
  xorl r12,r12
skip6:
  cmpl r12,0x0
  jnz skip7
  testb r11,0x1
  jnz deopt_b
  movq r14,r11
  shrq r14, 32
  addl r14,rdi
  jo deopt_c
  cmpq rsp,[r13-0x20] (external value (StackGuard::address_of_jslimit()))
  ja close_for_loop ;; back edge to beginning of the loop
  jmp bailout_d

As one can see the actual loop (between the label close_for_loop and the jump to it at the end) there is no single call to a function (i.e. iterator.next()). So the engine completely optimized the iterator away and the array access is a simple movq r11,[rcx+r11*8+0xf] (I think that's the actual array access, can be wrong though as reading generated assembly is not the easiest).

2 Likes