We give a performance analysis of the purely functional ar- ray language Futhark and its GPU-targeting optimising com- piler on 16 benchmarks, and present several features that enable performance comparable with hand-written code: (i) a simple type system for in-place updates that ensures referential transparency and supports equational reasoning, (ii) several bulk-parallel operators, which encode strength- reduction invariants, along with their fusion rules, and (iii) a flattening transformation aimed at enhancing the degree of parallelism, which builds on loop interchange and dis- tribution but uses higher-order reasoning rather than array- dependence analysis, and preserves the opportunities for fur- ther locality-of-reference optimizations.