A portable program executes on different platforms and yields consistent performance. With the focus on portability, this paper presents an in-depth study of the performance of three NAS benchmarks (EP, MG, FT) compiled with three commercial HPF compilers (APR, PGI, IBM) on the IBM SP2. Each benchmark is evaluated in two versions: using DO loops and using F90 constructs and/or HPF's Forall statement. Base-line comparison is provided by versions of the benchmarks written in Fortran/MPI and ZPL, a data parallel language developed at the University of Washington.While some F90/Forall programs achieve scalable performance with some compilers, the results indicate a considerable portability problem in HPF programs. Two sources for the problem are identified. First, Fortran's semantics require extensive analysis and optimization to arrive at a parallel program; therefore relying on the compiler's capability alone leads to unpredictable performance. Second, the wide differences in the parallelization strategies used by each compiler may require an HPF program to be customized for the particular compiler. While improving compiler optimizations may help to reduce some performance variations, the results suggest that Permission to make digital/hard copy of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.(c) 1997 ACM 0-89791-985-8/97/0011 $3.50 the foremost criteria for portability is a concise performance model that the compiler must adhere to and that the users can rely on.