Does anyone know how best to optimize compiling rosetta to take advantage of AMD Epyc (zen3/zen4) and Ryzen archatectures?
AMD offers an optimized fortran compiler called AOCC (https://developer.amd.com/wp-content/resources/57222_AOCC_UG_Rev_3.2.pdf). What might be the best way to make use of this? Can it be invoked directly using the "cxx=" flag in "scons.py" or must one configure the OS directly - and if so, how?
Some newer Epyc processors offer very large L3 caches (768MB and soon over 1GB). Can Rosetta make use of this extra cache? Is there a way to have roseta make specific use of this extra cache - either at build or runtime?
AMD offers a number of optimized numerical libraries collectively called AOCL (https://developer.amd.com/amd-aocl/). Does roseta make use of any of these libraries? Might it in the future?
Any input, suggestions or pointers to more info are welcome.
Rosetta is a C++ program, so a Fortran-specific compiler won't help. If there's a corresponding C & C++ compiler, you can specify that in the site.settings file. The released version should have a site.settings which has spots for specifing the path to the specific compiler you want to use. (The cxx setting on the scons command line is only for switching between the default gcc/clang/icc versions.) That file can also be used to add in any compiler-specific optimization flags which would be used to tweak the compile for the specific processors you're using.
As I understand it, L3 cache should be automatically used by the program when needed. (It will speed up all memory accesses.) You're in a bit of "luck" there, as Rosetta tends to suffer from L3 cache misses more than some other programs, which means that you'll probably see a bigger effect with a larger L3 cache with Rosetta versus some other programs.
Rosetta isn't able to use AOCL, but for the most part it doesn't use the sort of high-powered compute libraries like BLAS that AOCL is apparently substituing for.
I was under the impression that rosetta still uses some small bits of numerical fortran code and requires a fortran compiler, but perhaps I'm wayyyy behind the times - or confusing rosetta with some other code base like Gromacs or the like. Thanks for setting me straight. And good to know that it can benefit from the added L3 cache.