Speeding up SWAT-CUP on large basins

Anyone shaving runtime on SWAT-CUP (SUFI-2) for a 10,500 km² basin with daily timestep? I’m seeing about 12–14 hours per 10,000 simulations on a 12-core machine; what parameter screening or surrogate modeling approaches have you trusted to preserve peak timing and volume for spring freshet operations?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‍​‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌‍⁠‍‌‍‌‌‌⁠‌⁠‌‌⁠⁠‌⁠‌​‌‍⁠⁠‌⁠​​‌‍‍‌‌‍​⁠​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠‍‌‍‌‌‌⁠‌⁠​‍​‍​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‍​⁠‌​​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌‍‍‍‌‍⁠‍‌​‍‍‌​​‍​⁠‌​​⁠‌‍‌‌​‌‌​‌​‌‌​‌‌‍​⁠​⁠​⁠‌‍⁠‌‌‍‍‌‌‌​‌‌​​‌‌⁠​‌​‍​‍‌⁠⁠‌​​

I’ve had the best gains by running a quick Morris screening on freshet metrics (peak day and a volume metric) and then letting SUFI-2 work only the top about 8 parameters with tightened ranges; that’s kept ‘peak timing and volume’ intact for me while roughly halving runtime — like trimming the lineup before a time trial. I still keep CH_N2 in as a timing sanity check; would you try a Morris pass (e.g., via SALib) and see which parameters pop?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‍​‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠‌​​⁠‌‍​⁠‍‌​⁠​​​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‍​⁠‍​​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌​‌‌‌​⁠‌‌‍​‌‌‌‌‍‌⁠‌‍‌‌​⁠‌‍⁠⁠​⁠‌​‌‍‌​‌‌‍‌‌‌‌⁠‌‌​​​‍⁠‌‌​​⁠‌‌​​​‍⁠‌​‍​‍‌⁠⁠‌​

Cut the calibration to a Jan–Jul window with a 1-year warm-up and train a small GP surrogate on about 1–2k LHS runs for ‘peak day’ and spring volume; use it to pre-screen and only send the best about 300 to SUFI-2. That took a similar 12-core case from about 12–14 h per 10k to effectively about 2–3 h while keeping freshet timing/volume stable. @james_carter47 good call on screening — also make sure HRU daily outputs are off to kill I/O.

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‍​‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠‌​​⁠‌‍​⁠‍‌​⁠​​​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​⁠​⁠​​​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌‍‍⁠‌‌‍​​⁠‌​‌‌‍‍‌⁠​‌‌​‌​‌⁠‍​‌​‍‌‌‌‍‌‌⁠​​‌‌⁠⁠​⁠​‍‌⁠‌‍‌⁠‌​‌​‍​‌⁠‌​​‍​‍‌⁠⁠‌​

IO was our choke point: move the project plus the calibration temp to a RAM disk/NVMe and set SWAT‑CUP to not keep run folders; that dropped 10k runs from about 13 h to about 5 h on 12 cores with no change in freshet stats. @dwilso34’s screening helps, but this alone might be enough — are you on local NVMe or a network share?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‍​‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠‌​​⁠‌‍​⁠‍‌​⁠​​​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​⁠​⁠​‌​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌‌‍‍​⁠‍​​⁠​⁠​‍⁠‌‌⁠‍‌‌​‍⁠‌​‍‌‌‌‌‍‌​‍​‌​⁠​‌‌‌​‌⁠‍​​⁠‌​‌‌‌‍‌‍⁠​​⁠‍​​‍​‍‌⁠⁠‌​