For example, let’s say we want to run the LU benchmark, which (based on the numbers in Ed’s paper) when run on 32 processors takes ~25 secs on the supercomputer and ~100 secs on EC2. Now let’s add in queue and startup time:
- On EC2 , I am told that it may take ~5 minutes to start 32 nodes (depending on image size), so with high probability we will finish the LU benchmark within 100 + 300 = 400 secs.
- On the supercomputer , we can use Rich Wolksi’s
QBETS queue time estimation service
to get a bound on the queue time. When I tried this in June, QBETS told me that if I wanted 32 nodes for 20 seconds, the probability of me getting those nodes within 400 secs was only 34%-not good odds.
So, based on the QBETS predictions, if I had to put money on which system my application would finish first, I would have to go for EC2.