I enjoyed reading Ian‘s analysis on how to interpret “faster” when it comes to comparing infrastructure-as-a-service and a supercomputer.
Snippet:
For example, let’s say we want to run the LU benchmark, which (based on the numbers in Ed’s paper) when run on 32 processors takes ~25 secs on the supercomputer and ~100 secs on EC2. Now let’s add in queue and startup time:
- On EC2 , I am told that it may take ~5 minutes to start 32 nodes (depending on image size), so with high probability we will finish the LU benchmark within 100 + 300 = 400 secs.
- On the supercomputer , we can use Rich Wolksi’s
QBETS queue time estimation service
to get a bound on the queue time. When I tried this in June, QBETS told me that if I wanted 32 nodes for 20 seconds, the probability of me getting those nodes within 400 secs was only 34%-not good odds.So, based on the QBETS predictions, if I had to put money on which system my application would finish first, I would have to go for EC2.
Happy New Year everyone! I was planning for my next BrainExpanded post to be a…
See "BrainExpanded - Introduction" for context on this post. Notes and links Over the years,…
This is the first post, in what I think is going to be a series,…
Back in February, I shared the results of some initial experimentation with a digital twin.…
I am embarking on a side project that involves memory and multimodal understanding for an…
I was in Toronto, Canada. I'm on the flight back home now. The trip was…