I am really enjoying the work that my new team does. We are a Cloud-focused team. I am really looking forward to when we are going to be in a position to talk in public about what we are building. I am learning a lot and the people in the team are fantastic!
The following observation is by no means unique or original. Many many out there have been doing similar and even more complicated calculations, illustrating the value of Cloud Computing. Many use similar insights already for their day-to-day operations. Cloud Computing infrastructures have enabled so many businesses to take off and scale, at a fraction of the typical infrastructure costs. So, nothing new here :-)
I am recording it because it led me to the realization that us developers/architects/philosophers*, not just CIOs and CTOs, have to also start embracing the new platforms out there and incorporate economics-related thinking in the way we develop, not just deploy and operate, software and services. Again, the following observations are very simple and obvious, so don’t expect to find anything profound. You’ve been warned :-)
The Cloud is not just a platform for deploying applications and services. It’s a great tool for our day-to-day lives as developers as well.
Recently, we had to transfer 2TB of data as a test for one of the components we are developing. As part of the transfer process, we wanted to calculate the MD5s of thousands of files. We used Azure storage as the destination for our files. We effectively used it as a data repository, as a disk, directly from our locally-running software component. Our intake process was running on a local computer, getting the data from the Internet and placing it on Azure. Then, another computer was reading the data from the “disk”, calculating the MD5s, and storing the results back into the Cloud. That was our mistake. Some simple calculations illustrate why.
An Azure (or Amazon Web Services customer for that matter), pays for the data it transfers into the Cloud storage and for the data it transfers out. Any transfers inside the Cloud are free.
For Azure (and assuming that the transfers happen in the US), the cost is $0.10/GB in and $0.15/GB out. In other words, we had to pay (I am not adding the per-10k-transactions cost here):
- $0.10 x 2 x 1,024 GB ~= $205 to bring the data into Azure
- $0.15 x 2 x 1,024 GB ~= $307 to get the data out of Azure in order to perform the MD5 calculations
Now, had we used an Azure compute instance to do the calculation of MD5s, we would have dramatically reduced the cost of our task. Assuming we can get 100Mbits from Azure storage to an Azure worker role, we can process ~12.5MB/s. Let’s assume for the moment that we can calculate the MD5 almost instantly and that we are bounded by the bandwidth. We will need ~46.5 hours to calculate all the MD5s. We cannot get 100Mbits out of Azure to a local machine, so it would have been much much more time consuming to do the calculation outside of Azure.
The cost of using an Azure worker role for 46.5h:
- 46.5h x $0.12/h = $5.58
Wow. Compare $307 against $5.58! That’s a huge saving and I still haven’t included the cost of owning, managing, maintaining the infrastructure in order to perform the calculation locally (hardware, software, networking, power, human resources). All we have to do is deploy our app on an Azure compute node and finish our task for $6. More importantly, given that our task at hand is highly parallelizable, we could just use 10 or 20 or 40 Azure instances and finish in a fraction of the time for the same cost. Installing a cluster to scale out an one-time task would have dramatically increased the total cost for that operation.
Our job as developers is to come up with great designs and build good quality software that meets customer requirements. We should also be thinking about the cost of delivering the software. Cloud computing infrastructures are here to help.
* :-) “why a philosopher“