Dynasoar and Virtual Machines (part 1 – The concept and the architecture)

Here's something fun I've been working on lately (relates to something I blogged about back in November 2004).

Motivation

Dynasoar is a research effort within NEReSC to define a service-oriented architecture for the dynamic deployment and hosting of services on the Internet (or Grid for those of you who prefer "in" buzzwords 🙂 It clearly defines the roles of a Consumer, Service Provider, and Host Provider, and the relationships and interactions between them. You can read Paul Watson's and Chris Fowler's CS technical report (CS-TR-890) for more details about the thinking and the architecture.

The motivation behind Dynasoar has been the observation that the 'service' abstraction is better suited, when compared to the 'job' abstraction, for the distribution of application functionality and the dynamic exploitation of distributed computational and other resources. This is inline with the current distributed computing thinking, where the focus is on the composition of services and the message-based interactions between them. Effectively, Dynasoar treats the entire Internet as a big service-hosting environment. Issues which are considered by Dynasoar include service-code-caching and mobility, moving computation close to data, resource usage balancing and dynamic service provisioning, payment for the use of resources, privacy, security, and many more. I am not going to expand on any of these here. The technical report and some of the other papers/presentations contain more details (there are also a couple that have been submitted to conferences and one being prepared which are not on the page yet).

Jobs vs Services vs Virtual Machines

In high-performance computing the 'job' abstraction has been used extensively as the means to reason about distribution of computation and the exploitation of remote computational resources. When moving in a service-oriented world, current practice dictates the creation of wrappers around jobs or even new infrastructure (e.g. WSRF) in order to enable interaction with them. Also, as Paul Watson often observes, in a typical job-submission scenario, a job is submitted, executed, and then discarded. In a service-oriented architecture a service is deployed, rather than created, and once deployed it can deal with multiple interactions until it is explicitly undeployed. This subtle difference represents a good example of the transition between the job-submission-based and service-oriented models of building distributed, large-scale applications. The great Dynasoar team (in no particular order: Chris Fowler, Charles Kubicek, Arijit Mukherjee, John Colquhoun, Mark Hewitt, and of course the boss... Paul Watson) already has an initial prototype ready utilising Condor and the previous work on GridSHED. Currently, we are at a re-architecture and re-implementation phase to allow for the integration of planned future work and to enable a clear separation in the code of the actor roles (more on the Consumer, Service Provider, and Host Provider roles below and in the technical report). Then we hope to deploy Dynasoar to our 10,000-node Newcastle Grid infrastructure (when that's ready as well) and perhaps on to the UK National Grid. Well, I say 'we' although unfortunately I am going to be gone when the time comes. I'll keep an eye on this work though 🙂

An inherent problem with both the job-submission and dynamic service deployment scenarios is that of the possible runtime dependencies on libraries and other applications. There has to be an agreement on versions and binaries of classes, other installed applications and runtimes, hosting environments, etc.

Recently, the increasing computational power available to us through the advances in hardware and the emergence of commodity solutions for machine virtualisation (e.g. Xen, VMWare, Virtual PC/Server, etc.) have given rise to a more coarse-grained solution to the distribution of application functionality, that of virtual machines. So, sometime ago and before my holidays, Paul asked me to investigate and try to discover the issues with implementing a VM-based solution for Dynasoar. Rather than deploying service-specific code, the idea is to transfer entire virtual machines (well, perhaps only the necessary parts of a virtual machine, like a differencing disk for example). This allows applications components to be transferred with their entire environment packaged together without polluting the hosting environments with configuration changes or library and application installations. Both the host and the consumer win from this arrangement. However, the cost of transferring large images is great and, hence, the need for the investigation. We want to measure the relation between a service's computational granularity, the fact that it's deployed (hence, cached), and the network transfer costs.

Please note that although I am talking about large-scale deployments, I am not necessarily suggesting such solutions are loosely-coupled. Although loose-coupling is welcomed, when binary or runtime-specific technologies are involved, assumptions about specific technologies and their versions have to be made (e.g. Java vs .NET and Java 1.4 vs Java 1.5, etc.). As a result, we decrease the degree of loose-coupling between the components of our distributed application. This is also true when virtual machine technologies are employed given the required agreement on a particular virtualisation technology (e.g. VMWare vs Virtual PC/Server), unless of course all possible technologies are supported. VMs make life easier since fewer aspects of a deployment have to be agreed in advance.

Architecture

Dynasoar's architecture introduces three actors: the Consumer, the Service Provider, and the Host Provider.

A MEST disclaimer: Please note that these actors are application-domain specific. As we know from MEST, the only actor in service-oriented architectures is the 'service'. Application domains, like that of dynamic service deployment, are allowed to reason in terms of higher-level actors like those introduced by Dynasoar .

A Consumer exchanges messages with a Service Provider. As far as the Consumer (C) is concerned, the required service is offered by the Service Provider (SP). However, the Service Provider chooses one of the available Host Providers (HPs) out on the Internet (or an organisation's internal infrastructure) to deploy the service's code (or a virtual machine). The choice of HP can be done based on service-level agreements, resource load requirements, security-related policies, Consumer-defined criteria, etc. Once the service has been deployed, messages are forwarded for processing. If the deployed service code can be replicated (e.g. the implementation is stateless or the data store requirements are also replicated or shared), then even multiple HPs could be used to distributed the message processing load.

There are multiple variations of the architecture diagram presented above. For example, a Consumer may define the list of available Host Providers a Service Provider should consider because, for example, the utilisation of computational resources there is free for that particular Consumer. Also, a Consumer may play the role of a Service Provider effectively pushing the service's code to a Host Provider. As I said, there are many variations which are discussed in Paul's and Chris' technical report. The service-code caching for offline access of stateless services I built sometime ago using WSE is also a special case of the Dynasoar architecture.

The VM-based implementation for Dynasoar doesn't differ from what was discussed above. The only difference is that a virtual machine is packaged and sent to a Host Provider rather than just the code for an Axis or Indigo (erm... Windows Communication Foundation) service as in previous implementations as shown below.

That's it for now. Part 2 will discuss the implementation of the above using Windows Communication Foundation and Virtual Server on Windows.

5 responses to “Dynasoar and Virtual Machines (part 1 – The concept and the architecture)”

  1. I’ve always wondered where the fine line between the traditional “Grid Computing” paradigm that is often catagorized by the “Job Submit” model and the more service oriented paradigm goes.

    There are some frameworks for .NET based “Job Submit” such as Alchemi (http://www.alchemi.net/) and .NET has faily good support for the Service Oriented approch, but there is a lack of everything in between.

    I, for example, need something that is not as vage as the traditional “Job submit” model (or something similar) that upon submitting the job someone makes sure that the necessary staging will occur, the job will run and the result will get back.

    What should I do if I want a semi contractable interface that receives real things not only job objects that contains everything a generic job should do?

    How can I still be as open as possible in a generic job fashion yet still have some of the power of a strongly defined contract that perform some job for me?

  2. Hey Eran, I think what you are really looking for is a community agreement on a set of message exchanges. Just in case you are not aware o fit, there is a group in GGF that is trying to do just that. There is a vocabularly called JSDL you may want to have a look at and there is also an implementation by the OMII.

  3. Hi Tim, my apologies if I’ve given the wrong impression. I was not trying to suggest that we are the first to look into virtual machines. In fact, we’ve been following the work done by others in this space and we’ve had discussions with them.

    I believe that the work of the Dynasoar team is interesting and novel because of its service-oriented approach and the investigation into architectural issues based on the separation of roles for the 3 actors. The investigation I talk about in this post is mostly an experiment on how to bring VMs (services hosted in VMs really) into Dynasoar.

    I hope this makes sense. Again, my apologies if I was seen to promote the idea of VMs as something novel.

  4. Savas, thanks for the answer.

    I am looking now at JSDL.

    Is it just me, or are there too many WS-* specs out there? 😉

    I wonder how JSDL and others integrate with the rest of the WS-* specs.