What's going on (from twitter)
Archive: July 2008
F# for Scientists
31 Jul 2008
, Categories: Microsoft, Technology

I saw Tony Hey earlier this week holding the “F# for Scientists” book so I had to check it out. Very very cool! I love it that F# is being productized and will be soon in the hands of developers. I’ve been following its evolution and rise to a product for two years now.

While at the Faculty Summit, I talked with Ian Foster again who pointed me to the 3rd Annual Chicago Colloquium on Digital Humanities and Computing Science. It looks very interesting and relevant to some of the work we are doing in MSR’s External Research.

In the first of a series of videos related to our recent announcements in the Scholarly Communication lifecycle space, Lee Dirks talks on Channel 8. Fun!!! :-)

Yes, I know the name is boring. We have a better name but we weren’t able to secure it internationally in time for Faculty Summit. Anyway... “Research Output Repository Platform” will have to be for now.

I talked about our efforts around the platform few months ago.

We are building a graph store, to enable an ecosystem of tools and services. Originally, we had an ontology specific to the scholarly communication domain, which is still the case. When our platform and our tools/services built on top of it are released, they are going to be focused towards “research output” repositories. However, we are in the process of building support for RDFS. This means that one would be able to express data models and our graph store will accommodate it, trying its best to maintain the balance between a relational and a triple store (refer to my previous discussion on this) but we are always going to be storing relationships as data (i.e. use <subject, predicate, object, attributes> tuples).

We want to enable a semantic computing ecosystem of technologies on top of our platform. We are going to be looking into what we need to do in order to support SPARQL and other related technologies.

I am going to be blogging more about the platform from now on and provide code samples as well. Just to give you an idea of how the “scholarly communication lifecycle” is influencing everything that we do, here’s an example of a plugin for Word allowing you to submit the document directly into a repository, making use of a Web service.

image

When I joined Technical Computing, now part of External Research, we wanted to create an ecosystem of tools and services to support researchers worldwide. Today we announced the results of some of our efforts; there is still more going on.

A tool that was discussed was the Creative Commons addin for Microsoft Office XP/2003. We got feedback from researchers that they really liked the functionality but were very surprised that Microsoft didn’t release an update version for Microsoft Office 2007. Well, we contacted the team responsible for it and found out that they had no plans to update it so we requested and got ownership of its future.

I started prototyping some new ideas around a ribbon-based interface, allowing you to create Creative Common licenses that can be shared between Word, Powerpoint, and Excel. The plugin uses the Creative Commons web service when generating new licenses. Finally, we wanted to make the license machine readable so we are including the RDF representation of the license in the OOXML package.*

Download the Creative Common plugin for Microsoft Office 2007. The updated version for XP/2003 (fixing some reported bugs) will be released very soon.

image

image

image

* Unfortunately, due to timing constraints we didn’t get around to avoiding a feature of Office where document properties are URL-encoded. This is mentioned in the documentation that comes with the plugin so you can build crawlers/indexers.

Lee Dirks has been leading our efforts in the Scholarly Communication lifecycle. It’s been an absolutely pleasure working with Lee since he joined Technical Computing, which has now been integrated with External Research in Microsoft Research. He’s been a great leader and I am so happy to see his vision on how to better support scholarly communication actually been discussed in public.

Scholarly Communication Lifecycle

I am at the Faculty Summit listening to Tony Hey’s keynote where he announces what we’ve been doing for months now. More blog posts to follow about what we are releasing.

.NetMap start a series of announcements by our group today. Marc Smith and his team have done a great job at delivering a network visualization plugin for Excel; and it’s open source.

Graph12.jpg

NetMap is a pair of applications for viewing network graphs, along with a set of .NET Framework 2.0 class libraries that can be used to add network graphs to custom applications.
A network graph is a series of vertices (sometimes called nodes) connected by edges. See this
Wikipedia article for an overview of network graphs.
.NetMap was created by
Marc Smith's team at Microsoft Research.

Open Web Foundation
27 Jul 2008
, Categories: Web

Google, BBC, Facebook, and other big names sponsor the “Open Web Foundation”. They are ignoring W3C and OASIS! Ok… you have to pay to participate in those organizations. But IETF???? Dare Obasanjo has some more detailed commentary on the announcement.

Nevertheless, this is going to be a great organization to monitor. With such Web heavyweights behind it, I expect specifications with wide adoption (in terms of implementations) to emerge.

Werner Vogel just finished his “Ahead in the Cloud” talk here at the DISC 08 workshop. As always, very entertaining. Amazon is light years ahead of the rest of the industry in thinking about and delivering utility computing to the world.

The talk was not technical given the audience here but here are some highlights/notes (subjective of course):

  • Werner introduced himself as “a systems administrator for a small bookshop in Seattle” :-)
  • Animoto story (always great hearing about this success story)
  • It appears that Amazon has the capacity to deal with MANY Animoto scenarios at the same time, if a situation arises
  • Apparently startups are not the only significant load for Amazon’s services
  • 20 billion objects in S3 and increasing exponentially
  • Utility computing in the cloud: moves CAPEX to OPEX
  • A very interesting point about utilization... if we make it cheap/easy to acquire resources (e.g. provision a server), engineers will not think twice releasing them when they don’t need them anymore hence improving utilization of the infrastructure.
  • “Software as a service”: Hmmm... I think I am starting to dislike this term. Werner used it. They are not delivering software but “functionality”. Salesforce, for example, hosts the functionality for our behalf. It’s “functionality as a service” or just “service”. Anyway... just terminology I guess.

My friend and colleague Kris Tolle is responsible this year for all things related to the 2008 Microsoft eScience Workshop. This meeting has been getting better and better with each year. The community amazes me every time with the wonderful things that they have to show. This year the workshop is co-located with the 4th IEEE International Conference on eScience, which means that you have the opportunity to attend two great events in one trip!

I will be actively participating in the program too... Kris keeps twisting my arm about a tutorial in Cloud Computing. By that time Microsoft would have released lots of new information about what is doing in the space (oh, I can’t wait for PDC 08). There will probably lots of discussion related to our “research-output” platform (stay tuned for a lot more information next week, including its official name :-)

Of course, the workshop is always about YOU and YOUR work! So, start writing those papers and register.

I am spending today, tomorrow, and Friday at the University of Washington at an NSF-funded workshop on how to use Hadoop to write map-reduce computations.

I already know map-reduce of course but it’s interesting to get some hands-on experience on Hadoop. Fun!

20 Petabytes/day! Now, how much is that in rice units? :-)...

I never knew Google was THIS massive!

I remember the conversation going something like this a couple of years ago(paraphrasing for dramatization purposes :-)…

- (My manager at the time:) Savas, we’d like to ask you to drive the latest technical efforts on WS-Transfer.

- Eh? You do know that from all the specifications in the WS-* ecosystem WS-Transfer is perhaps the one with which I seriously object, right? (the other being WS-RF of course :-)

- No, I didn’t. All the more reason for you to be involved then.

I ended up as one of the co-authors of the WS-Transfer submission to W3C.

Then, along came WS-ResourceTransfer and the recent proposal to standardize it. To be honest, I’d love it if this takes off, only to see Mark Baker at it again (or wake up from hibernation, as Mark Nottingham says :-)

The ideas behind Web Services were really promising but those involved (including me) seriously messed up on the way. We totally missed the Web and forgot that “simplicity” is a quality rather than a curse. And let’s not mention the tooling and middleware! :-(

I still believe in message-orientation as the good way to build distributed systems. All of the behaviors that we see on the Web today could have been implemented (and perhaps better) over basic SOAP. Only if we had just focused on behaviors like caching. But then again… why try to replicate the functionality of something that is already there, even if it’s not perfect?

The Web is not the solution for everything and neither are Web Services. Different nails need different hammers :-)

As I catch up with my blog feeds after having wrongly configured Outlook, which left me without updates for more than two weeks, I came across this great post by Robert of Digipede fame. Great little summary!

imageThe diagram to the right shows this continuum from infrastructure to platform to software.   Brief definitions of these parts are:

  • Infrastructure includes provisioning of hardware or virtual computers on which one generally has control over the OS; therefore allowing the execution of arbitrary software.
  • Platform indicates a higher-level environment for which developers write custom applications.  Generally the developer is accepting some restrictions on the type of software they can write in exchange for built-in application scalability. 
  • Software (as a Service) indicates special-purpose software made available through the Internet.”

Full article.

New toy on order
14 Jul 2008, Updated: 14 Jul 2008

imageAs those close to me know, I love photography. Unfortunately, my Canon EOS 300D Digital SLR got so wet while I was queuing at Glastonbury that it stopped working. So, I had to buy a new camera.

After doing some research I got a really good deal online on a Canon EOS 40D with an EF-S 17-85mm f/4-5.6 IS lens. Not only I got a tax-free deal but I also used Live Search Cashback to get a further $59 cash back. Not bad.

Looking forward to receiving it in around 8 days!

REST anti-patterns
12 Jul 2008, Updated: 12 Jul 2008

Jim pointed me to Stefan’s article on “REST anti-patterns”, which I somehow missed*. Very very nice indeed!

I particularly liked the following note:

The usual standard disclaimer applies: REST, the Web, and HTTP are not the same thing; REST could be implemented with many different technologies, and HTTP is just one concrete architecture that happens to follow the REST architectural style. So I should actually be careful to distinguish “REST” from “RESTful HTTP”.

Remember my “WS-Web (The Web using SOAP)” post from few years ago? :-)

REST (an architectural style) != Web (an application) != HTTP (a technology).

 

* Update: I now know why I missed it. Outlook has not been updating my feeds for more than two weeks now (my fault!)

Google’s “Protocol Buffers”
8 Jul 2008, Updated: 8 Jul 2008

Yes, exactly what we need... another IDL data exchange format... “protocol buffers”.

Glastonbury was absolutely fantastic. I really really enjoyed it. It rained on Thursday and Friday but Saturday and Sunday were gorgeous. The sun came out and allowed everyone to walk around, sit on the grass, and really enjoy the atmosphere and vibe of the festival.

I mostly enjoyed the arts, comedy, various shows this year. I still ended up right in front of the stage for many bands but it wasn’t as much as last year. I loved Manu Chao, KT Tunstall, The Feeling, The Gossip, Leonard Cohen, The Verve (they really rocked), Panic at the Disco, Massive Attack. Amy Winehouse’s performance was horrible; she was all over the place, couldn’t even stand, and couldn’t remember the lyrics.

I saw lots and lots of comedy shows and walked A LOT. I was doing 15-16 hour days around and about. It was so much fun.

I love Glastonbury. No police (at least not in uniform), 130,000 people, and still no fights, no arguments. Everyone is smiling (no, it’s not all due to drugs :-) and you can feel the positive energy in the air.

I met with Carole and Dave; we jumped around together for few shows. It was soooo great seeing them and spending time with them.

Unfortunately, there are no photographs from this year. While queuing to collect my ticket (they don’t send them abroad), my camera got so wet, it stopped working... there goes my good digital SLR (Canon EOS 300D). It was my fault for not protecting my backpack. I am still optimistic that the battery might have just discharged due to the water; we’ll see. If not, it’s time for a new one. Carole and Dave might have a couple of photos. When I get them, I’ll post them.

 

Now I am in London, spending few days with Jim and his partner. It’s great to see them both; I’ve missed them. Jim and I are focusing on writing chapters for our upcoming book. Yesterday we finished an article for InfoQ we’ve been preparing for some time together with Ian Robinson. I am pleased with the result.