Over the last few weeks, as a spare time side project, I’ve been experimenting with a concept that has been circulating in my head for few years now. I’ve been trying to form an opinion, by writing code, around questions such as…
- What does a graph store with both “pull” querying and “push” reactive computing capabilities look like?
- What are the supporting abstractions/interfaces?
- Can it scale?
Over a series of few posts I will explore the outcome of that investigation. My hope is to get feedback on the approach since I would like to capture the results in a paper. But first, some history to put my thinking in context…
It was few years ago (2010-2011) when Erik Meijer and I worked on a no-SQL platform for data processing at Microsoft. Our goal was to support on-demand (pull) and continuous querying (push) queries on top of a document store (and eventually a graph store). We had map-reduce, expressed through LINQ, running on top of a distributed document store that also supported Rx for continuous queries. It was beautiful. Unfortunately, the priorities of our parent organization were such that “project Detroit” never became a product.
Erik and I went our separate ways but we kept in touch.
My continued fascination with graph stores started way before “project Detroit”, back when I joined Microsoft and Don Box’s group. I was reading a lot on knowledge representation and reasoning at the time. Then I moved to Microsoft Research where I designed and built the “Zentity” graph store to support the needs of the scholarly communication community. Zentity was heavily inspired by the ideas of WinFS and was built on top of the Entity Framework.
Bing – Information Platform
When I joined Bing, we had another go at a distributed graph store (amongst the other things that I was doing). We even had a natural language to LINQ-over-graph layer working. We supported media-related queries for the Xbox One voice search feature. I even presented the team’s work wearing the organization’s logo at the time :-) Unfortunately, while great technology, the project didn’t make it due to the risk associated with changing the existing, simple-but-not-as-capable platform. It was the right decision at the time.
Nevertheless, many of the supporting ideas and code (e.g. the data model, an RDF-like data model, a JSON-based data serialization similar to the later-introduced JSON-LD standard, serialization of query expressions using the same data model) became the foundation of Reactor.
Reactor is Microsoft’s large-scale, distributed stream/event processing platform used by Cortana to handle all types of near-realtime information. It’s based on Rx 3.0 that Bart, I, and others designed and built. Reactor can deal with billions of events, offering “at least once” semantics. Reactor is programmed used Rx queries. Think of Reactor as the execution engine behind an
IQbservable interface (actually, there is much more to that but let’s keep the discussion simple for now).
Graph node and edges as information streams
In my pre-Cortana presentations about personal digital assistants, I always made the case for them to be built in the cloud, close to a supporting information & knowledge platform that incorporates stream processing capabilities (amongst many other things). In those talks, I started describing the long-term idea of combining graphs and streams into a unified model for representing and reasoning over information. In such a graph data model, nodes and edges are effectively streams. When the value of a node or a edge is updated, a new addressable event is generated. Relationships (i.e. edges) can be drawn between specific events on the streams or between whole streams, in which case the latest observation on the stream is considered.
After I joined Facebook, I met Adam Wolff who has been thinking about reactivity as a way of building applications on top of Facebook’s huge data platform. Adam is responsible, amongst other things, for React, the framework that enables fast, responsive UI on the Web and mobile devices.
We coded together during a Facebook hackathon session, wayyyy into the early hours of the following day. He’s totally hard core. I had to go home to get some sleep after 24h of non-stop coding but he stayed at the office for another full day of meetings.
Adam raised a challenge for us to consider… How can we build Facebook-style application experiences that don’t require continuous polling for data? How can we build responsive UIs that react to changes to the underlying data? Even though not part of my main responsibilities at Facebook, I saw it as a great technical challenge and an opportunity to revisit the various ideas from the last few years.
The investigation on what I call “ReactGraph” is really an evolution of all of the above. Part of a journey that started all those years ago. Whether the outcome is any good or useful, that’s to be determined :-) No matter what, I had a great time coding.