The Hermes Series – Issue 2

Welcome to the second edition of The Hermes Series, a non-regular and often random collection of notes & thoughts about knowledge representation and reasoning, graphs, technology, and all things from around the Web.

In this second blog post in the series …

The Internet of Things and Glanceable Data

Tim Berners-Lee talked about the Giant Global Graph (GGG). He used the term to describe a digital world of interconnected, machine-processable data to complement the World-Wide Web (WWW), the human-oriented world of interconnected documents. GGG and WWW… get it? :-) The GGG was all about RDF, OWL and other Semantic Web technologies. I am a huge believer of the vision. I don’t particularly care about the specific technologies even though I do have my biases towards representation models that don’t require me to write many triples :-) Facebook’s OpenGraph showed us how a great user experience providers the incentive for information publishers to incorporate structured data into their pages.

imagereadDIYmate is an interesting Kickstarter project. The company wants to make it very easy for someone to build Web-connected objects that react to information on the Web. Checkout their videos showing a physical object moving and making sounds when you receive “likes” on Facebook, when a friend tweets something, when someone checks-in at your favorite place. I love the idea. It reminds me of Yahoo Pipes but with physical objects reacting to information events on the Web.

glanceableI think there is a tremendous opportunity in building automation that gathers, processes, and personalizes information on the Web. Call it agents, digital assistants, personalized aggregator services… call it Bob if you want :-) The point is that we are going to need help in processing the vast information that is being produced by all the Giant Global Graph-connected things. Computer/TV/phone/tablet screens might not be the only way that such aggregated, processed, and personalized information surfaces to us. Projects like readDIYmate demonstrate the possible bridges between the digital and the physical worlds for information awareness. In an internal version of this newsletter series I had pointed to the “glanceable data” concept, which falls in the same category.

And of course… we have to include devices such as the Nest, Nike Fuelband, Fitbit, and Aria (which my partner ordered and now tells me that I have to use :-( in the category of devices connected to the Internet of all things. Everyone seems to be building similar functionality, which leads me to believe that a platform is necessary. I think the people behind the company Presence are on the right track… a platform to connect everything on the Internet. Facebook is successful in connecting people. However, we need a platform that connects everything… people, physical objects, data. I really liked this quote from the founder:

“This is about making your interactions with spaces and objects more similar to your interaction with people and friends” (source TechCrunch interview)

But is the connection of devices to the Internet enough? Companies such as Fitbit and Nike create isolated islands of information, they lock the users data behind their respective walls. Shouldn’t all data and all devices be interconnected in a Giant Global Graph? Who is going to enable that capability? The value to users comes from a world full of bridges between islands.

The Economies of Digital Assistants and Personalized Experiences

carrobotI don’t think that there is any doubt that natural language interaction is going to be THE way we consume and produce information in the very near future. The concept of a digital assistant with whom we can interact has been a goal of our industry for decades. Such digital assistants take all forms and shapes, from Siri on the iPhone, to Roomba, to this cute little robot in the car.

As I wrote above, we need help with processing all the information on Web. We need information processing agents that operate in a manner tailored to our needs and interests. My observation is that everything that the big companies do these days focuses on learning as much as possible about their users so that they can offer highly-personalized services to them. Google hasn’t hidden the fact that their recent privacy policy changes are aimed towards that goal (note the “tailored for you” part). Facebook is, of course, already mining their users’ data.

It goes without saying that companies are focusing on how to make profit. As they get to know more about their users, they can sell more targeted ads, they can offer more relevant-to-the-user services. Here are examples of news in this space that I noticed over the last couple of weeks:

  • Amazon offers recommendation services for daily deals. They allow the user to express their likes and dislikes. Effectively, Amazon explicitly learns about the user’s interests and, as a result, can offer better deals. But of course, with Amazon collecting so much consumer behavior data, it’s only a matter of time before they can offer targeted deals automatically.
  • Glimpse is a service that attempts to leverage Facebook’s likes in order to learn more about users’ products preferences. They effectively join their product database with a user’s likes.
  • Narrative Science developed a program that can write articles. It can create stories tailored to the users’ interests and even tone preferences… dry vs sensational. The Wired article “Can an Algorithm Write a Better News Story Than a Human Reporter” is a great read.
  • Google introduced Schemer, which is an activity recommendation service. Again, no surprise. You add “schemes”, or in other words things that you like doing, and Schemer will offer recommendations. As you do stuff, Schemer gets better at recommendations.

I would categorize all the above in the “digital assistants” space, as I also discussed in the previous section. They are offered to us users as little helpers that process information on our behalf and notify us about stuff that are of interest to us. Whether they can be considered “intelligent”, that’s a different topic :-) I personally avoid the use of the term because it has so many connotations. I have used the following spectrum on a number of presentations, inside and outside Microsoft…


I used to have “intelligence” instead of “understanding” but since the former term is so misunderstood and overloaded, I stopped using it. But I diverge.

As before, here we have another case of information islands. My (inferred) interests, consumer habits, activity timelines are isolated from service to service. There is no interconnection. Most importantly, it’s mostly the companies that benefit from that data. Yes, I do consume a personalized experience or receive relevant offers but, at the end of the day, it’s other companies that party on my data. Perhaps a new economic model is necessary.

Scott Merrill reviewsThe Intention Economy” by Doc Searls who argues that we should really change the above game. Rather than allowing companies to find things about us, we should really express what we want to do. We should trust our data with a “fourth party” and allow companies to come to us based on what we want to accomplish, buy, consume. I sympathize with Searls premise. Searls writes: “We need ways of gathering, organizing, and controlling the data that we generate and that others suck in from our digital crumb trails. We also need new understandings about how personal data might be used.” (source TechCrunch)

Whether an intention-based economy with fourth parties acting as gatekeepers of personal data is going to be possible, I don’t know. I think that the work being done around digital assistants by so many companies will surface this issue big time.

And since the discussion was about digital assistants, let’s take a trip back to 1979. As always, Xerox Parc had the vision.

Xerox Parc demonstrates the “office of the future”… effectively a digital assistant.

Artificial Stupidity, Meaning, and Structured Data

As I was writing earlier about my attempt to avoid the use of the term “intelligence”, I remembered of Stephen Wolfram‘s latest post. Regular readers and those in my organization will know that WolframAlpha is one of my favorite services out there. I really admire the work that Stephen and his team are doing.

Stephen talks about how they are “Overcoming Artificial Stupidity” :-) Effectively he explains how they have been improving the natural language understanding capability of WolframAlpha over the years. And since I’ve been hanging out a lot with language understanding folks lately, I get how usage data improves the accuracy of a language understanding system.

In my presentations and discussions around knowledge I always reference WolframAlpha as an example of a knowledge system that can do really great things but fails at some simple ones. I have been asking the following question as an example… “Who are the members of Coldplay?“. The answer I used to get was the definition of “member” from the dictionary. WolframAlpha didn’t know about the music domain so it tried its best to give me something else. Well, WolframAlpha now gives the correct answer. I wonder whether they found my query in their logs from the many times I used it :-) Just joking. They just hadn’t ingested the data.

WolframAlpha can now even answer questions such as “When were Radiohead formed?“. However, it can’t yet answer my next set of test questions: “How many members were there in Deep Purple?” (it knows the members and the years but it doesn’t count them) or “When did Berlin become the capital of Germany?” (it understand Berlin as the capital but it doesn’t answer the specific question).

WolframAlpha is evolving at a very fast pace, it’s improving, and its knowledge base is expanding with more and more domains. As Stephen says, he aspires to make computers do more than humans. I truly wish them the best. Their work is truly inspiring.


Related to language understanding… “Iris” is  Readability‘s new content normalization service. As per the blog post announcing the feature, Iris will attempt to draw meaning from the Web and it’s inspired by IBM’s Watson. In my mind, Iris falls under the category of “content understanding”. It’s absolutely the future… trying to understand documents, language, gestures and connect them with structured data.

Well, talking about structured data, I think that the move by Flickr to push structured data to Pinterest is very interesting. I find it a great example of the proliferation of structured data on the Wed. Of course Flickr wants photographs to be attributed but no matter what the reason, it’s definitely the right strategy.

Application Discoverability and Actions

When we talk about structured data, we cannot ignore Facebook. Every week they seem to announce a new feature around OpenGraph. They execute really fast in this space and they should be congratulated:

  • The social stream can now be used to discover applications on Android, with support for deep linking. This a brilliant strategy by Facebook. They continue to make their platform useful to application creators while they collect even more structured data about what the users are doing on their platform.
  • Third party developers can now use the Open Graph to register actions in the user’s social streams. As an example, Foursquare checkins can now include a “save this place” action. Not only the actions represent a convenient way for users to perform a task, they can also drive traffic to a service or application because of the network effect on Facebook’s platform.

Just to further emphasize the last point… Traffic to Pinterest increased by 60% when they integrated with the Open Graph.

Advice for Startups

Even though the following articles talk about startups, I believe that the advices they give are equally applicable to teams within large companies. I think any leader, any project would benefit.

  • In “How Technology Can Solve The Financial Industry’s Deficit of Trust“, Mike Sha discusses the findings of a study that shows 75% of those questioned trusting technology companies vs 50% for the financial ones. The reasons that he gives can stand as advice to any team starting a project:
    • Put Users First & Trust: Do what’s right for the user.
    • Truth & Trust: The data never lies.
    • Awesomeness & Trust: Focus on design and user experience.
  • In “Data-Driven Decisions for Startups“, Uzi Shmilovici talks about the power of having data in the decision making process. He references some interesting works. I liked the quote from Jim Barksdale: “If you don’t have any facts, we’ll just use my opinion.” :-)
  • Twitter talks about the value of innovating through experimentation. I couldn’t agree more!!!
  • In “Disillusionment of an Entrepreneur“, Prema Gupta talks about his experience building and growing a company to 10M users. Even though he reached his original goal, his appetite for more only grew, giving him the illusion that he needed to do more in order to feel successful. But he didn’t feel happy. This is a reminder to all of us (and mostly me I guess)… there is more to life than the next career goal/success.
  • In “The Billion Dollar Mind Trick“, Nir Eyal and Jason Hreha talk about the three steps for capturing and keeping a user, giving Instagramas an example:
    • Educate and Acquire With External Triggers (e.g. links in Twitterstreams and Facebook feeds).
    • Create Desire
    • Affix the Internal Trigger… create habit.

$0 to $1B… Yes, It Was That Easy To Scale

And talking about the billion dollar mind trick, here’s how “easy” it was to scale to 30M users and ultimately to an $1B acquisition… “Scaling Instagram“. They started with 2 engineers and by the time they scaled to millions of users they only had 5 engineers. Very impressive!

It is a tech-oriented talk that every service engineer should read!!! :-) My key takeaways:

  • Simplicity!
  • Instrument everything! (here it is again :-)
  • Loose coupling.
  • Extensive monitoring.
  • If you are tempted to reinvent the wheel… don’t! (I can’t emphasize this enough)
  • Focus on making what you have better!
  • Stay nimble = remind yourself of what’s important.
    • Your users around the world don’t care that you wrote your own DB (or that you wrote another distributed computing platform or another messaging layer or another transactions platform)… it’s the functionality offered that matters.

Working With Data

Random Bits


covertilesThe Instagr/am/bient project encouraged folks to associate ambient music with photographs from Instagram. What ambient track would you associate with a particular photograph?

It took them 238 takes in order to get it right. Very impressive and very cool concept. Check out their “behind the scenes” video to see how they did it. (via Janet’s Facebook wall :-)

CodexEven though it’s a product, I am including evr1 in the art section of this edition. Perhaps, I should have created a “tech-hippie” section :-) I don’t know, you decide. Anyway… How would you feel carrying the human knowledge in your key ring? Would you pay $140 for it (oh… and you won’t be able to read anything… you just carry it). From The Bible and The Koran to Darwin’s “The Origin of Species by Means of Natural Selection “, from the US Army Survival Guide to Plato’s writings.

If you made it all the way down here, thank you!!! :-)