Imagine a world where your memory is enhanced by a team of intelligent agents, working tirelessly to keep you organized and informed. Welcome to BrainExpanded.
It’s time for an update on BrainExpanded, a system that helps me understand the technologies involved in enhancing our abilities to remember, consume, and organize information.
For those who have been following along, you know that I’ve been exploring ideas and technologies related to memory and a multi-agent system that operates over it. The core concept is that such a system can expand a human’s ability to remember, consume information, and stay organized by allowing a set of agents to work on their behalf. I spent few weeks, on and off, working on the end-to-end user flow and I am back to report the progress so far.
In the “BrainExpanded – Introduction” post, I presented a high-level architecture concept. Back then, I built some agents in Python and a custom memory system in .NET + Entity Framework + mySQL to help me with my initial experiments. Things have evolved since then.
Here’s a high-level view of the working system as of today.
There is an iOS app of course.
The iOS app connects to a Web API layer. This is built in ASP.NET Core.
The memory database is a GraphQL server. I used dgraph in a docker container. I had to overcome some limitations of dgraph’s ability to support nested filters but otherwise I am very pleased with the choice of GraphQL as the WebAPI↔︎memory and agents↔︎memory interfaces. Playing with GraphQL data models, queries, mutations, and subscriptions was fun.
The agents are built in Python. They use GraphQL subscriptions to get notified about new memories. They then introduce their own memories into the memory graph. They have an internal queue to process one memory at a time. Since there is no real-time interaction between a user and the agents (yet), they can take their time to process the new memories.
I also used Ollama as the interface between the agents and the Llama and Llava models. The Llama model is used for text-related operations such as summarization, topic extraction, and prompt generation for image creation. The Llava model will be used for image understanding.
Stable Diffusion running locally on my MacBook Pro is used for image generation.
I used MacOS’s text-to-speech (TTS) engine. The quality of the synthesized speech isn’t great. This component needs to be replaced by one of the cloud services that produce natural speech.
Finally, I used the moviepy Python module to stitch everything together into a short video.
User flow
Here’s a video of the experimental user experience. Obviously, this is far from polished, but it does help me explore potential user value and also forces me to build all the backend services for real.
Here’s what happened.
I browsed the BBC and identified two articles that might be of interest to me.
However, I didn’t have time to read them. Instead of sending an email-note to myself with the link (refer to BrainExpanded – Timeline), which I would probably never read, I share the article with BrainExpanded via iOS’s share sheet.
The article’s link makes its way to the BrainExpanded memory.
The agents that are listening for Link memories get notified and process each link in turn.
Each agent downloads the article’s content. When agents start working together, this step could be shared.
The summarization agent summarizes the content of the article and adds the summary as a new memory. These new summary memories appear in the iOS app’s timeline. An evolution of the experience would allow me to read them. At this point, it’s worth noting that the timeline doesn’t really render the memories in a user-friendly way. Remember… it’s just a prototype 🙂
The video summary agent also summarizes the article (another step that could be shared between agents), generates images that are related to the content of the article, generates the narration of the summary, and stitches everything together in a video. This process takes about a minute on my MacBook Pro M1 Max (64GB RAM). I haven’t played with the image generation yet. The agent uses Llama to generate the image generation prompts from the summary. StableDiffusion generates the images, which aren’t really great. A better implementation would reuse images from the article and even do research to find similar articles and relevant images from the Web.
I didn’t record the sound when I made the above video. When the summary videos automatically start as the user swipes left/right, the generated voiceover for each article automatically starts playing back.
The user can now watch the videos with the summary of the articles, just like TikTok videos.
Lessons learned
Obviously, the app isn’t useful to end users (yet). However, building the end-to-end system for real helped me learn a lot. Here are some highlights:
GraphQL as an interface to graph-organized data.
Building iOS apps.
Using github’s Code Copilot to make progress as if I had a small team of engineers working with me. Since there are many technologies involved, Copilot helped me find solutions to problems I encountered along the way. I still had to debug a lot of the code that it wrote but I totally appreciate its usefulness. No, I don’t think that the art of programming is gone. If anything, one has to understand all the technologies involved and how they are integrated in order to have a meaningful interaction with a code agent.
Conclusion
It has been fun coding as a way of learning, this time with new tools at my disposal. I am not done with the BrainExpanded journey. If anything, I think it is just starting. I have some ideas around agent frameworks/technologies, user privacy, hosting of the LLMs on personal hardware, and more. All of them deserve some investigation time.
We aren’t far from a robotics revolution in terms of scale and adoption. Every robot will effectively be an agent. It will have to sense and understand the real world. Then, it will have to act. Having access to the user’s digital world will help it be even more useful.
I am going to continue experimenting with more agents and with integration with sources such as Obsidian (second brain), calendar, email, and more. It’s going to be fun.
Savas Parastatidis
Savas Parastatidis works at Amazon as a Sr. Principal Engineer in Alexa AI'. Previously, he worked at Microsoft where he co-founded Cortana and led the effort as the team's architect. While at Microsoft, Savas also worked on distributed data storage and high-performance data processing technologies. He was involved in various e-Science projects while at Microsoft Research where he also investigated technologies related to knowledge representation & reasoning. Savas also worked on language understanding technologies at Facebook. Prior to joining Microsoft, Savas was a Principal Research Associate at Newcastle University where he undertook research in the areas of distributed, service-oriented computing and e-Science. He was also the Chief Software Architect at the North-East Regional e-Science Centre where he oversaw the architecture and the application of Web Services technologies for a number of large research projects. Savas worked as a Senior Software Engineer for Hewlett Packard where he co-lead the R&D effort for the industry's Web Service transactions service and protocol. You can find out more about Savas at https://savas.me/about