BrainExpanded – Copilot

Happy New Year everyone!

I was planning for my next BrainExpanded post to be a report of the entire end-to-end experience and backend working. However, a coding-related moment with Microsoft’s Github Copilot led me to this note.

Previous posts

Over the holidays, I made good progress on BrainExpanded. It’s been so much fun!

The initial prototype of the Timeline was all in Python. However, given that I was continously experimenting and rethinking about parts of the end-to-end system, it became more difficult to refactor/reshape the code. Most likely it’s just me. Python was getting in my way.

> `mv Timeline Memory`

I rewrote most of what I already had in C#. The “Timeline” became just “Memory”. The thinking is that a temporal view of my recorded memories is just one of the few projections I can create over the contents of the memory store.

As I did with my first Python implementation, I created memory abstractions in C# so that I can have different implementations for my experiments: in-memory, SQLite-based, and soon a graph DB one. I used the Entity Framework to implement the SQLite-based implementation. There is now an orchestrator process that uses the independent and reusable memory component.

Many of the LLM-related tooling prioritize the Python ecosystem. So, I implemented a separate, very simple service in Python (using Flask) to host the imlpementation of my AI agents.

I have been, of course, using Visual Studio Code for all my prototyping needs. Copilot has been extremely useful. Its suggestions have saved me so much code-typing time. It even created additional unit tests for me. However, the reason for this post is what it did when I asked it to generate some additional artifacts for me.

I have seen examples on the web of coding assistants writing entire programs after developer prompting. Impressive stuff.

The Memory

The Memory component’s interface looks like this (only listing part of it):

public interface IMemory
{
    // Add new memories
    Task<IMemoryEntry> Add<T>(
        T content,
        IMemorySource source,
        IEnumerable<Guid> inferredFrom)
        where T : MemoryEntryContent;
    // Subscribe to events for new memories
    IObservable<IMemoryEntry> NewEntries { get; }
}

The Add() method returns an IMemoryEntry, which is a representation, as its name suggests, of a memory entry. The developer must provide implementations of these entries that inherit MemoryEntryContent.

public abstract class MemoryEntryContent
{
    public MemoryEntryContent(Uri type) { }
    public Uri Type { get; }
}

Side note: The above should have been an interface. It’s a long story on why it’s a class (thank you Entity Framework for not letting me model things the way I wanted).

Now, I can create C# types for the different memories I manually record (via some app) or the agents produce. Here are some:

public class Note : MemoryEntryContent
{
    public Note(): base(type: SchemaIds.Note) { }
    public string Text { get; set; } = string.Empty;
}
public class ToDo : MemoryEntryContent
{
    public ToDo(): base(type: SchemaIds.ToDo) { }
    public string Text { get; set; } = string.Empty;
    public DateTime Reminder { get; set; } = DateTime.MinValue;
}
public class WebLink : MemoryEntryContent
{
    public WebLink(): base(type: SchemaIds.WebLink) { }
    public Uri Link { get; set; } = new Uri("#");
    public string Notes { get; set; } = string.Empty;
}
public class DocText : MemoryEntryContent
{
    public DocText(): base(type: SchemaIds.DocText) { }
    public string Text { get; set; } = string.Empty;
}
public class Summary : MemoryEntryContent
{
    public Summary(): base(type: SchemaIds.Summary) { }
    public string Text { get; set; } = string.Empty;
}

Relationships between memory entries are handled by the memory (see the IMemory.Add() method). When a new memory entry is introduced, the memory triggers those agents that have subscribed to be notified (see the IMemory.NewEntries property). This way, when I record a WebLink as a memory, the WebDocumentRetrieverAgent detects it, downloads the web page, extracts the main text from it, and produces a new DocText memory which is added into the memory. The addition of a DocText memory triggers the WebDocumentSummarizerAgent which will create a Summary memory. It will also trigger the TopicsExtractorAgent which will create a Topics memory (the schema of which isn’t shown above).

So far, the above are implementation details of what I had pretty much discussed in my previous post (BrainExpanded – The Timeline).

Notice that each of the types must have a schema ID, which is a URI to its base class. That URI acts as a descriminator that we can use during JSON deserialization. The IDs look like these:

internal static class SchemaIds
{
    public static Uri DocText => new Uri("https://schemas.brainexpanded.org/2024/doc_text");
    public static Uri WebLink => new Uri("https://schemas.brainexpanded.org/2024/web_link");
    public static Uri Note => new Uri("https://schemas.brainexpanded.org/2024/note");
    public static Uri Topics => new Uri("https://schemas.brainexpanded.org/2024/topics");
    public static Uri Summary => new Uri("https://schemas.brainexpanded.org/2024/summary");
    public static Uri ToDo => new Uri("https://schemas.brainexpanded.org/2024/todo");
}

Not contract-first 🙁

Since I separated the main orchestration from the LLM-based agents, I had to recreate the above types in Python. This way, when the .NET process communicates with the AI agents implemented in Python, there would be agreement between on the exchanged JSON data structures.

I should have known better. Back in the service-oriented days, it was good practice to start with the contracts first before writing any code.

Even though I already had the set of classes representing memories in C# and in Python, I thought that I should still capture the implied contracts using some declarative notation, let’s say JSON Schema. I could then pretend that the implementations were actually based on those contracts. But I didn’t want to spend time writing the JSON Schema contracts, so I asked Copilot.

Copilot generates the contracts

I used the “edit with Copilot” feature in Visual Studio Code. I added the relevant C# classes as context and used the prompt below.

Create the json schemas for the DocText, Note, Summary, Topics, and WebLinks C# classes. For the "id" property, use a const value. For the const values use the actual schema IDs from the SchemaIds.cs class.

Copilot did a perfect job and created exactly the files I needed, just like this one:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "id": "https://schemas.brainexpanded.org/2024/web_link",
  "type": "object",
  "properties": {
    "link": {
      "type": "string",
      "format": "uri"
    },
    "notes": {
      "type": "string"
    },
    "type": {
      "type": "string",
      "const": "https://schemas.brainexpanded.org/2024/web_link"
    }
  },
  "required": ["link", "notes", "type"]
}

That’s just awesome. It did exactly what I needed.

Savas Parastatidis

Savas Parastatidis works at Amazon as a Sr. Principal Engineer in Alexa AI'. Previously, he worked at Microsoft where he co-founded Cortana and led the effort as the team's architect. While at Microsoft, Savas also worked on distributed data storage and high-performance data processing technologies. He was involved in various e-Science projects while at Microsoft Research where he also investigated technologies related to knowledge representation & reasoning. Savas also worked on language understanding technologies at Facebook. Prior to joining Microsoft, Savas was a Principal Research Associate at Newcastle University where he undertook research in the areas of distributed, service-oriented computing and e-Science. He was also the Chief Software Architect at the North-East Regional e-Science Centre where he oversaw the architecture and the application of Web Services technologies for a number of large research projects. Savas worked as a Senior Software Engineer for Hewlett Packard where he co-lead the R&D effort for the industry's Web Service transactions service and protocol. You can find out more about Savas at https://savas.me/about