BrainExpanded - Copilot - savas parastatidis

Happy New Year everyone!

I was planning for my next BrainExpanded post to be a report of the entire end-to-end experience and backend working. However, a coding-related moment with Microsoft’s Github Copilot led me to this note.

Previous posts

Over the holidays, I made good progress on BrainExpanded. It’s been so much fun!

The initial prototype of the Timeline was all in Python. However, given that I was continously experimenting and rethinking about parts of the end-to-end system, it became more difficult to refactor/reshape the code. Most likely it’s just me. Python was getting in my way.

> `mv Timeline Memory`

I rewrote most of what I already had in C#. The “Timeline” became just “Memory”. The thinking is that a temporal view of my recorded memories is just one of the few projections I can create over the contents of the memory store.

As I did with my first Python implementation, I created memory abstractions in C# so that I can have different implementations for my experiments: in-memory, SQLite-based, and soon a graph DB one. I used the Entity Framework to implement the SQLite-based implementation. There is now an orchestrator process that uses the independent and reusable memory component.

Many of the LLM-related tooling prioritize the Python ecosystem. So, I implemented a separate, very simple service in Python (using Flask) to host the imlpementation of my AI agents.

I have been, of course, using Visual Studio Code for all my prototyping needs. Copilot has been extremely useful. Its suggestions have saved me so much code-typing time. It even created additional unit tests for me. However, the reason for this post is what it did when I asked it to generate some additional artifacts for me.

I have seen examples on the web of coding assistants writing entire programs after developer prompting. Impressive stuff.

The Memory

The Memory component’s interface looks like this (only listing part of it):

public interface IMemory
{
    // Add new memories
    Task<IMemoryEntry> Add<T>(
        T content,
        IMemorySource source,
        IEnumerable<Guid> inferredFrom)
        where T : MemoryEntryContent;
    // Subscribe to events for new memories
    IObservable<IMemoryEntry> NewEntries { get; }
}

The Add() method returns an IMemoryEntry, which is a representation, as its name suggests, of a memory entry. The developer must provide implementations of these entries that inherit MemoryEntryContent.

public abstract class MemoryEntryContent
{
    public MemoryEntryContent(Uri type) { }
    public Uri Type { get; }
}

Side note: The above should have been an interface. It’s a long story on why it’s a class (thank you Entity Framework for not letting me model things the way I wanted).

Now, I can create C# types for the different memories I manually record (via some app) or the agents produce. Here are some:

public class Note : MemoryEntryContent
{
    public Note(): base(type: SchemaIds.Note) { }
    public string Text { get; set; } = string.Empty;
}
public class ToDo : MemoryEntryContent
{
    public ToDo(): base(type: SchemaIds.ToDo) { }
    public string Text { get; set; } = string.Empty;
    public DateTime Reminder { get; set; } = DateTime.MinValue;
}
public class WebLink : MemoryEntryContent
{
    public WebLink(): base(type: SchemaIds.WebLink) { }
    public Uri Link { get; set; } = new Uri("about:blank");
    public string Notes { get; set; } = string.Empty;
}
public class DocText : MemoryEntryContent
{
    public DocText(): base(type: SchemaIds.DocText) { }
    public string Text { get; set; } = string.Empty;
}
public class Summary : MemoryEntryContent
{
    public Summary(): base(type: SchemaIds.Summary) { }
    public string Text { get; set; } = string.Empty;
}

Relationships between memory entries are handled by the memory (see the IMemory.Add() method). When a new memory entry is introduced, the memory triggers those agents that have subscribed to be notified (see the IMemory.NewEntries property). This way, when I record a WebLink as a memory, the WebDocumentRetrieverAgent detects it, downloads the web page, extracts the main text from it, and produces a new DocText memory which is added into the memory. The addition of a DocText memory triggers the WebDocumentSummarizerAgent which will create a Summary memory. It will also trigger the TopicsExtractorAgent which will create a Topics memory (the schema of which isn’t shown above).

So far, the above are implementation details of what I had pretty much discussed in my previous post (BrainExpanded – The Timeline).

Notice that each of the types must have a schema ID, which is a URI to its base class. That URI acts as a descriminator that we can use during JSON deserialization. The IDs look like these:

internal static class SchemaIds
{
    public static Uri DocText => new Uri("https://schemas.brainexpanded.org/2024/doc_text");
    public static Uri WebLink => new Uri("https://schemas.brainexpanded.org/2024/web_link");
    public static Uri Note => new Uri("https://schemas.brainexpanded.org/2024/note");
    public static Uri Topics => new Uri("https://schemas.brainexpanded.org/2024/topics");
    public static Uri Summary => new Uri("https://schemas.brainexpanded.org/2024/summary");
    public static Uri ToDo => new Uri("https://schemas.brainexpanded.org/2024/todo");
}

Not contract-first 🙁

Since I separated the main orchestration from the LLM-based agents, I had to recreate the above types in Python. This way, when the .NET process communicates with the AI agents implemented in Python, there would be agreement between on the exchanged JSON data structures.

I should have known better. Back in the service-oriented days, it was good practice to start with the contracts first before writing any code.

Even though I already had the set of classes representing memories in C# and in Python, I thought that I should still capture the implied contracts using some declarative notation, let’s say JSON Schema. I could then pretend that the implementations were actually based on those contracts. But I didn’t want to spend time writing the JSON Schema contracts, so I asked Copilot.

Copilot generates the contracts

I used the “edit with Copilot” feature in Visual Studio Code. I added the relevant C# classes as context and used the prompt below.

Create the json schemas for the DocText, Note, Summary, Topics, and WebLinks C# classes. For the "id" property, use a const value. For the const values use the actual schema IDs from the SchemaIds.cs class.

Copilot did a perfect job and created exactly the files I needed, just like this one:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "id": "https://schemas.brainexpanded.org/2024/web_link",
  "type": "object",
  "properties": {
    "link": {
      "type": "string",
      "format": "uri"
    },
    "notes": {
      "type": "string"
    },
    "type": {
      "type": "string",
      "const": "https://schemas.brainexpanded.org/2024/web_link"
    }
  },
  "required": ["link", "notes", "type"]
}

That’s just awesome. It did exactly what I needed.

BrainExpanded – Copilot

> mv Timeline Memory

The Memory

Not contract-first 🙁

Copilot generates the contracts

> `mv Timeline Memory`