Every call to ovn-nbctl is a fork. A new process, a new connection to the OVSDB socket, one operation, exit. At a certain scale, on a busy node, this becomes your networking stack's bottleneck.

Back in 2020, OVN-Kubernetes was doing a lot of forking.

Some background

OVSDB is the database that backs Open vSwitch and OVN. It speaks a JSON-RPC protocol over a Unix socket or TCP, defined in RFC 7047. The ovn-nbctl and ovs-vsctl tools allow you to interact with the database to configure it. Invoke them, they connect, do the thing, and exit.

The way that most CMS's would drive OVSDB is via a long-lived connection. You would call a Monitor() RPC to get notified of all changes to the database. And call Transact() when you wish to change some state.

Open vSwitch offers a C-IDL and Python-IDL that easily allow you to build such a tool... But what if you're building something in the cloud-native lingua franca (Golang)?

Enter libovsdb

Where libovsdb came from

I first wrote libovsdb in 2014 at Socketplane - a startup building container networking on top of Open vSwitch, before Docker had a real networking story. We'd previously done this in Java, for OpenDaylight... but to do this from Go we needed a new library.

Docker acquired Socketplane in 2015. The networking work eventually flowed into libnetwork, was used for a while in Docker Swarm, but was then ultimately abandoned as we opted to move away from Open vSwitch. libovsdb was left languishing and received relatively few updates from myself and the other maintainers.

The problem at scale

OVN-Kubernetes is a Container Network Interface (CNI) plugin. It's based on top of Open Virtual Networking (OVN), which itself is based on top of Open vSwitch. Back in 2020, the way it configured OVSDB was to call the ovn-nbctl and ovs-vsctl binaries whenever it needed to make a change. Every logical port created, every ACL updated, every address set change was a new process. Fork, connect, operate, exit. At kubernetes scale, that cost added up.

The qualitative problems were visible: slow pod startup under load, intermittent failures under concurrent network events, high CPU overhead from process spawning on busy nodes. The kind of flakiness that's hard to reproduce deterministically and expensive to debug.

On looking through the codebase too it was clear that there were several second-order issues too:

  • No easy way to reason around concurrency - who's doing what to the database in parallel
  • No way to check if something exists in the database already without creating a transaction
  • No way to perform an action in response to something changing in the database (i.e a callback)
  • Not easy to test - requires a real OVSDB server

If only someone had written a Go library that could do some of this!

Revamping libovsdb

It would be fair to say that the original libovsdb had bitrotted just a little bit. eBay had a better maintained fork, but we were still the canonical upstream. I reached out to them so we could start the process of bringing their changes back into the upstream, while also closing out some of the issues in the patch queue. A lot of time was spent hardening libovsdb, ensuring that encoding was RFC compliant, working on performance, ensuring the Monitor() worked correctly etc... Not only that, but OVSDB (the protocol) had gotten some new RPCs in that time - Monitor2 and Monitor3 - that we needed to add support for too. Believe it or not, not all of that work was implemented and there is still more to do for any interested parties - reach out if you would like some pointers on where to start.

One of the big values of libovsdb was that it would let you work with Go structs instead of having to encode things in OVSDB-style JSON yourself. One of the largest tasks ahead was going to be modelling all of the tables used by OVS and OVN. Thankfully a colleague of mine, Adrián Moreno, happened to have a similar problem in another project. This led to him contributing a code-generation tool that was able to generate the Go structs directly from the OVSDB schema, and a mapper layer that would allow this to be used fluently with the core libovsdb API!

This unlocked a much better API design for users and with better cache handling, connection retry logic etc... we were ready to start looking at migration.

I bravely ran away!

My mission was done. The library was working well enough, and I'd performed a PoC of migrating one or two OVN-Kubernetes controllers to the model and was able to show that it was more stable and easier to maintain moving forwards. At that point, I wrote up a document, did a few hand off meetings, passed the baton to Jaime Caamaño Ruiz and departed for pastures new.

The payoff was clear. A single persistent connection instead of thousands of ephemeral ones. Reads served from a local cache without hitting the database. Writes batched into atomic transactions, with semantics that were much clearer to callers... mainly thanks to Jaime's work on abstracting this further through the libovsdb_ops library in ovn-kubernetes.

Closing thought

libovsdb is still actively maintained and used in production. A library written for a startup that no longer exists, for a product that pivoted away from it, ended up as foundational infrastructure in one of the most widely deployed Kubernetes networking stacks and in NVIDIA's GeForce NOW platform.

This taught me that software has a longer half-life than the organisations that produce it... and you never know when something you build is going to turn out to be incredibly valuable in future.