Introducing Bazel 7.2’s Output Service protocol

By Chi Wang and Ed Schouten on 23 July 2024

Introduction

One of the exciting new features in Bazel 7.2 is support for the Bazel Output Service which allows Bazel to lazily materialize outputs when you access them with normal filesystem operations. This allows you to maintain visibility to the entire output tree while still saving network bandwidth.

It can be enabled by providing command line flags --experimental_remote_output_service and --experimental_remote_output_service_output_path_prefix. In this blog post we will describe what the Bazel Output Service is, and why you’d want to use it.

Background

When performing builds, Bazel stores output files generated by build actions in the bazel-out/ directory. This directory is used for two different purposes:

During the build, actions that run locally use this directory to access input files that were generated by previous actions (e.g., a linker action consuming object files generated by a compiler).
After the build, users may use this directory to access the results of the build. Either by accessing bazel-out/ directly or invoking bazel run.

For builds that have remote caching/execution enabled, the contents of this directory are populated by downloading objects from the Content Addressable Storage (CAS) to the local system. Unfortunately, this can frequently become a bottleneck. This is why Bazel 0.25.0 added the --remote_download_minimal command line flag, which can be used to limit downloading to just those outputs that are needed to run the build to completion. In Bazel 7, this flag was enabled by default. The --remote_download_toplevel and --remote_download_regex flags were also added to permit the user to access build results matching certain criteria.

Even though --remote_download_minimal is very good at making your builds run fast and consume a minimal amount of network traffic, it has one downside: it makes it harder to explore build results. Within CI this is often not an issue, as the Build Event Protocol can be used to record URLs of output files stored in the remote cache. However, at desk users often just want to run find bazel-out/ to see which files were built, and potentially inspect these files using disassemblers and other development tools. IDEs need some of the intermediate outputs to function but it’s hard to list the full outputs IDEs need beforehand. Debuggers need the debug symbols to work but it’s a waste to always download the symbols when you only occasionally need to debug, or it’s too late when you want to debug but didn’t tell Bazel to download the debug symbols.

The Bazel Output Service protocol

In Bazel 7.2, we’ve addressed this by introducing the Bazel Output Service protocol. This protocol can be used by Bazel to offload the creation of output files in bazel-out/ belonging to remote actions to a separate helper process. Bazel will continue to write locally created output files into bazel-out/ as it does right now. This means that Bazel and the Bazel Output Service are jointly responsible for managing the bazel-out/ directory.

A simple implementation of the Bazel Output Service could download and store the output files when Bazel requests that they are created. However, this behavior wouldn’t be appealing, as it is identical to invoking Bazel with --remote_download_all (i.e., downloading all output files). Modern operating systems provide a couple of features that an implementation of the Bazel Output Service can leverage to provide better performance. For example:

On systems like Linux, it could use FUSE to fully replace bazel-out/ by a virtual file system. Functions like readdir(3) and stat(2) could immediately return information for remote output files. However, downloading their contents from the CAS could be delayed until the first call to read(2).
On systems like macOS, it could use the File Provider API to create ‘dataless’ files that are managed by a File Provider extension. When any attempt is made to access these files, a call is made by the operating system to the extension to request that the file is materialized.

The Bazel Output Service protocol also provides methods for computing hashes of files, tracking/reporting of file system modifications, and cleaning the contents of the output directory. This significantly reduces the number of file system operations Bazel needs to perform.

bb_clientd: an implementation of the Bazel Output Service

The Bazel Output Service was first implemented back in 2021 as part of the Buildbarn project. Even though it’s part of the Buildbarn project and builds on top of many of its abstractions, it is intended to also be compatible with other remote cache/execution services. Together with changes to Bazel, we released a tool named bb_clientd. In addition to being a great tool for inspecting and contents of the remote cache and debugging actions, it provides a FUSE based implementation of the Bazel Output Service.

When using bbclientd as a Bazel Output Service, the overall setup looks like shown in the diagram above. As you can see, `bbclientd` makes use of a couple of auxiliary data stores:

Local CAS cache: At the REv2 protocol level, it is preferable to read files sequentially and entirely, as this makes it possible to validate their contents. However, when accessing output files through the FUSE file system, it is possible to access files partially and at random. In addition to preventing redundant downloads, the local CAS cache provides efficient random access to CAS objects.
File pool: As mentioned earlier, Bazel is still permitted to write output files of local actions into bazel-out/. Storing these files in the local CAS cache isn’t feasible, as they are mutable and cannot easily be evicted. bb_clientd therefore stores these files in a separate data store named the file pool.
Snapshots: When a build completes, bb_clientd can write a snapshot of the contents of bazel-out/ to disk. This allows bb_clientd to reload them after restarting. These snapshots are very compact, as they don’t store any file contents; only their digests.

In addition to implementing the Bazel Output Service protocol, bb_clientd also acts as a proxy for REv2 traffic. This allows bb_clientd to automatically extract credentials from “Authorization” headers, and reuse those to read objects from the CAS when files in the FUSE file system are accessed.

On the GitHub Actions page you may find pre-built binaries of bb_clientd. There are also container images that are of use when running bb_clientd as a sidecar of build environments running on Kubernetes. In addition, the repository also provides targets for building Debian packages.

Getting started with Bazel and bb_clientd

Prerequisites

Bazel 7.2 or newer.
Your project is already built with remote cache or remote execution.

Steps

Setup and run bb_clientd as described in the repo.
Note down the value of mount.mountPath and grpcServers.listenPaths.
- mountPath defaults to ~/bb_clientd
- listenPaths defaults to ~/.cache/bb_clientd/grpc
Add flag --experimental_remote_output_service=${listenPaths} and --experimental_remote_output_service_prefix=${mountPath}/outputs to your .bazelrc
Start the invocation as usual and you should notice bazel-out/ is managed by bb_clientd.

Future work

There is still room for improvement. Please try it out and leave your feedback in the Bazel repo. We want to address any issues and stabilize the feature in Bazel 8.

Bazel Blog

Introducing Bazel 7.2’s Output Service protocol

Introduction

Background

The Bazel Output Service protocol

bb_clientd: an implementation of the Bazel Output Service

Getting started with Bazel and bb_clientd

Prerequisites

Steps

Future work

Twitter

Discuss

Subscribe

Contribute

Archive