Bazel sync and resolved files

By Klaus Aehlig on 09 July 2018

When building against external dependencies, it is often desirable to closely follow upstream of those projects. On the other hand, reproducible builds can only be achieved if all dependencies are pinned to specific versions. So updating the pinned versions becomes a frequent task. We recently added (to bazel at HEAD) a couple of changes to make this task easier. While we have plans to further improve the workflow of pinning and updating versions of external dependencies, we encourage everybody to try out the steps below and provide feedback.

Return values of repository rules

Say you're trying to follow several git repositories, including protobuf. Then an entry like

load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

git_repository(
  name = "com_google_protobuf",
  remote = "https://github.com/google/protobuf",
  branch = "master",
)

will follow the active branch master. When the rule is actually executed, it will no longer return None, as it has important information to report: the commit that was actually checked out. More precisely, it will return a dict with arguments that can be used to obtain the same checkout, even if master moves ahead.

{
    "name": "com_google_protobuf"
    "remote": "https://github.com/google/protobuf",
    "commit": "78ba021b846e060d5b8f3424259d30a1f3ae4eef",
    "shallow_since": "2018-02-07",
    ...
}

In particular, the branch argument is replaced by the appropriate commit argument. A shallow_since parameter is added as well, to support cloning that commit in a shallow way.

The new flag `--experimental_repository_resolved_file`

To collect the values returned by the repository rules, we added a new option --experimental_repository_resolved_file. If provided, it records in the specified file all the repository rules that where actually executed, together with their arguments and return values. The syntax is valid Skylark, so that the file can be included later in build specifications. To do so, you would check it into the version control system of your project.

resolved = [
    ...,
    {
        "original_rule_class": "@bazel_tools//tools/build_defs/repo:git.bzl%git_repository",
        "original_attributes": {
            "name": "com_google_protobuf",
            "remote": "https://github.com/google/protobuf",
            "branch": "master"
        },
        "repositories": [
            {
                "rule_class": "@bazel_tools//tools/build_defs/repo:git.bzl%git_repository",
                "attributes": {
                    "remote": "https://github.com/google/protobuf",
                    "commit": "78ba021b846e060d5b8f3424259d30a1f3ae4eef",
                    "shallow_since": "2018-02-07",
                    "init_submodules": False,
                    "verbose": False,
                    "strip_prefix": "",
                    "patches": [],
                    "patch_tool": "patch",
                    "patch_args": [
                        "-p0"
                    ],
                    "patch_cmds": [],
                    "name": "com_google_protobuf"
                }
            }
        ]
    }
]

As you can see, we collect separately the rule that was originally called, with its arguments as they were called, and the new rule that is to be called, which happens to be the same rule in this case, and the new attributes obtained from the rule for reproducing the same checkout; so branch has been replaced by commit and shallow_since, and default values have been added. All this is wrapped in a list to have the format prepared for an extension we plan to add in the future: rules expanding to several repositories. Those rules do not exist yet, and bazel is not yet ready to support them, but we're thinking of rules that handle the interaction with some package manager and then expand to the list of packages that need to be fetched (maybe simply as http_archives), including those packages transitively depended upon.

Another use case of the resolved file is a continuous integration system. That system would always follow the development branch of your project and store the resolved file for each run, to have enough information for later bisecting to find the breaking change. That approach is particularly interesting when testing integration of several independently developed services.

The new command `bazel sync`

The next building block is a newly added bazel sync command. It unconditionally executes all rules in the WORKSPACE file, pretending that every repository is out of date. So bazel sync --experimental_repository_resolved_file=resolved.bzl will generate a snapshot of all the repositories mentioned in the WORKSPACE.

Using version snapshots

Now that we know how to generate a file with the frozen commit identifiers, the last step is to actually read and use them. As the file is Skylark, this fortunately is not too hard—we can just import it with a load statement.

load("//:resolved.bzl", "resolved")
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

def frozen_repos():
    for entry in resolved:
        for repo in entry["repositories"]:
            if repo["rule_class"] == "@bazel_tools//tools/build_defs/repo:git.bzl%git_repository":
                git_repository(**(repo["attributes"]))

To deal with the fact that we may or may not have recorded version snapshots, we use a function, usually called maybe, to just add the external repositories not yet present.

def maybe(repo_rule, **kwargs):
  if kwargs["name"] not in native.existing_rules():
    repo_rule(**kwargs)

In the WORKSPACE file, we just put those ingredients together.

load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")
load("//:maybe.bzl", "maybe")
load("//:frozen_repos.bzl", "frozen_repos")

frozen_repos()

maybe(git_repository,
  name = "com_google_protobuf",
  remote = "https://github.com/google/protobuf",
  branch = "master",
)

...

So if resolved.bzl contains a pinned version of a repository (identified by its name), then that version is used; otherwise, the top-level specification of which branch to follow is used. To advace to a newer snapshot, simply remove the repositories that should be freshly synced from the resolved.bzl file and bazel sync again; in particular, moving all repositories to a new snapshot is simply echo 'resolved = []' > resolved.bzl; bazel sync --experimental_repository_resolved_file=resolved.bzl. As resolved.bzl is pretty-printed, the diff is meaningful, and could look, e.g., as follows.

diff --git a/resolved.bzl b/resolved.bzl
index 683dd1f..8f25dfa 100644
--- a/resolved.bzl
+++ b/resolved.bzl
@@ -55,8 +55,8 @@ resolved = [
                 "rule_class": "@bazel_tools//tools/build_defs/repo:git.bzl%git_repository",
                 "attributes": {
                     "remote": "https://github.com/google/protobuf",
-                    "commit": "78ba021b846e060d5b8f3424259d30a1f3ae4eef",
-                    "shallow_since": "2018-02-07",
+                    "commit": "79700b56b99fa5c8c22ddef78e6c9557ff711379",
+                    "shallow_since": "2018-03-07",
                     "init_submodules": False,
                     "verbose": False,
                     "strip_prefix": "",

After reviewing and testing, that updated resolved.bzl file can be committed, so that everyone can work with the new snapshot in a reproducible way.

Bazel Blog

Bazel sync and resolved files

Return values of repository rules

The new flag `--experimental_repository_resolved_file`

The new command `bazel sync`

Using version snapshots

Twitter

Discuss

Subscribe

Contribute

Archive

Bazel Blog

Bazel sync and resolved files

Return values of repository rules

The new flag --experimental_repository_resolved_file

The new command bazel sync

Using version snapshots

Twitter

Discuss

Subscribe

Contribute

Archive

The new flag `--experimental_repository_resolved_file`

The new command `bazel sync`