Bazel Blog

First-class resolved files

In an earlier blog post, we described how a resolved file can be used to freeze external dependencies.

  • Repository rules may indicate how their arguments have to be changed to produce identical output. This is the transition from following a branch to a fixed commit. The Starlark version of the git_repository rule already does that, and other rules will follow suit soon.

  • With bazel sync there is a command to unconditionally fetch all external repositories, and with the --experimental_repository_resolved_file option all the reproducible descriptions can be collected in a Starlark value that is written to a file.

In this post, we describe some recently added features. These changes have been committed to the HEAD revision of Bazel and will be part of the 0.19 release.

Directly reading a resolved file instead of the WORKSPACE file

The resolved file can be used as a propoper substitute to the WORKSPACE file. The option to enable this is --experimental_resolved_file_instead_of_workspace. If specified, the WORKSPACE file will be completely ignored, and all information about external repositories will be taken from the specified file.

  • In this way, the WORKSPACE file gets back its natural shape of describing which upstream repositories a build follows. It no longer needs to be aware of the use of a resolved file. Thus this approach can be used for existing projects following floating branches without changing the project in quesiton. For example, if following protobuf, the WORKSPACE file simply reads as follows.
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

git_repository(
  name = "bazel_skylib",
  remote = "https://github.com/bazelbuild/bazel-skylib",
  branch = "master",
)

git_repository(
  name = "com_google_protobuf",
  remote = "https://github.com/google/protobuf",
  branch = "master",
)
  • Getting a snapshot of the upstream repositories followed is still a simple bazel sync --experimental_repository_resolved_file=resolved.bzl, optionally followed by committing the newly obtained resolved.bzl after testing.

  • bazel build --experimental_resolved_file_instead_of_workspace=resolved.bzl .. will take all information about external repositories from the file resolved.bzl. Thus, the build is fixed to the snapshot taken by the bazel sync.

Verifying the output of a repository rule

The main purpose of freezing dependencies is to be able to replay a particular build later, and also on a different machine. While git is very good at producing the same directory when the same commit hash is specified, the programatic transformations may cause observable differences between multiple invocations. For example, a build differing on two machines might be due to the tools (such as patch, sed, find) being only mostly the same on each machine.

To detect such problems, we've added a new entry output_tree_hash to the dict describing a repository. For example, the entry for com_google_protobuf in the resolved file now looks as follows.

resolved = [
    ...
    {
        "original_rule_class": "@bazel_tools//tools/build_defs/repo:git.bzl%git_repository",
        "original_attributes": {
            "name": "com_google_protobuf",
            "remote": "https://github.com/google/protobuf",
            "branch": "master"
        },
        "repositories": [
            {
                "rule_class": "@bazel_tools//tools/build_defs/repo:git.bzl%git_repository",
                "output_tree_hash": "a776ce4f591327c6b23d88d367d6208a88af6ad889e08f7b86a0edfc76fcfd96",
                "attributes": {
                    "remote": "https://github.com/google/protobuf",
                    "commit": "a6e1cc7e328c45a0cb9856c530c8f6cd23314163",
                    "shallow_since": "2018-09-17",
                    "init_submodules": False,
                    "verbose": False,
                    "strip_prefix": "",
                    "patches": [],
                    "patch_tool": "patch",
                    "patch_args": [
                        "-p0"
                    ],
                    "patch_cmds": [],
                    "name": "com_google_protobuf"
                }
            }
        ]
    }

]

This new output_tree_hash entry is a hash of the directory generated by the repository rule. It includes the names, contents, and executability bit of all files. However, information that is likely to be different between various users and won't affect most builds (like owner of the files or the modification time) is ignored. Additionally, for symlinks to files outside the repository, the content of the file is hashed, rather than the link path itself; so the link generated by a build_file argument is not a problem.

The resolved file from which to take the hashes can be specified with the --experimental_repository_hash_file option. Of course, not for all types of "external repositories" we even expect reproducible content. For example the cc_autoconf rule is specifically designed to detect the local C++ toolchain, which might well differ from machine to machine. So, you can use the --experimental_verify_repository_rules option to specify which rule classes should be verified. For example, bazel build --experimental_repository_hash_file=resolved.bzl --experimental_verify_repository_rules=@bazel_tools//tools/build_defs/repo:git.bzl%git_repository ... will verify the hashes of all git repositories, but not do any verification for repositories generated by other rules.

If a mismatch is found, for example, because you added the not-so-hermetic patch_cmds = ["date +%s > .timestamp"] to the rule for com_google_protobuf, you will get an error like the following.

$ bazel-exp build @com_google_protobuf//:protobuf
Starting local Bazel server and connecting to it...
INFO: Repository rule 'com_google_protobuf' returned: {"remote": "https://github.com/google/protobuf", "commit": "a6e1cc7e328c45a0cb9856c530c8f6cd23314163", "shallow_since": "2018-09-17", "init_submodules": False, "verbose": False, "strip_prefix": "", "patches": [], "patch_tool": "patch", "patch_args": ["-p0"], "patch_cmds": ["date +%s > .timestamp"], "name": "com_google_protobuf"}
ERROR: Skipping '@com_google_protobuf//:protobuf': no such package '@com_google_protobuf//': git_repository rule //external:com_google_protobuf failed to create a directory with expected hash 416a412dbbb1fa4f822374844dffedeb0b582fda6ffda95afb7936fb2f378ca0
WARNING: Target pattern parsing failed.
ERROR: no such package '@com_google_protobuf//': git_repository rule //external:com_google_protobuf failed to create a directory with expected hash 416a412dbbb1fa4f822374844dffedeb0b582fda6ffda95afb7936fb2f378ca0
INFO: Elapsed time: 4.606s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
$

This way, you can be sure to build against the same source tree of your external dependencies that you (or some one else) used when generating the resolved.bzl file.

.bazelrc to specify the use of resolved.bzl

As the whole workflow is controlled only by flags, this all can be set up once in your configuration file. Just add the following to your .bazelrc.

sync --experimental_repository_resolved_file=resolved.bzl
build --experimental_resolved_file_instead_of_workspace=resolved.bzl
build --experimental_repository_hash_file=resolved.bzl
build --experimental_verify_repository_rules=@bazel_tools//tools/build_defs/repo:git.bzl%git_repository

And then all steps are as simple as they could be.

  • To update you snapshot of external dependencies, simply type bazel sync. You might want to commit the updated resolved.bzl, once you have tested that the new snapshot works for your project.

  • To build and test with the frozen dependencies, simply call bazel build ... or bazel test ... as usual. Note that you have to have a resolved.bzl first; either one committed to your repository, or generated by bazel sync earlier. You will stay at the fixed snapshot recorded in you resolved.bzl until you update it by another bazel sync.

And, whenever an external git repository is fetched, the hash of the resulting directory (with all the local transformations specified in the patches and patch_cmds arguments already applied) is verified automatically.

Your feedback needed

Resolved files can now be used to fix external dependencies. But does the way it is implemented now fit your needs? We don't know and that's why the feature is marked as experimental. Please help us make fixing dependencies suit your needs by sending feedback to our discussion mailing list.