Remote repositories are the way to use dependencies from "outside" of the Bazel world in Bazel. Using them, you can download binaries from the internet or use some from your own host. You can even use Skylark to define your own repository rules to depend on a custom package manager or to implement auto-configuration rules.
This post explains when Skylark repositories are invalidated and hence when they are executed.
Dependencies
The implementation attribute of the
repository_rule
defines a function (the fetch operation) that is executed inside a
Skyframe function. This function is executed when
one of its dependencies change.
For repository that are declared local
(set local = True
in the call to the
repository_rule
function), the fetch operation is performed on every call of the
Skyframe function.
Since a lot of dependencies can trigger this execution (if any part of the WORKSPACE
file change for instance), a supplemental mechanism ensure that we re-execute the
fetch operation only when stricly needed for non-local
repository rules (see the
design doc for more details).
After cr.bazel.build/8218 is released, Bazel will
re-perform the fetch
operation if and only if any of the following
dependencies change:
- Skylark files needed to define the repository rule.
- Declaration of the repository rule in the
WORKSPACE
file. - Value of any environment variable declared with the
environ
attribute of therepository_rule
function. The value of those environment variable can be enforced from the command line with the--action_env
flag (but this flag will invalidate every action of the build). - Content of any file used and referred using a label (e.g.,
//mypkg:label.txt
notmypkg/label.txt
).
Good practices regarding refetching
Declare your repository as local very carefully
First and foremost, declaring a repository local
should be done only for rule that
needs to be eagerly invalidated and are fast to update. For native rule, this is used only
for local_repository
and
new_local_repository
.
Put all slow operation at the end, resolve dependencies first
Since a dependency might be unresolved when asked for, the function will be executed up to where the dependency is requested and all that part will be replayed if the dependency is not resolved. Put those file dependencies at the top, for instance prefer
def _impl(repository_ctx):
repository_ctx.file("BUILD", repository_ctx.attr.build_file)
repository_ctx.download("BIGFILE", sha256 = "...")
myrepo = repository_rule(_impl, attrs = {"build_file": attr.label()})
over
def _impl(repository_ctx):
repository_ctx.download("BIGFILE")
repository_ctx.file("BUILD", repository_ctx.attr.build_file)
myrepo = repository_rule(_impl, attrs = {"build_file": attr.label()})
(in the later example, the download operation will be re-executed if build_file
is not
resolved when executing the fetch
operation).
Declare your environment variables
To avoid spurious refetch of repository rules (and the impossibility of tracking all
usages of environmnent variables), only environment variables that have been declared
through the environ
attribute of the repository_rule
function are invalidating
the repositories.
Therefore, if you think you should re-run if an environment variable changes (like
for auto-configuration rules), you should declare those dependencies, or your user
will have to do bazel clean --expunge
each time they change their environment.