Introducing Bazel Aquery

By Joe Le-Ba on 15 February 2019

tl;dr: bazel aquery is a new bazel command that queries the action graph, and thus allows you to gain insights about the actions executed in a build (inputs, outputs, command line, …). aquery’s API is now stable and supported by the Bazel team.

Why `bazel aquery`?

When performing a bazel build, you may find yourself wondering about the under-the-hood details of the build process, particularly about the actions executed:

"What was the exact command line that produced this file?"

"Did the new change in my rule implementation affect the actions previously generated by the rule?"

"Which actions have file X as an input?"

Those are some of the questions which can be answered with aquery. The aquery command allows you to query for actions to be executed in your build. It operates on the post-analysis action graph and exposes information about actions, artifacts and their relationships.

An example usage of aquery can be found in the Bazel issue #6861, where we are migrating legacy CROSSTOOL fields. In this case, Bazel users would run a migration tool, and then use aquery to verify that the migration tool works properly, in particular:

The actions generated while building the same target before & after running the migration tool are the same.
The command lines run for each action are the same.

The specific usage is implemented in aquery_differ tool. This also serves as an example of how tools can be built on top of aquery.

Background & Motivation

Apart from providing the ability to build & test your projects, Bazel also offers insights into how those processes happen with query and cquery. These existing tools have been very helpful with answering the questions about dependencies of targets in your Bazel project.

The Bazel build process consists of 3 phases¹: loading, analysis and execution. query operates on the post-loading phase target graph, which makes it unaware of the configurations of these targets. cquery moves it further down the building process and queries the post-analysis configured targets, thus includes the actual configurations.

The topology of the configured target graph closely resembles the dependency graph of targets established by the BUILD files. It offers information on the dependency between targets in a build, but not on the actual build actions that will be run to execute that build. To gain insights on the exact actions executed in a build, we have to go one level deeper, to the action graph.

Enter aquery.

bazel queries and phases

aquery runs on the configured target graph and queries the action graph. The action graph² is the result of the analysis phase. It is a bipartite graph with the following types of nodes:

Artifacts: either a source file or any output file produced by an action
Actions: the functional step that takes a list of artifacts as input and outputs a list of artifacts. Note that any (output) artifact is produced by exactly one action. The action graph conveys explicit step-by-step instructions on how the build would be executed.

With aquery, it is now possible to tap into that knowledge.

How To Use `bazel aquery`

aquery is useful when we are interested in the properties of the actions/artifacts in the action graph. It uses the same query language as query and cquery, with some additional aquery-specific functions. The basic structure of aquery output is as follows:

$ bazel aquery '//some:label'
action 'Writing file some_file_name'
  Mnemonic: ...
  Target: ...
  Configuration: ...
  ActionKey: ...
  Inputs: [...]
  Outputs: [...]
...

Each action entry encapsulates all the information you need to know about how this action is to be executed: the actual commands run, the configuration in which the action is run, its input/output artifacts, and other attributes.

Another nifty feature in aquery is the ability to filter the actions based on their inputs, outputs and mnemonics. This is useful to answer questions like: “Which action, from which target, is responsible for creating file foo.out”.

# List all actions generated while building all dependencies of //src/target_a
$ bazel aquery 'deps(//src/target_a)'

# List all actions generated while building all dependencies of //src/target_a
# that have C++ files in their inputs.
$ bazel aquery 'inputs(".*cc”, deps(//src/target_a))'

# Which action generated `foo.out` after building all dependencies of target //src/target_a
$ bazel aquery ‘outputs(“.*foo.out”, deps(//src/target_a))’

Apart from these basic features, aquery offers customizations for your specific use cases with its various flags and tools.

Output Formats

aquery supports 3 different output formats: text (default, human-readable with formatting), proto and textproto (a human-readable representation of the proto output).

`--skyframe_state`

A common use case of aquery is to find the action responsible for generating a particular file foo.out. However, it is often the case that multiple build commands for different targets were run prior to the query. Imagine the following sequence:

Run bazel build //target_a
Run bazel build //target_b
File foo.out was generated.

One could run bazel aquery 'outputs("foo.out", //target_a)' and bazel aquery 'outputs("foo.out", //target_b)' to figure out the action responsible for creating foo.out, and in turn the target. However, the number of different targets previously built can be larger than 2, which makes running multiple aquery commands a hassle.

As an alternative, the --skyframe_state flag can be used:

# Find all actions on skyframe that has “foo.out” as an output
bazel aquery --skyframe_state --output=proto ‘outputs(“*.foo.out”)’

With --skyframe_state mode, aquery takes the content of the action graph that Skyframe³ keeps on the current instance of Bazel and (optionally) performs filtering on it and outputs the content, without re-running the analysis phase.

Note that for --skyframe_state, the target label is omitted from the query expression. More details on this flag can be found here.

Comparing Aquery Outputs With `aquery_differ`

There are times when there’s a need to compare two different aquery outputs (for instance: when you make some changes to your rule definition and want to verify that the command lines being run is still the same). aquery_differ is the tool for that.

# The tool is available at https://github.com/bazelbuild/bazel/tree/master/tools/aquery_differ
$ bazel run //tools/aquery_differ -- \
--before=/path/to/before.proto \
--after=/path/to/after.proto \
--input_type=proto \
--attrs=cmdline \
--attrs=inputs

The above command returns the difference between the before and after aquery outputs (e.g. which actions were present in one but not the other, which actions have different command line/inputs in each aquery output, ...).

With this blog post, we declare aquery stable and supported by the Bazel team. Please give it a try and let us know what you think! For more details on aquery, check out the aquery documentation.

1: In the actual implementation of Bazel, we interleave _loading & analysis phases. The “Target Graph” at the end of Loading phase is only materialized with bazel query and not in actual builds._

2: For a more detailed overview of the action graph, check out Jin’s blog post.

3: Skyframe is the evaluation and incrementality model of Bazel. On each instance of Bazel server, Skyframe stores the dependency graph constructed from the previous runs of the analysis phase.

Bazel Blog

Introducing Bazel Aquery

Why `bazel aquery`?

Background & Motivation

How To Use `bazel aquery`

Output Formats

`--skyframe_state`

Comparing Aquery Outputs With `aquery_differ`

Twitter

Discuss

Subscribe

Contribute

Archive

Bazel Blog

Introducing Bazel Aquery

Why bazel aquery?

Background & Motivation

How To Use bazel aquery

Output Formats

--skyframe_state

Comparing Aquery Outputs With aquery_differ

Twitter

Discuss

Subscribe

Contribute

Archive

Why `bazel aquery`?

How To Use `bazel aquery`

`--skyframe_state`

Comparing Aquery Outputs With `aquery_differ`