diff --git a/docs/docs/writing-plugins/the-rules-api/concepts.mdx b/docs/docs/writing-plugins/the-rules-api/concepts.mdx index 3f5eb147349..7fc2604bfff 100644 --- a/docs/docs/writing-plugins/the-rules-api/concepts.mdx +++ b/docs/docs/writing-plugins/the-rules-api/concepts.mdx @@ -9,9 +9,9 @@ The core concepts of the Rules API. ## Rules -Plugin logic is defined in _rules_: [pure functions](https://en.wikipedia.org/wiki/Pure_function) that map a set of statically-declared input types to a statically-declared output type. +Plugin logic is defined in _rules_. A rule is a pure function (or, more precisely, a pure coroutine) that maps a set of statically-declared input types to a statically-declared output type. -Each rule is an `async` Python function annotated with the decorator `@rule`, which takes any number of parameters (including zero) and returns a value of one specific type. Rules must be annotated with [type hints](https://www.python.org/dev/peps/pep-0484/). +Each rule is an `async` Python function annotated with the decorator `@rule`. A rule can take any number of parameters, each of a specific type, and returns a value of a specific type. Rule parameters and return types must be annotated with [type hints](https://www.python.org/dev/peps/pep-0484/). For example, this rule maps `(int) -> str`. @@ -23,176 +23,256 @@ async def int_to_str(i: int) -> str: return str(i) ``` -Although any Python type, including builtin types like `int`, can be a parameter or return type of a rule, in almost all cases rules will deal with values of custom Python classes. +Rules are typically module-level functions. In some cases you can define rules in nested scopes, such as inside a class or function body. But this is useful only in specific, special cases in the Pants codebase, and you are unlikely to need to use this in practice. -Generally, rules correspond to a step in your build process. For example, when adding a new linter, you may have a rule that maps `(Target, Shellcheck) -> LintResult`: +Although any immutable Python type, including builtin types like `int`, can be a parameter or a return type of a rule, in almost all cases rules will deal with values of custom Python classes. These are are typically implemented as [frozen dataclasses](https://docs.python.org/3/library/dataclasses.html), for reasons we'll get into [below](#dataclasses). + +Generally, a rule corresponds to a step in your build process. For example, when implementing a rule to run [shellcheck](https://www.shellcheck.net/) on a set of shell scripts, you could have a rule that maps `(Target, Shellcheck) -> LintResult`: ```python @rule async def run_shellcheck(target: Target, shellcheck: Shellcheck) -> LintResult: # Your logic. - return LintResult(stdout="", stderr="", exit_code=0) + return LintResult(stdout=..., stderr=..., exit_code=...) ``` -You do not call a rule like you would a normal function. In the above examples, you would not say `int_to_str(26)` or `run_shellcheck(tgt, shellcheck)`. Instead, the Pants engine determines when rules are used and calls the rules for you. +In this example the `target` argument points to the set of files to check, the `shellcheck` argument points to the `shellcheck` binary to run, and the return value contains the result of running `shellcheck` on those files. We will see later how the values of the rule parameters, `target` and `shellcheck` in this example, are provided. -Each rule should be pure; you should not use side effects like `subprocess.run()`, `print()`, or the `requests` library. Instead, the Rules API has its own alternatives that are understood by the Pants engine and which work properly with its caching and parallelism. +Although rules are implemented as Python coroutines, they differ from regular Python async code because their execution is controlled by the Pants engine and not by a standard Python event loop. -## The rule graph +The Pants engine provides the following benefits for rule execution: -All the registered rules create a rule graph, with each type as a node and the edges being dependencies used to compute those types. +- The engine analyzes the input and output types and can "fill in the blanks" of any input parameters not explicitly provided. This is why rule signatures must have complete type annotations. +- The engine invokes rules concurrently where possible, to make use of all available local and remote cores. This is why rule params and return values must be immutable. +- The engine applies memoization, so that if a rule has already run with the given params, the engine will supply the output value from the in-memory cache, instead of executing the rule. This is why rules must be [pure](https://en.wikipedia.org/wiki/Pure_function) and why rule params and return values must be hashable. -For example, the `list` goal uses this rule definition and results in the below graph: +This requirement of rule purity is worth emphasizing: a rule must yield the same output for a given set of inputs, and a rule must not directly or indirectly rely on side-effecting code like `print()`, `subprocess.run()`, or `requests`. The Rules API provides alternatives that are understood by the Pants engine and which work properly with its caching and concurrency mechanisms. -```python -@goal_rule -async def list_targets( - console: Console, addresses: Addresses, list_subsystem: ListSubsystem -) -> ListGoal: - ... - return ListGoal(exit_code=0) -``` +## Invoking other rules in a rule body -![](https://files.readme.io/7d5163f-Rule_graph_example-2.png) +One obvious way for a rule to depend on values of given types is to declare input parameters of those types. However it is very common to request extra values in the rule body by explicitly calling other rules. This is useful when you want programmatic control over the inputs to those other rules, or when you want to invoke other rules conditionally. -At the top of the graph will always be the goals that Pants runs, such as `list` and `test`. These goals are the entry-point into the graph. When a user runs `pants list`, the engine looks for a special type of rule, called a `@goal_rule`, that implements the respective goal. From there, the `@goal_rule` might request certain types like `Console` and `Addresses`, which will cause other helper `@rule`s to be used. To view the graph for a goal, see: [Visualize the rule graph](./tips-and-debugging.mdx#debugging-visualize-the-rule-graph). +To call a rule explicitly, you `await` it, and pass explicit and/or implicit params to it. The following contrived example shows a couple of rule calls (note that Pants ships with real shellcheck support that is more complicated, this example is simplified for clarity): -The graph also has several "roots", such as `Console`, `Specs`, and `OptionsBootstrapper` in this example. Those roots are injected into the graph as the initial input, whereas all other types are derived from those roots. +```python +from pants.engine.rules import rule +from pants.engine.intrinsics import execute_process +from pants.engine.process import ( + ProcessResult, + FallibleProcessResult, + fallible_to_exec_result_or_raise, +) -The engine will find a path through the rules to satisfy the types that you are requesting. In this example, we do not need to explicitly specify `Specs`; we only specify `Addresses` in our rule's parameters, and the engine finds a path from `Specs` to `Addresses` for us. This is similar to [Dependency Injection](https://www.freecodecamp.org/news/a-quick-intro-to-dependency-injection-what-it-is-and-when-to-use-it-7578c84fa88f/), but with a typed and validated graph. +@rule +async def run_shellcheck(target: Target, shellcheck: Shellcheck) -> LintResult: + ... + process_request = Process( + ["/bin/echo", str(target.address)], + description=f"Echo {target.address}", + ) + # Get a process result that allows failure. + fallible_process_result: FallibleProcessResult = await execute_process( + process_request, **implicitly() + ) + # Raise if the process failed, or return its info if it succeeded. + process_result: ProcessResult = await fallible_to_exec_result_or_raise( + fallible_process_result, **implicitly() + ) + return LintResult( + stdout=process_result.stdout, stderr=process_result.stderr, exit_code=0 + ) +``` -If the engine cannot find a path, or if there is ambiguity due to multiple possible paths, the rule graph will fail to compile. This ensures that the rule graph is always unambiguous. +The Pants engine will run your rule as straight-line Python code until it encounters the `await`, which will yield execution back to the engine. The engine will then see if it has a memoized result for the requested rule invocation. If not, it will execute the rule to obtain such a value. Once the engine gives back the resulting output value, control will be returned back to your Python code, until the next `await`. -:::caution Rule graph errors can be confusing -We know that rule graph errors can be intimidating and confusing to understand. We are planning to improve them. In the meantime, please do not hesitate to ask for help in the #plugins channel on [Slack](/community/getting-help). +In this example, we could not have requested the `process_result` as a parameter to our rule because we needed to create the `Process` object dynamically. -Also see [Tips and debugging](./tips-and-debugging.mdx#debugging-rule-graph-issues) for some tips for how to approach these errors. -::: +We will revisit process execution [below](#extra-context-for-implicit-parameters) and cover it in a lot more detail [here](./processes.mdx). -## `await Get` - awaiting results in a rule body +## Explicit vs. implicit rule parameters -In addition to requesting types in your rule's parameters, you can request types in the body of your rule. +### Explicit parameters -Add `await Get(OutputType, InputType, input)`, where the output type is what you are requesting and the input is what you're giving the engine for it to be able to compute the output. For example: +In simple cases, you can pass parameters directly to invoked rules: ```python -from pants.engine.rules import Get, rule +from pants.engine.environment import EnvironmentName +from pants.engine.fs import NativeDownloadFile +from pants.engine.intrinsics import download_file, run_id, run_interactive_process_in_environment +from pants.engine.process import InteractiveProcess +from pants.engine.rules import rule +... @rule -async def run_shellcheck(target: Target, shellcheck: Shellcheck) -> LintResult: - ... - process_request = Process( - ["/bin/echo", str(target.address)], - description=f"Echo {target.address}", +async def my_rule() -> MyResult: + # Takes no params. + rid = await run_id() + + # Takes one param. + downloaded_file = await download_file(NativeDownloadFile( + url="https://www.google.com/robots.txt", + expected_digest=FileDigest( + "988d5eecb5b9d346bb0ca87fe76ab029be332997c79c590af858cc0c6dd6d1a4", + 7153, + )) ) - process_result = await Get(ProcessResult, Process, process_request) - return LintResult(stdout=process_result.stdout, stderr=process_result.stderr, exit_code=0) + + # Takes two params. + interactive_process_result = await run_interactive_process_in_environment( + InteractiveProcess(...), + EnvironmentName("local") + ) + ... ``` -Pants will run your rule like normal Python code until encountering the `await`, which will yield execution to the engine. The engine will look in the pre-compiled rule graph to determine how to go from `Process -> ProcessResult`. Once the engine gives back the resulting `ProcessResult` object, control will be returned back to your Python code. +:::caution Explicit rule parameters must be passed positionally +Explicit rule parameters must be passed as positional arguments, as in the examples above. We hope +to support keyword arguments in the future. +::: -In this example, we could not have requested the type `ProcessResult` as a parameter to our rule because we needed to dynamically create a `Process` object. +### Implicit parameters -Thanks to `await Get`, we can write a recursive rule to compute a [Fibonacci number](https://en.wikipedia.org/wiki/Fibonacci_number): +In many cases it is very useful to call rules using _implicit_ parameters. These parameters are injected by the Pants engine instead of being provided explicitly by the caller. This is the "fill in the blanks" functionality mentioned earlier, and is part of what makes the Pants engine so powerful. + +To tell the engine to implicitly fill in any unspecified parameters, you use the `**implicitly()` idiom: ```python -@dataclass(frozen=True) -class Fibonacci: - val: int +from pants.engine.rules import implicitly, rule @rule -async def compute_fibonacci(n: int) -> Fibonacci: - if n < 2: - return Fibonacci(n) - x = await Get(Fibonacci, int, n - 2) - y = await Get(Fibonacci, int, n - 1) - return Fibonacci(x.val + y.val) +async def my_rule() -> MyResult: + # The engine implicitly provides the GlobalOptions param. + ll = await log_level(**implicitly()) + + # The user explicitly provides the EnvironmentVarsRequest param. + # The engine implicitly provides the CompleteEnvironmentVars param. + localization_vars = await environment_vars_subset( + EnvironmentVarsRequest(["LANG", "LC_ALL"]), **implicitly() + ) + ... ``` -Another rule could then "call" our Fibonacci rule by using its own `Get`: +Where does Pants get the values for implicit parameters? They can be: +- From external context, such as [option values](/using-pants/key-concepts/options), git state, or the set of targets provided on the Pants command line. +- From the input parameters of the calling rule. +- Computed from other params by (transitively) applying suitable rules. You can think of this as a form of dependency injection via type: Pants knows the type of the implicit parameter, and can traverse a path through rule execution to go from an initial set of values, known from context, to the needed value. + +Since explicit params must be provided positionally, they must be the first arguments to the rule. This means that when you write a rules, you should put the parameters expected to be passed explicitly before the parameters expected to be provided implicitly. + +### Extra context for implicit parameters + +As mentioned above, Pants can compute values for implicit parameters by transitively applying rules. In many cases the initial parameters for *those* rules are known from external context. But in some cases we need to provide extra context from the calling rule. To do so, we pass the contextual parameters as arguments to `**implicitly()`: ```python +from pants.engine.process import fallible_to_exec_result_or_raise +from pants.engine.rules import implicitly, rule + @rule -async def call_fibonacci(...) -> Foo: - fib = await Get(Fibonnaci, int, 4) +async def my_rule() -> MyResult: + process_result = await fallible_to_exec_result_or_raise( + **implicitly( + Process( + ["/bin/echo", str(target.address)], + description=f"Echo {target.address}", + ) + ) + ) ... ``` -:::note `Get` constructor shorthand -The verbose constructor for a `Get` object takes three parameters: `Get(OutputType, InputType, input)`, where `OutputType` and `InputType` are both types, and `input` is an instance of `InputType`. +In this example the `fallible_to_exec_result_or_raise()` rule takes a `FallibleProcessResult` and returns a `ProcessResult` by first checking the `FallibleProcessResult` for success and raising an exception if it failed. We saw this earlier, in the simplified shellcheck example. -Instead, you can use `Get(OutputType, InputType(constructor arguments))`. These two are equivalent: +But instead of explicitly passing a `FallibleProcessResult` as we did earlier, we now pass a `Process` as implicit context. The Pants engine then looks at all the rules it knows about to figure out how to compute a `FallibleProcessResult` from a `Process`. The `execute_process()` we encountered earlier fits the bill, and so the engine calls it on our `Process` and passes its return value into `fallible_to_exec_result_or_raise()`. Whereas earlier we called both rules explicitly, here we get the exact same behavior with just one call. -- `Get(ProcessResult, Process, Process(["/bin/echo"]))` -- `Get(ProcessResult, Process(["/bin/echo"]))` +In fact, since raising an exception on process failure is frequently what you want, we have an alias, `execute_process_or_raise`, to make the code more readable when using this common shorthand idiom. -However, the below is invalid because Pants's AST parser will not be able to see what the `InputType` is: +### Static analysis of parameter types -```python -process = Process(["/bin/echo"]) -Get(ProcessResult, process) -``` +It's important to note that the parameter types, and the corresponding rule matching, are computed _statically_, at engine startup time. Pants employs various static analysis heuristics to capture common cases. E.g., in the example above, the engine knows that the parameter passed to `**implicitly()` is intended to match the formal parameter type `Process` because it recognizes the explicit `Process()` initializer call. -::: +But in some cases the parameter value will have been created earlier, and the engine can't know its type from static analysis. In such cases you must provide the type explicitly, by passing a dict to `**implicitly()` mapping values to the formal parameter types they are intended to match: -:::note Why only one input? -Currently, you can only give a single input. It is not possible to do something like `Get(OutputType, InputType1(...), InputType2(...))`. +```python +from pants.engine.process import execute_process_or_raise +from pants.engine.rules import implicitly, rule -Instead, it's common for rules to create a "Request" data class, such as `PexRequest` or `SourceFilesRequest`. This request centralizes all the data it needs to operate into one data structure, which allows for call sites to say `await Get(SourceFiles, SourceFilesRequest, my_request)`, for example. +@rule +async def my_rule() -> MyResult: + process = Process(...) + ... + process_result = await execute_process_or_raise( + **implicitly({ + process: Process, + ProductDescription("Running echo"): ProductDescription, + }) + ) + ... +``` -See [https://github.com/pantsbuild/pants/issues/7490](https://github.com/pantsbuild/pants/issues/7490) for the tracking issue. -::: +As you can see above, this also allows you to pass multiple contextual params to `**implicitly()`. -### `MultiGet` for concurrency +## Rule concurrency -Every time your rule has the `await` keyword, the engine will pause execution until the result is returned. This means that if you have two `await Get`s, the engine will evaluate them sequentially, rather than concurrently. +The engine pauses execution on each `await` in your rule until the result is returned. This means that if you have two consecutive `await`s, the engine will evaluate them sequentially. -You can use `await MultiGet` to instead get multiple results in parallel. +If your rules can be executed concurrently (because nether depends on the result of the other) then you can use `concurrently(...)` to instead get multiple results in a single `await`: ```python -from pants.engine.rules import Get, MultiGet, rule +from pants.engine.rules import concurrently, rule @rule -async def call_fibonacci(...) -> Foo: - results = await MultiGet(Get(Fibonnaci, int, n) for n in range(100)) +async def lint_single_target(target: Target) -> LintResult: ... -``` -The result of `MultiGet` is a tuple with each individual result, in the same order as the requests. +@rule +async def lint_all(targets: Targets) -> LintResults: + single_results = await concurrently( + lint_single_target(target, **implicitly()) for target in targets + ) + ... +``` -You should rarely use a `for` loop with `await Get` - use `await MultiGet` instead, as shown above. +The result of `concurrently` is a tuple with each individual result, in the same order as the requests. You should hardly ever call `await` in a loop - use `await concurrently` instead. -`MultiGet` can either take a single iterable of `Get` objects or take multiple individual arguments of `Get` objects. Thanks to this, we can rewrite our Fibonacci rule to parallelize the two recursive calls: +`concurrently` can either take an iterable of rule calls, as above, or take multiple individual rule calls. For example: ```python -from pants.engine.rules import Get, MultiGet, rule +from pants.engine.rules import concurrently, rule @rule -async def compute_fibonacci(n: int) -> Fibonacci: - if n < 2: - return Fibonacci(n) - x, y = await MultiGet( - Get(Fibonacci, int, n - 2), - Get(Fibonacci, int, n - 1), +async def my_rule() -> MyResult: + first_party_deps, third_party_deps = await concurrently( + FirstPartyDepsRequest(...), ThirdPartyDepsRequest(...) ) - return Fibonacci(x.val + y.val) ``` -## Valid types +## Recursive rules -Types used as inputs to `Get`s or `Query`s must be hashable, and therefore should be immutable. Specifically, the type must have implemented `__hash__()` and `__eq__()`. While the engine will not validate that your type is immutable, you should be careful to ensure this so that the cache works properly. +A rule can call itself recursively: -Because you should use immutable types, use these collection types: +```python +from dataclasses import dataclass +from pants.engine.rules import rule -- `tuple` instead of `list`. -- `pants.util.frozendict.FrozenDict` instead of the built-in `dict`. -- `pants.util.ordered_set.FrozenOrderedSet` instead of the built-in `set`. This will also preserve the insertion order, which is important for determinism. +@dataclass(frozen=True) +class Fibonacci: + val: int -Unlike Python in general, the engine uses exact type matches, rather than considering inheritance; even if `Truck` subclasses `Vehicle`, the engine will view these types as completely separate when deciding which rules to use. +@rule +async def fibonacci(n: int) -> Fibonacci: + if n < 2: + return Fibonacci(n) + x, y = await concurrently(fibonacci(n - 2), fibonacci(n - 1)) + return Fibonacci(x.val + y.val) +``` + +This is useful in cases such as compiling a JVM source file, which first requires compiling its direct dependencies. + +Rules can even be mutually recursive, that is, there can be circular calls between multiple rules. However in this case the rules must all be top-level functions in the same module. This is due to limitations of the engine's static analysis heuristics. In practice, mutual recursion between functions in different modules would create forbidden Python import cycles anyway, unless you used local imports or other unsavory workarounds. -You cannot use generic Python type hints in a rule's parameters or in a `Get()`. For example, a rule cannot return `Optional[Foo]`, or take as a parameter `Tuple[Foo, ...]`. To express generic type hints, you should instead create a class that stores that value. +## Valid types -To disambiguate between different uses of the same type, you will usually want to "newtype" the types that you use. Rather than using the builtin `str` or `int`, for example, you should define a new, declarative class like `Name` or `Age`. +Input params and output values must be hashable, and therefore must be immutable. Specifically, their types must implement `__hash__()` and `__eq__()`. While the engine will not validate that your type is immutable, you should be careful to ensure this so that the cache works properly. ### Dataclasses @@ -202,7 +282,7 @@ Python 3's [dataclasses](https://docs.python.org/3/library/dataclasses.html) wor 2. Dataclasses use type hints. 3. Dataclasses are declarative and ergonomic. -You do not need to use dataclasses. You can use alternatives like `attrs` or normal Python classes. However, dataclasses are a nice default. +You are not required to use dataclasses. You can use alternatives like `attrs` or normal Python classes with manual `__hash__()` and `__eq__()` implementations. However, dataclasses are convenient and idiomatic, and we encourage their use. You should set `@dataclass(frozen=True)` for Python to autogenerate `__hash__()` and to ensure that the type is immutable. @@ -246,6 +326,34 @@ class Example: ::: +### Exact type matching + +Recall that type annotations are used by the engine at runtime to "fill in the blanks" of implicit parameters. This is an unsual use of type hints, which are normally for the benefit of build time type checking by tools such as MyPy. + +Unlike type checkers, the engine uses _exact_ type matches and does not consider subtyping. Even if `Truck` subclasses `Vehicle`, the engine will view these types as completely unrelated when deciding how to fill in implicit parameters. The engine has a different way of expressing polymorphism, namely [unions](union-rules-advanced.mdx). + +### Type disambiguation + +To disambiguate between different uses of the same type, you will usually want to "newtype" the types that you use. For example, instead of using the builtin `str` or `int` to represent a name or age you can define new classes that nominally extend them: + +```python +class Name(str): + pass + +class Age(int): + pass +``` + +### Collections + +Fields of input params and output values may be collections, but you must use the following types: + +- `tuple` instead of `list`. +- `pants.util.frozendict.FrozenDict` instead of `dict`. +- `pants.util.ordered_set.FrozenOrderedSet` instead of `set`. + +The type annotations for parameters and return values must be just a type name. For example, a rule cannot return `Foo | None`, or take `tuple[Foo, ...]` as a parameter. + ### `Collection`: a newtype for `tuple` If you want a rule to use a homogenous sequence, you can use `pants.engine.collection.Collection` to "newtype" a tuple. This will behave the same as a tuple, but will have a distinct type. @@ -273,7 +381,7 @@ async def demo(results: LintResults) -> Foo: ### `DeduplicatedCollection`: a newtype for `FrozenOrderedSet` -If you want a rule to use a homogenous set, you can use `pants.engine.collection.DeduplicatedCollection` to "newtype" a `FrozenOrderedSet`. This will behave the same as a `FrozenOrderedSet`, but will have a distinct type. +If you want a rule to use a homogenous set, you can use `pants.engine.collection.DeduplicatedCollection` to newtype a `FrozenOrderedSet`. This will behave the same as a `FrozenOrderedSet`, but will have a distinct type. ```python from pants.engine.collection import DeduplicatedCollection @@ -289,7 +397,7 @@ async def demo(requirements: RequirementStrings) -> Foo: ... ``` -You can optionally set the class property `sort_input`, which will often result in more cache hits with the Pantsd daemon. +Setting the class property `sort_input` to `True` will often result in more cache hits, at the expense of time spent sorting. ## Registering rules in `register.py` @@ -334,3 +442,21 @@ async def run_fotran_test(...) -> TestResult: def rules(): return collect_rules() ``` + +## The rule graph + +As we mentioned above, at startup the Pants engine performs static analysis on the registered rules. The resulting analysis is represented as a _rule graph_. This is a directed graph where the nodes represent _queries_, rules or _params_, and the edges represent data dependencies. + +The queries are the roots of the graph - graph traversals always start at a query. When the user runs a Pants command, the engine looks for a special type of rule, annotated with `@goal_rule`, that implements the respective goal. For example, `pants list` triggers the `list` Goal rule, which in turn represents a query into the rule graph. + +The params are the leaves of the graph - they represent initial data that is provided from context, such as option values or command line arguments. All other intermediate types and the final goal type are computed from these params by traversing the graph and executing rules along the way. + +To view the graph for a goal, see: [Visualize the rule graph](./tips-and-debugging.mdx#debugging-visualize-the-rule-graph). + +If the engine cannot find a path, or if there is ambiguity due to multiple possible paths, rule graph construction will fail. + +:::caution Rule graph errors can be confusing +We know that rule graph errors can be intimidating and confusing to understand. We are planning to improve them. In the meantime, please do not hesitate to ask for help on [Slack](/community/getting-help). + +Also see [Tips and debugging](./tips-and-debugging.mdx#debugging-rule-graph-issues) for some tips for how to approach these errors. +::: diff --git a/docs/docs/writing-plugins/the-rules-api/file-system.mdx b/docs/docs/writing-plugins/the-rules-api/file-system.mdx index 1097542a9a7..f6790178717 100644 --- a/docs/docs/writing-plugins/the-rules-api/file-system.mdx +++ b/docs/docs/writing-plugins/the-rules-api/file-system.mdx @@ -46,13 +46,14 @@ A `Snapshot` is useful when you want to know which files a `Digest` refers to. F Given a `Digest`, you may use the engine to enrich it into a `Snapshot`: ```python -from pants.engine.fs import Digest, Snapshot -from pants.engine.rules import Get, rule +from pants.engine.fs import Snapshot +from pants.engine.intrinsics import digest_to_snapshot +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - snapshot = await Get(Snapshot, Digest, my_digest) + snapshot: Snapshot = await digest_to_snapshot(my_digest) ``` ## `CreateDigest`: create new files @@ -61,12 +62,15 @@ async def demo(...) -> Foo: ```python from pants.engine.fs import CreateDigest, Digest, FileContent -from pants.engine.rules import Get, rule +from pants.engine.intrinsics import create_digest +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - digest = await Get(Digest, CreateDigest([FileContent("f1.txt", b"hello world")])) + digest: Digest = await create_digest( + CreateDigest([FileContent("f1.txt", b"hello world")]) + ) ``` The `CreateDigest` constructor expects an iterable including any of these types: @@ -83,12 +87,15 @@ This does _not_ write the `Digest` to the build root. Use `Workspace.write_diges ```python from pants.engine.fs import Digest, PathGlobs -from pants.engine.rules import Get, rule +from pants.engine.intrinsics import path_globs_to_digest +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - digest = await Get(Digest, PathGlobs(["**/*.txt", "!ignore_me.txt"])) + digest: Digest = await path_globs_to_digest( + PathGlobs(["**/*.txt", "!ignore_me.txt"]) + ) ``` - All globs must be relative paths, relative to the build root. @@ -120,16 +127,17 @@ PathGlobs( ) ``` -If you only need to resolve the file names—and don't actually need to use the file content—you can use `await Get(Paths, PathGlobs)` instead of `await Get(Digest, PathGlobs)` or `await Get(Snapshot, PathGlobs)`. This will avoid "digesting" the files to the LMDB Store cache as a performance optimization. `Paths` has two properties: `files: tuple[str, ...]` and `dirs: tuple[str, ...]`. +If you only need to resolve the file names—and don't actually need to use the file content—you can use `await path_globs_to_paths()` instead of `await path_globs_to_digest()` or `await digest_to_snapshot(**implicitly(PathGlobs(...)))`. This will avoid "digesting" the files to the LMDB Store cache, as a performance optimization. The returned `Paths` instance has two properties: `files: tuple[str, ...]` and `dirs: tuple[str, ...]`. ```python -from pants.engine.fs import Paths, PathGlobs -from pants.engine.rules import Get, rule +from pants.engine.fs import PathGlobs, Paths +from pants.engine.intrinsics import path_globs_to_paths +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - paths = await Get(Paths, PathGlobs(["**/*.txt", "!ignore_me.txt"])) + paths: Paths = await path_globs_to_paths(["**/*.txt", "!ignore_me.txt"])) logger.info(paths.files) ``` @@ -139,12 +147,13 @@ async def demo(...) -> Foo: ```python from pants.engine.fs import Digest, DigestContents -from pants.engine.rules import Get, rule +from pants.engine.intrinsics import path_globs_to_paths +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - digest_contents = await Get(DigestContents, Digest, my_digest) + digest_contents: DigestContents = await get_digest_contents(my_digest) for file_content in digest_contents: logger.info(file_content.path) logger.info(file_content.content) # This will be `bytes`. @@ -155,9 +164,9 @@ The result will be a sequence of `FileContent` objects, which each have a proper :::caution You may not need `DigestContents` Only use `DigestContents` if you need to read and operate on the content of files directly in your rule. -- If you are running a `Process`, you only need to pass the `Digest` as input and that process will be able to read all the files in its environment. If you only need a list of files included in the digest, use `Get(Snapshot, Digest)`. +- If you are running a `Process`, you only need to pass the `Digest` as input and that process will be able to read all the files in its environment. If you only need the list of files included in the digest, use `get_digest_entries()`. -- If you just need to manipulate the directory structure of a `Digest`, such as renaming files, use `DigestEntries` with `CreateDigest` or use `AddPrefix` and `RemovePrefix`. These avoid reading the file content into memory. +- If you only need to manipulate the directory structure of a `Digest`, by renaming files, use `DigestEntries` with `create_digest()` or use `add_prefix()` and `remove_prefix()`. These avoid reading the file content into memory. ::: @@ -172,13 +181,14 @@ Only use `DigestContents` if you need to read and operate on the content of file This is useful if you need to manipulate the directory structure of a `Digest` without actually needing to bring the file contents into memory (which is what occurs if you were to use `DigestContents`). ```python -from pants.engine.fs import Digest, DigestEntries, Directory, FileEntry -from pants.engine.rules import Get, rule +from pants.engine.fs import DigestEntries, Directory, FileEntry +from pants.engine.intrinsics import get_digest_entries +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - digest_entries = await Get(DigestEntries, Digest, my_digest) + digest_entries: DigestEntries = await get_digest_entries(my_digest) for entry in digest_entries: if isinstance(entry, FileEntry): logger.info(entry.path) @@ -194,20 +204,24 @@ Often, you will need to provide a single `Digest` somewhere in your plugin—suc ```python from pants.engine.fs import Digest, MergeDigests -from pants.engine.rules import Get, rule +from pants.engine.intrinsics import merge_digests +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - digest = await Get( - Digest, - MergeDigests([downloaded_tool_digest, config_file_digest, source_files_snapshot.digest], + digest: Digest = await merge_digests( + MergeDigests([ + downloaded_tool_digest, + config_file_digest, + source_files_snapshot.digest + ]) ) ``` - It is okay if multiple digests include the same file, so long as they have identical content. -- If any digests have different content for the same file, the engine will error. Unlike Git, the engine does not attempt to resolve merge conflicts. -- It is okay if some digests are empty, i.e. `EMPTY_DIGEST`. +- If any digests have different content for the same file, the engine will error. +- It is okay if some digests are empty. The `pants.engine.fs.EMPTY_DIGEST` constant represents an empty digest. ## `DigestSubset`: extract certain files from a `Digest` @@ -215,13 +229,14 @@ To get certain files out of a `Digest`, use `DigestSubset`. ```python from pants.engine.fs import Digest, DigestSubset, PathGlobs -from pants.engine.rules import Get, rule +from pants.engine.intrinsics import digest_subset_to_digest +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - new_digest = await Get( - Digest, DigestSubset(original_digest, PathGlobs(["file1.txt"]) + new_digest: Digest = await digest_subset_to_digest( + DigestSubset(original_digest, PathGlobs(["file1.txt"])) ) ``` @@ -233,13 +248,18 @@ Use `AddPrefix` and `RemovePrefix` to change the paths of every file in the dige ```python from pants.engine.fs import AddPrefix, Digest, RemovePrefix -from pants.engine.rules import Get, rule +from pants.engine.intrinsics import add_prefix, remove_prefix +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - added_prefix = await Get(Digest, AddPrefix(original_digest, "new_prefix/subdir")) - removed_prefix = await Get(Digest, RemovePrefix(added_prefix, "new_prefix/subdir")) + added_prefix: Digest = await add_prefix( + AddPrefix(original_digest, "new_prefix/subdir") + ) + removed_prefix: Digest = await remove_prefix( + RemovePrefix(added_prefix, "new_prefix/subdir") + ) assert removed_prefix == original_digest ``` @@ -256,7 +276,7 @@ from pants.engine.rules import goal_rule @goal_rule async def run_my_goal(..., workspace: Workspace) -> MyGoal: ... - # Note that this is a normal method; we do not use `await Get`. + # Note that this is a regular synchronous method; we do not use `await`. workspace.write_digest(digest) ``` @@ -277,7 +297,7 @@ for digest in all_digests: Good: ```python -merged_digest = await Get(Digest, MergeDigests(all_digests)) +merged_digest = await merge_digests(MergeDigests(all_digests)) workspace.write_digest(merged_digest) ``` @@ -286,8 +306,9 @@ workspace.write_digest(merged_digest) `DownloadFile` allows you to download an asset using a `GET` request. ```python -from pants.engine.fs import DownloadFile, FileDigest -from pants.engine.rules import Get, rule +from pants.engine.fs import Digest, DownloadFile, FileDigest +from pants.engine.download_file import download_file +from pants.engine.rules import rule @rule async def demo(...) -> Foo: @@ -297,7 +318,9 @@ async def demo(...) -> Foo: "12937da9ad5ad2c60564aa35cb4b3992ba3cc5ef7efedd44159332873da6fe46", 2637138 ) - downloaded = await Get(Digest, DownloadFile(url, file_digest) + downloaded: Digest = await download_file( + DownloadFile(url, file_digest), **implicitly() + ) ``` `DownloadFile` expects a `url: str` parameter pointing to a stable URL for the asset, along with an `expected_digest: FileDigest` parameter. A `FileDigest` is like a normal `Digest`, but represents a single file, rather than a set of files/directories. To determine the `expected_digest`, manually download the file, then run `shasum -a 256` to compute the fingerprint and `wc -c` to compute the expected length of the downloaded file in bytes. diff --git a/docs/docs/writing-plugins/the-rules-api/goal-rules.mdx b/docs/docs/writing-plugins/the-rules-api/goal-rules.mdx index d630527b8ee..a110710532e 100644 --- a/docs/docs/writing-plugins/the-rules-api/goal-rules.mdx +++ b/docs/docs/writing-plugins/the-rules-api/goal-rules.mdx @@ -9,7 +9,7 @@ How to create new goals. For many [plugin tasks](../common-plugin-tasks/index.mdx), you will be extending existing goals, such as adding a new linter to the `lint` goal. However, you may instead want to create a new goal, such as a `publish` goal. This page explains how to create a new goal. -As explained in [Concepts](./concepts.mdx), `@goal_rule`s are the entry-point into the rule graph. When a user runs `pants my-goal`, the Pants engine will look for the respective `@goal_rule`. That `@goal_rule` will usually request other types, either as parameters in the `@goal_rule` signature or through `await Get`. But unlike a `@rule`, a `@goal_rule` may also trigger side effects (such as running interactive processes, writing to the filesystem, etc) via `await Effect`. +As explained in [Concepts](./concepts.mdx), `@goal_rule`s are the entry-point into the rule graph. When a user runs `pants my-goal`, the Pants engine will look for the respective `@goal_rule`. That `@goal_rule` will usually request other types, either as parameters in the `@goal_rule` signature or through `await`ing another rule. But unlike a `@rule`, a `@goal_rule` may also trigger side effects (such as running interactive processes, writing to the filesystem, etc) via `await Effect`. Often, you can keep all of your logic inline in the `@goal_rule`. As your `@goal_rule` gets more complex, you may end up factoring out helper `@rule`s, but you do not need to start with writing helper `@rule`s. @@ -201,7 +201,14 @@ async def hello_world(console: Console, specs_paths: SpecsPaths) -> HelloWorld: `SpecsPaths.files` will list all files matched by the specs, e.g. `::` will match every file in the project (regardless of if targets own the files). -To convert `SpecsPaths` into a [`Digest`](./file-system.mdx), use `await Get(Digest, PathGlobs(globs=specs_paths.files))`. +To convert `SpecsPaths` into a [`Digest`](./file-system.mdx), use: + +```python +from pants.engine.intrinsics import path_globs_to_digest +... +await path_globs_to_digest(PathGlobs(globs=specs_paths.files)) +``` + :::note Name clashing It is very unlikely, but is still possible that adding a custom goal with an unfortunate name may cause issues when certain existing Pants options are passed in the command line. For instance, executing a goal named `local` with a particular option (in this case, the global `local_cache` option), e.g. `pants --no-local-cache local ...` would fail since there's no `--no-cache` flag defined for the `local` goal. diff --git a/docs/docs/writing-plugins/the-rules-api/index.mdx b/docs/docs/writing-plugins/the-rules-api/index.mdx index eb243a93d3d..6636f7d6e77 100644 --- a/docs/docs/writing-plugins/the-rules-api/index.mdx +++ b/docs/docs/writing-plugins/the-rules-api/index.mdx @@ -18,3 +18,4 @@ Adding logic to your plugin. - [Logging and dynamic output](./logging-and-dynamic-output.mdx) - [Testing rules](./testing-plugins.mdx) - [Tips and debugging](./tips-and-debugging.mdx) +- [Migrating from call-by-type](./migrating-gets.mdx) \ No newline at end of file diff --git a/docs/docs/writing-plugins/the-rules-api/installing-tools.mdx b/docs/docs/writing-plugins/the-rules-api/installing-tools.mdx index cddc89be501..68284ecf794 100644 --- a/docs/docs/writing-plugins/the-rules-api/installing-tools.mdx +++ b/docs/docs/writing-plugins/the-rules-api/installing-tools.mdx @@ -19,14 +19,18 @@ If you instead want to allow the binary to be located anywhere on a user's machi from pants.core.util_rules.system_binaries import ( BinaryPathRequest, BinaryPaths, - ProcessResult, + find_binary, +) +from pants.engine.process import ( Process, + ProcessResult, + execute_process_or_raise ) +from pants.engine.rules import implicitly @rule async def demo(...) -> Foo: - docker_paths = await Get( - BinaryPaths, + docker_paths: BinaryPaths = await find_binary( BinaryPathRequest( binary_name="docker", search_path=["/usr/bin", "/bin"], @@ -35,7 +39,9 @@ async def demo(...) -> Foo: docker_bin = docker_paths.first_path if docker_bin is None: raise OSError("Could not find 'docker'.") - result = await Get(ProcessResult, Process(argv=[docker_bin.path, ...], ...)) + result: ProcessResult = await execute_process_or_raise( + **implicitly(Process(argv=[docker_bin.path, ...], ...)) + ) ``` `BinaryPaths` has a field called `paths: tuple[BinaryPath, ...]`, which stores all the discovered absolute paths to the specified binary. Each `BinaryPath` object has the fields `path: str`, such as `/usr/bin/docker`, and `fingerprint: str`, which is used to invalidate the cache if the binary changes. The results will be ordered by the order of `search_path`, meaning that earlier entries in `search_path` will show up earlier in the result. @@ -113,26 +119,35 @@ You must also define the methods `generate_url`, which is the URL to make a GET Because an `ExternalTool` is a subclass of [`Subsystem`](./options-and-subsystems.mdx), you must also define an `options_scope`. You may optionally register additional options from `pants.option.option_types`. -In your rules, include the `ExternalTool` as a parameter of the rule, then use `Get(DownloadedExternalTool, ExternalToolRequest)` to download and extract the tool. +In your rules, include the `ExternalTool` as a parameter of the rule, then `await download_external_tool()` to download and extract the tool. ```python -from pants.core.util_rules.external_tool import DownloadedExternalTool, ExternalToolRequest +from pants.core.util_rules.external_tool import ( + DownloadedExternalTool, + ExternalToolRequest, + download_external_tool, +) from pants.engine.platform import Platform +from pants.engine.process import ( + Process, + ProcessResult, + execute_process_or_raise +) +from pants.engine.rules import implicitly @rule -async def demo(shellcheck: Shellcheck, ...) -> Foo: - shellcheck = await Get( - DownloadedExternalTool, - ExternalToolRequest, +async def demo(shellcheck: Shellcheck, platform: Platform) -> Foo: + shellcheck: DownloadedExternalTool = await download_external_tool( shellcheck.get_request(platform) ) - result = await Get( - ProcessResult, - Process(argv=[shellcheck.exe, ...], input_digest=shellcheck.digest, ...) + result: ProcessResult = await execute_process_or_raise( + **implicitly( + Process(argv=[shellcheck.exe, ...], input_digest=shellcheck.digest, ...) + ) ) ``` -A `DownloadedExternalTool` object has two fields: `digest: Digest` and `exe: str`. Use the `.exe` field as the first value of a `Process`'s `argv`, and use the `.digest` in the `Process's` `input_digest`. If you want to use multiple digests for the input, call `Get(Digest, MergeDigests)` with the `DownloadedExternalTool.digest` included. +A `DownloadedExternalTool` object has two fields: `digest: Digest` and `exe: str`. Use the `.exe` field as the first value of a `Process`'s `argv`, and use the `.digest` in the `Process's` `input_digest`. If you want to use multiple digests for the input, call `merge_digests()` with the `DownloadedExternalTool.digest` included. ## `Pex`: Install binaries through pip @@ -146,13 +161,15 @@ from pants.backend.python.util_rules.pex import ( PexProcess, PexRequest, PexRequirements, + create_pex, ) +from pants.engine.intrinsics import execute_process from pants.engine.process import FallibleProcessResult +from pants.engine.rules import implicitly @rule async def demo(...) -> Foo: - pex = await Get( - Pex, + pex: Pex = await create_pex( PexRequest( output_filename="black.pex", internal_only=True, @@ -161,9 +178,8 @@ async def demo(...) -> Foo: main=ConsoleScript("black"), ) ) - result = await Get( - FallibleProcessResult, - PexProcess(pex, argv=["--check", ...], ...), + result: FallibleProcessResult = await execute_process( + **implicitly(PexProcess(pex, argv=["--check", ...], ...)), ) ``` @@ -181,9 +197,9 @@ There are several other optional parameters that may be helpful. The resulting `Pex` object has a `digest: Digest` field containing the built `.pex` file. This digest should be included in the `input_digest` to the `Process` you run. -Instead of the normal `Get(ProcessResult, Process)`, you should use `Get(ProcessResult, PexProcess)`, which will set up the environment properly for your Pex to execute. There is a predefined rule to go from `PexProcess -> Process`, so `Get(ProcessResult, PexProcess)` will cause the engine to run `PexProcess -> Process -> ProcessResult`. +Instead of the usual execute_process(Process), you should use `execute_process(**implicitly(PexProcess))`, which will set up the environment properly for your Pex to execute. There is a rule to convert `PexProcess -> Process`, so this will cause the engine to run `PexProcess -> Process -> FallibleProcessResult`. -`PexProcess` requires arguments for `pex: Pex`, `argv: Iterable[str]`, and `description: str`. It has several optional parameters that mirror the arguments to `Process`. If you specify `input_digest`, be careful to first use `Get(Digest, MergeDigests)` on the `pex.digest` and any of the other input digests. +`PexProcess` requires arguments for `pex: Pex`, `argv: Iterable[str]`, and `description: str`. It has several optional parameters that mirror the arguments to `Process`. If you specify `input_digest`, be careful to first use `merge_digests()` on the `pex.digest` and any of the other input digests. :::note Use `PythonToolBase` when you need a Subsystem Often, you will want to create a [`Subsystem`](./options-and-subsystems.mdx) for your Python tool @@ -221,7 +237,7 @@ Then, you can set up your `Pex` like this: ```python @rule async def demo(black: Black, ...) -> Foo: - pex = await Get(Pex, PexRequest, black.to_pex_request()) + pex = await create_pex(black.to_pex_request(...)) ``` ::: diff --git a/docs/docs/writing-plugins/the-rules-api/migrating-gets.mdx b/docs/docs/writing-plugins/the-rules-api/migrating-gets.mdx new file mode 100644 index 00000000000..972d4eeb6d2 --- /dev/null +++ b/docs/docs/writing-plugins/the-rules-api/migrating-gets.mdx @@ -0,0 +1,81 @@ +--- + title: Migrating from Get + sidebar_position: 11 +--- + +Migrating away from the old call-by-type Rules API. + +--- + +## `Get` and `MultiGet` + +As [we've seen](./concepts.mdx), rules invoke other rules directly by name: + +```python +from pants.engine.fs import NativeDownloadFile +from pants.engine.intrinsics import download_file +from pants.engine.rules import rule +... + +@rule +async def my_rule() -> MyResult: + downloaded_file = await download_file( + NativeDownloadFile("https://www.google.com/robots.txt") + ) + ... +``` + +However, a previous version of the Rules API had a different idiom, call-by-type. This was achieved using a construct called `Get`: + +```python +from pants.engine.fs import Digest, NativeDownloadFile +from pants.engine.rules import Get, rule +... + +@rule +async def my_rule() -> MyResult: + downloaded_file = await Get(Digest, NativeDownloadFile("https://www.google.com/robots.txt")) + ... +``` + +A `Get(OutputType, InputType, input)` or `Get(OutputType, InputType(...))` invoked the engine's "fill in the blanks" mechanism to find a rule or a cascade of rules that could produce a value of OutputType from the given input type (plus any contextual parameters). + +To achieve concurrency you could `await MultiGet()`. + +## `Get` and `MultiGet` are deprecated + +`Get` and `MultiGet` are now deprecated, and will be removed entirely soon. Pants itself no longer uses them internally, but external plugins still might. Plugin authors must migrate from `Get`/`MultiGet` to call-by-name syntax as soon as possible. + +## Migrating to call-by-name + +Migrating to call-by-name is fairly straightforward: + +First, replace all `MultiGet`s with `concurrently`, imported from from `pants.engine.rules`. `MultiGet` is now just an alias for `concurrently`, so this is trivial to do with a find/replace. + +Then, replace `await Get(OutputType, InputType, input)` with `await rule_name(...)` where `rule_name` is the rule that returns a value `OutputType`. + +- If the rule has exactly one parameter, of type `InputType`, then this as simple as: + +```python +val = await rule_name(input) +``` + +- If the rule has other parameters that should be passed implicitly, then add an empty `**implicitly()`: + +```python +val = await rule_name(input, **implicitly()) +``` + +- If the rule that returns `OutputType` does not take `InputType` directly, but rather some `IntermediateType`, then we need to tell the Pants engine to fill in the blanks, just as the `Get` would have: + +```python +val = await rule_name(**implicitly({input: InputType})) +``` + +There may be various corner cases that require slightly more refactoring. We encourage you to ask for help on [Slack](/community/getting-help) with those. + +For examples of migrations, you can review [pull requests](https://github.com/pantsbuild/pants/pulls?q=is%3Apr+%22call-by-name%22+is%3Aclosed) made to the Pants repository. Additionally, there are some simplified examples in the form of [Pants integration tests](https://github.com/pantsbuild/pants/blob/main/src/python/pants/goal/migrate_call_by_name_integration_test.py). + +## Migrating union `Get`s to call-by-name + +If you're relying on polymorphic dispatch via a [union](./union-rules-advanced.mdx), then you must make sure that your implementation rule has the same signature as the "base" rule (the `@rule(polymorphic=True)` rule) - the same type annotations for parameters and return value - except with your subtype in place of the union type. If things aren't working and the base rule is a standard Pants rule, you can examine its signature in the Pants source code. diff --git a/docs/docs/writing-plugins/the-rules-api/processes.mdx b/docs/docs/writing-plugins/the-rules-api/processes.mdx index eef43b06105..a7cdd65820b 100644 --- a/docs/docs/writing-plugins/the-rules-api/processes.mdx +++ b/docs/docs/writing-plugins/the-rules-api/processes.mdx @@ -7,27 +7,26 @@ How to safely run subprocesses in your plugin. --- -It is not safe to use `subprocess.run()` like you normally would because this can break caching and will not leverage Pants's parallelism. Instead, Pants has safe alternatives with `Process` and `InteractiveProcess`. +It is not safe to use `subprocess.run()` like you normally would, because this can break caching and will not leverage Pants's concurrency mechanisms. Instead, Pants has the safe alternatives `Process` and `InteractiveProcess`. ## `Process` ### Overview -`Process` is similar to Python's `subprocess.Popen()`. The process will run in the background, and you can run multiple processes in parallel. +`Process` is similar to Python's `subprocess.Popen()`. The process will run in the background, and you can run multiple processes concurrently. ```python -from pants.engine.process import Process, ProcessResult -from pants.engine.rules import Get, rule +from pants.engine.process import Process, ProcessResult, execute_process_or_raise +from pants.engine.rules import rule @rule async def demo(...) -> Foo: - result = await Get( - ProcessResult, + result: ProcessResult = await execute_process_or_raise(**implicitly( Process( argv=["/bin/echo", "hello world"], description="Demonstrate processes.", ) - ) + )) logger.info(result.stdout.decode()) logger.info(result.stderr.decode()) ``` @@ -36,6 +35,8 @@ This will return a `ProcessResult` object, which has the fields `stdout: bytes`, The process will run in a temporary directory and is hermetic, meaning that it cannot read any arbitrary file from your project and that it will be stripped of environment variables. This sandbox is important for reproducibility and to allow running your `Process` anywhere, such as through remote execution. +If the process fails (i.e., returns a non-zero exit code) then an exception will be raised. If non-zero exit codes are not errors, and you want your code to be able to handle them, instead call `pants.engine.intrinsics.execute_process`, which returns a `FallibleProcessResult`. Like `ProcessResult`, `FallibleProcessResult` has the attributes `stdout: bytes`, `stderr: bytes`, and `output_digest: Digest`, and it adds `exit_code: int`. + :::note Debugging a `Process` Setting the [`--keep-sandboxes=always`](./tips-and-debugging.mdx#debugging-look-inside-the-chroot) flag will cause the sandboxes of `Process`es to be preserved and logged to the console for inspection. @@ -55,12 +56,15 @@ The `EnvironmentVars` type contains a subset of the environment that Pants was r ```python from pants.engine.env_vars import EnvironmentVarsRequest, EnvironmentVars -from pants.engine.rules import Get, rule +from pants.engine.rules import rule +from pants.core.util_rules.env_vars import environment_vars_subset @rule async def partial_env(...) -> Foo: - relevant_env_vars = await Get(EnvironmentVars, EnvironmentVarsRequest(["RELEVANT_VAR", "PATH"])) + relevant_env_vars: EnvironmentVars = await environment_vars_subset( + EnvironmentVarsRequest(["RELEVANT_VAR", "PATH"]) + ) ... ``` @@ -95,14 +99,6 @@ async def demo(...) -> Foo: `ProcessCacheScope` supports other options as well, including `ALWAYS`. ::: -### FallibleProcessResult - -Normally, a `ProcessResult` will raise an exception if the return code is not `0`. Instead, a `FallibleProcessResult` allows for any return code. - -Use `Get(FallibleProcessResult, Process)` if you expect that the process may fail, such as when running a linter or tests. - -Like `ProcessResult`, `FallibleProcessResult` has the attributes `stdout: bytes`, `stderr: bytes`, and `output_digest: Digest`, and it adds `exit_code: int`. - ## `InteractiveProcess` `InteractiveProcess` is similar to Python's `subprocess.run()`. The process will run in the foreground, optionally with access to the workspace. @@ -134,6 +130,11 @@ The `Effect` will return an `InteractiveProcessResult`, which has a single field A `Process` can be retried by wrapping it in a `ProcessWithRetries` and requesting a `ProcessResultWithRetries`. The last result, whether succeeded or failed, is available with the `last` parameter. For example, the following will allow for up to 5 attempts at running `my_process`: ```python -results = await Get(ProcessResultWithRetries, ProcessWithRetries(my_process, 5)) +from pants.engine.intrinsics import execute_process_with_retry +from pants.engine.process import ProcessWithRetries, ProcessResultWithRetries + +results: ProcessResultWithRetries = await execute_process_with_retry( + ProcessWithRetries(my_process, 5) +) last_result = results.last ``` diff --git a/docs/docs/writing-plugins/the-rules-api/rules-and-the-target-api.mdx b/docs/docs/writing-plugins/the-rules-api/rules-and-the-target-api.mdx index 346ed8d9535..d4692575cbc 100644 --- a/docs/docs/writing-plugins/the-rules-api/rules-and-the-target-api.mdx +++ b/docs/docs/writing-plugins/the-rules-api/rules-and-the-target-api.mdx @@ -117,50 +117,72 @@ For most [Common plugin tasks](../common-plugin-tasks/index.mdx), like adding a Given targets, you can find their direct and transitive dependencies. See the below section "The Dependencies field". -You can also find targets by writing your own `Spec`s, rather than using what the user provided. (The types come from `pants.base.specs`.) +You can also find targets by writing your own `Spec`s, rather than using what the user provided: ```python -# Inside an `@rule`, use `await Get` like this. -await Get( - Targets, - RawSpecs( - description_of_origin="my plugin", # Used in error messages for invalid specs. - # Each of these keyword args are optional. - address_literals=( - AddressLiteralSpec("my_dir", target_component="tgt"), # `my_dir:tgt` - AddressLiteralSpec("my_dir", target_component="tgt", generated_component="gen"), # `my_dir:tgt#gen` - AddressLiteralSpec("my_dir/f.ext", target_component="tgt"), # `my_dir/f.ext:tgt` - ), - file_literals=(FileLiteralSpec("my_dir/f.ext"),), # `my_dir/f.ext` - file_globs=(FileGlobSpec("my_dir/*.ext"),), # `my_dir/*.ext` - dir_literals=(DirLiteralSpec("my_dir"),), # `my_dir/` - dir_globs=(DirGlobSpec("my_dir"),), # `my_dir:` - recursive_globs=(RecursiveGlobSpec("my_dir"),), # `my_dir::` - ancestor_globs=(AncestorGlobSpec("my_dir"),), # i.e. `my_dir` and all ancestors - ) +from pants.base.specs import ( + AddressLiteralSpec, + AncestorGlobSpec, + DirGlobSpec, + DirLiteralSpec, + FileGlobSpec, + FileLiteralSpec, + RawSpecs, + RecursiveGlobSpec ) +from pants.engine.internals.graph import resolve_targets +from pants.engine.target import Target + + +@rule +async def my_rule() -> MyReturnValue: + ... + targets: Targets = await resolve_targets( + **implicitly( + RawSpecs( + description_of_origin="my plugin", # Used in error messages for invalid specs. + # Each of these keyword args are optional. + address_literals=( + AddressLiteralSpec("my_dir", target_component="tgt"), # `my_dir:tgt` + AddressLiteralSpec("my_dir", target_component="tgt", generated_component="gen"), # `my_dir:tgt#gen` + AddressLiteralSpec("my_dir/f.ext", target_component="tgt"), # `my_dir/f.ext:tgt` + ), + file_literals=(FileLiteralSpec("my_dir/f.ext"),), # `my_dir/f.ext` + file_globs=(FileGlobSpec("my_dir/*.ext"),), # `my_dir/*.ext` + dir_literals=(DirLiteralSpec("my_dir"),), # `my_dir/` + dir_globs=(DirGlobSpec("my_dir"),), # `my_dir:` + recursive_globs=(RecursiveGlobSpec("my_dir"),), # `my_dir::` + ancestor_globs=(AncestorGlobSpec("my_dir"),), # i.e. `my_dir` and all ancestors + ) + ) + ) + ... ``` Finally, you can look up an `Address` given a raw address string, using `AddressInput`. This is often useful to allow a user to refer to targets in [Options](./options-and-subsystems.mdx) and in `Field`s in your `Target`. For example, this mechanism is how the `dependencies` field works. This will error if the address does not exist. ```python from pants.engine.addresses import AddressInput, Address -from pants.engine.rules import Get, rule +from pants.engine.internals.build_files import resolve_address +from pants.engine.rules import rule, implicitly @rule async def example(...) -> Foo: - address = await Get( - Address, - AddressInput, - AddressInput.parse("project/util:tgt", description_of_origin="my custom rule"), + address: Address = await resolve_address( + **implicitly({ + AddressInput.parse( + "project/util:tgt", description_of_origin="my custom rule" + ): AddressInput + }) ) ``` Given an `Address`, there are two ways to find its corresponding `Target`: ```python -from pants.engine.addresses import AddressInput, Address, Addresses -from pants.engine.rules import Get, rule +from pants.engine.addresses import Address, AddressInput, Addresses +from pants.engine.internals.graph import resolve_target, resolve_targets +from pants.engine.rules import rule from pants.engine.target import Targets, WrappedTarget, WrappedTargetRequest @rule @@ -168,47 +190,58 @@ async def example(...) -> Foo: address = Address("project/util", target_name="tgt") # Approach #1 - wrapped_target = await Get( - WrappedTarget, + wrapped_target: WrappedTarget = await resolve_target( WrappedTargetRequest(address, description_of_origin="my custom rule"), + **implicitly(), ) target = wrapped_target.target # Approach #2 - targets = await Get(Targets, Addresses([address])) + targets = await resolve_targets(**implicitly(Addresses([address]))) target = targets[0] ``` ## The `Dependencies` field -The `Dependencies` field is an `AsyncField`, which means that you must use the engine to hydrate its values, rather than using `Dependencies.value` like normal. +The `Dependencies` field is an `AsyncField`, which means that you must use the engine to hydrate its values, rather than directly inspecting `Dependencies.value`. ```python +from pants.engine.addresses import Addresses +from pants.engine.internals.graph import resolve_dependencies from pants.engine.target import Dependencies, DependenciesRequest, Targets -from pants.engine.rules import Get, rule +from pants.engine.rules import implicitly, rule @rule async def demo(...) -> Foo: ... - direct_deps = await Get(Targets, DependenciesRequest(target.get(Dependencies))) -``` + # Hydrate to targets. + targets: Targets = await resolve_targets( + **implicitly(target.get(Dependencies)) + ) -`DependenciesRequest` takes a single argument: `field: Dependencies`. The return type `Targets` is a `Collection` of individual `Target` objects corresponding to each direct dependency of the original target. + # Or, hydrate to addresses. + direct_deps: Addresses = await resolve_dependencies( + DependenciesRequest(target.get(Dependencies)), **implicitly() + ) +``` -If you only need the addresses of a target's direct dependencies, you can use `Get(Addresses, DependenciesRequest(target.get(Dependencies))` instead. (`Addresses` is defined in `pants.engine.addresses`.) +`DependenciesRequest` takes a single argument: `field: Dependencies`. The return type `Targets` is a `Collection` of individual `Target` objects corresponding to each direct dependency of the original target. If you only need the addresses of a target's direct dependencies, you can hydrate to `Addresses instead. ### Transitive dependencies with `TransitiveTargets` -If you need the transitive dependencies of a target—meaning both the direct dependencies and those dependencies' dependencies—use `Get(TransitiveDependencies, TransitiveTargetsRequest)`. +If you need the transitive dependencies of a target, use `transitive_targets()`. ```python +from pants.engine.internals graph import transitive_targets from pants.engine.target import TransitiveTargets, TransitiveTargetsRequest -from pants.engine.rules import Get, rule +from pants.engine.rules import implicitly, rule @rule async def demo(...) -> Foo: ... - transitive_targets = await Get(TransitiveTargets, TransitiveTargetsRequest([target.address])) + transitive_targets: TransitiveTargets = await transitive_targets( + TransitiveTargetsRequest([target.address]), **implicitly() + ) ``` `TransitiveTargetsRequest` takes an iterable of `Address`es. @@ -232,61 +265,70 @@ class MyTarget(Target): core_fields = (..., PackagesField) ``` -Then, to resolve the addresses, you can use `UnparsedAddressInputs`: +Then, to resolve the addresses, you can use resolve_unparsed_address_inputs(): ```python from pants.engine.addresses import Addresses, UnparsedAddressInputs +from pants.engine.internals.graph import resolve_targets from pants.engine.target import Targets -from pants.engine.rules import Get, rule +from pants.engine.rules import implicitly, rule @rule async def demo(...) -> Foo: - addresses = await Get( - Addresses, - UnparsedAddressInputs, - my_tgt[MyField].to_unparsed_address_inputs() + addresses: Addresses = await resolve_unparsed_address_inputs( + my_tgt[MyField].to_unparsed_address_inputs(), **implicitly() ) - # Or, use this: - targets = await Get( - Targets, - UnparsedAddressInputs, - my_tgt[MyField].to_unparsed_address_inputs() + + # Or, to directly get targets: + targets: Targets = await resolve_targets( + **implicitly({ + my_tgt[MyField].to_unparsed_address_inputs(): UnparsedAddressInputs, + }) ) ``` -Pants will include your special-cased dependencies with `pants dependencies`, `pants dependents`, and `pants --changed-since`, but the dependencies will not show up when using `await Get(Addresses, DependenciesRequest)`. +Pants will include your special-cased dependencies in `pants dependencies`, `pants dependents`, and `pants --changed-since`, but the they will not show up when calling `await resolve_dependencies()`. ## `SourcesField` -`SourceField` is an `AsyncField`, which means that you must use the engine to hydrate its values, rather than using `Sources.value` like normal. +`SourceField` is an `AsyncField`, which means that you must use the engine to hydrate its values, rather than directly inspecting `Sources.value`. Some Pants targets like `python_test` have the field `source: str`, whereas others like `go_package` have the field `sources: list[str]`. These are represented by the fields `SingleSourceField` and `MultipleSourcesField`. When you're defining a new target type, you should choose which of these to subclass. However, when operating over sources generically in your `@rules`, you can use the common base class `SourcesField` so that your rule works with both formats. ```python +from pants.engine.internals.graph import hydrate_sources from pants.engine.target import HydratedSources, HydrateSourcesRequest, SourcesField -from pants.engine.rules import Get, rule +from pants.engine.rules import implicitly, rule @rule async def demo(...) -> Foo: ... - sources = await Get(HydratedSources, HydrateSourcesRequest(target[SourcesField])) + sources: HydratedSources = await hydrate_sources( + HydrateSourcesRequest(target[SourcesField]), **implicitly() + ) ``` `HydrateSourcesRequest` expects a `SourcesField` object. This can be a subclass, such as `PythonSourceField` or `GoPackageSourcesField`. `HydratedSources` has a field called `snapshot: Snapshot`, which allows you to see what files were resolved by calling `hydrated_sources.snapshot.files` and to use the resulting [`Digest`](./file-system.mdx) in your plugin with `hydrated_sources.snapshot.digest`. -Typically, you will want to use the higher-level `Get(SourceFiles, SourceFilesRequest)` utility instead of `Get(HydrateSources, HydrateSourcesRequest)`. This allows you to ergonomically hydrate multiple `SourcesField`s objects in the same call, resulting in a single merged snapshot of all the input source fields. +Typically, you will want to use the higher-level `determine_source_files()` utility instead of calling `hydrate_sources()` directly. This allows you to ergonomically hydrate multiple `SourcesField`s objects in the same call, resulting in a single merged snapshot of all the input source fields. ```python -from pants.core.util_rules.source_files import SourceFiles, SourceFilesRequest +from pants.core.util_rules.source_files import ( + SourceFiles, + SourceFilesRequest, + determine_source_files, +) from pants.engine.target import SourcesField -from pants.engine.rules import Get, rule +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - sources = await Get(SourceFiles, SourceFilesRequest([tgt1[SourcesField], tgt2[SourcesField]])) + sources: SourceFiles = await determine_source_files( + SourceFilesRequest([tgt1[SourcesField], tgt2[SourcesField]]) + ) ``` `SourceFilesRequest` expects an iterable of `SourcesField` objects. `SourceFiles` has a field `snapshot: Snapshot` with the merged snapshot of all resolved input sources fields. @@ -297,8 +339,15 @@ To convert a list of target addresses to existing source file names, you can req from itertools import chain from pants.engine.addresses import Addresses from pants.engine.collection import DeduplicatedCollection -from pants.engine.rules import Get, MultiGet, rule -from pants.engine.target import (HydratedSources, HydrateSourcesRequest, SourcesField, UnexpandedTargets) +from pants.engine.internals.graph import hydrate_sources, resolve_unexpanded_targets +from pants.engine.internals.selectors import concurrently +from pants.engine.rules import implicitly, rule +from pants.engine.target import ( + HydratedSources, + HydrateSourcesRequest, + SourcesField, + UnexpandedTargets +) class ProjectSources(DeduplicatedCollection[str]): @@ -307,13 +356,17 @@ class ProjectSources(DeduplicatedCollection[str]): @rule async def addresses_to_source_files(addresses: Addresses) -> ProjectSources: - targets = await Get(UnexpandedTargets, Addresses, addresses) - all_sources = await MultiGet(Get(HydratedSources, HydrateSourcesRequest(tgt.get(SourcesField))) for tgt in targets) - return ProjectSources(chain.from_iterable(sources.snapshot.files for sources in all_sources)) + targets: UnexpandedTargets = await resolve_unexpanded_targets(addresses) + all_sources = await concurrently( + hydrate_sources(HydrateSourcesRequest(tgt.get(SourcesField)), **implicitly()) + for tgt in targets + ) + return ProjectSources( + chain.from_iterable(sources.snapshot.files for sources in all_sources) + ) ``` -This is often useful when you need to pass target addresses to commands that are not Pants goals and would not -be able to interpret them properly. +This is often useful when you need to pass target addresses to commands that are not Pants goals and would not be able to interpret them properly. ### Enabling codegen @@ -322,19 +375,20 @@ If you want your plugin to work with code generation, you must set the argument ```python from pants.backend.python.target_types import PythonSourceField from pants.core.target_types import ResourceSourceField +from pants.engine.internals.graph import hydrate_sources from pants.engine.target import HydratedSources, HydrateSourcesRequest, SourcesField -from pants.engine.rules import Get, rule +from pants.engine.rules import implicitly, rule @rule async def demo(...) -> Foo: ... - sources = await Get( - HydratedSources, + sources: HydratedSources = await hydrate_sources( HydrateSourcesRequest( target.get(SourcesField), enable_codegen=True, for_sources_types=(PythonSourceField, ResourceSourceField) - ) + ), + **implicitly(), ) ``` @@ -345,15 +399,18 @@ If the provided `SourcesField` object is already a subclass of one of the `for_s ```python from pants.backend.python.target_types import PythonSourceField from pants.core.target_types import ResourceSourceField -from pants.core.util_rules.source_files import SourceFiles, SourceFilesRequest +from pants.core.util_rules.source_files import ( + SourceFiles, + SourceFilesRequest, + determine_source_files +) from pants.engine.target import SourcesField -from pants.engine.rules import Get, rule +from pants.engine.rules import rule @rule async def demo(...) -> Foo: ... - sources = await Get( - SourceFiles, + sources: SourceFiles = await determine_source_files( SourceFilesRequest( [target.get(SourcesField)], enable_codegen=True, @@ -366,24 +423,35 @@ async def demo(...) -> Foo: You may sometimes want to remove source roots from files, i.e. go from `src/python/f.py` to `f.py`. This can make it easier to work with tools that would otherwise be confused by the source root. -To strip source roots, use `Get(StrippedSourceFiles, SourceFiles)`. +To strip source roots, use `strip_source_roots()`: ```python -from pants.core.util_rules.source_files import SourceFiles, SourceFilesRequest -from pants.core.util_rules.stripped_source_files import StrippedSourceFiles -from pants.engine.rules import Get, rule +from pants.core.util_rules.source_files import ( + SourceFiles, + SourceFilesRequest, + determine_source_files, +) +from pants.core.util_rules.stripped_source_files import ( + StrippedSourceFiles, + strip_source_roots, +) +from pants.engine.rules import rule from pants.engine.target import SourcesField @rule async demo(...) -> Foo: ... - unstripped_sources = await Get(SourceFiles, SourceFilesRequest([target.get(SourcesField)])) - stripped_sources = await Get(StrippedSourceFiles, SourceFiles, unstripped_sources) + unstripped_sources: SourceFiles = await determine_source_files( + SourceFilesRequest([target.get(SourcesField)]) + ) + stripped_sources: StrippedSourceFiles = await strip_source_roots( + unstripped_sources + ) ``` `StrippedSourceFiles` has a single field `snapshot: Snapshot`. -You can also use `Get(StrippedSourceFiles, SourceFilesRequest)`, and the engine will automatically go from `SourceFilesRequest -> SourceFiles -> StrippedSourceFiles)`. +You can also use `await stripped_sources(**implicitly(SourceFilesRequest(..)))`, and the engine will automatically go from `SourceFilesRequest -> SourceFiles -> StrippedSourceFiles`. ## `FieldSet`s diff --git a/docs/docs/writing-plugins/the-rules-api/testing-plugins.mdx b/docs/docs/writing-plugins/the-rules-api/testing-plugins.mdx index 67e97b38b0b..81f29d4d248 100644 --- a/docs/docs/writing-plugins/the-rules-api/testing-plugins.mdx +++ b/docs/docs/writing-plugins/the-rules-api/testing-plugins.mdx @@ -78,7 +78,7 @@ For Approach #4, you should use `setup_tmpdir()` to set up BUILD files. ## Approach 2: `run_rule_with_mocks()` (unit tests for rules) -`run_rule_with_mocks()` will run your rule's logic, but with each argument to your `@rule` provided explicitly by you and with mocks for any `await Get`s. This means that the test is fully mocked; for example, `run_rule_with_mocks()` will not actually run a `Process`, nor will it use the file system operations. This is useful when you want to test the inlined logic in your rule, but usually, you will want to use Approach #3. +`run_rule_with_mocks()` will run your rule's logic, but with each argument to your `@rule` provided explicitly by you and with mocks for any `await`s. This means that the test is fully mocked; for example, `run_rule_with_mocks()` will not actually run a `Process`, nor will it use the file system operations. This is useful when you want to test the inlined logic in your rule, but usually, you will want to use Approach #3. To use `run_rule_with_mocks`, pass the `@rule` as its first arg, then `rule_args=[arg1, arg2, ...]` in the same order as the arguments to the `@rule`. @@ -95,11 +95,11 @@ async def int_to_str(i: int) -> str: def test_int_to_str() -> None: - result: str = run_rule_with_mocks(int_to_str, rule_args=[42], mock_gets=[]) + result: str = run_rule_with_mocks(int_to_str, rule_args=[42], mock_calls={}) assert result == "42" ``` -If your `@rule` has any `await Get`s or `await Effect`s, set the argument `mock_gets=[]` with `MockGet`/`MockEffect` objects corresponding to each of them. A `MockGet` takes three arguments: `output_type: type`, `input_types: tuple[type, ...]`, and `mock: Callable[..., InputType]`, which is a function that takes an instance of each of the `input_types` and returns a single instance of the `output_type`. +If your `@rule` has any `await`s, set the argument `mock_calls={}` to a mapping from the fully-qualified name of the awaited rule, to the function that mocks it. For example, given this contrived rule to find all targets with `sources` with a certain filename included (find a "needle in the haystack"): @@ -110,8 +110,14 @@ from dataclasses import dataclass from pathlib import PurePath from pants.engine.collection import Collection -from pants.engine.rules import Get, MultiGet, rule -from pants.engine.target import HydratedSources, HydrateSourcesRequest, SourcesField, Target +from pants.engine.internals.graph import hydrate_sources +from pants.engine.rules import rule +from pants.engine.target import ( + HydratedSources, + HydrateSourcesRequest, + SourcesField, + Target +) @dataclass(frozen=True) @@ -129,13 +135,15 @@ class TargetsWithNeedle(Collection[Target]): @rule async def find_needle_in_haystack(find_needle: FindNeedle) -> TargetsWithNeedle: - all_hydrated_sources = await MultiGet( - [Get(HydratedSources, HydrateSourcesRequest(tgt.get(SourcesField))) for tgt in find_needle.targets] - ) + all_hydrated_sources = await concurrently([ + hydrate_sources(HydrateSourcesRequest(tgt.get(SourcesField)), **implicitly()) + for tgt in find_needle.targets + ]) return TargetsWithNeedle( tgt for tgt, hydrated_sources in zip(find_needle.targets, all_hydrated_sources) - if any(PurePath(fp).name == find_needle.needle_filename for fp in hydrated_sources.snapshot.files) + if any(PurePath(fp).name == find_needle.needle_filename + for fp in hydrated_sources.snapshot.files) ) ``` @@ -144,8 +152,9 @@ We can write this test: ```python from pants.engine.addresses import Address from pants.engine.fs import EMPTY_DIGEST, Snapshot +from pants.engine.internals.graph import hydrate_sources from pants.engine.target import HydratedSources, HydrateSourcesRequest, Target, Sources -from pants.testutil.rule_runner import MockGet, run_rule_with_mocks +from pants.testutil.rule_runner import run_rule_with_mocks class MockTarget(Target): alias = "mock_target" @@ -172,13 +181,9 @@ def test_find_needle_in_haystack() -> None: result: TargetsWithNeedle = run_rule_with_mocks( find_needle_in_haystack, rule_args=[find_needles_request], - mock_gets=[ - MockGet( - output_type=HydratedSources, - input_types=(HydrateSourcesRequest,), - mock=mock_hydrate_sources, - ) - ], + mock_calls={ + "pants.engine.internals.graph.hydrate_sources": mock_hydrate_sources, + }, ) assert list(result) == [tgt2] ``` @@ -378,11 +383,11 @@ Now that you have your `RuleRunner` set up, along with any options and the conte Unlike Approach #2, you will not explicitly say which `@rule` you want to run. Instead, look at the return type of your `@rule`. Use `rule_runner.request(MyOutput, [input1, ...])`, where `MyOutput` is the return type. -`rule_runner.request()` is equivalent to how you would normally use `await Get(MyOuput, Input1, input1_instance)` in a rule (See [Concepts](./concepts.mdx)). For example, if you would normally say `await Get(Digest, MergeDigests([digest1, digest2])`, you'd instead say `rule_runner.request(Digest, [MergeDigests([digest1, digest2])`. +`rule_runner.request()` is a way to request a value of an output type given a set of values of input types. It invokes the engine's "fill in the blanks" capability (See [Concepts](./concepts.mdx)). -You will also need to add a `QueryRule` to your `RuleRunner` setup, which gives a hint to the engine for what requests you are going to make. The `QueryRule` takes the same form as your `rule_runner.request()`, except that the inputs are types, rather than instances of those types. +You will need to add a `QueryRule` to your `RuleRunner` setup, which tells the engine what requests you are going to make. The `QueryRule` takes the same form as your `rule_runner.request()`, except that the inputs are types, rather than instances of those types. -For example, given this rule signature (from the above Approach #2 example): +For example, given this rule signature (from the Approach #2 example above): ```python @rule diff --git a/docs/docs/writing-plugins/the-rules-api/tips-and-debugging.mdx b/docs/docs/writing-plugins/the-rules-api/tips-and-debugging.mdx index 9644d98594a..bec4432803f 100644 --- a/docs/docs/writing-plugins/the-rules-api/tips-and-debugging.mdx +++ b/docs/docs/writing-plugins/the-rules-api/tips-and-debugging.mdx @@ -11,35 +11,44 @@ We would love to help you with your plugin. Please reach out through [Slack](/co We also appreciate any feedback on the Rules API. If you find certain things confusing or are looking for additional mechanisms, please let us know. ::: -## Tip: Use `MultiGet` for increased concurrency +## Tip: Use `concurrently` for increased concurrency -Every time your rule has `await`, Python will yield execution to the engine and not resume until the engine returns the result. So, you can improve concurrency by instead bundling multiple `Get` requests into a single `MultiGet`, which will allow each request to be resolved through a separate thread. +Every time your rule `await`s, Python will yield execution to the engine and not resume until the engine returns the result. So, you can improve concurrency by instead bundling multiple `await` requests into a single `concurrently`, which will allow requests to execute concurrently on multiple threads. Okay: ```python -from pants.core.util_rules.source_files import SourceFilesRequest, SourceFiles -from pants.engine.fs import AddPrefix, Digest -from pants.engine.internals.selectors import Get +from pants.core.util_rules.source_files import ( + SourceFilesRequest, + determine_source_files +) +from pants.engine.fs import AddPrefix +from pants.engine.internals.selectors import concurrently +from pants.engine.intrinsics import add_prefix +from pants.engine.rules import rule @rule async def demo(...) -> Foo: - new_digest = await Get(Digest, AddPrefix(original_digest, "new_prefix")) - source_files = await Get(SourceFiles, SourceFilesRequest(sources_fields)) + new_digest = await add_prefix(AddPrefix(original_digest, "new_prefix")) + source_files = await determine_source_files(SourceFilesRequest(sources_fields)) ``` Better: ```python -from pants.core.util_rules.source_files import SourceFilesRequest, SourceFiles -from pants.engine.fs import AddPrefix, Digest -from pants.engine.internals.selectors import Get, MultiGet +from pants.core.util_rules.source_files import ( + SourceFilesRequest, + determine_source_files +) +from pants.engine.fs import AddPrefix +from pants.engine.intrinsics import add_prefix +from pants.engine.rules import concurrently, rule @rule async def demo(...) -> Foo: - new_digest, source_files = await MultiGet( - Get(Digest, AddPrefix(original_digest, "new_prefix")), - Get(SourceFiles, SourceFilesRequest(sources_fields)), + new_digest, source_files = await concurrently( + add_prefix(AddPrefix(original_digest, "new_prefix")), + determine_source_files(SourceFilesRequest(sources_fields)), ) ``` @@ -82,7 +91,7 @@ Rule graph issues can be particularly hard to figure out - the error messages ar We encourage you to reach out in #plugins on [Slack](/community/getting-help) for help. -Often the best way to debug a rule graph issue is to isolate where the problem comes from by commenting out code until the graph compiles. The rule graph is formed solely by looking at the types in the signature of your `@rule` and in any `Get` statements - none of the rest of your rules matter. To check if the rule graph can be built, simply run `pants --version`. +Often the best way to debug a rule graph issue is to isolate where the problem comes from by commenting out code until the graph compiles. The rule graph is formed solely by looking at the types in the signature of your `@rule` and in any `await`s - none of the rest of your rules matter. To check if the rule graph can be built, simply run `pants --version`. We recommend starting by determining which backend—or combination of backends—is causing issues. You can run the below script to find this. Once you find the smallest offending combination, focus on fixing that first by removing all irrelevant backends from `backend_packages` in `pants.toml`—this reduces the surface area of where issues can come from. (You may need to use the option `--no-verify-config` so that Pants doesn't complain about unrecognized options.) @@ -130,13 +139,13 @@ if __name__ == "__main__": main() ``` -Once you've identified the smallest combination of backends that fail, and you have updated `pants.toml`, you can try isolating which rules are problematic by commenting out `Get`s and the parameters to `@rule`s. +Once you've identified the smallest combination of backends that fail, and you have updated `pants.toml`, you can try isolating which rules are problematic by commenting out `await`s and the parameters to `@rule`s. Some common sources of rule graph failures: - Dependent rules are not registered. - This is especially common when you only have one backend activated entirely. We recommend trying to get each backend to be valid regardless of what other backends are activated. Use the above script to see if this is happening. - - To fix this, see which types you're using in your `@rule` signatures and `Get`s. If they come from another backend, activate their rules. For example, if you use `await Get(Pex, PexRequest)`, you should activate `pants.backend.python.util_rules.pex.rules()` in your `register.py`. + - To fix this, see which types you're using in your `@rule` signatures and `await`s. If they come from another backend, activate their rules. For example, if you use `await create_pex(PexRequest)`, you should activate `pants.backend.python.util_rules.pex.rules()` in your `register.py`. - Not "newtyping". - It's possible and sometimes desirable to use types already defined in your plugin or core Pants. For example, you might want to define a new rule that goes from `MyCustomClass -> Process`. However, sometimes this makes the rule graph more complicated than it needs to be. - It's often helpful to create a result and request type for each of your `@rule`s, e.g. `MyPlugin` and `MyPluginRequest`. diff --git a/docs/docs/writing-plugins/the-rules-api/union-rules-advanced.mdx b/docs/docs/writing-plugins/the-rules-api/union-rules-advanced.mdx index 635d62972bf..d56e09860e0 100644 --- a/docs/docs/writing-plugins/the-rules-api/union-rules-advanced.mdx +++ b/docs/docs/writing-plugins/the-rules-api/union-rules-advanced.mdx @@ -9,19 +9,44 @@ Polymorphism for the engine. Union rules solve the same problem that polymorphism solves in general: how to write generic code that operates on types not known about at the time of writing. -For example, Pants has many generic goals like `lint` and `test`. Those `@goal_rule` definitions cannot know about every concrete linter or test implementation ahead-of-time. +For example, Pants has many generic goals like `lint` and `test`. Those `@goal_rule` definitions cannot know about every concrete linter or test implementation ahead of time. -Unions allow a specific linter to be registered with `UnionRule(LintTargetsRequest, ShellcheckRequest)`, and then for `lint.py` to access its type: +The solution involves two related declarations: + +1) A registration mechanism, `UnionRule`, that declares a specific class to be a member of a generic union. +2) A `polymorphic` keyword on the `@rule` decorator, that tells the engine that calls to the rule should be dispatched to some other rule based on the runtime type of the union member provided. + +This is best understood via example: ```python title="pants/core/goals/lint.py" -from pants.engine.rules import Get, MultiGet, goal_rule +from dataclasses import dataclass +from pants.engine.rules import concurrently, goal_rule, rule from pants.engine.target import Targets from pants.engine.unions import UnionMembership -.. + +@union +class LintTargetsRequest(ABC): + # The union base for all specific linters. + # Can have fields common to all linter requests here. + ... + + +@dataclass(frozen=True) +class LintResults: + ... + + +@rule(polymorphic=True) +async def lint_target(req: LintTargetsRequest) -> LintResults: + # If no implementation for the member type is found, this generic + # implementation will be invoked. In this case that is not useful, + # so we raise. + raise NotImplementedError(f"Must be implemented for {type(req)}") + @goal_rule -async def lint(..., targets: Targets, union_membership: UnionMembership) -> Lint: +async def lint(targets: Targets, union_membership: UnionMembership) -> Lint: lint_request_types = union_membership[LintTargetsRequest] concrete_requests = [ request_type( @@ -31,37 +56,49 @@ async def lint(..., targets: Targets, union_membership: UnionMembership) -> Lint ) for request_type in lint_request_types ] - results = await MultiGet( - Get(LintResults, LintTargetsRequest, concrete_request) + results = await concurrently( + lint_target(**implicitly({concrete_request: LintTargetsRequest})) for concrete_request in concrete_requests ) ``` ```python title="pants-plugins/bash/shellcheck.py" -from pants.core.goals.lint import LintTargetsRequest +from pants.core.goals.lint import LintResults, LintTargetsRequest +from pants.engine.rules import collect_rules, rule +# It is common for the union member to also subclass the union base. +# It's not strictly required, but it may be in the future, so it is +# good practice today. class ShellcheckRequest(LintTargetsRequest): ... -... +@rule +async def shellcheck_target(req: ShellcheckRequest) -> LintResults: + # At runtime, calls to the generic `lint_target()` on a + # `ShellcheckRequest` will be dispatched here. + ... def rules(): - return [*ShellcheckRequest.rules()] + return [ + *collect_rules(), + UnionRule(LintTargetsRequest, ShellcheckRequest), + ] ``` -This example will find all registered linter implementations by looking up `union_membership[LintTargetsRequest]`, which returns a tuple of all `LintTargetsRequest ` types that were registered with a `UnionRule`, such as `ShellcheckRequest` and `Flake8Request`. +This example will find all registered linter implementations by looking up `union_membership[LintTargetsRequest]`, which returns a tuple of all `LintTargetsRequest` members that were registered with a `UnionRule`, such as `ShellcheckRequest` or `Flake8Request`. ## How to create a new Union -To set up a new union, create a class for the union "base". Typically, this should be an [abstract class](https://docs.python.org/3/library/abc.html) that is subclassed by the union members, but it does not need to be. Mark the class with `@union`. +To set up a new union, create a class for the union base. Typically, this should be an [abstract class](https://docs.python.org/3/library/abc.html) that is subclassed by the union members. Mark the class with `@union`. ```python from abc import ABC, abstractmethod -from pants.engine.unions import union +from pants.engine.unions import UnionRule, union + @union class Vehicle(ABC): @@ -77,8 +114,13 @@ class Truck(Vehicle): def num_wheels(self) -> int: return 4 + def rules(): return [UnionRule(Vehicle, Truck)] ``` Now, your rules can request `UnionMembership` as a parameter in the `@rule`, and then look up `union_membership[Vehicle]` to get a tuple of all relevant types that are registered via `UnionRule`. + +There are many instructive examples of Union use in the Pants codebase. + +We hope to simplify this mechanism in the future to rely more heavily on Python subclassing, instead of the Pants-specific boilerplate currently required. This is why we strongly recommend making your union members subclasses of the union base.