Skip to content

Conversation

@dabla
Copy link
Contributor

@dabla dabla commented Aug 29, 2025

This PR tries to fix the rendering of templated fields with start from trigger args without breaking architectural assumptions, which the initial PR 53071 did violate and was reverted in PR 55037.

This PR could be simplified without the need of parsing the DAG if the render_template_fields would be re-usable in the triggerers, as of now it's part of the BaseOperator and thus not re-usable in BaseTrigger. As discussed with @ashb , apparently @amoghrajesh would be working on this (e.g. making template renderers reusable).

As opposed to previous attempt, here the rendering of the templates isn't done while creating the trigger into the database within the TriggerRunnerSupervisor. Here, the TriggerRunnerSupervisor will only retrieve the serialized DAG from the DagModel, and pass it to the workloads RunTrigger as an extra parameter (e.g named dag_data), which in turn will be used by the TriggerRunner in the subprocess in which there it will have access to the XCom's, and thus, shouldn't raise the: ImportError: cannot import name 'SUPERVISOR_COMMS' from 'airflow.sdk.execution_time.task_runner'.

So, if template rendering would be reusable outside the BaseOperator, then we wouldn't need to parse the DAG from the DagBag to retrieve the task (e.g. BaseOperator) to be able to do the template rendering, which would make the solution even easier but also less heavy (e.g. more performant) for the triggerer subprocess.

After further reflection, also regarding the work I'm doing for AIP-88, the current solution will still need to get the task (e.g. Operator) for the trigger when yielding multiple events. Same for the rendering of templates, you need the context to be able to render those, and to get the context, you need at least a RuntimeTaskInstance, which of course, requires guess what? a task (e.g. BaseOperator). So even if the rendering of templates would be reusable across operator and triggerer, you will still need the context, thus, in this case, wouldn't change that much to the current solution.

As loading DAG code is prohibited in the triggerer, we load the serialized DAG version from the database. As a consequence, callables aren't supported as kwargs when start_from_trigger is enabled. Thus, when enabling an operator as start_from_trigger, we will automatically validate the kwargs and check if those don't contain callables, otherwise an AirflowException will be raised stating which kwargs has a callable.

image

Also the test dag from @kaxil is working:

image image

This PR only fixes the start_from_trigger with rendered templates for non-expanded tasks, to allow start_from_trigger with expanded tasks, another PR will be needed, but at least, we can already support start_from_trigger on non-expanded tasks.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:Executors-core LocalExecutor & SequentialExecutor area:Triggerer labels Aug 29, 2025
@dabla dabla marked this pull request as draft August 29, 2025 12:30
@ashb
Copy link
Member

ashb commented Sep 3, 2025

https://github.com/apache/airflow/blob/3.0.0rc3/airflow-core/newsfragments/aip-66.significant.rst

Dag bundles are not initialized in the triggerer. In practice, this means that triggers cannot come from a dag bundle. This is because the triggerer does not deal with changes in trigger code over time, as everything happens in the main process. Triggers can come from anywhere else on sys.path instead.

(Emphasis mine)

@ramitkataria
Copy link
Contributor

This would also be very useful for async callbacks (currently used for Deadline Alerts) running in the triggerer! Once this is merged in, I could create a followup PR to replace the implementation in #55241

@dabla dabla requested a review from ashb September 9, 2025 12:47
@dabla
Copy link
Contributor Author

dabla commented Sep 9, 2025

@ashb I've introduced a SerializedDagBag class, which acts as a simple cache in the same as the DagBag but then only uses serialized DAG's from the DB instead of from the filesystem. The SerializedDagBag needs dag_id and dag_version_id to be able to return the corresponding DAG.

The reason why I still retrieve the serialized DAG from within the update_triggers (which has DB access) instead of the scheduler to render the templates there, as you proposed, is to still be able to construct the RuntimeTaskInstance from within the update_triggers (which needs a task to be able to construct) so that we can assign that instance to the trigger instead of the serialized TI from the workloads module, as this will also be needed to be able to run the multiple yielded trigger events for AIP-88 (as those will need the template context to be able to do the pagination from within the triggerer while yielding the events). I know this is a very complicated explanation, let me know if you want some clarification there.

I still need to write a unit test for SerializedDagBag though.

@ashb
Copy link
Member

ashb commented Sep 9, 2025

SerializedDagBag already exists in the form of SchedulerDagBag I think -- rather than a new one, it might be better to rename that one if it otherwise fits your need

@dabla
Copy link
Contributor Author

dabla commented Sep 9, 2025

SerializedDagBag already exists in the form of SchedulerDagBag I think -- rather than a new one, it might be better to rename that one if it otherwise fits your need

You're right, it's called DBDagBag, it serves the same purpose, so will replace it with that one, thx @ashb for pointing this out 👍

@dabla
Copy link
Contributor Author

dabla commented Sep 9, 2025

SerializedDagBag already exists in the form of SchedulerDagBag I think -- rather than a new one, it might be better to rename that one if it otherwise fits your need

I ditched it and won't be using it in this PR as I need the SerializedDagModel, not the SerializedDAG, but I'm already using it for AIP-88.

@dabla dabla changed the title Second attempt to fix rendering of template fields with start from trigger Fix rendering of template fields with start from trigger Sep 9, 2025
@dabla dabla marked this pull request as ready for review September 10, 2025 07:24
@dabla dabla marked this pull request as draft September 10, 2025 16:51
@dabla dabla marked this pull request as ready for review September 12, 2025 16:30
@dabla
Copy link
Contributor Author

dabla commented Sep 12, 2025

So small explanation what I did in this PR.

  • @kaxil First of all I had to re-implement (a simplified version) of the defer_task method in TaskInstance, which was called by the schedule_tis method of the DagRun but which was commented out as start_from_trigger wasn't working correctly.
  • @ashb proposed to try to have a common defer_task classmethod in the Trigger, and which would be re-useable across the execution api and the schedule_tis method of the DagRun, but that didn't work, as there the update of the TaskInstance is done differently, and doing it this way didn't work in the schedule_tis method as modification weren't picked up by the scheduler, thus the task instance got stuck. I even tried refreshing the ti on the session, flushing and even committing it but no avail. Thus at the moment we have a little of duplication there. I've added a TODO to keep that in mind as at the moment my main focus was to make the rendering of template fields working in triggerers.
  • I also had to exclude the base Trigger module from the check-sdk-imports, as there I had to import the Templater from the sdk (which was previously in airflow.template.templater).
  • @kaxil Your test dag is also working with start_from_trigger, see screenshot at top of PR.

@dabla dabla force-pushed the fix/render-templated-fields-start-from-trigger-in-trigger-subprocess branch from b48d4eb to 85f5e2b Compare January 8, 2026 08:21
@dabla dabla requested review from ferruzzi and uranusjr January 8, 2026 13:28
@dabla
Copy link
Contributor Author

dabla commented Jan 8, 2026

Re-openen this PR as I realised I needed it to be able to finish AIP-88.

self.task_instance.run_id,
self.task_instance.map_index,
if not self.task_instance:
raise AirflowException(f"TaskInstance not set on {self.__class__.__name__}!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wei is correct. We don't want to use AirflowExecption for new use cases.

if not self.task_instance:
raise AirflowException(f"TaskInstance not set on {self.__class__.__name__}!")

if not isinstance(self.task_instance, RuntimeTaskInstance):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for for 3.0..3.1 right? Might be worth a comment saying when we could hit this case.

:param session: Sqlalchemy session
"""
if not self.task_instance:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this going to fail on Airflow 2.x? Or maybe on Airflow 3.1? It feels like it's going to fail sometime when it should still be supported.

self._BaseOperator__init_kwargs.update(kwargs) # type: ignore

# Validate trigger kwargs
if hasattr(self, "_validate_start_from_trigger_kwargs"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we define a _validate_start_from_trigger_kwargs on L1415 how can this ever be False?

"""
return self

def expand_start_from_trigger(self, *, context: Context) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this function? Given the doc string I would have expected to see a different implementation of this for mapped operators, but it seems to be working without it, so maybe this fn isn't needed?

@dabla dabla requested a review from shahar1 as a code owner January 19, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Executors-core LocalExecutor & SequentialExecutor area:Triggerer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants