Comments (9)
There's a little more information here: #7.
I think what might be happening is that the scheduler is kicking off from the date of the 'start_date', ignoring the hours/mins/secs, and then running every 5 minutes to "catch up" until present time.
But that would seem to be contradicted by the documentation:
start_date (datetime.datetime) – The timestamp ...
Note datetime.datetime
(not datetime.date
), and "timestamp" reference...
from airflow.
schedule_interval
currently can only be defined at the DAG level, when creating a DAG instance. If you write dag = DAG(default_args=args, schedule_interval=timedelta(minutes=5))
you'll get a 5 minutes DAG. The schedule_interval
at the task level is currently disregarded and the one from the DAG level is forced. In theory you could have a DAG with tasks running at different intervals but that may be slightly confusing so I switched that feature off for now. I could re-enable it easily.
The scheduler will start on start_date
, and attempt to run for every schedule_interval
from that point. I think the scheduler is set to trigger every minute by default.
from airflow.
Thanks for your speedy answer Maxime! If I pass in datetime.datetime(2015, 6, 15, 7, 28, 48, 163296)
as my start_date
, then does the scheduler start:
- At
datetime.datetime(2015, 6, 15, 7, 28, 48, 163296)
, or - At
datetime.datetime(2015, 6, 15, 0, 0, 0, 0)
?
(BTW, I agree - schedule_interval
at the DAG and task level seems confusing, good to hear that's being removed.)
from airflow.
The key for a task instance is (dag_id, task_id, execution_date)
. Your first execution_date
is == to your start_date
, then your following execution_date
will be start_date
+ schedule_interval
(and so on).
So. You should definitely round down your start_date
to match your schedule_interval
. Meaning if you run an hourly DAG, your start_date
should be truncated to a specific hour, otherwise that's just confusing to have trailing minutes/seconds.
Also, if we were to support DAGs with heterogenous schedule_interval, (say a mix of hourly and daily), you'd want them to align on the execution_date
, meaning every 24 hours you want for the hourly job to line up with your daily job, since the dependencies are checked based on the execution_date
from airflow.
Many thanks @mistercrunch - your description doesn't exactly match what I saw, but it might well have been pilot error on my part. I'll give it another try and get back to you either way...
from airflow.
I was trying this schedule_interval example and cannot get it to work with the current pypi version (1.0.1) and github version. If I pass in the schedule_interval to DAG it applies it, but as part of the default args it is not used.
from airflow.
Current pypi is 1.3.1 I believe, but schedule_interval is a DAG only param for now, I could add heterogenous schedule_interval support easily but I need to put some safeguards on it first:
Related: #221
from airflow.
I can't remember exactly but I think I was providing it to DAG, just that it only works if given explicitly, not as a default args. Or are the default args not for the DAG?
from airflow.
default_args
is really just for operators constructor calls. Maybe it should be called task_default_args
It's useful there because people typically create many tasks and tasks share a lot of parameters.
I clarified the docs yesterday around schedule_interval
. I'd make schedule_interval a required arg for DAG, but I don't want to break existing DAGs...
from airflow.
Related Issues (20)
- Task processes killed with SIGTERM signal - task PID of job runner does not match
- Add task status filters to Task Duration Histogram view HOT 1
- Deferred operator do not preserve attribute values set during execution HOT 1
- Missing `usePgbouncer` key for Triggerer
- Logging out from Web UI raises Airflow 405 error HOT 8
- Vertex AI job operators in deferrable mode assume job uses Managed Model HOT 4
- EmrContainerOperator in Async mode doesn't respect default "infinite" polling number HOT 2
- Change signature of `cleanup_stuck_queued_tasks` to return TIs or TI Keys HOT 6
- SparkKubernetesOperator fails to fetch the driver pod when SparkApplication is still in pending state
- ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: cannot allocate memory in static TLS block
- SparkSqlOperator and SparkSubmitOperator are using different types for configurations HOT 2
- Get rid of bootstrap 3 from Airflow UI HOT 2
- BeamRunJavaPipelineOperator fails without job_name set HOT 3
- Sensors stuck in queued after being spot interrupted HOT 4
- Fix backfilling CLI command where DAG files are missing HOT 2
- deferred / deferrable task are not take in account by max_active_tasks
- Support non-public schema for PostgreSQL 15+ HOT 1
- Allow setting log_driver when using DockerSwarmOperator HOT 1
- Task with ALL_DONE trigger rule gets skipped HOT 4
- control state of individual taskflow in mapped task-group HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from airflow.