-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DAG to remove old entries from metadata database #245
Conversation
81efd20
to
0dd0e7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for putting together to keep our Airflow environment healthy.
When I looked at the DAG definition in airflow, I noticed the cleanup_sessions
task instance does not have an upstream dependency on print_configuration
. I assume this is ok? The dependency tree just doesn't match the rest of the tasks.
Do you think it's also worthwhile to enable Sentry -> Slack alerting by adding the alert_after_max_retries
parm to the PythonOperators
?
Neither of these suggestions block the merge.
Regarding the About the |
* dag to remove old entries from metadata database * reformat dag file * update `pre-commit` steps and configuration * downgrade `pre-commit` `sqlfluff` step * fix `cleanup_RenderedTaskInstanceFields` error * add variable to change retention period * toggle boolean to start deleting db entries * add `alert_after_max_retries` callback function Co-authored-by: lucas zanotelli <[email protected]>
This PR adds a maintenance DAG to prune the content of Airflow’s metadata database.
This DAG removes old entries from job,
dag_run
,task_instance
,log
,xcom
,sla_miss
,dags
,task_reschedule
,task_fail
and import_error tables by default. In the DAG, review the list of tables and decide whether old entries must be removed from them. In general, most space savings are provided by cleaninglog
,task_instance
, dag_run andxcom
tables. To exclude a table from cleanup, modify the DAG and comment corresponding items in the DATABASE_OBJECTS list.The default retention period is 30 days. To modify that, create and set the
max_db_entry_age_in_days
variable with an integer value that corresponds to the amount of days to retain the logs.The DAG has already ran successfully in the
test-hubble
environment: https://he02f27a661269b05p-tp.appspot.com/dags/cleanup_metadata_dag/gridThis workflow was created mainly following the guide: https://cloud.google.com/composer/docs/cleanup-airflow-database