idealista / airflow-role Goto Github PK
View Code? Open in Web Editor NEWAnsible role to install Apache Airflow
License: Apache License 2.0
Ansible role to install Apache Airflow
License: Apache License 2.0
Need to add it in:
enabled: yes
and state: started
.restart airflow-flower
added.restart airflow-flower
handler in every task other handlers were notified to.it seems the scheduler process is started after deployment.
however, the jobs can't be scheduled to execute.
First, thanks for making this project open source, it's saved me a bunch of time in getting airflow successfully deployed.
I thought I'd note this down since others may run into this issue.
My environment:
airflow-role
using galaxy, with the default configurations---
- hosts: someserver
roles:
- idealista/airflow-role
I had an uninstalled import in one of my dags, which I expected to cause ansible to fail on the task "Airflow | Initializing DB", since it has to import the dags as part of this process. Instead, the exit signal is not received by ansible and the task hangs.
I discovered this by adding this to my task definition:
async: 60
poll: 60
I'm fairly new to ansible, and not suggesting that async is the best way to handle this, but the expected behavior on a bad import is that the ansible task should fail instead of hang.
I'm playing the role with almost the default config.
So I have a /etc/airflow/airflow.cfg with a mysql connection parameter
But during this tasks :
https://github.com/idealista/airflow-role/blob/master/tasks/users.yml#L3-L8
An other airflow config directory is created in the root homedir with an other airflow.cfg.
Thus the following tasks use this airflow.cfg with the default sqllite.
Note that I'm installing Airflow 2.1.0
And that I bypassed the installation tasks with a pre_roles
# See https://airflow.apache.org/docs/apache-airflow/stable/installation.html#installation-tools
- name: Install some dependencies
package:
name: "{{ item }}"
with_items:
- freetds-bin
- krb5-user
- ldap-utils
- libffi6
- libsasl2-2
- libsasl2-modules
- libssl1.1
- locales
- lsb-release
- sasl2-bin
- sqlite3
- unixodbc
# See https://github.com/idealista/airflow-role/blob/master/tasks/install.yml
- name: Airflow | Install pip "{{ airflow_pip_version }}" version
pip:
name: pip
version: "{{ airflow_pip_version }}"
when: airflow_pip_version is defined
- name: Airflow | Install virtualenv
pip:
executable: "{{ airflow_pip_executable }}"
name: virtualenv
- name: Airflow | Check if exists virtualenv
stat:
path: "{{ airflow_app_home }}/pyvenv.cfg"
register: virtualenv_check
- name: Airflow | Set a virtualenv
become: true
become_user: "{{ airflow_user }}"
command: "virtualenv -p python{{ airflow_python_version | default(omit) }} {{ airflow_app_home }}"
when: not virtualenv_check.stat.exists
- name: Airflow | Install airflow
pip:
name: apache-airflow
version: "{{ airflow_version }}"
extra_args: "--constraint '{{ contraint_url }}'"
virtualenv: "{{ airflow_app_home }}"
Some tasks like Airflow | Ensure Airflow group
fail when run without escalated privileges.
Expected behavior: [What you expect to happen]
TASK [airflow : Airflow | Ensure Airflow group] ********************************************************************************************************************************************************************************
ok: [server-name]
Actual behavior: [What actually happens]
TASK [airflow : Airflow | Ensure Airflow group] ********************************************************************************************************************************************************************************
fatal: [1riv-dev-air]: FAILED! => {"changed": false, "msg": "groupadd: Permission denied.\ngroupadd: cannot lock /etc/group; try again later.\n", "name": "airflow"}
Reproduces how often: [What percentage of the time does it reproduce?]
Every time
The version/s you notice the behavior.
idealist.airflow-role 1.7.2
ubuntu bionic 18.04
I resolved this issue by adding become: true
to the imported task, but this should be documented or added to the tasks.
- hosts: airflow
tasks:
- import_role:
name: airflow
become: true
In airflow.cfg
, the variable dagbag_import_timeout
cannot have an empty value when CeleryExecutor is used: it raises a warning invalid literal for int() with base 10: ''
that makes DAGs not to be listed.
The value of this variable is set in the role in defaults/main.yml
with the variable airflow_dagbag_import_time
and templated afterwards in templates/airflow.cfg.j2
. Just setting a numeric value and not leaving it blank fixes this issue.
Due to complications in making Vagrant and Docker tests with Molecule compatible, in the first .travis.yml version we'll use ansible-playbook
commands to test playbook's syntax, first run and idempotence and if the webserver is properly deployed.
Molecule version is so old that it's a bit of nightmare to reproduce tests in local. Upgrade its version.
[Description of the issue]
Just run the playbook for ubuntu 18.04
[What you expect to happen]
I was expecting it to deploy Airflow without any error
Actual behavior:
TASK [deploy_airflow : Airflow | Check Admin user (> 2.0)] *******************************************************************************************
fatal: [dmpServer]: FAILED! => {"changed": false, "cmd": ["/opt/airflow/bin/airflow", "users", "list"], "delta": "0:00:04.620829", "end": "2021-06-22 01:24:22.929163", "msg": "non-zero return code", "rc": 1, "start": "2021-06-22 01:24:18.308334", "stderr": "/opt/airflow/lib/python3.6/site-packages/flask_appbuilder/models/sqla/interface.py:62 SAWarning: relationship 'DagRun.serialized_dag' will copy column serialized_dag.dag_id to column dag_run.dag_id, which conflicts with relationship(s): 'DagRun.task_instances' (copies task_instance.dag_id to dag_run.dag_id), 'TaskInstance.dag_run' (copies task_instance.dag_id to dag_run.dag_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards. To silence this warning, add the parameter 'overlaps="dag_run,task_instances"' to the 'DagRun.serialized_dag' relationship.\n/opt/airflow/lib/python3.6/site-packages/flask_appbuilder/models/sqla/interface.py:62 SAWarning: relationship 'SerializedDagModel.dag_runs' will copy column serialized_dag.dag_id to column dag_run.dag_id, which conflicts with relationship(s): 'DagRun.task_instances' (copies task_instance.dag_id to dag_run.dag_id), 'TaskInstance.dag_run' (copies task_instance.dag_id to dag_run.dag_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards. To silence this warning, add the parameter 'overlaps="dag_run,task_instances"' to the 'SerializedDagModel.dag_runs' relationship.\nTraceback (most recent call last):\n File "/opt/airflow/lib/python3.6/site-packages/connexion/apis/abstract.py", line 209, in add_paths\n self.add_operation(path, method)\n File "/opt/airflow/lib/python3.6/site-packages/connexion/apis/abstract.py", line 173, in add_operation\n pass_context_arg_name=self.pass_context_arg_name\n File "/opt/airflow/lib/python3.6/site-packages/connexion/operations/init.py", line 8, in make_operation\n return spec.operation_cls.from_spec(spec, *args, **kwargs)\n File "/opt/airflow/lib/python3.6/site-packages/connexion/operations/openapi.py", line 138, in from_spec\n **kwargs\n File "/opt/airflow/lib/python3.6/site-packages/connexion/operations/openapi.py", line 89, in init\n pass_context_arg_name=pass_context_arg_name\n File "/opt/airflow/lib/python3.6/site-packages/connexion/operations/abstract.py", line 96, in init\n self._resolution = resolver.resolve(self)\n File "/opt/airflow/lib/python3.6/site-packages/connexion/resolver.py", line 40, in resolve\n return Resolution(self.resolve_function_from_operation_id(operation_id), operation_id)\n File "/opt/airflow/lib/python3.6/site-packages/connexion/resolver.py", line 66, in resolve_function_from_operation_id\n raise ResolverError(str(e), sys.exc_info())\nconnexion.exceptions.ResolverError: <ResolverError: columns>\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/opt/airflow/bin/airflow", line 8, in \n sys.exit(main())\n File "/opt/airflow/lib/python3.6/site-packages/airflow/main.py", line 40, in main\n args.func(args)\n File "/opt/airflow/lib/python3.6/site-packages/airflow/cli/cli_parser.py", line 48, in command\n return func(*args, **kwargs)\n File "/usr/lib/python3.6/contextlib.py", line 52, in inner\n return func(*args, **kwds)\n File "/opt/airflow/lib/python3.6/site-packages/airflow/cli/commands/user_command.py", line 35, in users_list\n appbuilder = cached_app().appbuilder # pylint: disable=no-member\n File "/opt/airflow/lib/python3.6/site-packages/airflow/www/app.py", line 135, in cached_app\n app = create_app(config=config, testing=testing)\n File "/opt/airflow/lib/python3.6/site-packages/airflow/www/app.py", line 120, in create_app\n init_api_connexion(flask_app)\n File "/opt/airflow/lib/python3.6/site-packages/airflow/www/extensions/init_views.py", line 172, in init_api_connexion\n specification='v1.yaml', base_path=base_path, validate_responses=True, strict_validation=True\n File "/opt/airflow/lib/python3.6/site-packages/connexion/apps/flask_app.py", line 57, in add_api\n api = super(FlaskApp, self).add_api(specification, **kwargs)\n File "/opt/airflow/lib/python3.6/site-packages/connexion/apps/abstract.py", line 156, in add_api\n options=api_options.as_dict())\n File "/opt/airflow/lib/python3.6/site-packages/connexion/apis/abstract.py", line 111, in init\n self.add_paths()\n File "/opt/airflow/lib/python3.6/site-packages/connexion/apis/abstract.py", line 216, in add_paths\n self._handle_add_operation_error(path, method, err.exc_info)\n File "/opt/airflow/lib/python3.6/site-packages/connexion/apis/abstract.py", line 231, in _handle_add_operation_error\n raise value.with_traceback(traceback)\n File "/opt/airflow/lib/python3.6/site-packages/connexion/resolver.py", line 61, in resolve_function_from_operation_id\n return self.function_resolver(operation_id)\n File "/opt/airflow/lib/python3.6/site-packages/connexion/utils.py", line 111, in get_function_from_name\n module = importlib.import_module(module_name)\n File "/usr/lib/python3.6/importlib/init.py", line 126, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File "", line 994, in _gcd_import\n File "", line 971, in _find_and_load\n File "", line 955, in _find_and_load_unlocked\n File "", line 665, in _load_unlocked\n File "", line 678, in exec_module\n File "", line 219, in _call_with_frames_removed\n File "/opt/airflow/lib/python3.6/site-packages/airflow/api_connexion/endpoints/connection_endpoint.py", line 26, in \n from airflow.api_connexion.schemas.connection_schema import (\n File "/opt/airflow/lib/python3.6/site-packages/airflow/api_connexion/schemas/connection_schema.py", line 42, in \n class ConnectionSchema(ConnectionCollectionItemSchema): # pylint: disable=too-many-ancestors\n File "/opt/airflow/lib/python3.6/site-packages/marshmallow/schema.py", line 117, in new\n dict_cls=dict_cls,\n File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 94, in get_declared_fields\n fields.update(mcs.get_auto_fields(fields, converter, opts, dict_cls))\n File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 108, in get_auto_fields\n for field_name, field in fields.items()\n File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 110, in \n and field_name not in opts.exclude\n File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 28, in create_field\n return converter.field_for(model, column_name, **self.field_kwargs)\n File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/convert.py", line 171, in field_for\n return self.property2field(prop, **kwargs)\n File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/convert.py", line 146, in property2field\n field_class = field_class or self._get_field_class_for_property(prop)\n File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/convert.py", line 210, in _get_field_class_for_property\n column = prop.columns[0]\n File "/opt/airflow/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 1240, in getattr\n return self._fallback_getattr(key)\n File "/opt/airflow/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 1214, in _fallback_getattr\n raise AttributeError(key)\nAttributeError: columns", "stderr_lines": ["/opt/airflow/lib/python3.6/site-packages/flask_appbuilder/models/sqla/interface.py:62 SAWarning: relationship 'DagRun.serialized_dag' will copy column serialized_dag.dag_id to column dag_run.dag_id, which conflicts with relationship(s): 'DagRun.task_instances' (copies task_instance.dag_id to dag_run.dag_id), 'TaskInstance.dag_run' (copies task_instance.dag_id to dag_run.dag_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards. To silence this warning, add the parameter 'overlaps="dag_run,task_instances"' to the 'DagRun.serialized_dag' relationship.", "/opt/airflow/lib/python3.6/site-packages/flask_appbuilder/models/sqla/interface.py:62 SAWarning: relationship 'SerializedDagModel.dag_runs' will copy column serialized_dag.dag_id to column dag_run.dag_id, which conflicts with relationship(s): 'DagRun.task_instances' (copies task_instance.dag_id to dag_run.dag_id), 'TaskInstance.dag_run' (copies task_instance.dag_id to dag_run.dag_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards. To silence this warning, add the parameter 'overlaps="dag_run,task_instances"' to the 'SerializedDagModel.dag_runs' relationship.", "Traceback (most recent call last):", " File "/opt/airflow/lib/python3.6/site-packages/connexion/apis/abstract.py", line 209, in add_paths", " self.add_operation(path, method)", " File "/opt/airflow/lib/python3.6/site-packages/connexion/apis/abstract.py", line 173, in add_operation", " pass_context_arg_name=self.pass_context_arg_name", " File "/opt/airflow/lib/python3.6/site-packages/connexion/operations/init.py", line 8, in make_operation", " return spec.operation_cls.from_spec(spec, *args, **kwargs)", " File "/opt/airflow/lib/python3.6/site-packages/connexion/operations/openapi.py", line 138, in from_spec", " **kwargs", " File "/opt/airflow/lib/python3.6/site-packages/connexion/operations/openapi.py", line 89, in init", " pass_context_arg_name=pass_context_arg_name", " File "/opt/airflow/lib/python3.6/site-packages/connexion/operations/abstract.py", line 96, in init", " self._resolution = resolver.resolve(self)", " File "/opt/airflow/lib/python3.6/site-packages/connexion/resolver.py", line 40, in resolve", " return Resolution(self.resolve_function_from_operation_id(operation_id), operation_id)", " File "/opt/airflow/lib/python3.6/site-packages/connexion/resolver.py", line 66, in resolve_function_from_operation_id", " raise ResolverError(str(e), sys.exc_info())", "connexion.exceptions.ResolverError: <ResolverError: columns>", "", "During handling of the above exception, another exception occurred:", "", "Traceback (most recent call last):", " File "/opt/airflow/bin/airflow", line 8, in ", " sys.exit(main())", " File "/opt/airflow/lib/python3.6/site-packages/airflow/main.py", line 40, in main", " args.func(args)", " File "/opt/airflow/lib/python3.6/site-packages/airflow/cli/cli_parser.py", line 48, in command", " return func(*args, **kwargs)", " File "/usr/lib/python3.6/contextlib.py", line 52, in inner", " return func(*args, **kwds)", " File "/opt/airflow/lib/python3.6/site-packages/airflow/cli/commands/user_command.py", line 35, in users_list", " appbuilder = cached_app().appbuilder # pylint: disable=no-member", " File "/opt/airflow/lib/python3.6/site-packages/airflow/www/app.py", line 135, in cached_app", " app = create_app(config=config, testing=testing)", " File "/opt/airflow/lib/python3.6/site-packages/airflow/www/app.py", line 120, in create_app", " init_api_connexion(flask_app)", " File "/opt/airflow/lib/python3.6/site-packages/airflow/www/extensions/init_views.py", line 172, in init_api_connexion", " specification='v1.yaml', base_path=base_path, validate_responses=True, strict_validation=True", " File "/opt/airflow/lib/python3.6/site-packages/connexion/apps/flask_app.py", line 57, in add_api", " api = super(FlaskApp, self).add_api(specification, **kwargs)", " File "/opt/airflow/lib/python3.6/site-packages/connexion/apps/abstract.py", line 156, in add_api", " options=api_options.as_dict())", " File "/opt/airflow/lib/python3.6/site-packages/connexion/apis/abstract.py", line 111, in init", " self.add_paths()", " File "/opt/airflow/lib/python3.6/site-packages/connexion/apis/abstract.py", line 216, in add_paths", " self._handle_add_operation_error(path, method, err.exc_info)", " File "/opt/airflow/lib/python3.6/site-packages/connexion/apis/abstract.py", line 231, in _handle_add_operation_error", " raise value.with_traceback(traceback)", " File "/opt/airflow/lib/python3.6/site-packages/connexion/resolver.py", line 61, in resolve_function_from_operation_id", " return self.function_resolver(operation_id)", " File "/opt/airflow/lib/python3.6/site-packages/connexion/utils.py", line 111, in get_function_from_name", " module = importlib.import_module(module_name)", " File "/usr/lib/python3.6/importlib/init.py", line 126, in import_module", " return _bootstrap._gcd_import(name[level:], package, level)", " File "", line 994, in _gcd_import", " File "", line 971, in _find_and_load", " File "", line 955, in _find_and_load_unlocked", " File "", line 665, in _load_unlocked", " File "", line 678, in exec_module", " File "", line 219, in _call_with_frames_removed", " File "/opt/airflow/lib/python3.6/site-packages/airflow/api_connexion/endpoints/connection_endpoint.py", line 26, in ", " from airflow.api_connexion.schemas.connection_schema import (", " File "/opt/airflow/lib/python3.6/site-packages/airflow/api_connexion/schemas/connection_schema.py", line 42, in ", " class ConnectionSchema(ConnectionCollectionItemSchema): # pylint: disable=too-many-ancestors", " File "/opt/airflow/lib/python3.6/site-packages/marshmallow/schema.py", line 117, in new", " dict_cls=dict_cls,", " File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 94, in get_declared_fields", " fields.update(mcs.get_auto_fields(fields, converter, opts, dict_cls))", " File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 108, in get_auto_fields", " for field_name, field in fields.items()", " File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 110, in ", " and field_name not in opts.exclude", " File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 28, in create_field", " return converter.field_for(model, column_name, **self.field_kwargs)", " File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/convert.py", line 171, in field_for", " return self.property2field(prop, **kwargs)", " File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/convert.py", line 146, in property2field", " field_class = field_class or self._get_field_class_for_property(prop)", " File "/opt/airflow/lib/python3.6/site-packages/marshmallow_sqlalchemy/convert.py", line 210, in _get_field_class_for_property", " column = prop.columns[0]", " File "/opt/airflow/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 1240, in getattr", " return self._fallback_getattr(key)", " File "/opt/airflow/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 1214, in _fallback_getattr", " raise AttributeError(key)", "AttributeError: columns"], "stdout": "", "stdout_lines": []}
Reproduces how often:
[What percentage of the time does it reproduce?]
100%
Ubuntu 18.04
[Any additional information, configuration or data that might be necessary to reproduce the issue.]
In my machine I already have the MySQL and Tomcat installed and now I want to install Airflow onto it.
In the new versions of Airflow, airflow.cfg
has more fields (supporting Kubernets, for instance). We should add support to these new features
Hello!
I'm using a part of your great project!
I have noticed a potential issue inside the template located in templates/gunicorn-logrotate.j2.
On the master branch, it's currently written in this way:
{{ airflow_logs_folder }}/gunicorn-*.log {
daily
missingok
rotate 7
size 500M
compress
notifempty
create 644 {{ airflow_user }} {{ airflow_group }}
sharedscripts
postrotate
[ -f {{ airflow_pidfile_folder }}-webserver/webserver.pid ] && kill -USR1 `cat {{ airflow_pidfile_folder }}-webserver/webserver.pid`
endscript
}
My issue concerns this code piece at the line before the last
kill -USR1 `cat {{ airflow_pidfile_folder }}/webserver.pid`
I think it needs an "-webserver" right after "{{ airflow_pidfile_folder }}" as it is written at the beginning of the line. Otherwise the logrotate script will fail.
Thanks for your time!
PS: I tried to push a PR but PR access seems to be forbidden for non-contributors
Add support to install airflow 2.0 version
Its a mayor release with new features
Added properly airflow.cfg for this version
Due to the scheduler is not constantly up & running and reboots itself every five seconds, Service(airflow_service).is_running
assertion fails depending on the moment it is checked.
To fix this, we are going to add the @retry
decorator from retrying
Python module and make test_airflow_services
check if the services are running during a maximum of 5 seconds:
from retrying import retry
@retry(stop_max_delay=5000)
def test_airflow_services(Service, AnsibleDefaults):
airflow_services = AnsibleDefaults["airflow_services"]
for airflow_service in airflow_services:
if airflow_services[airflow_service]["enabled"]:
assert Service(airflow_service).is_enabled
assert Service(airflow_service).is_running
I
I'm getting this error on ubuntu 1804 :
At TASK [airflow-role : Airflow | Installing Airflow]
TASK [airflow-role : Airflow | Installing Airflow] ****************************************************************************************************************************************************************
fatal: [192.168.33.13]: FAILED! => {"changed": false, "cmd": ["/usr/bin/pip", "install", "--no-cache-dir", "apache-airflow==1.10.2"], "msg": "stdout: Collecting apache-airflow==1.10.2\n\n:stderr: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5beffb2f10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/apache-airflow/\n Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5beffb2b10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/apache-airflow/\n Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5beffb2d50>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/apache-airflow/\n Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5bf0a655d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/apache-airflow/\n Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5bf0a65510>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/apache-airflow/\nException:\nTraceback (most recent call last):\n File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 215, in main\n status = self.run(options, args)\n File "/usr/lib/python2.7/dist-packages/pip/commands/install.py", line 342, in run\n requirement_set.prepare_files(finder)\n File "/usr/lib/python2.7/dist-packages/pip/req/req_set.py", line 380, in prepare_files\n ignore_dependencies=self.ignore_dependencies))\n File "/usr/lib/python2.7/dist-packages/pip/req/req_set.py", line 554, in _prepare_file\n require_hashes\n File "/usr/lib/python2.7/dist-packages/pip/req/req_install.py", line 278, in populate_link\n self.link = finder.find_requirement(self, upgrade)\n File "/usr/lib/python2.7/dist-packages/pip/index.py", line 465, in find_requirement\n all_candidates = self.find_all_candidates(req.name)\n File "/usr/lib/python2.7/dist-packages/pip/index.py", line 423, in find_all_candidates\n for page in self._get_pages(url_locations, project_name):\n File "/usr/lib/python2.7/dist-packages/pip/index.py", line 568, in _get_pages\n page = self._get_page(location)\n File "/usr/lib/python2.7/dist-packages/pip/index.py", line 683, in _get_page\n return HTMLPage.get_page(link, session=self.session)\n File "/usr/lib/python2.7/dist-packages/pip/index.py", line 792, in get_page\n "Cache-Control": "max-age=600",\n File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/sessions.py", line 533, in get\n return self.request('GET', url, **kwargs)\n File "/usr/lib/python2.7/dist-packages/pip/download.py", line 386, in request\n return super(PipSession, self).request(method, url, *args, **kwargs)\n File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/sessions.py", line 520, in request\n resp = self.send(prep, **send_kwargs)\n File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/sessions.py", line 630, in send\n r = adapter.send(request, **kwargs)\n File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/adapters.py", line 508, in send\n raise ConnectionError(e, request=request)\nConnectionError: HTTPSConnectionPool(host='pypi.python.org', port=443): Max retries exceeded with url: /simple/apache-airflow/ (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f5bf0a65710>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))\n"}
The PATH var set in the environment
file for the service isn't expanding de expression $PATH
Expected behavior:
The command of the task should be executed without errors
Actual behavior:
The BashOperator and maybe other operators are broken
Reproduces how often:
Always
When a DAG dependence is set on dags_dependencies
after installing it the task notify to an unknown (typo) restart handler
Handler is e.g: "restart airflow-webserver" and the task has "restart airflow_webserver"
Expected behavior:
Handler restart the services normally
Actual behavior:
Handler misses the restart of the services because of a typo
Reproduces how often:
When a DAG dependence is set
Can't replace template service files from playbook always get from role,
Expected behavior: When have templates in playbook and you specify in variable the role must get template from path you give.
Actual behavior: Gets templates from role
Reproduces how often: 100%
1.8.1
Any additional information, configuration or data that might be necessary to reproduce the issue.
There are a few small bugs in the installation playbook:
block
breaks a little bit the playbook "style", so the condition will be from now on checked individually in both tasks.Airflow | Installing dependencies
is buggy using Debian Docker image due to they come with no apt update
done so the packages list is empty. update_cache: yes
is added in order to prevent this from happening and to avoid older versions of packages.Airflow | Installing Airflow Extra Packages
will fail if airflow_extra_packages
variable is empty, so this task should only be run in case this variable is not empty.[Description of the issue]
Private_tmp option is set to true in service, is desirable to be an optional parameter
Expected behavior: [What you expect to happen]
We could choose true or false private_tmp in service
Actual behavior: [What actually happens]
Is setted to true and cant change it
Reproduces how often: [What percentage of the time does it reproduce?]
The version/s you notice the behavior.
Any additional information, configuration or data that might be necessary to reproduce the issue.
ansible playbook installs to opt/airflow/airflow.db
airflow process launched with systemd uses /opt/airflow/airflow.db
ansible/create_user uses /var/lib/airflow/airflow/airflow.db
opt/bin/airflow
also uses /var/lib/airflow/airflow/airflow.db
Just looking for ideas on how to solve this.
I changed a few settings in the webserver.service
and Env variable files to make airflow work on ubuntu server LTS 20.04
But this part was broken before that also.
sudo -u airflow bash -c "/opt/airflow/bin/airflow config get-value core sql_alchemy_conn"
sqlite:////var/lib/airflow/airflow/airflow.db
sudo -u airflow bash -c "export AIRFLOW_CONFIG=/etc/airflow/airflow.cfg; /opt/airflow/bin/airflow config get-value core sql_alchemy_conn"
sqlite:////opt/airflow/airflow.db
If need install from a different source repository that not is pypi I can't
Add extra_args field in pip installation
[Any additional information, configuration or data.]
Hello there,
This is more like a discussion than a support request but I did not know where to ask it.
In your role, you set SCHEDULER_RUNS to 1000, which in fact corresponds to airflow_scheduler_num_runs, which is the total number of runs a scheduler does before shutting down.
I saw in stackoverflow this number could be set to 5 in some docker/kubernetes environment. I also read this parameter should be set to -1 in somes cases which implies the scheduler runs undefinitely and this behavior should be the norm (link here).
So I would like to know your point of view on this matter and what is the reason you set this parameter to 1000 ?
Based on http://molecule.readthedocs.io/en/stable-1.22/usage.html#travis-ci, we are gonna try testing our role in Travis with Molecule.
Some dependencies aren't up to date, updated
Expected behavior: Install last version airflow with pip3 and his constraints
Actual behavior: Install deprecated airflow version by default with pip
Reproduces how often: 100%
1.8.0 (latest)
In airflow.cfg
, data_profiler_filter
and superuser_filter
cannot be left blank. We can't default these values, so the lines should be added just in case they exist. So we have to add two if
conditionals in airflow.cfg.j2
and add these lines (separatedly) just in case airflow_ldap_superuser_filter
or airflow_ldap_data_profiler_filter
exist and are not empty, respectively.
When LDAP is enabled in Airflow, it crashes with the same output as this question in Stack Overflow. The problem is with the pip package pyasn1
, that happens to be in a outdated version. Upgrading it solves the problem.
airflow_version: 1.10.0
airflow_webserver_authenticate: True
airflow_webserver_auth_backend: airflow.contrib.auth.backends.ldap_auth
Expected behavior: Airflow web UI working.
Actual behavior: Airflow web UI not working.
Reproduces how often: Always
Since 1.7.0 (first compatible with Airflow 1.10.0)
Hi, idelista you all.
Thanks for this repository. It helped my work.
However I have one point of concern about the systemd service file.
There is "-n" option in airflow-scheduler.service which set the number of scheduler loops.
Now it seems not to be needed. (cf. apache/airflow#19219)
So I suggest that this setting could be removed or set "-1" as default value.
At the moment, when the workers are restarted, they do so the work they're performing is interrupted. Following this Stackoverflow answer provided by @juanriaza, the workers can be gracefully restarted by sending them SIGINT
kill signal.
When specifying more than 1 airflow extra package, the task Airflow | Installing Airflow Extra Packages
fails with error:
(item=['apache-airflow[celery]', 'apache-airflow[postgres]']) => {"changed": false, "item": ["apache-airflow[celery]", "apache-airflow[postgres]"], "msg": "'version' argument is ambiguous when installing multiple package distributions. Please specify version restrictions next to each package in 'name' argument."}
airflow_version: 1.10.0
airflow_extra_packages: [celery,postgres]
Expected behavior: deployment to work
Actual behavior: deployment fails
Reproduces how often: Everytime
1.7.2
This also generate a warning:
[DEPRECATION WARNING]: Invoking "pip" only once while using a loop via squash_actions is deprecated. Instead of using a loop to supply multiple items and specifying `name: "apache-airflow[{{ item }}]"`, please use `name: '{{
airflow_extra_packages }}'` and remove the loop. This feature will be removed in version 2.11.
We need to modify these files:
celery_version
: Due to some incompatibilities between Airflow 1.8.x and Celery 4.x, it is interesting to be able to specify the Celery version we want to install.celery_extra_packages
: Dictionary where we can specify if we want to install, for instance, the Redis Celery package.airflow_executor
variable is CeleryExecutor
.To guarantee support and improve performance of actual and future versions of airflow I propose add python 3.8 and use virtualenv to prevent modify system python.
Expected behavior: Airflow will run with python 3.8 and virtualenv
Actual behavior: Airflow runs with system python version.
1.8.0
Found some problems when you want to install only some services and not all.
Try to configure a node without some service like airflow-worker
Expected behavior:
Runs smoothly
Actual behavior:
Undesired service is up and running
Reproduces how often:
Always
Setting airflow_version
to 1.10.0 and launching the playbook fails in the "Installing Airflow" task with the following output:
fatal: [airflow]: FAILED! => {"changed": false, "cmd": "/usr/bin/pip install --no-cache-dir apache-airflow==1.10.0", "msg": "stdout: Collecting apache-airflow==1.10.0\n Downloading https://files.pythonhosted.org/packages/da/2a/6e9efcd40193850e2f636c7306eede2ff5607aa9f81ff9f7a151d9b13ff8/apache-airflow-1.10.0.tar.gz (4.3MB)\n Complete output from command python setup.py egg_info:\n Traceback (most recent call last):\n File \"<string>\", line 1, in <module>\n File \"/root/pip-build-u203T5/apache-airflow/setup.py\", line 393, in <module>\n do_setup()\n File \"/root/pip-build-u203T5/apache-airflow/setup.py\", line 258, in do_setup\n verify_gpl_dependency()\n File \"/root/pip-build-u203T5/apache-airflow/setup.py\", line 49, in verify_gpl_dependency\n raise RuntimeError(\"By default one of Airflow's dependencies installs a GPL \"\n RuntimeError: By default one of Airflow's dependencies installs a GPL dependency (unidecode). To avoid this dependency set SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when you install or upgrade Airflow. To force installing the GPL version set AIRFLOW_GPL_UNIDECODE\n \n ----------------------------------------\n\n:stderr: Command \"python setup.py egg_info\" failed with error code 1 in /root/pip-build-u203T5/apache-airflow/\n"}
A colleague asked for an automatic way to import DAGs and plugins from a Git repo every 5 minutes via cron job. Several changes were made to do so:
defaults/main.yml
:
airflow_required_libs
dags_dependencies
dictionary. Is meant to be the Python dependencies demanded by the DAGs.{}
. If set, should follow the example:
scrapinghub:
version: 2.0.1
dags_repository
dictionary. Is meant to be the Git repositories containing DAGs or plugins that we want to check.{}
. If set, should follow the example:
dags:
src: https://github.com/apache/incubator-airflow/
repo_subfolder: airflow/example_dags
host_subfolder: "{{ airflow_dags_folder }}"
tasks/install.yml
:
tasks/config.yml
:
downloading role from https://github.com/idealista/airflow-role/archive/1.3.1.tar.gz
[ERROR]: failed to download the file: HTTP Error 404: Not Found
Unable to download any version of airflow-role.
$ ansible --version
ansible 2.4.0.0
config file = /media/asg-airflow-terraform/ansible/ansible.cfg
configured module search path = [u'/home/osboxes/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
executable location = /usr/local/bin/ansible
python version = 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]
The build and the default scenario of molecule is broken due to psycopg2 package.
Expected behavior:
Install psycopg2 in scenario without problems
Actual behavior:
Installation of psycopg2 broken
Reproduces how often:
Always
Using the psycopg2-binary fix the problem
In order to use Travis and Molecule, it is mandatory to use Docker: Travis doesn't support Vagrant. The problem was that the tests use Ansible module and it needs an Ansible backend connection and, when Docker driver is specified, Molecule uses Testinfra with the argument --connection=docker
, making the tests fail, as they don't find any host.
To make it work, we just have to force Testinfra to use the same arguments as the Vagrant driver. These arguments are: --connection=ansible --ansible-inventory=.molecule/ansible_inventory
, so in molecule.yml
, we specify these arguments in verifier
section:
verifier:
name: testinfra
options:
connection: ansible
ansible-inventory: .molecule/ansible_inventory
Docker driver configuration is a bit tricky itself: as our role is using systemd
, default image configuration won't work. To make Docker container use systemd
, we have to add this in the container configuration in molecule.yml
:
privileged: True
cap_add:
- SYS_ADMIN
volume_mounts:
- '/sys/fs/cgroup:/sys/fs/cgroup:ro'
command: '/lib/systemd/systemd'
Finally, looks like default Debian images come with Python 2.7.9 and it makes pip crack somehow after Airflow installation, so we have to use Python images. Knowing all of this, the Docker section in molecule.yml
states like this:
docker:
containers:
- name: airflow.vm
ansible_groups:
- airflow
image: python
image_version: 2.7.13-jessie
port_bindings:
80: 80
8080: 8080
5555: 5555
privileged: True
cap_add:
- SYS_ADMIN
volume_mounts:
- '/sys/fs/cgroup:/sys/fs/cgroup:ro'
command: '/lib/systemd/systemd'
The Log view of tasks by default are incorrectly configured so it doesn't show any log
Expected behavior:
Show the DAG task execution log as normal
Actual behavior:
The log is missing, wrong configured or failed
Reproduces how often:
Always with default airflow role options
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.