Giter Club home page Giter Club logo

dataengineeringproject's Introduction

Data Engineering Project

Build Status Coverage Status Python 3.8

Data Engineering Project is an implementation of the data pipeline which consumes the latest news from RSS Feeds and makes them available for users via handy API. The pipeline infrastructure is built using popular, open-source projects.

Access the latest news and headlines in one place. 💪

Table of Contents

Architecture diagram

MVP Architecture

How it works

Data Scraping

Airflow DAG is responsible for the execution of Python scraping modules. It runs periodically every X minutes producing micro-batches.

  • First task updates proxypool. Using proxies in combination with rotating user agents can help get scrapers past most of the anti-scraping measures and prevent being detected as a scraper.

  • Second task extracts news from RSS feeds provided in the configuration file, validates the quality and sends data into Kafka topic A. The extraction process is using validated proxies from proxypool.

Data flow

  • Kafka Connect Mongo Sink consumes data from Kafka topic A and stores news in MongoDB using upsert functionality based on _id field.
  • Debezium MongoDB Source tracks a MongoDB replica set for document changes in databases and collections, recording those changes as events in Kafka topic B.
  • Kafka Connect Elasticsearch Sink consumes data from Kafka topic B and upserts news in Elasticsearch. Data replicated between topics A and B ensures MongoDB and ElasticSearch synchronization. Command Query Responsibility Segregation (CQRS) pattern allows the use of separate models for updating and reading information.
  • Kafka Connect S3-Minio Sink consumes records from Kafka topic B and stores them in MinIO (high-performance object storage) to ensure data persistency.

Data access

  • Data gathered by previous steps can be easily accessed in API service using public endpoints.

Prerequisites

Software required to run the project. Install:

Running project

Script manage.sh - wrapper for docker-compose works as a managing tool.

  • Build project infrastructure
./manage.sh up
  • Stop project infrastructure
./manage.sh stop
  • Delete project infrastructure
./manage.sh down

Testing

Script run_tests.sh executes unit tests against Airflow scraping modules and Django Rest Framework applications.

./run_tests.sh

API service

Read detailed documentation on how to interact with data collected by pipeline using search endpoints.

Example searches:

  • see all news
http://127.0.0.1:5000/api/v1/news/ 
  • add search_fields title and description, see all of the news containing the Robert Lewandowski phrase
http://127.0.0.1:5000/api/v1/news/?search=Robert%20Lewandowski 
  • find news containing the Lewandowski phrase in their titles
http://127.0.0.1:5000/api/v1/news/?search=title|Lewandowski 
  • see all of the polish news containing the Lewandowski phrase
http://127.0.0.1:5000/api/v1/news/?search=lewandowski&language=pl

References

Inspired by following codes, articles and videos:

Contributions

Contributions are what makes the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Please feel free to contact me if you have any questions. Damian Kliś @DamianKlis

dataengineeringproject's People

Contributors

damklis avatar dependabot[bot] avatar szczeles avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataengineeringproject's Issues

Token Key Attribute

Hi,
I have successfully generated the Token for myself. The next step is to include token in header of request. What is the key attribute for token header?

Improve proxy healthcheck

Currently, is there is no valid proxy in 5 tries on any exporter, the DAG fails. The usual cause of errors is proxy health issue (event after 5 retries), like in the below logs:

*** Reading local file: /usr/local/airflow/logs/rss_news_dag/exporting_101greatgoals_news_to_broker/2020-10-11T08:10:00+00:00/1.log
[2020-10-11 08:21:15,995] {{taskinstance.py:655}} INFO - Dependencies all met for <TaskInstance: rss_news_dag.exporting_101greatgoals_news_to_broker 2020-10-11T08:10:00+00:00 [queued]>
[2020-10-11 08:21:16,029] {{taskinstance.py:655}} INFO - Dependencies all met for <TaskInstance: rss_news_dag.exporting_101greatgoals_news_to_broker 2020-10-11T08:10:00+00:00 [queued]>
[2020-10-11 08:21:16,029] {{taskinstance.py:866}} INFO - 
--------------------------------------------------------------------------------
[2020-10-11 08:21:16,029] {{taskinstance.py:867}} INFO - Starting attempt 1 of 1
[2020-10-11 08:21:16,029] {{taskinstance.py:868}} INFO - 
--------------------------------------------------------------------------------
[2020-10-11 08:21:16,053] {{taskinstance.py:887}} INFO - Executing <Task(PythonOperator): exporting_101greatgoals_news_to_broker> on 2020-10-11T08:10:00+00:00
[2020-10-11 08:21:16,055] {{standard_task_runner.py:53}} INFO - Started process 2164 to run task
[2020-10-11 08:21:16,162] {{logging_mixin.py:112}} INFO - Running %s on host %s <TaskInstance: rss_news_dag.exporting_101greatgoals_news_to_broker 2020-10-11T08:10:00+00:00 [running]> cfc5513180c6
[2020-10-11 08:21:16,195] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:16,195] {{retry_on_exception.py:14}} INFO - Retries: 5
[2020-10-11 08:21:16,201] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:16,200] {{conn.py:378}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('172.19.0.10', 9092)]>: connecting to kafka:9092 [('172.19.0.10', 9092) IPv4]
[2020-10-11 08:21:16,201] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:16,201] {{conn.py:1195}} INFO - Probing node bootstrap-0 broker version
[2020-10-11 08:21:16,202] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:16,202] {{conn.py:407}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('172.19.0.10', 9092)]>: Connection complete.
[2020-10-11 08:21:16,307] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:16,307] {{conn.py:1257}} INFO - Broker version identified as 1.0.0
[2020-10-11 08:21:16,307] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:16,307] {{conn.py:1259}} INFO - Set configuration api_version=(1, 0, 0) to skip auto check_version requests on startup
[2020-10-11 08:21:16,366] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:16,366] {{main.py:20}} INFO - {'http': 'http://181.129.70.82:46752', 'https': 'http://181.129.70.82:46752'}
[2020-10-11 08:21:46,395] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,394] {{web_parser.py:34}} INFO - Error occurred: HTTPSConnectionPool(host='www.101greatgoals.com', port=443): Max retries exceeded with url: /feed/ (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7373c37190>, 'Connection to 181.129.70.82 timed out. (connect timeout=30)'))
[2020-10-11 08:21:46,395] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,395] {{kafka.py:471}} INFO - Closing the Kafka producer with 9223372036.0 secs timeout.
[2020-10-11 08:21:46,396] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,396] {{conn.py:916}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connected> [IPv4 ('172.19.0.10', 9092)]>: Closing connection. 
[2020-10-11 08:21:46,397] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,396] {{retry_on_exception.py:20}} INFO - Error occured: Not a valid XML document
[2020-10-11 08:21:46,397] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,397] {{retry_on_exception.py:29}} INFO - Retries: 4
[2020-10-11 08:21:46,399] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,399] {{conn.py:378}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('172.19.0.10', 9092)]>: connecting to kafka:9092 [('172.19.0.10', 9092) IPv4]
[2020-10-11 08:21:46,400] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,399] {{conn.py:1195}} INFO - Probing node bootstrap-0 broker version
[2020-10-11 08:21:46,400] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,400] {{conn.py:407}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('172.19.0.10', 9092)]>: Connection complete.
[2020-10-11 08:21:46,505] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,505] {{conn.py:1257}} INFO - Broker version identified as 1.0.0
[2020-10-11 08:21:46,505] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,505] {{conn.py:1259}} INFO - Set configuration api_version=(1, 0, 0) to skip auto check_version requests on startup
[2020-10-11 08:21:46,513] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,513] {{main.py:20}} INFO - {'http': 'http://185.74.4.47:8080', 'https': 'http://185.74.4.47:8080'}
[2020-10-11 08:21:46,743] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,743] {{web_parser.py:34}} INFO - Error occurred: HTTPSConnectionPool(host='www.101greatgoals.com', port=443): Max retries exceeded with url: /feed/ (Caused by ProxyError('Cannot connect to proxy.', ConnectionResetError(104, 'Connection reset by peer')))
[2020-10-11 08:21:46,744] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,743] {{kafka.py:471}} INFO - Closing the Kafka producer with 9223372036.0 secs timeout.
[2020-10-11 08:21:46,744] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,744] {{conn.py:916}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connected> [IPv4 ('172.19.0.10', 9092)]>: Closing connection. 
[2020-10-11 08:21:46,745] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,745] {{retry_on_exception.py:20}} INFO - Error occured: Not a valid XML document
[2020-10-11 08:21:46,745] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,745] {{retry_on_exception.py:29}} INFO - Retries: 3
[2020-10-11 08:21:46,748] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,748] {{conn.py:378}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('172.19.0.10', 9092)]>: connecting to kafka:9092 [('172.19.0.10', 9092) IPv4]
[2020-10-11 08:21:46,748] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,748] {{conn.py:1195}} INFO - Probing node bootstrap-0 broker version
[2020-10-11 08:21:46,749] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,749] {{conn.py:407}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('172.19.0.10', 9092)]>: Connection complete.
[2020-10-11 08:21:46,854] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,854] {{conn.py:1257}} INFO - Broker version identified as 1.0.0
[2020-10-11 08:21:46,854] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,854] {{conn.py:1259}} INFO - Set configuration api_version=(1, 0, 0) to skip auto check_version requests on startup
[2020-10-11 08:21:46,856] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,856] {{kafka.py:461}} INFO - Kafka producer closed
[2020-10-11 08:21:46,859] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:46,858] {{main.py:20}} INFO - {'http': 'http://165.22.36.75:8888', 'https': 'http://165.22.36.75:8888'}
[2020-10-11 08:21:47,811] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,811] {{web_parser.py:32}} INFO - Bad response
[2020-10-11 08:21:47,812] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,812] {{kafka.py:471}} INFO - Closing the Kafka producer with 9223372036.0 secs timeout.
[2020-10-11 08:21:47,812] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,812] {{conn.py:916}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connected> [IPv4 ('172.19.0.10', 9092)]>: Closing connection. 
[2020-10-11 08:21:47,813] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,813] {{retry_on_exception.py:20}} INFO - Error occured: Not a valid XML document
[2020-10-11 08:21:47,813] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,813] {{retry_on_exception.py:29}} INFO - Retries: 2
[2020-10-11 08:21:47,816] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,816] {{conn.py:378}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('172.19.0.10', 9092)]>: connecting to kafka:9092 [('172.19.0.10', 9092) IPv4]
[2020-10-11 08:21:47,817] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,816] {{conn.py:1195}} INFO - Probing node bootstrap-0 broker version
[2020-10-11 08:21:47,817] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,817] {{conn.py:407}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('172.19.0.10', 9092)]>: Connection complete.
[2020-10-11 08:21:47,923] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,923] {{conn.py:1257}} INFO - Broker version identified as 1.0.0
[2020-10-11 08:21:47,923] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,923] {{conn.py:1259}} INFO - Set configuration api_version=(1, 0, 0) to skip auto check_version requests on startup
[2020-10-11 08:21:47,927] {{logging_mixin.py:112}} INFO - [2020-10-11 08:21:47,927] {{main.py:20}} INFO - {'http': 'http://139.5.71.199:8080', 'https': 'http://139.5.71.199:8080'}
[2020-10-11 08:22:17,949] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:17,949] {{web_parser.py:34}} INFO - Error occurred: HTTPSConnectionPool(host='www.101greatgoals.com', port=443): Max retries exceeded with url: /feed/ (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7373a8ae50>, 'Connection to 139.5.71.199 timed out. (connect timeout=30)'))
[2020-10-11 08:22:17,950] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:17,950] {{kafka.py:471}} INFO - Closing the Kafka producer with 9223372036.0 secs timeout.
[2020-10-11 08:22:17,951] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:17,950] {{conn.py:916}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connected> [IPv4 ('172.19.0.10', 9092)]>: Closing connection. 
[2020-10-11 08:22:17,951] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:17,951] {{retry_on_exception.py:20}} INFO - Error occured: Not a valid XML document
[2020-10-11 08:22:17,952] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:17,951] {{retry_on_exception.py:29}} INFO - Retries: 1
[2020-10-11 08:22:17,954] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:17,954] {{conn.py:378}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('172.19.0.10', 9092)]>: connecting to kafka:9092 [('172.19.0.10', 9092) IPv4]
[2020-10-11 08:22:17,955] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:17,955] {{conn.py:1195}} INFO - Probing node bootstrap-0 broker version
[2020-10-11 08:22:17,956] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:17,955] {{conn.py:407}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('172.19.0.10', 9092)]>: Connection complete.
[2020-10-11 08:22:18,061] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:18,060] {{conn.py:1257}} INFO - Broker version identified as 1.0.0
[2020-10-11 08:22:18,061] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:18,061] {{conn.py:1259}} INFO - Set configuration api_version=(1, 0, 0) to skip auto check_version requests on startup
[2020-10-11 08:22:18,065] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:18,065] {{main.py:20}} INFO - {'http': 'http://185.74.4.47:8080', 'https': 'http://185.74.4.47:8080'}
[2020-10-11 08:22:18,302] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:18,302] {{web_parser.py:34}} INFO - Error occurred: HTTPSConnectionPool(host='www.101greatgoals.com', port=443): Max retries exceeded with url: /feed/ (Caused by ProxyError('Cannot connect to proxy.', ConnectionResetError(104, 'Connection reset by peer')))
[2020-10-11 08:22:18,302] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:18,302] {{kafka.py:471}} INFO - Closing the Kafka producer with 9223372036.0 secs timeout.
[2020-10-11 08:22:18,303] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:18,303] {{conn.py:916}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connected> [IPv4 ('172.19.0.10', 9092)]>: Closing connection. 
[2020-10-11 08:22:18,304] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:18,304] {{retry_on_exception.py:20}} INFO - Error occured: Not a valid XML document
[2020-10-11 08:22:18,304] {{taskinstance.py:1128}} ERROR - Not a valid XML document
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/xml/etree/ElementTree.py", line 1637, in close
    self.parser.Parse("", 1) # end of data
xml.parsers.expat.ExpatError: no element found: line 1, column 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/atoma/utils.py", line 33, in parse_xml
    return defused_xml_parse(xml_content)
  File "/usr/local/lib/python3.7/site-packages/defusedxml/common.py", line 105, in parse
    return _parse(source, parser)
  File "/usr/local/lib/python3.7/xml/etree/ElementTree.py", line 1197, in parse
    tree.parse(source, parser)
  File "/usr/local/lib/python3.7/xml/etree/ElementTree.py", line 605, in parse
    self._root = parser.close()
  File "/usr/local/lib/python3.7/xml/etree/ElementTree.py", line 1639, in close
    self._raiseerror(v)
  File "/usr/local/lib/python3.7/xml/etree/ElementTree.py", line 1531, in _raiseerror
    raise err
  File "<string>", line None
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 966, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/modules/retry/retry_on_exception.py", line 22, in wrapper
    self._raise_on_condition(self._retries, err)
  File "/usr/local/airflow/modules/retry/retry_on_exception.py", line 27, in _raise_on_condition
    raise exception
  File "/usr/local/airflow/modules/retry/retry_on_exception.py", line 17, in wrapper
    return function(*args, **kwargs)
  File "/usr/local/airflow/modules/rss_news/main.py", line 21, in export_news_to_broker
    for news in NewsProducer(rss_feed).get_news_stream(proxy):
  File "/usr/local/airflow/modules/rss_news/rss_news_producer.py", line 34, in get_news_stream
    news_feed_items = self._extract_news_feed_items(proxies)
  File "/usr/local/airflow/modules/rss_news/rss_news_producer.py", line 30, in _extract_news_feed_items
    news_feed = atoma.parse_rss_bytes(content)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/atoma/rss.py", line 217, in parse_rss_bytes
    root = parse_xml(BytesIO(data)).getroot()
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/atoma/utils.py", line 35, in parse_xml
    raise FeedXMLError('Not a valid XML document')
atoma.exceptions.FeedXMLError: Not a valid XML document
[2020-10-11 08:22:18,307] {{taskinstance.py:1185}} INFO - Marking task as FAILED.dag_id=rss_news_dag, task_id=exporting_101greatgoals_news_to_broker, execution_date=20201011T081000, start_date=20201011T082115, end_date=20201011T082218
[2020-10-11 08:22:21,351] {{logging_mixin.py:112}} INFO - [2020-10-11 08:22:21,351] {{local_task_job.py:103}} INFO - Task exited with return code 1

Consider doing a healthcheck of proxy (like GET google.com) before running export_news_to_broker to ensure that task failure indicates error in the exporter and not a last of working proxy in a pool.

BAD REQUEST (400)

I am getting BAD REQUEST (400) when I am trying to connect to any url ex: http://0.0.0.0:5000/api/v1/news/. What are the steps to resolve this?
Output of ./manage.sh up

Creating infrastructure...
Recreating mongo ... done
MongoDB shell version v4.2.17
connecting to: mongodb://localhost:27017/rss_news?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("59975a30-4c37-429f-a574-f5178dc7f588") }
MongoDB server version: 4.2.17
{
	"ok" : 0,
	"errmsg" : "command replSetInitiate requires authentication",
	"code" : 13,
	"codeName" : "Unauthorized"
}
bye
Initiated replica set
MongoDB shell version v4.2.17
connecting to: mongodb://localhost:27017/admin?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("22b505ad-c940-46bf-ba2b-2620b4bacd36") }
MongoDB server version: 4.2.17
2021-10-11T04:28:05.628+0000 E  QUERY    [js] uncaught exception: Error: couldn't add user: command createUser requires authentication :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DB.prototype.createUser@src/mongo/shell/db.js:1413:11
@(shell):1:1
2021-10-11T04:28:05.628+0000 E  QUERY    [js] uncaught exception: Error: command grantRolesToUser requires authentication :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DB.prototype.grantRolesToUser@src/mongo/shell/db.js:1635:15
@(shell):1:1
bye
MongoDB shell version v4.2.17
connecting to: mongodb://localhost:27017/admin?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("0e95023f-9a00-4201-8a07-7f13608ed26b") }
MongoDB server version: 4.2.17
{
	"ok" : 0,
	"errmsg" : "not master",
	"code" : 10107,
	"codeName" : "NotWritablePrimary"
}
2021-10-11T04:28:05.745+0000 E  QUERY    [js] uncaught exception: Error: couldn't add user: not master :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DB.prototype.createUser@src/mongo/shell/db.js:1413:11
@(shell):1:1
bye
Recreating postgres ... 
Recreating postgres      ... done
Recreating elasticsearch ... done
Recreating zookeeper     ... done
Creating minio           ... done
Recreating redis           ... done
Recreating airflow       ... done
Recreating api           ... done
Recreating proxy           ... done
Recreating kafka         ... done
Recreating schema-registry ... done
Recreating connect         ... done

OUTPUT of docker ps command:

CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS                             PORTS                                        NAMES
0bc8ef3be938        dataengineeringproject_connect          "./register_connecto…"   40 seconds ago      Up 39 seconds (health: starting)   0.0.0.0:8083->8083/tcp, 9092/tcp             connect
11abb03946fb        confluentinc/cp-schema-registry:5.3.1   "/etc/confluent/dock…"   41 seconds ago      Up 40 seconds                      8081/tcp                                     schema-registry
96ffc1d68f8b        dataengineeringproject_kafka            "./create_default_to…"   41 seconds ago      Up 40 seconds                      9092/tcp                                     kafka
cf36bccafb4b        dataengineeringproject_proxy            "/docker-entrypoint.…"   42 seconds ago      Up 40 seconds                      0.0.0.0:5000->5000/tcp, 8080/tcp             proxy
49266edc8a5f        dataengineeringproject_api              "./run_api.sh"           42 seconds ago      Up 42 seconds                                                                   api
3f776a6022ed        dataengineeringproject_airflow          "/entrypoint.sh webs…"   43 seconds ago      Up 41 seconds (healthy)            5555/tcp, 8793/tcp, 0.0.0.0:8080->8080/tcp   airflow
2bfc84caa93f        redis:alpine                            "docker-entrypoint.s…"   43 seconds ago      Up 40 seconds                      0.0.0.0:6379->6379/tcp                       redis
ada4f43e8a2e        confluentinc/cp-zookeeper:5.3.1         "/etc/confluent/dock…"   43 seconds ago      Up 41 seconds                      2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp   zookeeper
d9e5f3169391        dataengineeringproject_elasticsearch    "/tini -- /usr/local…"   43 seconds ago      Up 42 seconds                      0.0.0.0:9200->9200/tcp, 9300/tcp             elasticsearch
d9ce588ea654        postgres:9.6                            "docker-entrypoint.s…"   43 seconds ago      Up 42 seconds                      5432/tcp                                     postgres
bbb7edd6cf8d        mongo:4.2                               "docker-entrypoint.s…"   55 seconds ago      Up 54 seconds                      0.0.0.0:27017->27017/tcp                     mongo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.