mauriciofbl Goto Github PK

followers: 1.0 following: 4.0 repos: 59.0 gists: 0.0

Name: Mauricio Fernando Bautista Lopez

Type: User

Company: Kellogg Company

Bio: I'm a data engineer, I have worked with Scrum and use tools like Azure, AWS I really love work with data and my favorite languages are Python and R

Location: México state, México

Blog: [email protected]

👋 Hi, I’m Mauricio Bautista

I´m a data engineer who really love work with data, Im always looking for something to learn and I belive data help us to undesrtand everything
LinkedIn Yo can reach me in https://www.linkedin.com/in/mauricio-fbl/
👀 I’m interested in Artifitial Intelligence, Data Engineering and Cloud computing, I love to learn, and I am a believer in lifelong learning.
🌱 I’m currently learning in Platzi and Clouduru about Cloud Computing and Data science and hope soon can apply for the AWS Developer addociate certification
💞️ I’m looking to collaborate on Data science, Data engineering, web scrapping, and analytic projects
📫 How to reach me you can send an inbox to my linkedIn LinkedIn and I respond as soon as possible
Technologies

Mauricio Fernando Bautista Lopez's Projects

apalabrados

Primer reto tecnico platzi master, Es una aplicacion en Flask que evalua un input

big-data-mapreduce-course

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

capstone_project

This a full pipeline develop to made movie analytics with AWS apache airflow and pyspark

computational_thinking_python

Scripts from computational thinking course from platzi

curso-hadoop-platzi

Repositorio utilizado para el Curso de Hadoop en Platzi

cuso_introductorio_de_spark

Curso Introductorio de Spark by Platzi 💚

dagster_crash_course

Dagster crash course https://dagster.io/blog/dagster-crash-course-oct-2022

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

data-engineer-roadmap

Roadmap to becoming a data engineer in 2021

data-engineering-with-databricks-english

# data_engeniering_python ## Description This is a Data engineering pipeline prototype to extract data from notice sites then, tranform and aggregate different sources and finally load data in a Bd ## Data Sources The project consumes different notices sites at this moment scrape: - https://elpais.com - http://www.eluniversal.com.mx ## Development Parameters needed for configuration are in the file config.yaml this file contains: * **news_sites:** sitename: url: queries: homepage_article_links: article_body: article_title: ### Requirements and Installation directories and file structure: ``` LH4_AMPPS_DASH/ |---extract/ |---common.py |---config.yaml |---main.py |---news_page_objects.py |---transform/ |---main.py |---load/ |---article.py |---base.py |---main.py |---.gitignore |---README.md |---newspaper.db |---pipeline.py ``` It requires Python 3.6 or higher, check your Python version first. The [requirements.txt](requirements.txt) should list and install all the required Python libraries that the pipeline depend on `pip install -r requirements.txt` To start scrapping the sitess, you have to execute [pipeline.py](pipeline.py) file: `python pipeline.py ` This will run the ETL process, and write the output to the specified output location.

mauriciofbl Goto Github PK

👋 Hi, I’m Mauricio Bautista

Mauricio Fernando Bautista Lopez's Projects

Recommend Projects

Recommend Topics

Recommend Org