Data Engineer & Python Developer | Building Robust & Scalable Data Solutions
This project implements a robust web crawling solution. It features automatic retry and connection drop handling, respecting server load with throttling. Data is reliably stored in a partitioned Bronze Layer with failure tracking for debugging. Critical issues trigger Slack notifications. The entire system is containerized with a custom Docker image and managed by a streamlined CI/CD pipeline for efficient deployment and operation on my Airflow Home server.
Leveraging Airflow on a dedicated Docker VM within Proxmox, I orchestrate dynamic data pipelines. My robust setup, incorporating Docker-out-of-Docker (DooD) for seamless container management and avoiding Docker-in-Docker complexities as advised by industry experts, ensures efficient data ingestion and ETL. This architecture, integrated with Proxmox's scheduled backups, guarantees a stable and resilient data processing environment for my home projects.
This multi-agent chatbot system leverages a supervisor agent pattern with nested ReAct agents (Researcher, Reporter, Chart Generator) to empower project managers. By interacting with project data via the API, it generates customized reports and charts providing answers even for niche questions. Built with LangGraph and LangSmith for traceability, this solution streamlines project insights and allows fast feedback to the client.
This project showcases a scalable RAG (Retrieval Augmented Generation) demo using AWS Bedrock. It features a Streamlit application deployed on EC2, designed for rapid setup and customer demonstrations. A key innovation is the efficient use of OpenSearch as a combined vector store with intelligent filtering, significantly reducing operational costs. The solution also incorporates secure access links via public/private key encryption, simplifying sharing while maintaining data privacy for diverse knowledge bases, including web data and S3 content.
I wrote this script before Deep Research was released. This is an experimental AI agent implements a novel deep search concept for web exploration. Employing a self-calling architecture with a dynamic scratchpad, it iteratively refines answers by intelligently ranking URLs and progressively building knowledge. The agent leverages AI for structured content analysis, confidence scoring, and adaptive thresholds, demonstrating a cutting-edge approach to comprehensive and accurate web-based question answering, ideal for complex information retrieval.
This project merges Scrapy's robust crawling with Playwright's modern browser automation for advanced web scraping. It demonstrates logging into 2dehands.be to extract user-saved searches, showcasing the scraping of dynamic, JavaScript-heavy websites. I make a compelling case for choosing Playwright's streamlined approach which is more developer-friendly and has a simple syntax, offering a more compelling alternative to traditional Selenium-based methods for efficient data extraction.
This project demonstrates how to set up Azure Machine Learning infrastructure using Terraform for scalability and efficiency. It utilizes MLflow for tracking experiments and ensuring reproducibility. A key feature is the ability to deploy and host a model with a single command via the MLflow server. The entire project, including the learning process for the technologies involved.
This is a foundational machine learning project focused on predicting house prices using a scraped internet dataset. It applies Linear Regression and Random Forest models to forecast prices based on property features. Undertaken as part of an AI trainee program, the project, including theoretical study and implementation, was completed in four days. The repository includes code to train multiple models, which are then saved for further use.
This project deploys a previously built house price prediction model. It uses FastAPI to create an API for developer access and Streamlit for the end-user interface. The application is built with Docker for containerization. This project was completed in five days as part of an AI trainee program at BeCode. Live demos for both the frontend and backend are available, though they may be slow as they are on a free server.
This project builds a chatbot using Python 3.12, Retrieval-Augmented Generation (RAG), and an OpenAI Vector Store. The process involves using Scrapy to crawl data from the ArcelorMittal website and job portal, loading it into the vector store, and then provides a chat interface via a Streamlit app. It requires an OpenAI API key to run. The chatbot leverages the scraped data to answer user queries accurately.
This project is a coding challenge to create a production plan for power plants. Given a specific load, it calculates how much power each plant should generate based on energy costs (gas, kerosine) and each plant's minimum/maximum output, without using an existing linear-programming solver. The project was completed for a job interview and served as an introduction to FastAPI and Test-Driven Development (TDD). It can be run locally using Docker.