Gerrit Geeraerts

Gerrit Geeraerts

Data Engineer & Python Developer | Building Robust & Scalable Data Solutions

VDAB Job Scraper
Python Scrapy Docker Airflow ETL Pipelines CI/CD DevOps Linux

VDAB Job Scraper ⭐⭐⭐

This project implements a robust web crawling solution. It features automatic retry and connection drop handling, respecting server load with throttling. Data is reliably stored in a partitioned Bronze Layer with failure tracking for debugging. Critical issues trigger Slack notifications. The entire system is containerized with a custom Docker image and managed by a streamlined CI/CD pipeline for efficient deployment and operation on my Airflow Home server.

Airflow Home Server
Airflow MLflow ETL Pipelines Docker DevOps Linux PostgreSQL CI/CD VM

Airflow Home Server ⭐⭐⭐

Leveraging Airflow on a dedicated Docker VM within Proxmox, I orchestrate dynamic data pipelines. My robust setup, incorporating Docker-out-of-Docker (DooD) for seamless container management and avoiding Docker-in-Docker complexities as advised by industry experts, ensures efficient data ingestion and ETL. This architecture, integrated with Proxmox's scheduled backups, guarantees a stable and resilient data processing environment for my home projects.

Chat with TIA
Python Streamlit Docker DevOps AWS Cloud LangSmith LangGraph LangChain Linux LLM Agents

Chat with TIA ⭐⭐⭐

This multi-agent chatbot system leverages a supervisor agent pattern with nested ReAct agents (Researcher, Reporter, Chart Generator) to empower project managers. By interacting with project data via the API, it generates customized reports and charts providing answers even for niche questions. Built with LangGraph and LangSmith for traceability, this solution streamlines project insights and allows fast feedback to the client.

AWS Knowledge Base Demo
Python Streamlit RAG LLM AWS Cloud Elastic Search Linux

AWS Knowledge Base Demo ⭐⭐⭐

This project showcases a scalable RAG (Retrieval Augmented Generation) demo using AWS Bedrock. It features a Streamlit application deployed on EC2, designed for rapid setup and customer demonstrations. A key innovation is the efficient use of OpenSearch as a combined vector store with intelligent filtering, significantly reducing operational costs. The solution also incorporates secure access links via public/private key encryption, simplifying sharing while maintaining data privacy for diverse knowledge bases, including web data and S3 content.

Ask Website
LangGraph LangSmith Python Linux LLM Agents

Deep Search ⭐⭐⭐

I wrote this script before Deep Research was released. This is an experimental AI agent implements a novel deep search concept for web exploration. Employing a self-calling architecture with a dynamic scratchpad, it iteratively refines answers by intelligently ranking URLs and progressively building knowledge. The agent leverages AI for structured content analysis, confidence scoring, and adaptive thresholds, demonstrating a cutting-edge approach to comprehensive and accurate web-based question answering, ideal for complex information retrieval.

Scrapy Playwright Demo
Python Scrapy Playwright Selenium Linux

Scrapy Playwright Demo ⭐⭐

This project merges Scrapy's robust crawling with Playwright's modern browser automation for advanced web scraping. It demonstrates logging into 2dehands.be to extract user-saved searches, showcasing the scraping of dynamic, JavaScript-heavy websites. I make a compelling case for choosing Playwright's streamlined approach which is more developer-friendly and has a simple syntax, offering a more compelling alternative to traditional Selenium-based methods for efficient data extraction.

Azure ML Infrastructure with Terraform and MLflow
Python Terraform IaC Azure Cloud MLflow Linux ML

Azure ML with Terraform and MLflow ⭐⭐

This project demonstrates how to set up Azure Machine Learning infrastructure using Terraform for scalability and efficiency. It utilizes MLflow for tracking experiments and ensuring reproducibility. A key feature is the ability to deploy and host a model with a single command via the MLflow server. The entire project, including the learning process for the technologies involved.

Immo Prediction together with Charlie
Python Pandas NumPy scikit-learn Linux ML

Immo Prediction ⭐⭐

This is a foundational machine learning project focused on predicting house prices using a scraped internet dataset. It applies Linear Regression and Random Forest models to forecast prices based on property features. Undertaken as part of an AI trainee program, the project, including theoretical study and implementation, was completed in four days. The repository includes code to train multiple models, which are then saved for further use.

Immo Prediction App with Charlie
Python Docker DevOps Streamlit FastAPI Pydantic Linux ML

Immo Prediction App ⭐⭐

This project deploys a previously built house price prediction model. It uses FastAPI to create an API for developer access and Streamlit for the end-user interface. The application is built with Docker for containerization. This project was completed in five days as part of an AI trainee program at BeCode. Live demos for both the frontend and backend are available, though they may be slow as they are on a free server.

ArcelorMittal Chatbot
Python OpenAI RAG neo4j Scrapy Streamlit

ArcelorMittal Chatbot ⭐

This project builds a chatbot using Python 3.12, Retrieval-Augmented Generation (RAG), and an OpenAI Vector Store. The process involves using Scrapy to crawl data from the ArcelorMittal website and job portal, loading it into the vector store, and then provides a chat interface via a Streamlit app. It requires an OpenAI API key to run. The chatbot leverages the scraped data to answer user queries accurately.

Powerplant Coding Challenge
Python Docker FastAPI Test Linux

Powerplant Coding Challenge ⭐⭐⭐

This project is a coding challenge to create a production plan for power plants. Given a specific load, it calculates how much power each plant should generate based on energy costs (gas, kerosine) and each plant's minimum/maximum output, without using an existing linear-programming solver. The project was completed for a job interview and served as an introduction to FastAPI and Test-Driven Development (TDD). It can be run locally using Docker.