Global Macroeconomics & News Data Pipeline
Python · GCP · BigQuery · dbt · Apache Airflow
This project involved designing an automated, end-to-end ELT pipeline to ingest and process real-time news and macroeconomic data from various APIs and RSS feeds. The architecture centers on durability, using Apache Airflow to orchestrate complex dependency graphs across ingestion scripts and warehouse transformations.
To handle the variety of incoming data, I developed an extendable transformation layer using dbt that merges disparate formats, such as raw text and structured numeric indicators. This approach optimizes BigQuery compute costs through incremental materializations while ensuring the system remains scalable as new data sources are integrated.
European Union Energy Grid Monitor
Apache Kafka · Python · Docker · SQL
This system is a decoupled microservices architecture designed to process over 13 million records of real-time energy generation and pricing data from the ENTSO-E API. The streaming solution utilizes Apache Kafka to decouple ingestion from storage, ensuring high availability and system resilience.
A key technical challenge was maintaining data integrity during high-write loads and deep-fetch backfilling. I engineered an idempotent storage service to persist enriched data, which solved for consistency and message replayability, ensuring that system failures did not result in data loss or duplication.
Freelance Automation Project
Python · Selenium
Developed for a specific client, this automation pipeline bulk-inputs productive hours for over 500 missed job records on a daily basis. The solution replaced a manual, fragmented workflow, saving staff approximately 8-16 hours of labor per run and eliminating a multi-day backlog.
The pipeline was engineered for high stability during continuous 2-hour execution cycles. By implementing custom wait conditions and robust error handling, the script ensures 100% accuracy in data entry without requiring human intervention or crashing during long-running tasks.