Data & Analytics

Portfolio

Applied data science projects in health, sustainability and public sector analytics — using Python, SQL, Power BI and machine learning to answer real business questions.

Python SQL Power BI Tableau Machine Learning Anomaly Detection Data Visualisation EDA Scikit-learn PCA

Projects

Complete Sustainability & Energy

Building Energy Anomaly Detection

“Which homes in England and Wales are anomalously energy-inefficient, and how can local authorities prioritise them for retrofit intervention?”

Applied IQR, Isolation Forest, and One-Class SVM to ~1 million UK EPC records to identify likely retrofit priorities. Isolation Forest produced the clearest separation, with ~78% agreement against the statistical baseline.

1M+EPC Records Analysed
~2%Flagged Anomalous
3Models Compared
78%Method Agreement
Python IQR Isolation Forest One-Class SVM PCA EPC Open Data
Complete Healthcare

NHS Referral Demand Forecasting

“Can referral demand be predicted 4–8 weeks ahead to support workforce and capacity planning in NHS outpatient services?”

Time series forecasting of NHS outpatient referral volumes using ARIMA and LSTM models. Targets healthcare analytics roles in the NHS and public sector.

2.4%ARIMA MAPE
8 wkForecast Horizon
ARIMARecommended Model
vs LSTMChallenger Tested
Python ARIMA LSTM Time Series NHS Open Data
Complete Sustainability & AI

Construction Sustainability Research Agent

“Can a multi-agent AI system autonomously research, analyse, and synthesise Scope 3 and embodied carbon intelligence for UK construction projects — reducing research time from days to minutes?”

Three-agent CrewAI pipeline that autonomously researches and synthesises UK construction sustainability intelligence from live regulatory sources. Produces compliance briefings with concrete actions in under 10 minutes.

3AI Agents
9Sources per Run
<10mBriefing Time
>95%Time Saving
Python CrewAI LangChain Gemini Agentic AI Scope 3 Embodied Carbon
Complete Education & People Analytics

Student Retention Risk Modelling

“Which student cohorts are at highest risk of non-continuation after Year 1, and what institutional factors drive early withdrawal?”

XGBoost classification model on HESA continuation data identifying high-risk cohort profiles by provider type, entry tariff, and subject area. SHAP explainability surfaces the top drivers for student support intervention.

0.847AUC-ROC
+11.3%vs Baseline
23%High-Risk Cohorts
2,400Cohorts Scored
Python XGBoost SHAP Scikit-learn HESA Data Education Analytics ROC-AUC
Complete Sustainability & Carbon Economics

Construction Scope 3 Cost Forecasting

“How will Scope 3 emissions-related cost pressures evolve for UK construction subcontractors over the next 12–24 months?”

Forecasts Scope 3 cost pressures for UK construction subcontractors using ONS PPI and HMRC trade data. SARIMA, ETS and Monte Carlo simulation (N=10,000) quantify material price risk and financial exposure for a representative £50M subcontractor.

+7.1%Steel PPI Uplift
£127kP50 Exposure
3Models Used
10kMC Simulations
Python SARIMA ETS Monte Carlo ONS Data UK ETS Scope 3

Yenlik Gaisina, MPH

Data Analyst with a background in public health research and sustainability consulting. Currently completing the Data Science with Machine Learning & AI programme at the University of Cambridge, building a portfolio of applied analytical projects across health, education and sustainability.

Connect