βββββββ βββββββ ββββββ βββ ββββββ ββββ ββββ
ββββββββββββββββ βββββββ βββββββββββββββββ βββββ
βββββββββββββββββ βββ βββββββ βββββββββββββββββββ
βββββββββββββββββββββ βββββ βββββββββββββββββββ
ββββββββββββββ ββββββ βββ βββ ββββββ βββ βββ
βββββββ ββββββ βββββ βββ βββ ββββββ βββ
Sr. Data Scientist Β· ML Systems Β· LLM Engineering Β· AdTech
Senior Data Scientist specializing in end-to-end ML system design β spanning quantile regression, temporal clustering, causal inference, anomaly detection, and LLM-powered agent pipelines. Building production systems where statistical rigor and engineering precision drive measurable outcomes at scale in latency-sensitive AdTech environments.
EDA EMBED MODEL SERVE FEEDBACK
βββ βββββ βββββ βββββ ββββββββ
Polars βββββββββββ Word2Vec ββββββββ LightGBM ββββββββ Lambda ββββββββ Thompson
β² β±β² β² β±β² β² β±β² β² β±
β² β± β² β² β± β² β² β± β² β² β±
Kafka βββββββββββ BERT βββββββ β DCN βββββββββ β O(1) ββββ Elasticity
β± β² β± β± β² β± β± β² β± β± β²
β± β²β± β± β²β± β± β²β± β± β²
DSP βββββββββββ HMM ββββββββ Iso.Forest ββββββββ RT inferβββββββ AutoLoop
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
hourly recalibration feedback arc
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β User Query βββββΆβ Planner βββββΆβ Tool Use βββββΆβ Reflection βββββΆβ Response β
β β β LangGraph β β MCP Β· RAG β β self-critiqueβ β grounded β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββ ββββββββββββββββ
β Memory β β Vector DB β
β store β β FAISSΒ·pgvec β
ββββββββββββββ ββββββββββββββββ
| Layer | Method | Detail |
|---|---|---|
| Embedding | Dense + sparse hybrid | BERT, Word2Vec, BM25 fusion |
| Indexing | HNSW approximate NN | 200M+ document scale |
| Re-ranking | Cross-encoder | Precision boost post-retrieval |
| Entity linking | Custom taxonomy mapper | Publisher content β audience graph |
| Storage | FAISS Β· pgvector | On-prem and cloud portable |
Query complexity assessment
β
ββββΆ Simple retrieval βββΆ Haiku (fast Β· cheap)
ββββΆ Structured output βββΆ Sonnet (balanced)
ββββΆ Complex generation βββΆ Opus (frontier)
Tools & Frameworks: LangGraph Β· LangSmith Β· PydanticAI Β· MCP Β· FAISS Β· pgvector Β· RAG
01 INGEST 02 FEATURES 03 MODEL 04 SERVE 05 FEEDBACK
βββββββββ βββββββββββ βββββββββ ββββββββ βββββββββββ
Polars Β· Kafka Word2Vec LightGBM QR Lambda Thompson MAB
Bidstream EDA HMM states DCN features O(1) lookup Auto-calibrate
DSP signals GloVe embeds Anomaly detect Real-time Price elasticity
β β β β β
ββββββββββββββββββ΄βββββββββββββββββ΄βββββββββββββββββ΄βββββββββββββββββ
Feedback loop (hourly recalibration)
| Metric | Result |
|---|---|
| Bid request reduction | 50%+ |
| GCPM gain | 2Γ+ |
| Pipeline speedup | 10Γ |
| Directional decision accuracy | 76% |
| Daily revenue lift | $44β$500 |
| ID5 integration revenue | $10K+/day |
| Infra cost reduction | 61% ($7.63 β $3/hr) |
Bid floor optimization ββββββββββββββββββββ 95%
Quantile regression βββββββββββββββββββ 92%
LLM agents Β· LangGraph ββββββββββββββββββ 88%
Anomaly detection Β· IVT ββββββββββββββββββ 88%
Embedding Β· HNSW Β· RAG βββββββββββββββββ 87%
NLP Β· BERT Β· Word2Vec βββββββββββββββββ 86%
Thompson Sampling Β· MAB ββββββββββββββββ 83%
Hidden Markov Models ββββββββββββββββ 82%
Causal inference βββββββββββββββ 80%
Core ML
LLM & Agents
Data & Infrastructure
Cloud
Addis Ababa, ET Β· Open to remote Β· AdTech Β· ML Systems Β· LLM Engineering



