Candidate Name
Applied ML Research (LLM Evaluation & Reliable Agents): I build reproducible evaluation pipelines for tool-using LLMs and agenticworkflows, grounded in real deployments (TerraTide for climate and health guidance). I’ve built agentic RAG + alerting systems using LangChain + FAISS with Flask/Node backends and MongoDB. I translate observed failures into measurable tests (retrieval grounding, hallucination-to-action risk, tool selection + argument/schema validity, prompt/context sensitivity) and run controlled ablations to identify mitigations that generalize. I’ve also trained forecasting models (LSTM) and bring an evaluation-first approach that connects model behavior analysis with deployment constraints.
01/09/2025
01/08/2025
01/06/2026