🕵 We’re hiring GPU Performance Engineers to accelerate AI inference at scale for Amazon Bedrock! If you’re passionate about optimizing GPU workloads, building high-performance distributed inference solutions and unlocking the efficiency of state-of-the-art foundation models, we should talk. Send your CV to kaibud [at] amazon [dot] com.
I am currently a Team Lead (Sr. Applied Scientist) in the AWS Deep Science for Systems and Services team, where I work at the intersection of machine learning and systems. Our team’s goal is to optimize foundation models for inference in Amazon Bedrock—driving higher hardware utilization, lower latency, and lower cost. We develop algorithms (e.g., for quantization, speculative decoding, structured sparsity, accelerating multi-LoRA inference) and optimize systems (e.g., inference engines like vLLM, kernel tuning, identifying inference perf bottlenecks) that power GenAI workloads in Amazon Bedrock without compromising model accuracy. Learn more about our team’s recent public works: (Park et al., 2024)(Kübler et al., 2025).
I joined Amazon Research Lablet Tübingen (part of AWS AI) in 2020, where I developed algorithms / tools to help businesses explain complex cause-effect relationships underlying their business problems, and led cross-org effort within Amazon to launch them in production (Budhathoki & Blöbaum, 2022)(Budhathoki, 2021)(Götz & Budhathoki, 2022).
Businesses like Amazon Supply Chain and Amazon Ads actively use those solutions for effect estimation and root cause analysis of changes / outliers.
Those algorithmic solutions were also open-sourced to the Python DoWhy library under a new package called gcm
(Götz & Budhathoki, 2022)(Blöbaum et al., 2023)(Emre Kiciman, 2022). This collaboration with Microsoft Research led to a new GitHub organization, PyWhy, with the mission to build an open source ecosystem for causal machine learning (Götz & Budhathoki, 2022)(Emre Kiciman, 2022).
Briefly, I also led the cross-org science effort within Amazon to deliver bias mitigation solutions for the first family of Amazon’s in-house multimodal foundation models, called Titan Multimodal Embeddings model and Amazon Titan Image Generation model, towards their re:Invent 2023 release (Kleindessner et al., 2025)(Barth, 2023)(Ali et al., 2023).
In 2020, I received the Doctoral Degree in Computer Science from the Saarland University, where I conducted my doctoral research at the Max Planck Institute for Informatics. During my PhD, I also interned at the Amazon Research Lablet Tübingen for 2.5 months in the spring of 2019. Earlier, in 2015, I completed my Master’s degree in Computer Science with honours from the Saarland University. Prior to that I worked as a Software Developer at ImmuneSecurity A/S (now LogPoint) between 2011 and 2013. I studied Bachelor’s Degree in Computer Engineering at the Institute of Engineering, Pulchowk Campus in Nepal (2006-2010).
I have been fortunate to collaborate with great colleagues across diverse research topics, but the common thread is a customer-centric approach to machine learning. Despite shifts in direction, my work consistently aims to create ML systems that deliver real value for customers. See up-to-date publications on Google Scholar.
Selected artifacts
gcm
package in DoWhy w/ major refactoring of DoWhy codebase