Varun Raj
Software Engineer at Google

FELLOW MEMBER
Varun Raj has established a strong professional record in large-scale distributed systems and AI/ML infrastructure, building a career over approximately eight years around performance optimization, release automation, observability, and developer productivity at some of the world’s most demanding technology platforms. His work has consistently focused on designing infrastructure that improves system efficiency, reliability, and deployment velocity in environments that process enormous traffic volumes. Across projects at Google and Oracle, he has contributed to the practical advancement of computer science through systems that enhance the performance of distributed services, improve visibility into high-scale execution environments, and automate the safe release of complex software systems.
One of the most technically significant initiatives in his career was the State-Aware RPC Load Balancing and Probing Framework developed during his time at Google, where he worked at the intersection of Google Ads Serving and the Core RPC teams. In this effort, Varun Raj contributed to modernizing load-balancing mechanisms across large-scale clusters by architecting a system capable of routing requests based on real-time server state, including queue depth and CPU availability. The innovation of this project came from replacing static distribution methods with a state-aware routing model that actively evaluated server health before dispatching traffic. Through his work designing the evaluation framework and implementing probing mechanisms, the system improved RPC efficiency for highly latency-sensitive services, including infrastructure supporting Google Search, Ads, and Gemini.
Another major contribution was his work on the Automated Profile Guided Optimization (AutoPGO) Framework, a system designed to optimize critical binaries using actual production traffic behavior. In this project, his objective was to bridge the gap between real-world workload patterns and compiler-level optimization. He helped architect a framework that automatically generated representative load tests from sampled production queries and fed execution profiles back into the compiler so optimized binaries could be produced based on live usage patterns. This work was particularly innovative because it automated the feedback loop between production traffic and compiler tuning, allowing binaries to be continuously optimized in alignment with actual system behavior. His contributions included designing the optimization pipeline, integrating automated profile generation into release workflows, and coordinating closely with compiler and infrastructure teams to ensure precise profile utilization.
Varun Raj also contributed significantly to observability and performance diagnostics through the Companion Stats Real-Time P99 Latency and CPU Profiling System. His role in this project centered on rearchitecting a diagnostic platform used to analyze tail latency in large-scale services. Rather than profiling every request, the redesigned system introduced a more scalable model in which requests reported latency to a central controller that calculated live thresholds and selectively sampled only the slowest traffic. This innovation enabled high-fidelity insight into tail latency while minimizing overhead on the broader system. By redesigning the system into a sharded architecture and implementing real-time threshold logic, he helped create a platform capable of supporting much larger traffic volumes while delivering precise diagnostic insights for developers working on latency-sensitive services.
Another project that demonstrated his leadership in infrastructure automation was the Velocity and Vigilance Automated Release and Regression Framework. Here, his objective was to reduce the growing operational burden created by fragmented release processes that depended heavily on manual intervention. He designed a unified qualification framework that standardized release validation across multiple binaries and automated regression testing workflows. The innovation in this system came from converging previously separate release and testing pipelines into one automated framework with consistent validation standards. Through his work building rollout automation tools, defining unified test conditions, and coordinating across SRE and engineering productivity teams, the framework improved release stability while meaningfully reducing operational toil.
He also contributed to experimentation infrastructure through the Ad Entity Ablation and Rapid Prototyping Infrastructure, a system designed to let engineers selectively disable or modify ad entities during experiments without incurring prohibitive memory costs in production binaries. The core innovation of this project lay in a specialized low-latency key-value lookup engine that enabled dynamic entity-level ablation during serving. Varun Raj architected this fast lookup system and integrated it with release and regression pipelines so experimentation could occur safely and efficiently. This infrastructure gave engineering teams the ability to evaluate the incremental effect of specific ad components at scale, accelerating product experimentation and data-driven iteration.
Earlier in his career at Oracle, he served as technical lead for the Fusion FastBuild Incremental Build System, an initiative aimed at modernizing build infrastructure for a large microservices ecosystem. His objective was to replace monolithic full-build processes with a dependency-aware incremental build system that could rebuild only modified components and their dependents. The technical innovation of this work came from applying dependency graph analysis to large-scale CI/CD pipelines, enabling faster and more selective builds across thousands of services. Through his leadership in designing the system architecture and coordinating adoption across teams, the platform significantly reduced build times and improved engineering productivity throughout the ecosystem.
Taken together, Varun Raj’s body of work reflects sustained achievement in distributed systems engineering, infrastructure optimization, diagnostic observability, experimentation systems, and release automation. His contributions have repeatedly addressed core technical challenges in large-scale computing environments, including efficiency, latency, resilience, and developer velocity. By building performance-critical systems and scalable automation frameworks that support modern distributed services, he has made meaningful contributions to the practical advancement of computer science in high-scale production infrastructure.