What solver does Amazon use to assign 2 Million VMs to 100,000 servers?

At Amazon’s scale, assigning millions of virtual machines (VMs) to hundreds of thousands of servers is an optimization challenge of exceptional size. Traditional solvers such as MILP-based engines like Gurobi and Xpress cannot handle this level of complexity. To address the problem, Amazon uses Hexaly, a global optimization solver designed to work efficiently on very large-scale scenarios.
Amazon’s Challenge
The core question is: what solver can assign 2 million VMs to 100,000 servers while respecting operational constraints and performance requirements? Classical approaches from operations research quickly reach their limits. Amazon needed a tool that could provide feasible, high-quality solutions at scale.
Pragmatic Operations Research
Rubén Ruiz, Principal Applied Scientist at Amazon and former Professor of Statistics and Operations Research at Universitat Politècnica de València, explains that solving these kinds of problems requires a pragmatic mindset. Instead of focusing only on mathematical optimality, he applies Pragmatic OR, using tools that can actually scale to production needs. He presented a talk entitled “Pragmatic OR: solving large-scale optimization problems in fast-moving environments” at the EURO Online Seminar Series.
👇 Watch Rubén Ruiz’s full presentation here
Here is the abstract of Rubén Ruiz’s seminar: Pragmatic OR: solving large-scale optimization problems in fast-moving environments.
We argue for the use of heuristic solvers and simplified modeling techniques that prioritize speed, adaptability, and ease of implementation over strict optimality or complex approaches. This angle is particularly valuable when dealing with estimated input data, where pursuing optimality may be less meaningful.
This talk examines the gap between academic Operations Research and real-world industrial applications, particularly in environments like Amazon and AWS where sheer scale and delivery speed are important factors to consider. While academic research often prioritizes complex algorithms and optimal solutions, large-scale industrial problems demand more pragmatic approaches. These real-world scenarios frequently involve multiple objectives, soft constraints, and rapidly evolving business requirements that require flexibility and quick adaptation.
The presentation will showcase various examples, including classical routing and scheduling problems, as well as more complex scenarios like virtual machine placement in Amazon EC2. These cases illustrate how pragmatic methods can effectively address real-world challenges, offering robust and maintainable solutions that balance performance with operational efficiency. The goal is to demonstrate that in many industrial applications, a small optimality gap is an acceptable trade-off for significantly improved flexibility and reduced operational overhead.
Customer Testimonial
Rubén Ruiz highlights why Hexaly was chosen at Amazon:
Recently, we have devised new models to balance LLM workloads over specialized hardware within Bedrock, a core product in AWS that leverages genAI through foundational models, allowing our customers to easily implement AI workloads. In this new system we directly started using Hexaly instead of trying first mathematical solvers to see if we could solve to optimality and if not, move to Hexaly. We have grown confident in Hexaly’s abilities so as not to waste time trying to get optimality proofs for extremely large problems. To our surprise, Hexaly is giving us optimal solutions for up to five different lexicographic objectives in CPU times that are well below our latency requirements. This is allowing rapid iteration cycles with our customers to keep extending and refining the model without worrying about scale, convergence or solution quality.
Ready to start?
Discover the ease of use and performance of Hexaly through a free 1-month trial.