The tutorial at ICML 2024 focuses on the challenges of evaluating language models (LMs), discussing fundamental evaluation methods, common pitfalls, and best practices for reliable assessments. It aims to provide attendees with insights into current evaluation practices, the issues faced, and future research directions in LM evaluation. Key topics include measurement methods, reproducibility challenges, and the impact of prompt sensitivity on evaluation outcomes.