LLM Evals for Structured Outputs Use the gold dataset to calculate accuracy for all LLMsHome / Lesson Notes / Use the gold dataset to calculate accuracy for all LLMs Use Marimo to calculate the “diff” between correct answer and LLM’s answer (demo) Calculate accuracy for each LLM Save the diff as a permanent record into a CSV file for future reference