LLM Accuracy Comparisons for Structured Outputs

Use DataBlist to generate the gold dataset

Compare values for which you are trying to calculate the accuracy
Use human judgment to calculate the value when there is no majority answer
DataBlist demo