Run the experiment for four different LLMs