Abstract: Tools based on the use of Large Language Models (LLMs) have improved the computer programming teaching process, automated feedback processes, facilitated program repair, and enabled ...
Abstract: Simply measuring a language model’s performance after deployment is not enough. Rigorous and inclusive benchmarking isn’t just a checkbox along the way—it’s the foundation upon which ...
Background: Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various natural language processing tasks, particularly in text generation. However, their ...