
Marthe Ballon
Biography
Marthe Ballon is a PhD student under the supervision of Prof. Dr. Vincent Ginis. She graduated from the Vrije Universiteit Brussel in 2024 with a degree in Mathematics, specialising in Financial and Applied Mathematics. Her research lies at the intersection of mathematics and AI, with the aim of understanding the reasoning or non-reasoning abilities of large language models.
Present projects
Estimating the difficulty of problems with large language models without ground truth. There are multiple metrics that are used to asses the complexity of a problem, but they all require ground truth data which is often unavailable or resource intensive to obtain. We propose a general framework to compute difficulty scores using pairwise comparisons judged by LLMs and Bradley-Terry scores. The LLM is queried to compare two problems at a time and decide which one it finds more difficult. Based on these comparisons, we compute Bradley–Terry scores for all problems, yielding a difficulty score that is independent of any ground-truth data or human labelling. Joint work with Andres Algaba, Brecht Verbeken and Vincent Ginis.
Injecting the reasoning traces of LLMs to improve their performance. In a past project, we found that longer reasoning correlates with reduced accuracy in LLMs, even when accounting for the difficulty of the problems. One possible explanation is that longer reasoning chains have an inherently higher probability of arriving at a wrong final answer, indicating that computing 'accuracy' on the final answer is not a comprehensive measure for LLMs abilities. With this project, we aim to discover how LLMs' reasoning traces contribute to the correctness of their final answer and how performance evolves through the reasoning chain. As a first step, we will split reasoning chains up into chunks and ask the model to guess the answer based on this chunk. Joint work with Desi R. Ivanova, Andres Algaba and Vincent Ginis.
Past projects
Analysing the relationship between token use and performance in LLMs. Currently there are 2 scaling laws that determine the performance of LLMs. 1) A model gets predictably better, if you increase the amount of {data, parameters, compute} exponentially. With the arrival of the reasoning models, there came a second scaling law. 2) If you increase the amount of test-time compute exponentially, your model also gets predictably better. Both of these laws are almost saturated. By analysing the amount of reasoning tokens o1 and o3 use when solving Olympiad-level mathematics problems, we found that there is a more efficient way to scale the abilities of LLMs, namely, the amount of reinforcement learning in the post-training. Our paper shows that more RL leads to more effective reasoning, underscoring that thinking harder is not the same as thinking longer. Joint work with Andres Algaba and Vincent Ginis. Paper.
Location
Pleinlaan 5
1050 Elsene
Belgium