Implicit and Explicit Reasoning in LLMs
Do LLMs introspect?
Humans can reason. When humans are asked to provide an explicit explanation of how they reached a conclusion, they often generate responses that are not compatible with the implicit reasoning not guides their actual behavior. This means that humans have limited metacognitive abilities.
What about Large Language Models? They too can reason, or at least generate reasoning-like language. Some have attempted to investigate this reasoning by effectively asking the LLM to introspect, on the assumption that the explicit reasoning generated by an LLM is likely to be the same as the implicit reasoning that led to a particular behavior. Do LLMs differ from humans in this respect? That is, do they have strong metacognition?
The perfect domain in which to investigate this question is linguistic generalization. The goal of this project is to examine the degree to which explicit responses to questions about generalizations of language patterns to novel forms predict the actual observed behaviors. That is, to what degree explicit rationales for linguistic behavior align with implicit patterns.