The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Reasoning Capabilities and Limitations of Large-scale Natural Language Understanding Models

Principal Investigator: Abulhair Saparov

My research focuses on the reasoning capabilities of large-scale language understanding models, including large language models (LLMs) and large reasoning models (LRMs). My research interests can be divided into two broad directions:


1. What are the reasoning capabilities of current AI models? In this direction, I want to better understand the extent to which current LLMs and LRMs are able to solve reasoning-intensive tasks. Specifically, can we identify fundamental shortcomings of large models thatcan help to explain their mistakes on such tasks? I am interested in studying LLM capabilities in different kinds of reasoning, such as deductive reasoning, inductive/abductive reasoning, social reasoning, etc. For example, overgeneralization in inductive reasoning can explain some instances of hallucination. I believe this is particularly important now as LLMs are currently being more widely deployed in socially-impactful settings, such as college admissions, the evaluation of job applications, student evaluations, financial and banking-related decisions, software engineering, etc.


2. What are the reasoning capabilities of future AI models? Here, I aim to find an “upper bound” on the reasoning capabilities of large-scale AI models. For example, are there certain tasks or abilities on which LLMs struggle, regardless of how much data and compute is available to the model? If we augment the training of these models, for example by utilizing curriculum learning or reinforcement learning, can these fundamental limitations be overcome? Or are these limitations inherent to the architecture (autoregressive decoder-only transformers)?


I am also broadly interested in applications of AI to problems in medicine and legal reasoning, and to develop AI systems that are robust against hallucinations and are able to verify the soundness of their own reasoning steps.

Keywords: AI evaluation, AI safety, Artificial Intelligence, large language models, machine learning, reasoning