CERIAS 2025 Annual Security Symposium


2026 Symposium Posters

Posters > 2026

PishWife: A Psychometric Research Platform for Adaptive Phishing Simulation and Susceptibility Measurement


PDF

Primary Investigator:
Jamie Davis

Project Members
Jamie Davis Andrew Rozema
Abstract
Phishing remains the leading initial access vector across all sectors, yet security awareness training programs lack validated psychometric frameworks for measuring individual susceptibility and organizational resilience over time. PhisishWife is an open-source research platform that bolts a psychometric analytics and adaptive LLM agent layer onto GoPhish, an established phishing simulation engine, without rebuilding its core simulation capabilities. FishWife introduces three novel measurement constructs. The Gullibility Index (GI) is a per-user susceptibility score derived from weighted simulation actions (opened, clicked, submitted, reported), normalized against template difficulty using the NIST Phish Scale. The Collective Immunity Index (CII) measures organizational resilience through time-to-first-report, click-to-report ratio, and inoculation rate across rolling time windows. The NIST Phish Scale scorer automatically rates template difficulty across 23 cues in five categories (urgency, authority, social proof, personalization, technical sophistication). The adaptive LLM pipeline uses LiteLLM-backed agents to generate phishing templates calibrated to a target difficulty level on the NIST scale. A per-user difficulty progression engine increases challenge as users demonstrate resistance, creating personalized learning trajectories grounded in Item Response Theory. The template generation agent validates generated difficulty via the NIST scorer and retries on deviation. The platform is implemented in Python 3.11 with FastAPI, SQLAlchemy 2.0, PostgreSQL 16, and Streamlit, and is fully unit-tested with all external services mocked. The primary research question — whether simulation performance predicts real-world phishing resistance — will be evaluated in a planned IRB study (n ≥ 200 participants).