The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Promoting Inter- and Intra-Organizational Learning from Software Failures: Towards a Failure-Aware Software Development Lifecycle

Research Areas: Cyber-Physical Systems

Principal Investigator: Jamie Davis

The goal of this project is to help software engineers incorporate the lessons learned from prior failures throughout the software development lifecycle. Since all engineered systems fail, one fundamental theorem of software engineering is to learn from failures to mitigate their recurrence. Although this theorem has been recommended for software engineers by standards bodies (e.g., the ISO) and organizations such as Google (e.g., the SRE book), it has received little critical examination. We are conducting foundational empirical research to understand current and best practices related to software failure feedback. We will use this knowledge to develop and evaluate innovations in failure-aware software development processes.

Some software engineering practices, such as postmortems and retrospectives, render failure knowledge into lessons learned. These lessons may be incorporated into other engineering processes and artifacts, e.g., design reviews and style guides. The problem is that we lack basic empirical data to inform the application of this feedback mechanism. For example, we do not know: (1) What are best practices in failure knowledge collection and sharing? ; nor (2) How well is failure knowledge leveraged in the software development lifecycle? ; nor (3) How and to what extent do engineers study failures in other organizations’ products to inform their own work? ; nor (4) What is the cost of failure analysis vs. the benefit in future failure elimination? Answering such questions could be transformative.

Our research goal is to collect these basic empirical data. To do so, we will develop tooling and conduct human-subjects work to facilitate inter- and intra-organizational learning. The expected outcomes are knowledge about current and best practices, and preliminary evaluations of innovations in failure-aware development practices. If successful, our results will enable engineers to analyze failures and apply lessons throughout the software development lifecycle; and enable engineering decision-makers to determine which failures to focus on and how to assess the cost/benefit trade-offs in their context.

Personnel

Students: Dharun Anandayuvaraj, PhD student

Representative Publications

  •  

    FAIL: Analyzing Software Failures from the News Using LLMs.
    Anandayuvaraj, Campbell, Tewari, and Davis.
    Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2024.

  • Learning From Software Failures: A Case Study at a National Space Research Center.
    Anandayuvaraj, Hammadeh, Lund, Holloway, and Davis.
    arXiv 2025.

  • Incorporating Failure Knowledge into Design Decisions for IoT Systems: A Controlled Experiment on Novices.
    Anandayuvaraj, Thulluri, Figueroa, Shandilya, and Davis.
    5th International Workshop on Software Engineering Research & Practices for the Internet of Things (SERP4IoT 2023) 2023.

  • Reflecting on Recurring Failures in IoT Development.
    Anandayuvaraj and Davis.
    Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering: New Ideas and Emerging Results track (ASE-NIER) 2022.

  • A Unified Taxonomy and Evaluation of IoT Security Guidelines.
    Chen, Anandayuvaraj, Davis, and Rahaman.
    arXiv 2023.

  • An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures.
    Singla, Anandayuvaraj, Kalu, Schorlemmer, and Davis.
    Proceedings of the 2nd ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED) 2023.

Keywords: AI4SE, Cybersecurity, failure analysis, failure knowledge, FMEA, large language models, ML4SE, software engineering