2023 Symposium Posters

Posters > 2023

Trustworthiness Re-use of Pre-trained Neural Networks


PDF

Primary Investigator:
Jamie Davis

Project Members
Wenxin Jiang, Taylor R. Schorlemmer, James C. Davis
Abstract
Deep Neural Networks (DNNs) are increasingly being adopted in software systems. Creating and specializing DNNs from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, machine learning engineers have begun to reuse large-scale pre-trained models (PTMs) and fine-tune these models for downstream tasks. Prior works have studied reuse practices for traditional software packages to guide software engineers towards better package maintenance and dependency management. We lack a similar foundation of knowledge to guide behaviors in pre-trained model ecosystems. In this work, we present the first empirical investigation of pre-trained model reuse. We interviewed 12 practitioners from the most popular PTM ecosystem, Hugging Face, to learn the practices and challenges of PTM reuse. From this data, we model the decision-making process for PTM reuse. Based on the identified practices, we describe useful (and commonly omitted) attributes for model reuse, including provenance, reproducibility, and portability. We substantiate the identified challenges with systematic measurements of trust and model risks in the Hugging Face ecosystem. Additionally, we publish HFTorrent as an open dataset for Hugging Face models. Our work informs future directions on optimizing deep learning ecosystems by automated measuring useful attributes and potential attacks, and envision future research on infrastructure and standardization for DL model registries.