Patients-Trainees-Supervisors: clinical learning's triad of trust
In health professions education, relationships between patients, trainees, and supervisors may not always coalesce spontaneously or effectively. Within this triad, a web of uncertainties and motivations makes each party vulnerable to the other. The mitigation of uncertainty, alignment of motivation, and acceptance of vulnerability all require trust. Patients seeking care may distrust their providers, but they must at least hold some trust in their providers’ ability to match them with the care they need. Trainees are motivated not only by direct care of their patients, but also by their own learning and development. Alignment of patient care with learning thus requires all parties’ acceptance of the vulnerability inherent in receiving and delivering care that is also crafted as an opportunity for learning.
We seek to understand how trainees build and receive trust within this dynamic, and how this process can be improved from both a learning perspective and patient outcomes perspective. Within HPE, a recent focus on trust emerged around the trainee-supervisor dyad because of its relevance to the framework of entrustment in assessment. While trainees’ opportunities for learning may depend on the trust their supervisor has in them, it depends upon their patients’ trust in them as well. Patient-provider trust has also been a topic of strong interest, so we would like to understand how these two dyads of trust interface, with a particular focus on learning and patient care outcomes.
Blurring qualitative and quantitative approaches: investigating clinical trust with AI language models
Advances in artificial intelligence (AI) and natural language processing (NLP) promise new insights in the analysis of narrative data. Computational and algorithmic developments in deep learning neural networks have led to the advent of language models (LMs - including large language models, or LLMs) with unprecedented abilities to not only encode the meaning of words, sentences, and entire bodies of text, but also to use such encodings to generate new text. Within health professions education research, these LMs promise to bridge, and potentially blur, the distinction between qualitative analysis and quantitative measurement approaches.
Our research focuses on exploring the interpersonal dynamics of clinical entrustment by developing LM methodologies to augment the reach of traditional qualitative methods for analyzing large narrative datasets. We have employed LMs to assist narrative analysis in several ways, including: 1) to discover qualitative themes and measure associated constructs in unlabeled narratives, and 2) to uncover latent constructs underlying the classification of pre-labeled narratives. In 1), querying how supervisors and trainees may differentially approach entrustment decisions, we developed a transfer learning strategy (applying LLMs trained on datasets apart from the study dataset) to identify and measure constructs in a large dataset of feedback narratives. The constructs algorithmically identified included features of clinical task performance and sentiment characterizing the language used in the feedback. The LLMs provided consistent measurement of these constructs across the entire dataset, enabling statistical analysis of differences in how supervisors and trainees reflect on entrustment decisions and respond to potential sources of bias. In 2), querying how entrustment decisions shape feedback, we trained an LM from scratch to predict entrustment ratings from feedback narratives, using a training set consisting of narratives paired with entrustment ratings. By deconstructing the trained LM, we uncovered latent constructs the LM used to make its predictions. Such constructs included the narrative’s level of detail and the degree to which the feedback was reinforcing versus constructive.
While LMs offer the advantages of consistent construct measurement and applicability to large datasets, they also carry the disadvantages of algorithmic bias and lack of transparency. In 1), we identified gender bias in the LLM we had trained to measure sentiment; this bias originated from its training dataset. To mitigate this bias, we developed a strategy that masked the LLM to gender-identifying words during both training and measurement of sentiment. This allowed us to identify small but significant biases in the study data itself, which revealed that entrustment ratings appeared to be less susceptible to bias than the language used to convey it. With respect to transparency, while our work in 2) enabled the deconstruction of an LM designed for a specific task (i.e. prediction of entrustment), larger LLMs used in generative AI (including GPT-4 and LLaMA) currently lack the ability to trace their output to its sources. Our ongoing work focuses on developing LLM-based strategies that support transparency in narrative analysis, and on developing theory to characterize the epistemology and limitations of knowledge both represented within and derived from LLMs.