From Black Box to Transparency: Pulling the Curtain Back on Machine Learning

By Michelle Marlborough, Chief Product Officer

The introduction of Good Machine Learning Practices (GMLP) and increasing buzz around the need for transparency and standardization of machine learning (ML) are significant steps to encourage adoption and trust in these tools across the healthcare industry. To do so, though, requires shifting these ideals from mere concepts into actionable processes without setting an unrealistic bar for developers. Before we can expect progress, we must clearly define what we mean when we talk about transparency of ML, and articulate the practicality of doing so without leaking proprietary information. 

Understanding the Breadth of Machine Learning 

We need to create a standard for what can and should be traced within an ML model. This starts by expanding how we define ML. End users of AI in healthcare – clinicians, patients, pharmaceutical sponsors, and more – can be skeptical of AI’s mysterious nature, often viewing it as a “magic” tool with unexplainable logic behind its output. The reality is that the algorithm, or the “magic” piece, is only one small part of machine learning, and the holistic performance of a model is not solely mirrored by the algorithm itself. There is a system built around it that includes how we collect data, how an algorithm is trained and tested and what data sets are chosen for each, what hypothesis the developers set out to solve, what generated a specific output, and so on. This set of processes is more telling of a model’s quality and accuracy than examining the algorithm’s code.

Without this differentiation of the algorithm from the system, it is easy to assume that when we talk about transparency into a model’s development, it means giving away proprietary information about the algorithm’s code and “magic.” Instead, we aim to gain visibility into the controls around the algorithm. By tracing a system’s ways of working, we can help eliminate this shroud of mystery that deters many AI users, all the while safeguarding intellectual property.

Traceability in Practice

Traceability of a model’s architecture is not only doable, but it’s essential. Developers need to be able to show the workflow of a system and how one component flows into the next. As regulations evolve beyond GMLP, there could come a day where developers are required to deliver audit trail-like reports that act as a seal of approval for a model.  As part of this, developers will need to trace the pedigree of an ML model that spells out the inner workings of a model’s system to prove it is built in a quality manner for its intended patient population. When doing so, it is important to remember that many of the end users of ML don’t live and breathe it every day, so there’s a fine balance between offering visibility versus information overload. To effectively improve users’ confidence, the ML and AI industry needs to build a bridge between the complexities of data science and generalized usability for those outside the AI field, ensuring any report-outs are digestible and easy to understand.

Part of building industry-wide trust also involves an ongoing responsibility to the ML one puts out in the world. An algorithm may perform perfectly on a predetermined dataset that seems representative of real-world populations, but there needs to be an obligation to monitor whether it is being used within those parameters once implemented in the real world. Just like software companies commit to upgrading users with the latest software update, there needs to be an investment in tracing a model’s performance and refining it accordingly. This ongoing, real-time monitoring will help demonstrate that the way a model was developed according to the “pedigree report” matches how it works in reality, and help orchestrate a system of checks and balances to hold developers accountable. 

Mirroring Our Own Healthcare Industry

The FDA’s robust evidence requirements around drug development means pharmaceutical companies are responsible for attentively tracing trial data and design to make the case for a drug’s efficacy. The industry’s ML and AI should mirror this vigilance and make the case for the safety and accuracy of their models, especially when a model’s output could impact a patient care decision. Only good can come from being more transparent about our work in this fast-moving industry. It will not only advance broader understanding across those who might be hesitant, but also encourage collaboration, advance best practices and spur innovation.