Making a model: Part 1 - Development strategy

This is just a short post about the criteria that one sets for the model to fulfill when making a model. In our paper, we decided to strictly separate criteria for model calibration (based on data that are used to develop the model) and validation (based on data not used in model creation that are used to assess it after the development has finished). Drawing a parallel to the world of machine learning, one could say that calibration criteria correspond to training data, while validation criteria correspond to testing [1] data. In the world of machine learning, the use of testing set to assess model quality is firmly established to the degree that it is hard to imagine that performance on training set would be reported. Returning from machine learning to the literature on cardiac computer modelling, it becomes rather apparent that the waters are much murkier here. Most modelling papers do not seem to specify whether the model was directly made to do the reported behaviours, or whether it is an independent validation in our paper’s sense. It is not helped by the fact that some papers seem to use the term “validation” as true independent validation, while in other papers, it’s used rather as “evaluation”, and it contains a mix of calibration, independent validation, and results where it is impossible to tell. Is this important at all though, or is it just a linguistic exercise without a deeper point? I would argue it is actually quite important and I as a reader would personally very much prefer to know precisely the extent of what is calibration and what is not. It is linked to how much one can hope the model to work for modelling phenomena it was not directly fitted to (which is arguably the main interesting application).

I believe that cardiac models are generally somewhat less prone to overfitting than machine learning models. They are constrained by data on their structure, by which currents and fluxes are included, and by the data underlying these – consequently it’s harder to produce an utterly nonsensical model which produces good behaviours [2]. At the same time, harder does not mean it’s hard. The problem with purely calibration-driven approach is that by adjusting a model to fulfill a calibration criterion, one may violate other desirable properties of the model or even other calibration criteria. This can be again “patched” by certain changes to the model, but these may again spawn other problems. If one fulfils all the calibration criteria, is it because the model is great, or because all the problems were simply moved outside the “observed range” of calibration criteria? If the latter, it’s what I’d call overfitting in the context of cardiac models. And the validation set of criteria is precisely what guards against this. If the calibration criteria were achieved at the cost of the model being nonsensical, the validation criteria are quite likely to point that out and that’s where their importance lies.

One extra point at the end - It’s good to mind differences in species, protocols, or conditions, when designing the calibration and validation criteria. Also, what is the heterogeneity of a particular feature between papers [3]? Traditionally, models are created in a way where a single model replicates multiple behaviours across different studies. When the model fails to replicate many papers at once, it is thought problematic. However, can we be sure that any living cell from an experimental study would tick all the boxes from other studies? Not really. An approach that nicely appreciates and tackles this issue is Populations of Models, where a baseline cardiac model is used to generate a population of its clones with changes in conductances of ionic currents or other properties. This is then typically calibrated to experimental data (e.g., on action potential duration [4]) to make sure that grossly unrealistic models are omitted. Thus, some form of heterogeneity is achieved and one model does not have to fulfil the potentially unrealistic requirement of replicating every single possible behaviour.

[1] It is slightly unfortunate that validation criteria don’t really correspond to validation data in machine learning, which may cause confusion. However, validation of a computer model is an already established term and it makes sense to stick to this tradition.

[2] The model may be also overconstrained to the degree that it’s even hard to achieve the good behaviours in the first place. Here, I have to mention the statement “you can fit these models to anything” that I’ve heard from multiple experimental researchers I talked to about computer models. It was particularly interesting in one stage of ToR-ORd development, where I was seriously stuck, health was so-so, and simply nothing worked to achieve the calibration criteria (Part 4 describes what was happening and how it was solved). And when having a meeting with an experimental physiologist about a different project at this miserable stage of my work, I heard him say “you can fit these models to anything” in a fairly dismissive way. I’m still proud to this day that that I managed to contain the agony-filled scream of mad desperate laughter inside my head.

[3] For example, the calcium transient amplitude in different studies in human was reported to be ca. 350 nM (Coppini et al. 2013), or over 800 nM (Piacentino III et al. 2012), so a model strongly fitted to one of these datasets would fail miserably on the other one. Another example is the slope of S1-S2 restitution, where some articles report steep curves, with other articles reporting flat ones. It’s thus good to be aware of criteria where literature doesn’t reach a quantitative consensus and not focus on these overly.

[4] That said, we don’t have enough data on the true biological variability of ionic channel conductances etc. at the moment. Therefore, there is no guarantee that the simulated heterogeneity corresponds to heterogeneity which might arise from biological diversity and/or noise following from experimental measurements

Under lid

Search This Blog

Making a model: Part 1 - Development strategy

Comments

Post a Comment

Popular posts from this blog

Several tips on junior fellowship applications

Making a Model: Part 0 - Introduction

Making a model: Part 4 - manual changes (IKr)