Skip to main content

Making a model: Part 1 - Development strategy


This is just a short post about the criteria that one sets for the model to fulfill when making a model. In our paper, we decided to strictly separate criteria for model calibration (based on data that are used to develop the model) and validation (based on data not used in model creation that are used to assess it after the development has finished). Drawing a parallel to the world of machine learning, one could say that calibration criteria correspond to training data, while validation criteria correspond to testing [1] data. In the world of machine learning, the use of testing set to assess model quality is firmly established to the degree that it is hard to imagine that performance on training set would be reported. Returning from machine learning to the literature on cardiac computer modelling, it becomes rather apparent that the waters are much murkier here. Most modelling papers do not seem to specify whether the model was directly made to do the reported behaviours, or whether it is an independent validation in our paper’s sense. It is not helped by the fact that some papers seem to use the term “validation” as true independent validation, while in other papers, it’s used rather as “evaluation”, and it contains a mix of calibration, independent validation, and results where it is impossible to tell. Is this important at all though, or is it just a linguistic exercise without a deeper point? I would argue it is actually quite important and I as a reader would personally very much prefer to know precisely the extent of what is calibration and what is not. It is linked to how much one can hope the model to work for modelling phenomena it was not directly fitted to (which is arguably the main interesting application).

I believe that cardiac models are generally somewhat less prone to overfitting than machine learning models. They are constrained by data on their structure, by which currents and fluxes are included, and by the data underlying these – consequently it’s harder to produce an utterly nonsensical model which produces good behaviours [2]. At the same time, harder does not mean it’s hard. The problem with purely calibration-driven approach is that by adjusting a model to fulfill a calibration criterion, one may violate other desirable properties of the model or even other calibration criteria. This can be again “patched” by certain changes to the model, but these may again spawn other problems. If one fulfils all the calibration criteria, is it because the model is great, or because all the problems were simply moved outside the “observed range” of calibration criteria? If the latter, it’s what I’d call overfitting in the context of cardiac models. And the validation set of criteria is precisely what guards against this. If the calibration criteria were achieved at the cost of the model being nonsensical, the validation criteria are quite likely to point that out and that’s where their importance lies.

One extra point at the end - It’s good to mind differences in species, protocols, or conditions, when designing the calibration and validation criteria. Also, what is the heterogeneity of a particular feature between papers [3]? Traditionally, models are created in a way where a single model replicates multiple behaviours across different studies. When the model fails to replicate many papers at once, it is thought problematic. However, can we be sure that any living cell from an experimental study would tick all the boxes from other studies? Not really.  An approach that nicely appreciates and tackles this issue is Populations of Models, where a baseline cardiac model is used to generate a population of its clones with changes in conductances of ionic currents or other properties. This is then typically calibrated to experimental data (e.g., on action potential duration [4]) to make sure that grossly unrealistic models are omitted. Thus, some form of heterogeneity is achieved and one model does not have to fulfil the potentially unrealistic requirement of replicating every single possible behaviour.





[1] It is slightly unfortunate that validation criteria don’t really correspond to validation data in machine learning, which may cause confusion. However, validation of a computer model is an already established term and it makes sense to stick to this tradition.
[2] The model may be also overconstrained to the degree that it’s even hard to achieve the good behaviours in the first place. Here, I have to mention the statement “you can fit these models to anything” that I’ve heard from multiple experimental researchers I talked to about computer models. It was particularly interesting in one stage of ToR-ORd development, where I was seriously stuck, health was so-so, and simply nothing worked to achieve the calibration criteria (Part 4 describes what was happening and how it was solved). And when having a meeting with an experimental physiologist about a different project at this miserable stage of my work, I heard him say “you can fit these models to anything” in a fairly dismissive way. I’m still proud to this day that that I managed to contain the agony-filled scream of mad desperate laughter inside my head.
[3] For example, the calcium transient amplitude in different studies in human was reported to be ca. 350 nM (Coppini et al. 2013), or over 800 nM (Piacentino III et al. 2012), so a model strongly fitted to one of these datasets would fail miserably on the other one. Another example is the slope of S1-S2 restitution, where some articles report steep curves, with other articles reporting flat ones. It’s thus good to be aware of criteria where literature doesn’t reach a quantitative consensus and not focus on these overly.
[4] That said, we don’t have enough data on the true biological variability of ionic channel conductances etc. at the moment. Therefore, there is no guarantee that the simulated heterogeneity corresponds to heterogeneity which might arise from biological diversity and/or noise following from experimental measurements

Comments

Popular posts from this blog

Several tips on junior fellowship applications

It turns out I was fortunate enough to receive the Sir Henry Fellowship from Wellcome Trust. This is a four-year international fellowship that will allow me to spend some time at UC Davis, California as well as University of Oxford, focusing on interdisciplinary investigation of diabetic arrhythmogenesis. It was certainly an interesting and person-developing experience (obviously viewed more favourably given the outcome). I had the advantage of working under/with highly successful people who gave me valuable advice about the process and requirements. I am quite sure that  I would not have gotten the fellowship without the support of Profs. Manuela Zaccolo, Blanca Rodriguez, and Don Bers, to whom I'm deeply grateful. However, not everyone has such nice and investing-in-you supervisors and beyond very generic advice, there is very little information on the internet on what the process of applying for junior fellowship entails [1]. The aim of this text is to share some findings I ma

Making a model: Part 4 - manual changes (IKr)

Background In the last section, we’ve seen how MGAs may be used to develop a model. However, when using them, it is quite possible you may get a plot of fitness of creatures in the population as follows (apologies for its ugly printscreen nature; it comes from a quick snap into a digital logbook, as is the case with most figures in this section): On the x-axis is the fitness component corresponding to action potential morphology, on the y-axis is the compound value aggregating other criteria. You can see the tradeoff between the two dimensions easily. In case you’re wondering about what is the green rectangular object, it shows the zone that corresponds to what I thought would be a reasonably good solution at the time. Of course, nothing inside, nor even close :(  This means one of two things; either there is a model parameter that we are missing in the creature DNA (good luck finding it among the hundreds of possible numbers in the model), or there is a structural problem