Making a model: Part 3 - automated fitting via multicriterial genetic algorithm

In this section, we’ll go over basic principles of genetic algorithm (GA), its multicriterial/multiobjective variant (MGA), and why I think it is a good tool for the task of optimizing computer cellular models. Of course feel free to search internet for other webpages about GAs, as it is a technique relevant to many fields and may take different flavors depending on the application. If you’re using Matlab to run your cell simulations, there are nice implementations of GA (ga.m) and MGA (gamultiobj.m) which may be of use.

Let’s start with a simple, single-criterial genetic algorithm. This is an optimization technique which generates a population of creatures (encoding of candidate solutions to the problem) and then performs artificial evolution on them, gradually improving their quality according to set criteria. This is done by taking a starting population, and iterating the following three steps:

Generate offspring using crossover of the parents.
Perform mutations on the offspring.
Perform selection to create a new generation of creatures.

Creatures

Creatures typically encode a solution to a problem (which may or may not be a good solution). A simple toy problem might be finding x, y satisfying the system of inequalities:

A creature would be a pair of candidate values of x, y. We can call the vector encoding a creature “DNA” to maintain the biologically-inspired perspective.

In the context of model development, a creature’s DNA is usually a list of parameters of the cardiac model (or their multipliers [1]); these could be for example conductances of ionic currents [2]. When designing what a creature is, one has to take into account the following:

Including too few parameters can cause that the genetic algorithm may not evolve towards anything useful, because not all relevant parameters needed to achieve a good solution were chosen.
Including too many [3] parameters is also not great, because it slows down the convergence of the genetic algorithm. Also, if you include a really huge number of parameters, you may get overfitting.
It is also good to consider constraints for the parameters (most implementations do allow constraining the value of a particular parameter to a particular range). Are there any data to give us prior information of what a parameter can be? If yes, definitely enforce this – there are few things more annoying than getting a beautifully behaved model and then realizing that it cannot be used because some parameters are completely unrealistic. Also, having too wide ranges is likely to produce many solutions that perform poorly and the computational time of these is essentially wasted.

Fitness

The quality of a creature is measured using a fitness function, which takes a single creature’s DNA and returns a single number determining its quality (the smaller the better [4]).

For the toy problem in a previous section, the fitness could be defined as the distance from a feasible solution, summed over the two inequalities. E.g., DNA (2,2) leads to both left hand sides of the inequalities being 48 – this fails the first constraint and satisfies the second – so the penalty for the first constraint is 48, and 0 for the other one, producing a total fitness of 48. DNA (1,1) produces left hand sides 9 and -1, providing fitness of 10. DNA (-1,0) satisfies both inequalities and thus has fitness 0, being the best of the three.

The estimation of fitness of a computer model is more complex. It may involve simulating a cell model with the given parameters, measuring various properties or biomarkers. This may be, for example, the amplitude of the calcium transient, with the fitness being its similarity to experimentally observed values. Creatures evolving according to this fitness function should eventually achieve data-like calcium transient amplitude. In the ToR-ORd development, we used a more comprehensive fitness, which we’ll revisit in the section about multicriteriality.

It is good to carry out preliminary simulations testing how many beats are needed to reasonably closely approach a fairly stable behavior of the model within the fitness computation [5]. With too short simulation, the optimization runs faster, but it is possible that a model is obtained which works very nicely only after the short stimulation period, but not in the longer-term. On the other hand, with too long simulations, the optimizer can take ages to finish. When developing ToR-ORd, I found 50 beats to be a very bare minimum, 100 beats still not overly stable, 200 beats getting there, and 500 beats pretty stable. 150 beats were used in the end as a compromise value.

Crossover

This operator is one of hallmarks of genetic algorithms. To generate new creatures from the “parental” generation, two [6] creatures are selected, a random point in their “DNA” is chosen, and the offspring are created by swapping the parents around the point:

The rationale behind crossover is that different parts of the DNA may encode different subproblems, and by combining a creature that does one thing well with a creature that’s good at something else, we may get a creature good at both things. A concrete example with ToR-ORd could be e.g., calcium transient (which depends mainly on I_CaL, SERCA, and NCX) versus reaction to I_K1 block (which is mainly about the background currents during diastole) – as these two phenomena are pretty much independent, relying on different parts of DNA [7], they may be combined well to produce creatures good at both things at once. Out of curiosity, I tried to see how the genetic algorithm works for model development when the crossover is turned off – and it did converge much slower, strongly suggesting that for the task of model fitting, crossover is very useful.

Sidenote: Implementing crossover is straightforward when the DNA is a simple list of numbers that don’t have any particular dependence on the rest. However, this may not always be the case – for example, if a creature is a permutation encoding a candidate solution to the travelling salesman problem, a direct crossover would most likely generate creatures that are not valid permutations. For such cases, more refined and task-specific crossover operators have to be created (plenty of literature about that, e.g., on partially mapped crossover, edge recombinant crossover, etc. – see here for more.).

Mutation

The mutation operator receives each of the offspring produced by the crossover and applies a random mutation to each position with a given probability. When a mutation happens at a particular position, the value stored there is changed. A popular type of mutation is a Gaussian one, where a normally distributed number (zero-mean normal distribution) is added to the position.

There is always the question of how to select the mutation probability. With a low probability of mutation, truly new creatures are generated slowly, making the exploration of new creatures slow On the other hand, with a very high mutation probability, the exploration is fast (many new models are created), however, when a good model is found, it may be lost (by further mutations or selection) before it manages to reproduce. Beyond selecting one concrete value of mutation probability, this can be also selected in a time-dependent way (e.g., having high probability of mutation initially, reducing it with subsequent generations, acting similarly to simulated annealing) or dynamically (e.g., when the fitness stops improving, increase mutation probability to help generate new creatures).

An important point linked to mutations as well as to how exactly creatures encode a cardiac model is the problem that arises when multipliers (e.g., current multipliers) form the creature DNA (the solution is briefly discussed in our paper). Below is given the probability density function of a current multiplier of 1 after a zero-mean Gaussian mutation is applied. It is "symmetrical" in the way that the probability density at 0.5 is the same as at 1.5:

However, when multipliers are stored in DNA, this causes a profoundly asymmetrical mutation. In the plot above, all possible reduction in current density are squeezed in the interval 0-1 (we don’t consider negative channel conductance multipliers), but all the increases in current density range from 1 to infinity. E.g., the symmetrical case to a current multiplier 0.5 is 2 (halving vs doubling). However, the density at 2 is noticeably smaller than in 0.5. This problem is relevant both for mutations and generation of initial population (a population “symmetrically” generated around current multiplier of 1, such as 25-175%, will be biased towards current reduction).

The solution is, similarly to the problem of how can a group of Ewoks destroy an AT-ST, a log. If we represent the creature using DNA consisting of logarithms of multipliers (i.e., no change in current is encoded by 0), a Gaussian mutation becomes symmetrical. This follows from the relationship:

E.g. for x=0.5, this says that logarithms of 0.5 and of 2 differ only in their sign – their distance from 0 (no change in current multiplier) is the same, so the Gaussian probability density function of the mutation has the same value at both points, giving the desired symmetry.

The log-transform then of course warrants exponentiation of the DNA in the computation of fitness function to get valid current multipliers.

Selection

Once we have generated new creatures and perturbed them via mutations, it is time to carry out selection to create new generation. Typically, the crossover is used to generate twice as many creatures as is the size of the initial population, with the selection removing 50% to get back to the original population size.

The question is how to pick the 50% creatures that survive into the following generation. An immediately appealing answer is “take the creature which have top-50% fitness”. Similarly to many immediately appealing answers, this doesn’t work very well in practice. In a typical GA, high-fitness creatures are overrepresented in the population anyway (i.e., the population typically contains a large amount of near-identical high-fitness creatures). Consequently, always taking top 50% creatures leads to a rapid loss of diversity and the GA tends to get stuck in a local optimum.

Two popular alternatives are tournament and roulette. In a simple tournament, creatures in the pre-selection population are paired, and the creature with higher fitness proceeds to the next generation with a given probability (something like 0.6-0.8). In roulette, an imaginary roulette wheel is formed, where each creature has a bin proportionally wide to its fitness (high-fitness creatures have a wider bin). Then, a point is chosen randomly on the roulette wheel and the creature within which this falls is selected to the next population. This is repeated until a new population is fully filled. Both these approaches thus show a preference towards picking high-fitness creatures, but this preference is not exclusive – consequently, the GA better preserves population diversity, which is beneficial.

That said, things are not black and white, and within ying is yang and vice versa [8]. What if our problem has the property that really good solutions are surrounded by poor or even unfeasible solutions? If we painstakingly find a great solution after many generations and then lose it because it got unlucky in a tournament/roulette, it’s not great, as finding it may take aeons again. To counter this, elitism is often employed – and it is a modification of selection process which assures that x% top creatures (x is usually small, e.g., 1-5) are always passed into the next generation. Using this, the fitness can never decrease throughout a GA.

Multicriteriality

Above, we focused at the basic genetic algorithm which has a single fitness value for each creature. However, what if our problem is intrinsically multicriterial? E.g., we may want to optimize the shape of an action potential at the same time as the shape of a calcium transient. One option is to compute both qualities and then sum them with given weights to produce a single-number fitness. This tends to work fine, although it may require some work to find appropriate weighting. An alternative approach is to use a multicriterial/multiobjective GA (MGA) instead. We can formulate a fitness function so that it returns two or more fitness value and have the GA work with that. In such a case, the notion of “global optimality” tends to lose sense and is replaced by “Pareto-optimality”. We say that a creature dominates another creature if it is better in all the considered fitness dimensions. A creature is Pareto-optimal if it is not dominated by any other creature in a population. The collection of all Pareto-optimal creatures is termed a Pareto front. This usually looks as follows, visualizing a toy problem of how different cars perform on a rallye and a formula 1 track:

I.e., some cars can be very good at one type of track, can be fairly good at both at once, or can just be universally slow on both. In red are circled points lying on the Pareto front. In a MGA, the whole Pareto front is optimized towards, rather than a single value. This has an important advantage, and that is maintaining population diversity. Most population-based techniques (GA, Particle swarm optimization, etc.) tend to suffer from progressive loss of diversity in the evolved populations, which is due to the fact that all creatures optimize towards a single fitness value. Conversely, in MGA, there may be creatures “focusing” at being excellent in the first criterion, some other creatures on the second one, and some creatures focus at having a good balance of both criteria, while not being excellent in either. This produces a great pool of population information, which may be leveraged using the genetic operators. Interestingly, having a multicriterial fitness does not really complicate the GA very much. Fitness is used only in the process of selection, and e.g. the tournament can work similarly to the single-criterial fitness. A creature which dominates another may be selected for the next generation with a probability of 0.7, and if the creatures don’t dominate each other, it’s 50:50. Likewise, roulette can be easily adapted to 2D (or more-D), by forming a 2D grid where creatures correspond to rectangles (the better a creature is in a criterion, the longer the side-length in that dimension). A point in the 2D space is then picked, and the corresponding rectangle within which it falls is selected.

An important question is how many dimensions to choose, particularly if your task has many different criteria (e.g., model calibration to 10 features). In one extreme, each criterion corresponds to a single fitness dimension. In the other extreme, a weighted sum is used to aggregate all criteria into a single fitness number. Then there are intermediate cases, where a given number of criteria are aggregated into a lower number of fitness dimensions. I did some preliminary experimentation, and for the ToR-ORd development (a population of 2500 creatures, 30 generations), 2-dimensional fitness worked best by far, using weighted summation to fit the considered criteria into two fitness dimensions [9]. For 3 and more dimensions, I got many solutions that just were not great. While the points on the Pareto front are formally not comparable with regards to what’s better or worse, it is often the case that points at the extremes (very good in one criterion, while very bad in the other) are less useful, and one always looks for solutions that don’t suck in any criterion (e.g., there is not much point having a little better shape of action potential, if it means that the calcium transient amplitude changes from 400 nM to 4000 nM). Unfortunately, when a 3-dimensional fitness was used, it happened all too often that a point on the Pareto front would be very bad in one or two criteria – consequently, all these many points were evolving through the generations, but they were not doing anything overly useful. It was only fairly few creatures in the population that had sensible properties in all three fitness dimensions, but because the population of creatures was spread over a Pareto front with many elements (most of them useless), these few overall-good creatures didn’t have their surroundings explored as quickly and by as many creatures to achieve good performance of the MGA. That said, the appropriate number of dimensions will be very specific for the task and population size – the greater population, the more dimensions can be used with good success.

Summary: why MGAs work for model development

Crossover operator allows sharing partial solutions to the problem of finding a good model.
MGA implicitly preserves population diversity.
MGAs are easy to parallelize/distribute. Given that fitness computation (which involves simulating the model) is the main bottleneck, the overhead of parallelism is not as much of a problem as in other areas.

I do not claim that MGA is the only approach with such an advantages, nor that it is globally the best approach (I tried other methods too, of course, but one cannot test them all), only that it appeared to work well.

Supplementary subsection: Lamarckism x Darwinism

One interesting concept in the GA selection is Lamarckism versus Darwinism, mapping to the competing evolutionary theories of Lamarck and Darwin. While Darwin believed that evolution is driven primarily by natural selection, Lamarck proposed that physical traits obtained during one’s life can be passed to the offspring (i.e. the phenotype can affect the genotype in today’s language). The different approaches are often illustrated using a blacksmith, and why sons of blacksmiths are stronger than sons of weavers. In the Darwinist framework, this can be explained by blacksmiths being recruited from people who are strong. Given that the strength is driven by a particular genotype, the babies of blacksmiths have a higher probability of being strong again, because the underlying genotype gets passed to offspring. However, in the Lamarckist framework, blacksmith’s babies are strong because their fathers gained high strength and such gained traits can be passed to the offspring [10] (i.e., the effect is fairly direct, rather than mediated by underlying factors). While Darwinism is now the dominant theory of evolution and there is very little evidence of Lamarckism taking place, that does not mean that it cannot be used in GAs. An example of Lamarckist approach is the local search, which can be implemented as follows: When evaluating the fitness of a creature, the creature is replaced by its fittest version in its vicinity within the space of parameters (sort of like a blacksmith “knowing” he or she will be more successful when strong, and thus becoming such). I.e., similar but not-identical creatures are generated for one considered creature, and the best one is picked, along with its fitness. The advantage is that the locally best version of each creature is chosen, likely having higher fitness than if the local search was not used. On the other hand, this usually requires many more evaluations of fitness of creatures generated by the local search. I do not think it is practical for development of cardiac models, as the fitness computation is the main bottleneck. Such Lamarckist local search is of utility predominantly in cases where there are many unfeasible solutions in the parameter space. In such a case, crossover/mutations are quite likely to take good creatures and convert them to useless unfeasible ones, which is not great. A local search around the produced offspring can act as a repair operator, having a greater chance of finding a feasible (and even good) solution.

[1] See below a comment in the section on mutations, explaining why a logarithm of a multiplier is probably a better choice.

[2] At the same time, there are also other options. One thing that I did not try, but which is nevertheless potentially useful, would be having an integer number encoding which of multiple possible formulations of an ionic current is used – the evolution would then pick the most suitable current formulation implicitly.

[3] Indeed, these two points are in direct conflict. What I think is a sensible strategy is to guess the minimum number of parameters you think are important, try running the genetic algorithm, and if it does not produce anything useful, try expanding the list of parameters further.

[4] This is a general convention in optimization – the fitness usually corresponds to an error function, which is natural to be minimized. We could call fitness „unfitness“ instead, but given that the convention for GA is the term „fitness“, this would be potentially even more confusing.

[5] Also keep in mind there is no single model at this stage – the genetic algorithm will generate many different versions of the same model, which will have different distance of their stable state from the initial state which is usually determined in advance. This is also why it does not work to prepace the control model for aeons to get to a stable-ish state and then just simulate one beat in the genetic algorithm – the control model is already at the stable-ish state, but its perturbations are not.

[6] The approach may be also scaled to more creatures, using multiple crossover points. This is then termed gangbang crossover, which was, I believe, coined by Dr. Roman Neruda.

[7] Given how crossover works (pick a point in DNA and swap around) It may be useful to organize your creature’s DNA in a way that it has related concepts/current multipliers adjacent to each other. A computer model may thought of as consisting of modules associated with depolarization (I_Na, I_NaL, I_CaL, I_to), repolarization (I_Kr, I_Ks, I_K1), exchangers (I_NaCa, I_NaK), background currents (I_Nab, I_Cab,…), and SR calcium handling (J_rel, J_up). For the creatures to be able to share the complete modules efficiently, it’s good when the elements of each module are adjacent.

[8] Let us conveniently ignore the fact that yin and yang are depicted in black and white.

[9] One dimension measured AP morphology versus data, the second one aggregated calcium transient properties, IK1 block behavior, sodium block and contractility, and L-type calcium current block and APD shortening.

[10] It’s worth noting that while there is no obvious of mechanism of achieving this with today’s knowledge of genetics, both Lamarckism and Darwinism came well before the discovery of genes, and without the knowledge of mechanisms of heritability, it was not as clear which theory was more accurate.

Under lid

Search This Blog