In this
section, we’ll go over basic principles of genetic algorithm (GA), its
multicriterial/multiobjective variant (MGA), and why I think it is a good tool
for the task of optimizing computer cellular models. Of course feel free to
search internet for other webpages about GAs, as it is a technique relevant to
many fields and may take different flavors depending on the application. If
you’re using Matlab to run your cell simulations, there are nice implementations
of GA (ga.m) and MGA (gamultiobj.m) which may be of use.
Let’s start
with a simple, single-criterial genetic algorithm. This is an optimization
technique which generates a population of creatures
(encoding of candidate solutions to the problem) and then performs artificial
evolution on them, gradually improving their quality according to set criteria.
This is done by taking a starting population, and iterating the following three
steps:
- Generate offspring using crossover of the parents.
- Perform mutations on the offspring.
- Perform selection to create a new generation of creatures.
Creatures
Creatures
typically encode a solution to a problem (which may or may not be a good
solution). A simple toy problem might be finding x, y satisfying the system of
inequalities:
A creature would
be a pair of candidate values of x, y.
We can call the vector encoding a creature “DNA” to maintain the
biologically-inspired perspective.
In the context of
model development, a creature’s DNA is usually a list of parameters of the
cardiac model (or their multipliers [1]);
these could be for example conductances of ionic currents [2].
When designing what a creature is, one has to take into account the following:
- Including too few parameters can cause that the genetic algorithm may not evolve towards anything useful, because not all relevant parameters needed to achieve a good solution were chosen.
- Including too many [3] parameters is also not great, because it slows down the convergence of the genetic algorithm. Also, if you include a really huge number of parameters, you may get overfitting.
- It is also good to consider constraints for the parameters (most implementations do allow constraining the value of a particular parameter to a particular range). Are there any data to give us prior information of what a parameter can be? If yes, definitely enforce this – there are few things more annoying than getting a beautifully behaved model and then realizing that it cannot be used because some parameters are completely unrealistic. Also, having too wide ranges is likely to produce many solutions that perform poorly and the computational time of these is essentially wasted.
Fitness
The quality
of a creature is measured using a fitness
function, which takes a single creature’s DNA and returns a single number
determining its quality (the smaller the better [4]).
For the toy
problem in a previous section, the fitness could be defined as the distance
from a feasible solution, summed over the two inequalities. E.g., DNA (2,2)
leads to both left hand sides of the inequalities being 48 – this fails the
first constraint and satisfies the second – so the penalty for the first
constraint is 48, and 0 for the other one, producing a total fitness of 48. DNA
(1,1) produces left hand sides 9 and -1, providing fitness of 10. DNA (-1,0)
satisfies both inequalities and thus has fitness 0, being the best of the
three.
The
estimation of fitness of a computer model is more complex. It may involve simulating
a cell model with the given parameters, measuring various properties or
biomarkers. This may be, for example, the amplitude of the calcium transient, with
the fitness being its similarity to experimentally observed values. Creatures
evolving according to this fitness function should eventually achieve data-like
calcium transient amplitude. In the ToR-ORd development, we used a more
comprehensive fitness, which we’ll revisit in the section about
multicriteriality.
It is good
to carry out preliminary simulations testing how many beats are needed to
reasonably closely approach a fairly stable behavior of the model within the
fitness computation [5].
With too short simulation, the optimization runs faster, but it is possible
that a model is obtained which works very nicely only after the short
stimulation period, but not in the longer-term. On the other hand, with too
long simulations, the optimizer can take ages to finish. When developing
ToR-ORd, I found 50 beats to be a very bare minimum, 100 beats still not overly
stable, 200 beats getting there, and 500 beats pretty stable. 150 beats were
used in the end as a compromise value.
Crossover
This
operator is one of hallmarks of genetic algorithms. To generate new creatures
from the “parental” generation, two [6] creatures are selected, a random point in their “DNA” is chosen, and the
offspring are created by swapping the parents around the point:
The
rationale behind crossover is that different parts of the DNA may encode
different subproblems, and by combining a creature that does one thing well
with a creature that’s good at something else, we may get a creature good at
both things. A concrete example with ToR-ORd could be e.g., calcium transient
(which depends mainly on ICaL, SERCA, and NCX) versus reaction to IK1 block (which is mainly about the background currents during diastole) – as
these two phenomena are pretty much independent, relying on different parts of
DNA [7],
they may be combined well to produce creatures good at both things at once. Out
of curiosity, I tried to see how the genetic algorithm works for model
development when the crossover is turned off – and it did converge much slower,
strongly suggesting that for the task of model fitting, crossover is very
useful.
Sidenote: Implementing
crossover is straightforward when the DNA is a simple list of numbers that
don’t have any particular dependence on the rest. However, this may not always
be the case – for example, if a creature is a permutation encoding a candidate
solution to the travelling salesman problem, a direct crossover would most
likely generate creatures that are not valid permutations. For such cases, more
refined and task-specific crossover operators have to be created (plenty of
literature about that, e.g., on partially mapped crossover, edge recombinant
crossover, etc. – see here for more.).
Mutation
The
mutation operator receives each of the offspring produced by the crossover and
applies a random mutation to each position with a given probability. When a
mutation happens at a particular position, the value stored there is changed. A
popular type of mutation is a Gaussian one, where a normally distributed number
(zero-mean normal distribution) is added to the position.
There is
always the question of how to select the mutation probability. With a low
probability of mutation, truly new creatures are generated slowly, making the
exploration of new creatures slow On the other hand, with a very high mutation
probability, the exploration is fast (many new models are created), however,
when a good model is found, it may be lost (by further mutations or selection)
before it manages to reproduce. Beyond selecting one concrete value of mutation
probability, this can be also selected in a time-dependent way (e.g., having
high probability of mutation initially, reducing it with subsequent
generations, acting similarly to simulated annealing) or dynamically (e.g., when
the fitness stops improving, increase mutation probability to help generate new
creatures).
An
important point linked to mutations as well as to how exactly creatures encode
a cardiac model is the problem that arises when multipliers (e.g., current
multipliers) form the creature DNA (the solution is briefly discussed in our
paper). Below is given the probability density function of a current multiplier
of 1 after a zero-mean Gaussian mutation is applied. It is "symmetrical" in the
way that the probability density at 0.5
is the same as at 1.5:
However,
when multipliers are stored in DNA, this causes a profoundly asymmetrical
mutation. In the plot above, all possible reduction in current density are
squeezed in the interval 0-1 (we don’t consider negative channel conductance
multipliers), but all the increases in current density range from 1 to
infinity. E.g., the symmetrical case to a current multiplier 0.5 is 2 (halving
vs doubling). However, the density at 2 is noticeably smaller than in 0.5. This
problem is relevant both for mutations and generation of initial population (a
population “symmetrically” generated around current multiplier of 1, such as
25-175%, will be biased towards current reduction).
The
solution is, similarly to the problem of how can a group of Ewoks destroy an
AT-ST, a log. If we represent the creature using DNA consisting of logarithms
of multipliers (i.e., no change in current is encoded by 0), a Gaussian
mutation becomes symmetrical. This follows from the relationship:
E.g. for x=0.5,
this says that logarithms of 0.5 and of 2 differ only in their sign – their
distance from 0 (no change in current multiplier) is the same, so the Gaussian probability
density function of the mutation has the same value at both points, giving the
desired symmetry.
The log-transform
then of course warrants exponentiation of the DNA in the computation of fitness
function to get valid current multipliers.
Selection
Once we
have generated new creatures and perturbed them via mutations, it is time to
carry out selection to create new generation. Typically, the crossover is used
to generate twice as many creatures as is the size of the initial population,
with the selection removing 50% to get back to the original population size.
The
question is how to pick the 50% creatures that survive into the following
generation. An immediately appealing answer is “take the creature which have
top-50% fitness”. Similarly to many immediately appealing answers, this doesn’t
work very well in practice. In a typical GA, high-fitness creatures are
overrepresented in the population anyway (i.e., the population typically
contains a large amount of near-identical high-fitness creatures).
Consequently, always taking top 50% creatures leads to a rapid loss of
diversity and the GA tends to get stuck in a local optimum.
Two popular
alternatives are tournament and roulette. In a simple tournament,
creatures in the pre-selection population are paired, and the creature with
higher fitness proceeds to the next generation with a given probability (something
like 0.6-0.8). In roulette, an imaginary roulette wheel is formed, where each
creature has a bin proportionally wide to its fitness (high-fitness creatures
have a wider bin). Then, a point is chosen randomly on the roulette wheel and
the creature within which this falls is selected to the next population. This
is repeated until a new population is fully filled. Both these approaches thus
show a preference towards picking high-fitness creatures, but this preference
is not exclusive – consequently, the GA better preserves population diversity,
which is beneficial.
That said,
things are not black and white, and within ying is yang and vice versa [8]. What if our problem has the property that really good solutions are
surrounded by poor or even unfeasible solutions? If we painstakingly find a
great solution after many generations and then lose it because it got unlucky
in a tournament/roulette, it’s not great, as finding it may take aeons again.
To counter this, elitism is often
employed – and it is a modification of selection process which assures that x%
top creatures (x is usually small, e.g., 1-5) are always passed into the next
generation. Using this, the fitness can never decrease throughout a GA.
Multicriteriality
Above, we
focused at the basic genetic algorithm which has a single fitness value for
each creature. However, what if our problem is intrinsically multicriterial?
E.g., we may want to optimize the shape of an action potential at the same time
as the shape of a calcium transient. One option is to compute both qualities
and then sum them with given weights to produce a single-number fitness. This
tends to work fine, although it may require some work to find appropriate
weighting. An alternative approach is to use a multicriterial/multiobjective GA
(MGA) instead. We can formulate a fitness function so that it returns two or
more fitness value and have the GA work with that. In such a case, the notion
of “global optimality” tends to lose sense and is replaced by
“Pareto-optimality”. We say that a creature dominates
another creature if it is better in all the considered fitness dimensions. A
creature is Pareto-optimal if it is
not dominated by any other creature in a population. The collection of all Pareto-optimal
creatures is termed a Pareto front. This
usually looks as follows, visualizing a toy problem of how different cars
perform on a rallye and a formula 1 track:
I.e., some
cars can be very good at one type of track, can be fairly good at both at once,
or can just be universally slow on both. In red are circled points lying on the
Pareto front. In a MGA, the whole Pareto front is optimized towards, rather
than a single value. This has an important advantage, and that is maintaining
population diversity. Most population-based techniques (GA, Particle swarm
optimization, etc.) tend to suffer from progressive loss of diversity in the
evolved populations, which is due to the fact that all creatures optimize
towards a single fitness value. Conversely, in MGA, there may be creatures
“focusing” at being excellent in the first criterion, some other creatures on
the second one, and some creatures focus at having a good balance of both
criteria, while not being excellent in either. This produces a great pool of
population information, which may be leveraged using the genetic operators.
Interestingly, having a multicriterial fitness does not really complicate the
GA very much. Fitness is used only in the process of selection, and e.g. the
tournament can work similarly to the single-criterial fitness. A creature which
dominates another may be selected for the next generation with a probability of
0.7, and if the creatures don’t dominate each other, it’s 50:50. Likewise, roulette
can be easily adapted to 2D (or more-D), by forming a 2D grid where creatures
correspond to rectangles (the better a creature is in a criterion, the longer
the side-length in that dimension). A point in the 2D space is then picked, and
the corresponding rectangle within which it falls is selected.
An
important question is how many dimensions to choose, particularly if your task
has many different criteria (e.g., model calibration to 10 features). In one
extreme, each criterion corresponds to a single fitness dimension. In the other
extreme, a weighted sum is used to aggregate all criteria into a single fitness
number. Then there are intermediate cases, where a given number of criteria are
aggregated into a lower number of fitness dimensions. I did some preliminary
experimentation, and for the ToR-ORd development (a population of 2500
creatures, 30 generations), 2-dimensional fitness worked best by far, using
weighted summation to fit the considered criteria into two fitness dimensions [9].
For 3 and more dimensions, I got many solutions that just were not great. While
the points on the Pareto front are formally not comparable with regards to
what’s better or worse, it is often the case that points at the extremes (very
good in one criterion, while very bad in the other) are less useful, and one
always looks for solutions that don’t suck in any criterion (e.g., there is not
much point having a little better shape of action potential, if it means that
the calcium transient amplitude changes from 400 nM to 4000 nM). Unfortunately,
when a 3-dimensional fitness was used, it happened all too often that a point
on the Pareto front would be very bad in one or two criteria – consequently,
all these many points were evolving through the generations, but they were not
doing anything overly useful. It was only fairly few creatures in the
population that had sensible properties in all three fitness dimensions, but
because the population of creatures was spread over a Pareto front with many
elements (most of them useless), these few overall-good creatures didn’t have
their surroundings explored as quickly and by as many creatures to achieve good
performance of the MGA. That said, the appropriate number of dimensions will be
very specific for the task and population size – the greater population, the
more dimensions can be used with good success.
Summary: why MGAs work for
model development
- Crossover operator allows sharing partial solutions to the problem of finding a good model.
- MGA implicitly preserves population diversity.
- MGAs are easy to parallelize/distribute. Given that fitness computation (which involves simulating the model) is the main bottleneck, the overhead of parallelism is not as much of a problem as in other areas.
I do not
claim that MGA is the only approach with such an advantages, nor that it is
globally the best approach (I tried other methods too, of course, but one
cannot test them all), only that it appeared to work well.
Supplementary subsection:
Lamarckism x Darwinism
One
interesting concept in the GA selection is Lamarckism versus Darwinism, mapping
to the competing evolutionary theories of Lamarck and Darwin. While Darwin
believed that evolution is driven primarily by natural selection, Lamarck
proposed that physical traits obtained during one’s life can be passed to the
offspring (i.e. the phenotype can affect the genotype in today’s language). The
different approaches are often illustrated using a blacksmith, and why sons of
blacksmiths are stronger than sons of weavers. In the Darwinist framework, this
can be explained by blacksmiths being recruited from people who are strong.
Given that the strength is driven by a particular genotype, the babies of
blacksmiths have a higher probability of being strong again, because the
underlying genotype gets passed to offspring. However, in the Lamarckist
framework, blacksmith’s babies are strong because their fathers gained high
strength and such gained traits can be passed to the offspring [10] (i.e., the effect is fairly direct, rather than mediated by underlying
factors). While Darwinism is now the dominant theory of evolution and there is
very little evidence of Lamarckism taking place, that does not mean that it
cannot be used in GAs. An example of Lamarckist approach is the local search,
which can be implemented as follows: When evaluating the fitness of a creature,
the creature is replaced by its fittest version in its vicinity within the
space of parameters (sort of like a blacksmith “knowing” he or she will be more
successful when strong, and thus becoming such). I.e., similar but
not-identical creatures are generated for one considered creature, and the best
one is picked, along with its fitness. The advantage is that the locally best
version of each creature is chosen, likely having higher fitness than if the
local search was not used. On the other hand, this usually requires many
more evaluations of fitness of creatures generated by the local search. I do
not think it is practical for development of cardiac models, as the fitness
computation is the main bottleneck. Such Lamarckist local search is of utility
predominantly in cases where there are many unfeasible solutions in the
parameter space. In such a case, crossover/mutations are quite likely to take
good creatures and convert them to useless unfeasible ones, which is not great.
A local search around the produced offspring can act as a repair operator,
having a greater chance of finding a feasible (and even good) solution.
[1] See below a
comment in the section on mutations, explaining why a logarithm of a multiplier
is probably a better choice.
[2] At the same
time, there are also other options. One thing that I did not try, but which is
nevertheless potentially useful, would be having an integer number encoding
which of multiple possible formulations of an ionic current is used – the
evolution would then pick the most suitable current formulation implicitly.
[3] Indeed, these two points are in direct
conflict. What I think is a sensible strategy is to guess the minimum number of
parameters you think are important, try running the genetic algorithm, and if
it does not produce anything useful, try expanding the list of parameters
further.
[4] This is a general
convention in optimization – the fitness usually corresponds to an error
function, which is natural to be minimized. We could call fitness „unfitness“
instead, but given that the convention for GA is the term „fitness“, this would
be potentially even more confusing.
[5] Also keep
in mind there is no single model at this stage – the genetic algorithm will
generate many different versions of the same model, which will have different
distance of their stable state from the initial state which is usually
determined in advance. This is also why it does not work to prepace the control
model for aeons to get to a stable-ish state and then just simulate one beat in
the genetic algorithm – the control model is already at the stable-ish state,
but its perturbations are not.
[6] The approach may be also scaled to more creatures, using multiple
crossover points. This is then termed gangbang
crossover, which was, I believe, coined by Dr. Roman Neruda.
[7] Given how crossover works (pick a point in DNA and swap around) It may be useful to organize your
creature’s DNA in a way that it has related concepts/current multipliers
adjacent to each other. A computer model may thought of as consisting of modules associated with depolarization
(INa, INaL, ICaL, Ito), repolarization
(IKr, IKs, IK1), exchangers (INaCa,
INaK), background currents (INab, ICab,…), and
SR calcium handling (Jrel, Jup). For the creatures to be
able to share the complete modules efficiently, it’s good when the elements of
each module are adjacent.
[8] Let us
conveniently ignore the fact that yin and yang are depicted in black and white.
[9] One
dimension measured AP morphology versus data, the second one aggregated calcium
transient properties, IK1 block behavior, sodium block and contractility, and
L-type calcium current block and APD shortening.
[10] It’s worth
noting that while there is no obvious of mechanism of achieving this with
today’s knowledge of genetics, both Lamarckism and Darwinism came well before
the discovery of genes, and without the knowledge of mechanisms of
heritability, it was not as clear which theory was more accurate.
Comments
Post a Comment