A computationalist view of selection
A high-level review of some parallels between natural selection and gradient descent
Gradient descent - in particular, Stochastic gradient descent, in the context of human evolution can be thought of as analogous to natural selection. Given the large dataset that would seemingly encompass genomic scale data for multiple species, we would not want to use a simple gradient descent algorithm as it would tend to scale linearly, growing in size and complexity at every independent iteration, spurring ostensibly higher dimensional tensors that would encapsulate all of the emergent genetic diversity. One could see how a stochastic approach would be a more efficient means for nature to achieve optimal selection as efficiency is highly valuable to reduce entropy in our closed ecological system engine.
We should probably define the key terms for those unfamiliar. Gradient descent is a mathematical optimization method that entails iteratively backpropagating through the negative gradient of an objective function, until a local minimum is reached upon convergence over the highly non-convex space of said negative objective function, indicating in most cases, an optimal theta, which represents the weights being searched for, have been closely approximated.
This is done to maximize the objective function in terms of reward/utility modeling in the reinforcement domain, which can again be translated into nature optimizing for the set of traits that would result in a higher relative fitness, which in this context, entails a reward function with output contained in {0,1} - with 1 signaling positive reward as a critic on the part of the selection algorithm, goading to the relative fitness score. This occurs over a mainly continuous space beckoning its differentiable nature. Generally, I would have liked to define this gradient descent concept on a deeper - more technical level, but given the fact that substack doesn’t seem to have support for any sort of mathematical notation tools, that can not really work. Also, this post is not necessarily concerned with the low-level mathematical trivialities but more with the high-level parallels that can be found between these seemingly unrelated disciplines. It is more of a “thought-jerker” than an actual academic paper that would warrant deeper, often more tedious conceptual breakdowns. I could attach screenshots of the mathematical explanations, but that doesn't seem to be a good option to me either, I might as well just attach a link for those interested to dig further.
One can find parallels in relation to natural selection, functioning by traversing over a high dimensional continuous space of trait/phenotype expressions in nature, with the highly non-convex space of the objective function entailing an optimal relative fitness score given the environment. As nature’s optimization algorithm it ensures objective function maximization, which, in this context, is analogous to a greater relative fitness score. Given the continuous nature of evolution, the local minima would be continually changing, being subject to a myriad of extraneous variables - iteratively, for millennia, as has been the case since the pre-Cambrian. This idea can be broken down into varying levels of granularity given the organization of life in living organisms. A bottom-up approach entails viewing this from the level of the molecular make-up of nucleotides in the DNA; the DNA sequences themselves; and the prevalence of favorable alleles in the gene pool.
Given the contextual framework that has been laid out, one could see how genetic diversity comes about as an emergent property of the objective function. Evolutionary forces such as drift, mutation, and recombination manifest themselves as the products of the convergent nature of allele and gene “fit” in a given population.
learning rate as a precursor to allele/gene fixation in a population
Learning rates, in the context of mathematics and machine learning, entails a primary hyperparameter that determines the extent to which weights are changed given the discrepancy between ground truth and model output after optimization via stochastic gradient descent. In other words, the learning rate determines how much one must tweak their weights given the output of optimization. One can see how this can be analogized as a parallel to allele/gene fixation in terms of evolutionary biology. One could also posit an emergent property of learning rates in the context of evolutionary biology necessitates allele/gene fixation - hopefully conferring additive traits.
Evolutionary biologists are generally partitioned in regards to the mode/tempo of evolutionary change over time. There is the gradualist school of thought which posits that morphological evolutionary change, on the scale of species and populations occurs in a “slow and steady” manner, meaning that it takes a big chunk of time for morphological changes due to selection and other evolutionary forces to be expressed in any given population. If you were to make parallels between this and a computationalist view of evolutionary biology, one could see how lower learning rates, which tend to bring about longer training epochs, would result in convergence over time. A higher number of epochs entails a long wait time for fixation in the population. The punctuated equilibria perspective is one that posits periods of stasis punctuated by brief periods of rapid change. In the event that the learning rate is optimal and/or weights are approximated to a very close degree, one could see how rapid changes would occur.
This computationalist view of evolutionary biology is obviously based primarily on conjecture, with little empirical evidence asides from the parallels one can draw by themselves given a fairly shallow understanding of both of the main concepts being addressed. As a disclaimer - please do not take any of my ideas too seriously, this is just me fleshing out my mode of understanding concepts, by analogizing and attempting to ground them in a myriad of fairly unrelated disciplines. Given more time, I would like to write more on this topic, but due to prior engagements and other factors I can’t control at the moment, I simply do not have the time to dive into this topic in a deeper - more detailed fashion as I would ideally prefer, so I guess this shallow rendition would have to suffice for now.