Machine Learning — A historical perspective

“Historical perspective differs from history in that the object of historical perceptive is to sharpen one’s vision of the present not the past”. Barbara S. Lawrence


Figure 1: Succession of different periods of data-based predictions and ML, spanning over a period of 5000 years.

The historical perspective of ML spans from the earliest predictive models, before writing, to today’s cognitive models, mimicking Human thought processes. Understanding ML’s past can be traced centuries back or even farther back in time to the origin of syllogism by Aristotle 384 BC–322 BC (logic), or to the time of Euclid’s work on mathematical geometry titled “The Elements”, or even back to the time of early known predictive models, such as those used in the accurate predictions of eclipses, thousands of years ago.

midwinter (S)

Figure 2: Illustration from University of Singapore showing markers around the Earth (marker) with the ‘Moon-marker’ as well as the ‘Sun-marker’. The Moon marker when displaced each day can predict accurately in advance the dates of moon lunation. (PREDICTING ECLIPSES WITH THE STONEHENGE — NUS)

3000 BC — 700 AC

Originally the Stonehenge was considered as an ancient observatory to mark midsummer and midwinter. However, the recent theories suggest it was used for predicting solar and lunar eclipses year-round [4] . The Stonehenge ancient system is based on optical illumination of configurations of stones and monoliths as a mechanism for predicting lunar and solar eclipses. It works like an analog mechanical computing device, and there are a wide variety of mechanical devices used for analog computing since then.[5]

The other ancient analog computer, known as the Antikythera Mechanism, dates back to 205 BC during the time of Archimedes. Back then, Cicero wrote that Archimedes may have built a machine that resembles the Antikythera Mechanism. Antikythera was designed to predict accurately eclipses combining ancient Babylonian and Greek mathematical geometry applied to astronomy. Recent re-examination of the data found that Antikythera Mechanism can predict eclipse time in hours [6] .

These earliest computing machines were not digital but analog as the numerical quantities that are captured are “continuous”, such as the continuous measurement of the angle of the rotation of a shaft. [7] Analog computers by contrast to digital computers do not break data down into binary code. They use physical phenomena such as optical illumination, mechanical or hydraulic quantities to model the problem being solved. Their mechanics involve mechanism that can be operated by moving objects or turning hand crank that would trigger the movement of other gears.

As an ancient and sophisticated analog computer Antikythera Mechanism’ s many functions are still unknown, however researchers believe that it could be “programmed” once and would then be able to calculate moon phases and eclipse cycles. Recent findings providing evidence that it may have been used for prediction using analog calculation of arithmetic progression cycles.

Figure 3: Illustration of the design of the Antikythera Machine [8] , which operates by turning hand crank that would trigger the movement of all of the gears.

Although today micro-processor such as CPU, GPU and soon QPU computers are taking over mechanical computers, the analog computers will still continue to be used in various emerging technologies and at even the micro-scale. They will be used in areas where mechanical functions with computational and control functions are not feasible by purely electronic processing. [9]

700–1500 AC

Al-Khwarizmi made practical arguments to develop these new mathematical methods. These methods were developed to solve quadratic equations by the process of completing squares. Al Khwarizmi[12] used the text and geometry to solve the quadratic function of the form:

Figure 4: Geometric solutions of a quadratic function elaborated by Al-Khwarizmi.

Al-Khwarizmi used numerous cases with text, such as the equation above, involving algebra where “A square and ten Roots are equal to thirty nine Dirhams.” [13]

Al-Khwarizmi’s book on algebra played similar role to that played by Euclid’s book on Elements for geometry. The book was later translated into Latin and the decimal system of numbers were disseminated by Leonardo of Pisa, also known as Fibonacci, in his work Liber Abaci. [14]

The developments made during this period laid the foundations of scientific and proof theories (Peter Lynch, 2014). Ibn Sinān (946) applied the mathematical proof approach to construct concise geometrical constructions, such as the area of a segment of a parabola.

1500–1900 AC

Figure 5: Front Panel of Pascal Calculator (Paul E. Dunne) — Mechanical Aids to Computation and the Development of Algorithms. [16]

Mathematical algorithms were launched and further developed to carry out tasks translated into mechanical analogues. This period has seen also the development of methods and machines for calculation not only for astronomy but also applied to the economy and the insurance, such as the launching of early life insurance model by Edmond Halley (1656–1742). [17]

The interest to continue to develop machines for calculation saw in turn the appearance of the first digital calculating machine as early as 1642. Blaise Pascal invented the mechanical calculator, which dates back to 1672. This was followed by Gottfried Leibniz’s invention of the binary numeral system to develop universal calculus of reasoning (alphabet of human thought) by which arguments could be decided mechanically. [18]

In terms of techniques to solve data problem Bayes theorem was elaborated in the year 1763. As a follow up to Bayes’ Theorem Pierre-Simon Laplace published, in 1812, the “Théorie Analytique des Probabilités”, expanding Bayes’ Theorem. This followed by the invention of Boolean algebra in 1854 by George Boole. Another breakthrough occurred in the year 1805 with the discovery of least squares method used widely in data fitting. The approach developed by Adrien-Marie Legendre under the French title the “Méthode des Moindres Carrés”. [19]

The great advances in mathematical astronomy that continued during the early years of the nineteenth century were due in no small part to the development of the method of least squares. Legendre (1752–1833) wrote on astronomy, the theory of numbers, elliptic functions, the calculus, higher geometry, mechanics and physics.

Maxwell published a paper 1868 on speed regulation mechanism based on feedback interaction, leading to the emergence of cybernetics by Norbert Wieneer (1894, 1964) [20] . Cybernetics emerged to address this very complex type of interaction or relationship within and between systems. This is relevant to Machine Learning as it is also a process by which these relationships, including functional relationships, between variables and ensembles of variables in these systems are discovered through diverse mechanisms of ML from data.

1900–1950 AC

The publication of Principia Mathematica by Bertrand Russell and Alfred North Whitehead in 1913 revolutionized formal logic. The same year Andrey Markov described techniques he used to analyse a poem based on Markov Chains. The technique, known today as Markov Chain, as its name implies, is a sequence or chain of possible events, where the probability of each event depends only on the state attained in the previous event. [22] The is also used to help identify genes in DNA and power algorithms for voice recognition. [23]

Between 1920s and 1930s, Ludwig Wittgenstein and Rudolf Carnap led philosophical work into logical analysis of knowledge. The origin of artificial neural networks starts to emerge as of 1943 based on the work of Warren Sturgis McCulloch and Walter Pitts, which was published under the title “A Logical Calculus of the Ideas Immanent in Nervous Activity”. In 1947, John Von Neumann launched a stored-programmed electronic computer which is also referred as the IAS computer of the Institute for Advanced Studies in Princeton.

Figure 6: “Arithmometre Electromagnetique” consists of five disc to represent the dividend with three crank representing the divisor and three hands representing the divisor (Leonardo Torres y Quevedo, 1920).

1950–2000 AC

The Turing Machine refers to a computer having the ability to read and write symbols to a tape using symbols to execute an algorithm. It was considered as the first complete mathematical model involving mathematical abstraction and logic. The machine consists of a logical control of the machine processing and a tape with a sequence of storage cells containing digital values as well as a scanner for reading and writing the values from and to the tape. [25]

In terms of early learning techniques and as a follow up to McCulloch and Pitts’ work on neural networks, Marvin Minsky and Dean Edmonds built, in 1951, the first neural network machine with ability to learn. The year after, in 1952, Arthur Samuel launched some of the very first machine learning programs. The term “artificial intelligence” was first coined in 1956 by John McCarthy at a Conference at Dartmouth College, New Hampshire. In 1957, Frank Rosenblatt invented what is known today as the perceptron, which is the first neural network for computers to simulate the thought processes of the human brain. [26]

Figure 7: Turing Machine simplified consists of a tape and a tape scanner for reading and writing.

More ML related techniques starts to emerge such as the “nearest neighbor” algorithm, which was written in 1967, allowing computers to begin using very basic pattern recognition. In the year 1968, Wallace and Boulton’s program was developed for unsupervised classification (clustering) using Bayesian Minimum Message Length criterion, a mathematical realization of Occam’s razor. In 1970, the concept of Automatic Backpropagation was developed by Seppo Linnainmaa, also known as automatic differentiation (AD) of discrete connected networks of nested differentiable functions.

The rediscovery of backpropagation causes a resurgence in machine learning research. In 1981, Gerald Dejong introduced the concept of Explanation Based Learning (EBL), in which a computer analyses training data and creates a general rule it can follow by discarding unimportant data. [27]

As of mid-1980s, neural networks become widely used with the backpropagation algorithm. In 1981, Danny Hillis designed the connection machine, which utilizes Parallel computing to bring new power to AI, and to computation in general. Recurrent Neural Network was developed by John Hopfield the year after, in 1982, which is known as Hopfield network, which is a type of recurrent neural network that can serve as content-addressable memory systems. The year 1989 saw the emergence of Reinforcement Learning with Christopher Watkins developing Q-learning (Chapter 7), which helped in the practicality and feasibility of reinforcement learning.

The 1990s saw a remarkable shift from ML knowledge-driven approach to a data-driven approach. Scientists begin creating programs for computers to analyse large amounts of data and draw conclusions — or “learn” — from the results. Support vector machines (SVMs) and recurrent neural networks (RNNs) become popular during this period. In 1995, Tin Kam Ho published a paper describing random decision forests (RF). In the same year Support Vector Machines (SVM) was published by Corinna Cortes and Vladimir Vapnik. Two years later, in 1997, Sepp Hochreiter and Jürgen Schmidhuber invented long short-term memory (LSTM) recurrent neural networks, which helped greatly improving the efficiency and practicality of recurrent neural networks (RNN). During 1998 MNIST database was compiled by a team led by Yann LeCun. MNIST database consists of a mix of handwritten digits from American Census Bureau employees and American high school students. Since then the MNIST database has become a benchmark for evaluating handwriting recognition and ML applications.

2000 — Present

As of 2010 several platforms for machine learning competitions were launched involving a combination of machine learning, natural language processing and information retrieval techniques. These platforms excel with the ability to massively carry out parallel machine learning and provide recommendations. In 2016, the machine learning in combination with tree search techniques beat Humans in Go game. One year later, in 2017, it generalized to Chess and more two-player games with AlphaZero.

Another remarkable achievement is the release of Qubit Processing Unit (QPU) quantum computer that will drive processing power even further. In 2017, IBM 5-qubit quantum computer is made available over the Internet to the general public as a programmable quantum computer. The half-angle bracket notation |⟩ is conventionally used to indicate qubits, as opposed to ordinary bits. [28]

Figure 8: QPU (Qubit Processing Unit)

Several companies are developing and delivering quantum computer as a service. There are two main players in this development, based on the report by James A. Martin [29]. The first group of players are from the world of classical computing and the second group of players are quantum computing start-ups.

Today, most of what is being framed as Artificial Intelligence (AI) is based on Machine Learning [30]. ML is a critical ingredient for intelligent applications and provides the opportunity to further accelerate discovery processes as well as enhancing decision making processes. These trends promise that every sector will be data-driven and will be using machine learning in the cloud to incorporate artificial intelligence applications and to ultimately supplement existing analytical and decision making tools.



[3] “Stonehenge earlier than believed”. BBC News. 9 October 2008.


[5] The Modern History of Computing, First published Mon Dec 18, 2000; substantive revision Fri Jun 9, 2006, Stanford University.

[6] Eclipse Prediction on the Ancient Greek Astronomical Calculating Machine Known as the Antikythera Mechanism — Tony Freeth / Luis M. Rocha, Editor

[7] Horsburgh …

[8] Wikipedia —

[9] John H. Reif () Mechanical Computation: its Computational Complexity and Technologies, Chapter, Encyclopedia of Complexity and Systems Science

[10] Elizabeth Bussa (2007)

[11] Luke Hodgkin (2005) A History of Mathematics From Mesopotamia to Modernity, Oxford University Press, UK

[12] Evelyne Barbin, Uffe Thomas Jankvist, and Tinne Hoff Kjeldsen (2015). History and Epistemology in Mathematics Education Proceedings of the Seventh European Summer University ESU 7

[13] Copenhagen, Denmark, 14–18 July 2015. Danish School of Education, Copenhagen, Denmark.

Davd Eugene Smith (1925). History of Mathematics, Dover Publications, Inc. NY.

[14] Peter Lynch (2014) Arabic enlightenment and the emergence of algebra, The Irish Times

[15] Paul E. Dunne, Mechanical Aids to Computation and the Development of Algorithms, Mechanical Calculators prior to the 19th Century

[16] Mechanical Calculators prior to the 19th Century —

[17] Trevor J Barnes and Matthew W Wilson (2014).

[18] Bernard Marr (2016).A Short History of Machine Learning

[19] Translated from the French by Professor Henry A Ruger and Professor Helen M, Walker, Teachers College, Columbia University, New York City.

[20] Governors and Feedback Control at

[21] Short Account on Leonardo Torres’ Endless Spindle (Federico Thomas, Institut de Rob`otica i Inform`atica Industrial (CSIC-UPC), Llorens Artigues 4–6, 08028 Barcelona, Spain,


[23] BRIAN HAYES, First Links in the Markov Chain, American Scientist —

[24] University of Washington (2006) The History of Artificial Intelligence

[25] John H. Reif () Mechanical Computation: its Computational Complexity and Technologies, Chapter, Encyclopedia of Complexity and Systems Science


[27] Beranard Marr

[28] IBM Research and the IBM QX team — 2017

[29] James A. Martin (2018). Who’s developing quantum computers? Network World

[30] Michael Jordan (2018) Artificial Intelligence — The Revolution Hasn’t Happened Yet — Medium


This article is one of the chapters of my book entitled “Machine Learning @ Work: Speeding up Discovery”.

OperAI develops IoTs with Math and AI Embedded Solutions to speed up and streamline operational processes at the edges of the cloud.