Machine Learning — A historical perspective
“Historical perspective differs from history in that the object of historical perceptive is to sharpen one’s vision of the present not the past”. Barbara S. Lawrence
Studies of historical perspective of ML can help to further sharpen its present for the future. According to Barbara Lawrence: “Historical perspective expands research horizons by encouraging study of the relative stability of phenomena, providing alternative explanations for phenomena, and aiding problem formulation and research design.” 
The historical perspective of ML spans from the earliest predictive models, before writing, to today’s cognitive models, mimicking Human thought processes. Understanding ML’s past can be traced centuries back or even farther back in time to the origin of syllogism by Aristotle 384 BC–322 BC (logic), or to the time of Euclid’s work on mathematical geometry titled “The Elements”, or even back to the time of early known predictive models, such as those used in the accurate predictions of eclipses, thousands of years ago.
3000 BC — 700 AC
Early sun and moon eclipses’ accurate predictions can be traced back to 3000 BC, 500 years earlier than original reported based on the re-examination of the Stonehenge relicts, in recent years. The Stonehenge consists of 56 Welsh bluestones standing in a ring of 87m. 
Originally the Stonehenge was considered as an ancient observatory to mark midsummer and midwinter. However, the recent theories suggest it was used for predicting solar and lunar eclipses year-round  . The Stonehenge ancient system is based on optical illumination of configurations of stones and monoliths as a mechanism for predicting lunar and solar eclipses. It works like an analog mechanical computing device, and there are a wide variety of mechanical devices used for analog computing since then.
The other ancient analog computer, known as the Antikythera Mechanism, dates back to 205 BC during the time of Archimedes. Back then, Cicero wrote that Archimedes may have built a machine that resembles the Antikythera Mechanism. Antikythera was designed to predict accurately eclipses combining ancient Babylonian and Greek mathematical geometry applied to astronomy. Recent re-examination of the data found that Antikythera Mechanism can predict eclipse time in hours  .
These earliest computing machines were not digital but analog as the numerical quantities that are captured are “continuous”, such as the continuous measurement of the angle of the rotation of a shaft.  Analog computers by contrast to digital computers do not break data down into binary code. They use physical phenomena such as optical illumination, mechanical or hydraulic quantities to model the problem being solved. Their mechanics involve mechanism that can be operated by moving objects or turning hand crank that would trigger the movement of other gears.
As an ancient and sophisticated analog computer Antikythera Mechanism’ s many functions are still unknown, however researchers believe that it could be “programmed” once and would then be able to calculate moon phases and eclipse cycles. Recent findings providing evidence that it may have been used for prediction using analog calculation of arithmetic progression cycles.
Although today micro-processor such as CPU, GPU and soon QPU computers are taking over mechanical computers, the analog computers will still continue to be used in various emerging technologies and at even the micro-scale. They will be used in areas where mechanical functions with computational and control functions are not feasible by purely electronic processing. 
This period saw major breakthroughs in mathematics especially in the area of computing roots of difficult equations using Algebra, which means to restore or balance equations. Al-Khwarizmi (780–860)  considered Algebra as a way to describe in mathematical terms the algorithms without the need of geometrical mathematical (Euclid) proof and developed a text solving a geometric problem. This process of using algebra to solve mathematical geometrical problems, is known as algorithm referring to his name Al- Khwarizmi, albeit slightly different. 
Al-Khwarizmi made practical arguments to develop these new mathematical methods. These methods were developed to solve quadratic equations by the process of completing squares. Al Khwarizmi used the text and geometry to solve the quadratic function of the form:
Al-Khwarizmi used numerous cases with text, such as the equation above, involving algebra where “A square and ten Roots are equal to thirty nine Dirhams.” 
Al-Khwarizmi’s book on algebra played similar role to that played by Euclid’s book on Elements for geometry. The book was later translated into Latin and the decimal system of numbers were disseminated by Leonardo of Pisa, also known as Fibonacci, in his work Liber Abaci. 
The developments made during this period laid the foundations of scientific and proof theories (Peter Lynch, 2014). Ibn Sinān (946) applied the mathematical proof approach to construct concise geometrical constructions, such as the area of a segment of a parabola.
The advances made in mathematics during the previous period along with the spectacular increase of mathematical understanding during the 16th and 17th century have led to breakthroughs. These breakthroughs were made in the domain of both mathematical methods and machines for simplifying calculations related in particular to astronomy. The advances made during these two successive periods are considered of lasting importance. 
Mathematical algorithms were launched and further developed to carry out tasks translated into mechanical analogues. This period has seen also the development of methods and machines for calculation not only for astronomy but also applied to the economy and the insurance, such as the launching of early life insurance model by Edmond Halley (1656–1742). 
The interest to continue to develop machines for calculation saw in turn the appearance of the first digital calculating machine as early as 1642. Blaise Pascal invented the mechanical calculator, which dates back to 1672. This was followed by Gottfried Leibniz’s invention of the binary numeral system to develop universal calculus of reasoning (alphabet of human thought) by which arguments could be decided mechanically. 
In terms of techniques to solve data problem Bayes theorem was elaborated in the year 1763. As a follow up to Bayes’ Theorem Pierre-Simon Laplace published, in 1812, the “Théorie Analytique des Probabilités”, expanding Bayes’ Theorem. This followed by the invention of Boolean algebra in 1854 by George Boole. Another breakthrough occurred in the year 1805 with the discovery of least squares method used widely in data fitting. The approach developed by Adrien-Marie Legendre under the French title the “Méthode des Moindres Carrés”. 
The great advances in mathematical astronomy that continued during the early years of the nineteenth century were due in no small part to the development of the method of least squares. Legendre (1752–1833) wrote on astronomy, the theory of numbers, elliptic functions, the calculus, higher geometry, mechanics and physics.
Maxwell published a paper 1868 on speed regulation mechanism based on feedback interaction, leading to the emergence of cybernetics by Norbert Wieneer (1894, 1964)  . Cybernetics emerged to address this very complex type of interaction or relationship within and between systems. This is relevant to Machine Learning as it is also a process by which these relationships, including functional relationships, between variables and ensembles of variables in these systems are discovered through diverse mechanisms of ML from data.
The beginning of 1900s saw the emergence of machine automation credited to Leonardo Torres y Quevedo (1852–1936) who invented a remarkable chess playing robot, named as “El Ajedrecista” in Spanish, which is the equivalent of a Chessplayer. Torres Quevedo built other several analogue calculation machines, capable of solving automatically algebraic equations of up to eight terms, obtaining its roots, with a precision of thousandth (1/1000). The key element of Torres’ machine was the endless spindle, an analog mechanical device designed to compute log(a+b) from log(a) and log(b) .
The publication of Principia Mathematica by Bertrand Russell and Alfred North Whitehead in 1913 revolutionized formal logic. The same year Andrey Markov described techniques he used to analyse a poem based on Markov Chains. The technique, known today as Markov Chain, as its name implies, is a sequence or chain of possible events, where the probability of each event depends only on the state attained in the previous event.  The is also used to help identify genes in DNA and power algorithms for voice recognition. 
Between 1920s and 1930s, Ludwig Wittgenstein and Rudolf Carnap led philosophical work into logical analysis of knowledge. The origin of artificial neural networks starts to emerge as of 1943 based on the work of Warren Sturgis McCulloch and Walter Pitts, which was published under the title “A Logical Calculus of the Ideas Immanent in Nervous Activity”. In 1947, John Von Neumann launched a stored-programmed electronic computer which is also referred as the IAS computer of the Institute for Advanced Studies in Princeton.
As of 1950 Machine Learning, per se, starts to emerge with character recognition and pattern recognition. As a follow up to Von Neumann, Alan Turing created, in 1950, “Turing Test” to determine if a programmed computer can mimic human intelligence and reasoning. Turing test is a model for measuring ‘intelligence’, which AI researchers still continue to strive toward. 
The Turing Machine refers to a computer having the ability to read and write symbols to a tape using symbols to execute an algorithm. It was considered as the first complete mathematical model involving mathematical abstraction and logic. The machine consists of a logical control of the machine processing and a tape with a sequence of storage cells containing digital values as well as a scanner for reading and writing the values from and to the tape. 
In terms of early learning techniques and as a follow up to McCulloch and Pitts’ work on neural networks, Marvin Minsky and Dean Edmonds built, in 1951, the first neural network machine with ability to learn. The year after, in 1952, Arthur Samuel launched some of the very first machine learning programs. The term “artificial intelligence” was first coined in 1956 by John McCarthy at a Conference at Dartmouth College, New Hampshire. In 1957, Frank Rosenblatt invented what is known today as the perceptron, which is the first neural network for computers to simulate the thought processes of the human brain. 
More ML related techniques starts to emerge such as the “nearest neighbor” algorithm, which was written in 1967, allowing computers to begin using very basic pattern recognition. In the year 1968, Wallace and Boulton’s program was developed for unsupervised classification (clustering) using Bayesian Minimum Message Length criterion, a mathematical realization of Occam’s razor. In 1970, the concept of Automatic Backpropagation was developed by Seppo Linnainmaa, also known as automatic differentiation (AD) of discrete connected networks of nested differentiable functions.
The rediscovery of backpropagation causes a resurgence in machine learning research. In 1981, Gerald Dejong introduced the concept of Explanation Based Learning (EBL), in which a computer analyses training data and creates a general rule it can follow by discarding unimportant data. 
As of mid-1980s, neural networks become widely used with the backpropagation algorithm. In 1981, Danny Hillis designed the connection machine, which utilizes Parallel computing to bring new power to AI, and to computation in general. Recurrent Neural Network was developed by John Hopfield the year after, in 1982, which is known as Hopfield network, which is a type of recurrent neural network that can serve as content-addressable memory systems. The year 1989 saw the emergence of Reinforcement Learning with Christopher Watkins developing Q-learning (Chapter 7), which helped in the practicality and feasibility of reinforcement learning.
The 1990s saw a remarkable shift from ML knowledge-driven approach to a data-driven approach. Scientists begin creating programs for computers to analyse large amounts of data and draw conclusions — or “learn” — from the results. Support vector machines (SVMs) and recurrent neural networks (RNNs) become popular during this period. In 1995, Tin Kam Ho published a paper describing random decision forests (RF). In the same year Support Vector Machines (SVM) was published by Corinna Cortes and Vladimir Vapnik. Two years later, in 1997, Sepp Hochreiter and Jürgen Schmidhuber invented long short-term memory (LSTM) recurrent neural networks, which helped greatly improving the efficiency and practicality of recurrent neural networks (RNN). During 1998 MNIST database was compiled by a team led by Yann LeCun. MNIST database consists of a mix of handwritten digits from American Census Bureau employees and American high school students. Since then the MNIST database has become a benchmark for evaluating handwriting recognition and ML applications.
2000 — Present
As of 2000 onwards machine learning becomes even more popular and widespread, with the first release of Torch Machine Learning software library for machine learning in 20002. In 2006, Deep Learning was recognized as a new subfield of ML by Geoffrey Hinton, when computers starts to differentiate objects in images and videos. Same year, another benchmark database was launched under the name ImageNet. It is a visual database conceived by Fei-Fei Li from Stanford University. As with MNIST database, the ImageNet database helped to develop and apply machine learning algorithms to the real world, which helped in the thrive of AI. Deep learning has led ML to become integral part to many widely used software services and applications.
As of 2010 several platforms for machine learning competitions were launched involving a combination of machine learning, natural language processing and information retrieval techniques. These platforms excel with the ability to massively carry out parallel machine learning and provide recommendations. In 2016, the machine learning in combination with tree search techniques beat Humans in Go game. One year later, in 2017, it generalized to Chess and more two-player games with AlphaZero.
Another remarkable achievement is the release of Qubit Processing Unit (QPU) quantum computer that will drive processing power even further. In 2017, IBM 5-qubit quantum computer is made available over the Internet to the general public as a programmable quantum computer. The half-angle bracket notation |⟩ is conventionally used to indicate qubits, as opposed to ordinary bits. 
Several companies are developing and delivering quantum computer as a service. There are two main players in this development, based on the report by James A. Martin . The first group of players are from the world of classical computing and the second group of players are quantum computing start-ups.
Today, most of what is being framed as Artificial Intelligence (AI) is based on Machine Learning . ML is a critical ingredient for intelligent applications and provides the opportunity to further accelerate discovery processes as well as enhancing decision making processes. These trends promise that every sector will be data-driven and will be using machine learning in the cloud to incorporate artificial intelligence applications and to ultimately supplement existing analytical and decision making tools.
 Barbara S. Lawrence, 1984, Historical Perspective: Using the Past to Study the Present, The Academy of Management Review, Vol. 9, №2 (Apr., 1984), pp. 307–312.
 PREDICTING ECLIPSES WITH THE STONEHENGE http://www.math.nus.edu.sg/aslaksen/gem-projects/hm/0102-1-stonehenge/eclipses.htm
 “Stonehenge earlier than believed”. BBC News. 9 October 2008.
 PREDICTING ECLIPSES WITH THE STONEHENGE www.math.nus.edu.sg/aslaksen/gem-projects/hm/0102-1-stonehenge/eclipses.htm
 The Modern History of Computing, First published Mon Dec 18, 2000; substantive revision Fri Jun 9, 2006, Stanford University.
 Eclipse Prediction on the Ancient Greek Astronomical Calculating Machine Known as the Antikythera Mechanism — Tony Freeth / Luis M. Rocha, Editor
 Horsburgh …
 Wikipedia — https://en.wikipedia.org/wiki/Antikythera_mechanism
 John H. Reif () Mechanical Computation: its Computational Complexity and Technologies, Chapter, Encyclopedia of Complexity and Systems Science
 Elizabeth Bussa (2007)
 Luke Hodgkin (2005) A History of Mathematics From Mesopotamia to Modernity, Oxford University Press, UK
 Evelyne Barbin, Uffe Thomas Jankvist, and Tinne Hoff Kjeldsen (2015). History and Epistemology in Mathematics Education Proceedings of the Seventh European Summer University ESU 7
 Copenhagen, Denmark, 14–18 July 2015. Danish School of Education, Copenhagen, Denmark.
Davd Eugene Smith (1925). History of Mathematics, Dover Publications, Inc. NY.
 Peter Lynch (2014) Arabic enlightenment and the emergence of algebra, The Irish Times
 Paul E. Dunne, Mechanical Aids to Computation and the Development of Algorithms, Mechanical Calculators prior to the 19th Century
 Mechanical Calculators prior to the 19th Century — http://cgi.csc.liv.ac.uk/~ped/teachadmin/histsci/htmlform/lect3.html
 Trevor J Barnes and Matthew W Wilson (2014).
 Bernard Marr (2016).A Short History of Machine Learning
 Translated from the French by Professor Henry A Ruger and Professor Helen M, Walker, Teachers College, Columbia University, New York City.
 Governors and Feedback Control at http://www.clerkmaxwellfoundation.org/Governors.pdf
 Short Account on Leonardo Torres’ Endless Spindle (Federico Thomas, Institut de Rob`otica i Inform`atica Industrial (CSIC-UPC), Llorens Artigues 4–6, 08028 Barcelona, Spain,
 BRIAN HAYES, First Links in the Markov Chain, American Scientist — https://www.americanscientist.org/article/first-links-in-the-markov-chain
 University of Washington (2006) The History of Artificial Intelligence
 John H. Reif () Mechanical Computation: its Computational Complexity and Technologies, Chapter, Encyclopedia of Complexity and Systems Science
 Beranard Marr
 IBM Research and the IBM QX team — 2017
 James A. Martin (2018). Who’s developing quantum computers? Network World
 Michael Jordan (2018) Artificial Intelligence — The Revolution Hasn’t Happened Yet — Medium