The solution of a system of linear equations is by far the most important problem in Applied Mathematics. It is important both in itself and because it is an intermediate step in many other important problems. Gaussian elimination is nowadays the standard method for solving this problem numerically on a computer and it was the first numerical algorithm to be subjected to rounding error analysis. In

La resolución de sistemas de ecuaciones lineales es sin duda el problema más importante en Matemática Aplicada. Es importante en sí mismo y también porque es un paso intermedio en la resolución de muchos otros problemas de gran relevancia. La eliminación Gaussiana es hoy en día el método estándar para resolver este problema en un ordenador y, además, fue el primer algoritmo numérico para el que se realizó un análisis de errores de redondeo. En

Alan Turing made several contributions that are considered fundamental in Mathematics and Computer Science and that are widely known by all mathematicians and computer scientists. Even more, the names of some of these contributions are also very well known by many educated people (who are not necessarily specialists) as, for instance, the name

Many numerical analysts know that Alan Turing was one of the first researchers working on the error analysis of GE. This is clearly explained in some standard references on Numerical Analysis. In particular, an excellent text that gives a detailed account on Turing’s contributions to the analysis of GE is Nicholas Higham’s

“The experiences of Fox, Huskey, and Wilkinson prompted Turing to write a remarkable paper “Rounding-off errors in matrix processes” (Turing,

I am not mentioning above all the contributions of Turing listed by N. Higham (

Probably most mathematicians and computer scientists, and certainly most common people, are unaware that Alan Turing was not the only great mathematician working on the error analysis of GE in the 1940’s. However, for numerical analysts it is well known that, before him, other giants of Mathematics considered the error analysis of GE as a very important problem and worked on it in the 1940’s. In fact, although Alan Turing certainly made a number of key and original contributions, some of the results presented in Turing (

In this context, the main goal of this manuscript is to bring to the attention of the “widest as possible” audience the work of Alan Turing on GE and to explain why this problem was (

Before starting, let me say a few words about what this paper is not.

The paper is organized as follows. In Section 2, a brief history of GE is presented and the classic and modern descriptions of GE are refreshed for those readers who have forgotten GE or who are not familiar with its modern treatment. Section 3 describes the historical context, from the point of view of Mathematics and Computer Science, in which the paper was published. Since the title of Turing (

Classic books on the History of Mathematics, as well as recent studies on this subject, place the origins of GE in a variety of ancient texts from different places and times: China, Greece, Rome, India, medieval Arabic countries, and European Renaissance. However, in my opinion, it is not exact to say that these ancient/medieval/renaissance texts describe what we understand today as the

The developments of GE that include explicit statements of algorithmic rules can be organized essentially in three periods (Grcar,

The

The way GE is presented in high school textbooks is highly inefficient for solving moderately large systems of linear equations via

The

These difficulties motivated Gauss to modify the high school elimination method of Newton in a nontrivial way and this is the start of the

Gauss’s method was significantly improved by Myrick Doolittle (1881), André-Louis Cholesky (1924), who adapted it for being used with mechanical multiplying calculators, and Prescott Durand Grout (1941), who developed a method valid for general systems of equations and not only for normal equations. This essentially closes the

The

It is important to observe that, since the 1940’s, the research on different aspects of GE has remained, and still remains, very active. The interested reader is invited to consult the wide collection of references included in Higham (

In this section, I refresh the method of GE as it appears in high school textbooks via an example. Later, I will use the same example to illustrate how modern GE is presented in Numerical Analysis textbooks at the University-level. So, I propose that readers to imagine themselves to be young again, living the good times of high school, and, to make this exercise even more exciting, that they imagine that Newton is their teacher!

Consider that we are asked to solve the following system of equations.

The key point of _{1} in all equations below the first one via the following

Next, we perform the _{2} in all equations below the second one via the replacement operations: replace, “equation(3)” by “equation(3) – (–4)×equation(2)”; and replace, “equation(4)” by “equation(4) – 2×equation(2)”. So, we obtain the linear system

Finally, we perform the _{3} in all equations below the third one via the replacement operation: replace “equation(4)” by “equation(4) – (–7)×equation(3)”. This leads to the following

This is the end of the GE process that has transformed the linear system (1), which we did not know how to solve, into the linear system (2), which has the same solution and that can be solved very easily: from equation(4) we compute _{4}; next from equation(3) we compute _{3}; next from equation(2) we compute _{2}; and, finally, from equation(1) we compute _{1}. This procedure of solving (2) is known as

The process above can be easily generalized to systems with any number of equations and unknowns.

The linear system (1) has some key features that ought to be mentioned. First note that (1) has the same numbers of equations as unknowns and that its solution is unique. In more advanced mathematical language, this is equivalent to say that

Another feature of (1) is that GE has run without _{3} does not appear in equation(3). I will discuss later how equations are interchanged when GE is currently implemented on computers. Finally, the readers might recall that at high school they used, in addition to replacement and interchange operations,

I am almost sure that most readers, apart from many happy memories in high school, have also recalled that to perform GE

The most important mathematical concept of modern GE is the LU matrix factorization. It allows us to state in a compact and elegant matrix language the GE method described above, it plays an important role in the implementation of GE in modern computers, and, finally, it is essential to facilitate the rounding error analysis of the algorithm. To explain the LU factorization, we use again the linear system (1). To begin with, let us write (1) in

The matrix

where

The numbers –2, 3, 1, –4, 2, and –7 that multiply rows in each row replacement operation in (6) are called the

So far, the matrix

which is the very famous

I would like to mention that we have constructed (8) via the multipliers of GE and the final matrix

The LU factorization is not the only factorization of a matrix involving triangular factors that is important in Numerical Analysis. In fact, accurate and efficient algorithms for computing different triangular factorizations of matrices were considered among the

Nowadays, the solution of a linear system

1. Compute the LU factorization of

2. Solve for _{1} = _{1} from the first equation, then use _{1} to compute the second unknown _{2} from the second equation, then use _{1}, _{2} to compute the third unknown _{3} from the third equation, and so on.

3. Solve for

It is easy to see that these steps compute the solution, because if we substitute

Explicit algorithms for solving the triangular systems

Observe that Algorithm 1 consists only of

The computational cost of Algorithm 1 is 2^{3}/3 + ^{2}) arithmetic operations and this is also the cost of the three-step approach for solving ^{2} –

Algorithm 1 may produce huge errors when it is implemented on a computer if a very small _{kk} appears in some

For describing ^{(1)} := ^{(k)} as the matrix produced by GE at the start of the ^{(n)} := ^{(k)} are denoted by _{ij}^{(k)}, as usual. Recall that the

and after that an standard ^{(1)} in (4). Note that the entry with largest absolute value in the first column of ^{(1)} is 6 in position (3, 1). Then ^{(2)}. This is summarized in the following equation:

Next, observe that the entry with largest absolute value in the second column of ^{(2)} ^{(3)}. This is summarized in the following equation:

Next observe that the entry with largest absolute value in the third column of ^{(3)} ^{(4)}=: _{P}, and the process of GE with

Once the row exchanges that have been done by partial pivoting are known, it is clear that the process is mathematically equivalent to permute _{P}U_{P}_{P}_{P}

The comparison of the LU factorization of _{P}_{P}_{P}

Partial pivoting can be easily and elegantly incorporated in Algorithm 1. The details are omitted, but can be found in Demmel (

Input:

Output:

1.

2.

3.

The term “partial pivoting” was introduced by Wilkinson (

In the 1940s there were three very famous papers giving error analyses of GE. The first one was written by Harold Hotelling in

The three papers were written before modern computers existed, but they were motivated by the existence of several projects for constructing the first

In the 1940s, as well as today, one of the most important numerical problems was the solution of “large” (the precise meaning of “large” changes continuously with time) systems of linear equations, since they appear in many applications. In addition, in Turing’s own words (

The paper by Hotelling (^{n-1}. This error bound led Hotelling (

Hotelling’s results led to general pessimism in mid 1940s about the practical use of GE for solving large systems of equations and motivated the papers by von Neumann and Goldstine (_{10}4

The error analyses developed by von Neumann and Goldstine (

The results in von Neumann and Goldstine (

In contrast, Turing (

Next, two additional points on the three papers are discussed. The three papers (Hotelling,

I will not explain in detail the mathematical contributions of John von Neumann (see

Harold Hotelling and Herman Goldstine (see

Herman Goldstine was born in Chicago in 1913. He was awarded bachelor (1933), master (1934) and PhD (1936) degrees in mathematics from the University of Chicago. In 1941 he wrote the technical description for ENIAC (Electronic Numerical Integrator And Computer), which was the first electronic computer starting to work in 1946 (up to 1955). He joined the Army in 1942, when the United States entered World War II and he persuaded the Army to fund the construction of ENIAC in 1943 and subsequently became programme manager of ENIAC. Although ENIAC was thousands of times faster than previously available electro-mechanical machines and was programmable, there was no way to issue orders at electronic speed (modern programs)

An additional information may be of interest for readers on the fascinating 1940’s: It is often said that von Neumann’s famous report

Many projects for constructing modern computers got underway in the 1940s. A brief account of them may be found in Grcar (

Turing was involved in the NPL Pilot ACE (National Physical Laboratory Pilot Automatic Computing Engine) project developed in Teddington, England. Turing worked at NPL from 1945 to 1948 and during this period he also worked in rounding error analysis of GE. Basically, Turing did the first design of Pilot ACE in 1946 which was, probably, very ambitious for the resources of NPL and was never constructed. The Pilot ACE started to work in May 1950, without Turing, based mainly on ideas of Harry Huskey and James Wilkinson (see more comments in Wilkinson,

Turing moved to The University of Manchester in September 1948. There, he collaborated in the Baby/Mark 1 project. The Small-Scale Experimental Machine, known as the ‘Baby”, made its first successful run of a program on June 21st 1948. It was the first machine that had all the components now regarded as characteristic of a modern computer. Most importantly it was the first computer that could store not only data but any user program in electronic memory and process it at electronic speed. From the ‘Baby” a full-sized machine was designed and built, the Manchester Mark 1, which by April 1949 was generally available for computation in scientific research in the University of Manchester. Turing worked in the project and helped to design the program language of the computer.

Von Neumann started to lead the computer project at the Institute of Advanced Studies at Princeton (USA) (the “IAS computer project”) in 1946 and Goldstine joined him from the very beginning. In this period they became interested in rounding errors in GE. The first IAS computer started to work in 1951, i.e., later than its UK competitors. However, its influence on modern computers is, probably, more important, since several clones of the IAS computer were built from 1952 to 1957, including the first IBM mainframe.

After explaining the history of GE and its particular historical context in the 1940’s, now the key properties of the rounding errors committed by GE are considered. This is the most technical section of the paper and “for encouraging” the reader, I will start with a quotation from the Preface of one of the most popular textbooks on Numerical Linear Algebra, written by Lloyd N. Trefethen and David Ban (

“...

In plain words, this means that GE is recognized by professional numerical analysts as very easy to explain, but

Rounding errors in computers come from two facts. First, computers can only represent a finite subset of the real numbers, which is called the set of

In current computers ^{–53} ≈ 1.11 x 10^{–16} in double precision and ^{–24} ≈ 5.96 x 10^{–8} in single precision. In Axiom 1, the exact meaning of the sentence “x ∈ lies in the range of ” is that the absolute value of x is smaller than or equal to the largest absolute value of the numbers in and larger than or equal to the smallest absolute value of the nonzero numbers in .

Axiom 2 is sometimes called the

From a historical point of view, it should be noted that Axioms 1 and 2 of floating point arithmetic were introduced by Wilkinson in

The main reason why Hotelling obtained a rounding error bound for GE that increases exponentially with the size of the matrix is easy to understand by combining Axioms 1 and 2 (that Hotelling did not know!) with Algorithm 1, and _{ij} = a_{ij} – a_{ik}a_{kj}_{ik}a_{kj}

Here are entries of the matrix ^{(k)},

Now, let us proceed with a simplified analysis and denote by

i.e., _{k}_{1}

where 2nd-order terms in the errors have been discarded. Of course, there are still more errors, coming from Axiom 2, when computing (10) in floating point arithmetic, but their effect is _{k+1}_{k+1}_{k}_{l}^{-16} and GE performs (

is finally obtained. The error bound in (13) is really huge even for small sized matrices: for _{n}^{–3} ; for _{n}^{2}; and, for _{n}^{7}. Therefore, for

Before stating Wilkinson’s famous result, it is necessary to establish effective ways to measure differences between matrices (

Nowadays, every numerical analyst is familiar with matrix norms, but this was not the case in the 1940’s. In fact, the paper by von Neumann and Goldstine (

^{(1)} = A, A^{(2)}, ... , A^{(n)} = U are the matrices appearing in the GE process as they were defined in Section 2.4.

The proof of Theorem 1 is not difficult with the tools currently available, but it requires some technical work so is omitted here. Interested readers can found two different modern proofs in Higham (

Some modern texts (Higham,

James Wilkinson was a Cambridge-trained English mathematician who worked as Turing’s assistant at NPL (1946-48). He is considered the founder of modern rounding error analysis by using systematically backward errors for analysing many numerical algorithms for matrix computations. He wrote two influential books on Numerical Analysis: _{∞} ≈ _{∞}, a fact that is deeply connected to backward errors, as we will discuss in Section 4.6. Wilkinson claimed at that time

The backward error bound (14) for GE with partial pivoting (GEPP) includes the growth factor of the matrix

The maximum absolute value of the entries of ^{(2)}, ^{(3)}, and ^{(4)} is 11.76. Therefore

Note that the growth factor is larger than or equal to one for any matrix

The key question in this context is to determine whether there exist matrices with very large growth factors or not. The answer is given in Theorem 2 and is

Combining Theorem 2 with the bound in (14) for the unit roundoff ^{–53} of double precision, one obtains

which is a huge bound for matrices with

How should we pose precisely this unsolved problem? One option is to consider random matrices whose entries are independent random variables and to prove that the probability of encountering a growth factor > α decreases extremely fast as α increases (perhaps, exponentially fast!). More information on the solution of this problem, including a money prize, can be found in Trefethen (

The fact that a numerical algorithm is _{∞}/||_{∞}, where

where it is assumed that ||^{–1}||_{∞}||∆_{∞}< 1. By discarding second order terms in the perturbation ||∆_{∞} and by using (14), equation (17) becomes

The inequalities in (18) tell us that _{∞}||^{–1}||_{∞} _{∞}||^{–1}||_{∞} _{∞}||^{–1}||_{∞} plays a fundamental role in the perturbation theory of the solution of linear systems and in the “forward errors” committed by GEPP. It is the very famous

I want to insist more on a fact that is very familiar to numerical analysts, but that it is still surprising for many users of numerical software. _{∞}(

The condition number _{∞}(

Theorem 1 presents backward errors of GEPP, but it does not show how Wilkinson reached this “at a first-glance unnatural” way of presenting/analysing rounding errors. I have already commented in the last paragraph of Section 4.3 that Wilkinson was motivated by a few numerical tests that always produced tiny residuals. In fact, we will see in Section 5 that this was also Turing’s motivation for undertaking his error analysis of GE. Therefore, I discuss in this section the deep connection existing between backward errors and residuals and how residuals can be used to give sharp optimal estimates of backward errors. The results can be applied to any algorithm for solving linear systems and not just to GEPP.

First, observe that if the approximated solution, , of _{∞} = _{∞} then _{∞} ≤ ||∆_{∞}||||_{∞} = _{∞}||||_{∞}. In plain words, this means that a tiny relative backward error of order _{∞}/(||_{∞} ||||_{∞}) also of order

According to the discussion in Higham (

Now, the reader can fully appreciate why the fact that GEPP computed solutions with relative residuals ||_{∞}/(||_{∞} ||||_{∞}) =

and store it in the MATLAB program (Higham, _{∞}(^{22}. Define the 3 x 1 vector ^{T} and compute in MATLAB

The growth factor (15) of ^{–16}. However, the relative residual is of order

Alan Turing wrote his famous “rounding-off error” paper (

^{-10}

In the modern language introduced in Section 4.6 what Turing, Wilkinson and coworkers observed was that the relative residual was of order unit roundoff. Turing (

Turing believed,

_{10}2

Therefore,

However, it should be also remarked that Turing’s analysis is non-standard from a modern point of view. In particular, it is based on the key assumption (Turing,

*∈* is made. How this is to be secured need not he specified, but it is clear that the number of figures to be retained in will have to depend on the values of the .”

The difficulty with this assumption is that no computer, either present or past, can guarantee a rounding error bound like this in finite precision arithmetic. In fact, *∈*”*∈*. Even with the unrealistic and ideal assumption *∈* = _{∞}, Turing’s error bound for the approximate solution computed by GEPP becomes a non optimal bound of the type ||_{∞}/||_{∞} ≲ (||_{∞}||^{–1}||∞)^{2} _{∞}/||_{∞} ≲ (||_{∞}||^{–1}||∞) (

Since Wilkinson’s pioneer paper was published in

As examples of very recent activities on the error analysis of GE, I discuss here very briefly the research presented in the papers Grigori, Demmel and Xiang (

I have reviewed at an introductory level the first research works on the rounding error analysis of one of the most important numerical algorithms in Mathematics: Gaussian elimination for solving systems of linear equations. The pioneer work on this topic published by Alan Turing in

The author thanks Profs. Manuel de León, David Ríos Insua, and Jesús María Sanz Serna, from the

This work was partially supported by the Ministerio de Economía y Competitividad of Spain through grant MTM-2009-09281 and was originated by a talk with the same title presented in the International Symposium "The Alan Turing Legacy" held in Madrid (Spain) in October 23-24, 2012. This symposium was organized and funded by the Real Academia de Ciencias Exactas, Físicas y Naturales of Spain and Fundación Ramón Areces.

The title of Grcar (

Readers should note that for systems having the same number of equations as unknowns this is, by far, the most frequent case in practice, since the probability that a square matrix is singular is zero.

Note that _{kk} at _{kk} = 0 may happen and, in this case, Algorithm 1 fails.

It should be noted that today, it is widely known that the use of normal equations for solving least squares problems may be unstable and they are never used in professional software. The standard algorithm for least squares problems is based on another famous matrix factorization: the QR factorization (Demmel,

Therefore ENIAC is not considered a “modern computer”.

In this informal analysis, it is assumed that all entries , for 1 ≤