The unreasonable Effectiveness of Data? On jugera.
David Donoho on Deepnet Spectra and the two cultures of data science
Prof. David Donoho, Standford University, is a mathematician who has made fundamental contributions to theoretical and computational statistics, as well as to signal processing and harmonic analysis. His algorithms have contributed significantly to our understanding of the maximum entropy principle, of the structure of robust procedures, and of sparse data description. On November 8, 2019 he held the inaugural Stiefel Lecture on Deepnet Spectra and the two cultures of data science, with references to Eduard Stiefel’s and Magnus R. Hestenes’ seminal paper Methods of Conjugate Gradients for Solving Linear Systems.
Abstract: Machine learning became a remarkable media story of the 2010s largely owing to its ability to focus researcher energy on attacking prediction challenges like ImageNet. Media extrapolation of complete transformation of human existence has (predictably) ensued.
Unfortunately machine learning has a troubled relationship with understanding the foundation of its achievements well enough to face demanding real world requirements outside the challenge setting. For example, its literature is admittedly corrupted by anti intellectual and anti scholarly tendencies. It is beyond irresponsible to build a revolutionary transformation on such a shaky pseudo-foundation.
In contrast, more traditional subdisciplines of data science like numerical linear algebra, applied probability, and theoretical statistics provide time-tested tools for designing reliable processes with understandable performance. Moreover, positive improvements in human well being have repeatedly been constructed using these foundations.
To illustrate these points we will review a recent boomlet in the ML literature in the study of eigenvalues of deepnet Hessians. A variety of intriguing patterns in eigenvalues were observed and speculated about in ML conference papers. We describe work of Vardan Papyan showing that the traditional subdisciplines, properly deployed, can offer insights about these objects that ML researchers had been seeking.
The annual Stiefel Lectures have created in honor of Eduard Stiefel (1909-1978) who was professor of mathematics at ETH Zürich. Stiefel has been the driving force for establishing “electronic scientific computing” with ERMETH (Elektronische Rechenmaschine der ETH). This became a landmark in computational and mathematical sciences with a huge impact to a broad range of applications in engineering and natural science. Stiefel has made fundamental and lasting contributions in mathematics, including the introduction of the Stiefel-Whitney classes, the Stiefel manifold and the conjugate gradient method. Stiefel advised 63 PhD students, many of whom became leaders in their field.