Scanning ACM tech news recently, I came across a piece that spoke to my inner nerd; I hope it will appeal to some of you also. The discovery will have no impact on markets or investments or probably anyone outside theories of machine learning. Its appeal is simply in the beauty of connecting a profound but obscure corner of mathematical logic to a hot domain in AI.

There is significant activity in theories of machine learning, to figure out how best to optimize neural nets, to understand what bounds we can put on the accuracy of results and generally to add the predictive power you would expect in any scientifically/mathematically well-grounded discipline. Some of this is fairly close to implementation and some delves into the foundations of machine learning.

In foundational theory, one question is whether it is possible to prove, within some appropriate framework, that an objective is learnable (or not). Identifying cat and dog breeds is simple enough – just throw enough samples at the ML and eventually you’ll cover all the useful variants. But what about identifying patterns in very long strings of numbers or letters? Since we can’t easily cross-check that ML found just those cases it should and no others, and since potentially the sample size could be boundless – think of data streams in a network - finding a theoretical approach to validate learnability can look pretty attractive.

There’s a well-established mathematical framework for this analysis called “probably approximately correct (PAC) learning” in which a learning system reads in samples and must build a generalization function from a class of possible functions to represent the learning. The use of “functions” rather than implementation details is intentional; the goal is to support a very general analysis abstracted from any implementation. The target function is simply a map between an input sample (data set) and the output value, match or no match. There is a method in this theory to characterize how many training samples will be needed for any given problem, which apparently has been widely and productively used in ML applications.

However – when a theory uses sets (of data) and functions on those sets, it strays onto mathematical logic turf and becomes subject to known limitations in that domain. A group of mathematicians at the Technion-Israel Institute of Technology in Haifa have demonstrated that there exist families of sets together with target learning problems for which learnability can neither be proved nor disproved within the standard axioms of mathematics; learnability is undecidable (or more precisely, independent of the base mathematical system, to distinguish this from computability undecidability).

If you ever read “Gödel, Escher and Bach” or anything else on Gödel, this should sound familiar. He proved, back in 1931, that it is impossible for any mathematical system to prove all truths about the integers. There will always be statements about the integers that cannot be proved either true or false. The same restriction applies to ML it seems; there are learning problems for which learnability cannot be proved or disproved. More concretely, as I understand it, it is not possible to determine for this class of problem an upper bound to the number of training samples you would need to supply to adequately train the system. (Wait, what about proving this from the halting problem? The authors used Gödelian methods, so that's what I describe here.)

This is unlikely to affect ML as we know it. Even in mathematics, Gödelian traps are few and far between, many quite specialized although a few like Goodstein’s theorem are quite simple. And of course we know other problems, like the traveling salesman problem which are theoretically unbounded yet are still managed effectively every day in chip physical design. So don’t sell your stock in ML-based enterprises. None of this will perturb their efforts even slightly. But it is pretty, nonetheless.

Bernard Murphy