Tag: machine learning

  • Why AI is not AI until it wonders why

    Reflections on Judea Pearl’s science of causal reasoning

    Dr. Mark Freestone lecturing on the Alan Turing stage at CogX19.

    Dr. Mark Freestone lecturing on the Alan Turing stage at CogX19.

    Back in June I was lucky enough to attend (indeed, exhibit at) this year’s  CogX Festival of AI and Emerging Technology  in London. It’s a fantastic event stuffed full of fascinating presentations— I urge you to come next year if you can — but of all the great talks I saw and encounters I had, one in particular stood out enough to make me want to sit down and write a blog post about it.

    The presentation in question was by  Dr. Mark Freestone. Freestone is a Senior Lecturer in the Centre for Psychiatry at the Wolfson Institute for Preventive Medicine; he’s also an Alan Turing fellow. What he had to say chimed powerfully with the contents of a book I’d read just a few weeks before, and which for my money sits alongside  Daniel Kahneman’s “Thinking, Fast and Slow”  as one of the books of the decade.

    This was  Judea Pearl’s “The Book of Why”  (co-written with Dana Mackenzie), which is about cause and effect in statistics. As a student of philosophy and psychology and a part-time data scientist I’ve spent a fair chunk of my intellectual life pondering these things, which is why I picked it up and read it in the first place. And what a revelation it proved to be.

    A historical timeline info board about the importance of Judah Pearl and Bayesian networks in the history of AI, from the “AI: More than Human” exhibition at London’s Barbican Centre, August 2019

    A historical timeline info board about the importance of Judah Pearl and Bayesian networks in the history of AI, from the “AI: More than Human” exhibition at London’s Barbican Centre, August 2019

    Statistics tells us an enormous amount about the world, and now — thanks to analytical techniques of various flavours (from logistical regression to neural nets)— we’re baking statistical analysis at scale into the extraordinary data structures we’ve been building since the invention of the micro-processor and, more recently, the Internet.

    We’ve now, with our usual hubris (and usual slavish adherence to the dictates of marketing), decided to call this development  artificial intelligence, despite the fact that it’s not really intelligence at all but is, rather, pattern recognition and statistical analysis.

    I don’t mean to demean the achievements that have been made in these sectors. But when compared to the processes at work in human or animal brains, they are akin to those at the “automated function” end — object persistence or facial recognition in vision, for example. Closer, therefore, to sensory perception than to the abstract cortical processing and decision-making that we generally refer to as “intelligence” (unless you work in marketing).

    As every statistician knows, you see, “correlation is not causation.” But as Pearl points out in “The Book of Why”:

    >
    Unfortunately, statistics has fetishized this commonsense observation. It tells us that correlation is not causation, but it does not tell us what causation is…. Student [of statistics] are not allowed to say that X is the cause of Y — only that X & Y are “related” or “associated”.
    — Judea Pearl, “The Book of Why”

    Even more than that, statisticians have maintained for decades that correlation was enough, that causation was either unfathomable or not required. So perhaps it’s no surprise that, as the crowning achievement of the discipline that has produced it, correlation is what the current crop of AI technology does, and does very well indeed.

    This is fine if we want an  AI  to tell an image of a dog from an image of a cat; to recognise a face or a voice or a word or a cancer cell in the midst of healthy tissue; to calculate routes and identify cars and pedestrians; even to work at how to win at video games. Iterated pattern recognition of labelled data with backward propagation for error correction bolstered by a range of other techniques to simulate the contributions of human memory or the layering function of the mammalian visual cortex can handle all of this admirably. If you ally the techniques to more traditional AI techniques like decision trees and other higher order logics, you can start beating grandmasters at chess or Go and start building (or attempting to build) self-driving cars.

    The trouble starts when we want to ask why something happened or predict what  might happen if  in systems as unconfined and messy as the untrammelled physical world rather than in closed and rule-bound environments such as a Go board or the neatly laid out traffic grid of the average mid-Western US town. Observational data sets of the kind used to train neural networks in pattern recognition do not contain the answers to these kinds of questions, you see (q.v. “correlation is not causation”). When it comes to causal or predictive questions (predictive in the sense of predicting the future, rather than predicting the likelihood of a classification), “data are profoundly dumb”.

    In other words, in the realm of actual thinking, rather than the processing that our visual cortices perform on the patterns of light that play across our retinae, these processes do not replicate what is going on in our heads. We do not use correlation to work out what might happen next. It’s part of the toolkit we might deploy, but it isn’t by any stretch the core mechanic of how we think.

    When it comes to figuring out causation, we instead use scenarios and counterfactuals. We use fictions, not facts (a point that appeals to the  novelist in me, as you might well guess). These fictions have their basis in fact (well, most of the time), but even so they are built on relatively few immediate data points. They are instead largely constructed from multiple reconstituted examples from our experience — what we call “common sense”. They also inherently probabilistic — something they have in common with correlation. What they don’t share with correlation, however, is the ability “to predict the effects of an intervention without actually enacting it,” as Pearl puts it in his book.

    Well, I hear you say, doesn’t AlphaGo do exactly that? And the answer is, no, it does not. AlphaGo enacts millions of virtual scenarios along multiple forking paths of action to produce highly complex statistical analyses of possible outcomes, which are then encoded into the weights of its deep neural nets. This is incredibly effective and even capable of producing previously unappreciated insight into Go’s game mechanic (AlphaGo’s now famous move 37 and, subsequently, Sedol’s move 78). And it may even be, in the broadest sense, akin to what Lee Sedol himself is doing when he’s playing Go. But it’s not what Lee Sedol is doing when he’s trying to work out what he should buy his daughter for her birthday.

    When Lee Sedol does that, he is spinning up various counterfactual scenarios involving various versions of his daughter and himself, various gift options, and a whole range of family scenarios possibly stretching well into the future, scenarios that “reflect the very structure of [his] world model.” None of these scenarios will have happened in the past, and none of them will happen in the future, but he will make a choice dependent on whichever of them conforms most closely with his world model. And then, when he sees his daughter’s (and his wife’s) reaction to the gift, he’ll perhaps embellish his world model according to the difference between his prediction and the perceived reality, thus deploying a training set, not of AlphaGo’s millions of examples, but of just one.

    What’s strange about this is that the empirical observation can never fully confirm or refute the counterfactual. And yet counterfactuals are the primary tools we have for guiding our journey through the world in a cybernetic fashion, and thus are “the building blocks of moral behaviour as well as scientific thought.”

    Current  AI  does not benefit from this mode of human thought. The importance of Pearl’s work, as encapsulated in “The Book of Why”, is that over the last three decades he has developed a method, a “causal calculus”, to enable the “algorithmization of counterfactuals”, and thus make them available to use by thinking machines.

    What is causal calculus? In essence, it’s a way of modelling the probability (P) of an event (L) happening if an action (X) takes place, while taking into account both mediating variables (so enabling the calculus to model of indirect as well as direct relationships between action and outcome) and influencing variables (so enabling the calculus to quantify and/or isolate other factors that may confuse, complexify or obscure the key relationship being interrogated much as the paragon of this form, the randomised control trial, seeks to do).

    Pearl and his collaborators have developed a visual vernacular for mapping the causal relationships between these various elements for any given situation. These causal graphs in turn allows the construction of counterfactuals: how the mapped causal calculus if an influencing variable impinged on this node instead of that node, or if a mediating relationship should turn out to be reciprocal instead of just one way, for example. Once the pathways have been mapped, it then becomes possible to take data generated in one scenario and test its validity or plausibility in another, apparently comparable, scenario.

    This is much more than a Bayesian prior, though priors play an important role in estimating the initial conditions for any given do-calculus, as Pearl terms his graphs. But the do-calculus itself goes far beyond Bayesian techniques in its power and implications as it compartmentalises and tracks conditions that comprise the system under study, rather than taken a global probability snapshot at a given stage and feeding it back into the evolving prediction calculation. (For a nice summary of the technique — and a second recommendation of the book — check out  this Medium post  by data scientist Ken Tsui).

    As you’d expect, in “The Book of Why” Pearl gives plenty of good toy examples of the do-calculus; what’s particularly interesting about these is the way that even a very simple causal graphs with only five or six nodes can help unpick incredibly thorny issues like the demonstration of causal relationship between smoking and lung cancer, or the comparative impacts of nature and nuture on personality.

    It’s in these test cases, too, that we are able to see the profound impact of this approach on the entire discipline of statistics. It means no less than:

    >
    the mantra “Correlation does not imply causation” should give way to “Some correlations do imply causation.”
    — Judea Pearl, “The Book of Why”

    The do-calculus is the technique that allows us — or our computers — to interrogate situations and their counterfactuals to work out which correlations those are.

    Dr. Freestone continues his lecture at CogX19, with an example of a full causal graph.

    Dr. Freestone continues his lecture at CogX19, with an example of a full causal graph.

    So now, I hope, it should be apparent why Mark Freestone’s talk at CogX 2019 excited me so much. It was the first example I’d come across since reading Pearl’s book of someone applying the do-calculus in the wild. As you can see from the photograph above, the causal graph of actions, outcomes, influences and mediators gets pretty crazy pretty fast when you’re trying to understand cause and effect in a situation as complex as the development of risk models for prediction and management of violence in mental health services (the focus of Freestone’s study).

    The approach is also already beginning to make an impact on robotics. Start-up  Realtime Robotics  is making great progress on enabling interactive movement in machines by using  counterfactual causal models, creating a specialised processor and scripting language (Indigolog) specifically to enable it. DeepMind has been mucking about in this area too, as you’d expect. Check out some of their research findings  here.

    I don’t pretend to be an expert in the do-calculus by any stretch. I’m writing this post partly to celebrate Pearl’s work, partly to tell you that it’s worthy of your attention if you haven’t come across it before, and partly to help me explain it to myself. To really grasp it I need to reread the whole book then start working through some trial examples; if I manage to get round to this while trying to close  Hospify’s seed round (which is keeping me pretty busy), I’ll let you know.

    In the meantime, do go and read “The Book of Why” for yourself. I promise you it’s worth it.

  • What AI wants

    A review of “Architects of Intelligence” by Martin Ford (Packt, 2018)

    Towards the end of last year I was sent a review copy of a new book on AI. I read it over Christmas, and since it was pretty good I thought I’d do a little review of it here, as I’ve been pretty busy managing a growth spurt at  Hospify  lately, and haven’t published on my Medium blog for a while.

    The book is called “Architects of Intelligence: The Truth about AI from the people Building It”, and as the title suggests it’s a collection of interviews, conducted by  Martin Ford, whose recent book “Rise of the Robots” (which I haven’t read) won various plaudits including Financial Times Business book of the year.

    When I received the book, I have to confess wasn’t overwhelmed with enthusiasm. I read a fair number of AI newsletters and listen to quite a few AI podcasts (TWIML AI  is my favourite — shout out to Sam Charrington), and glancing through the list of interviewees it felt a bit like a collection of musings by the usual suspects — Yoshua Bengio, Geoffrey Hinton, Yann LeCun Demis Hassabis, Ray Kurzweil, Gary Marcus, Andrew Ng, Jeff Dean et al. While this is undoubtedly a list of extremely smart people, all of whom have excelled in their fields and have an enormous amounts to say, I did feel like I’ve already read or listened to plenty of interviews with all of them, and wondered what another set of “in conversation” transcripts could add. After all, AI is a technical field, and there’s a point at which general discussion about it loses its charm, and what you want from practitioners is a deeper dive into the nuts and bolts of the techniques involved (one of the things TWIML AI strives for, and often achieves), rather than yet another overview of the landscape.

    However, as I worked my way through the 23 interviews in the book, it became apparent that Ford was doing something quite interesting. The big news in AI at the moment is of course the advances made in deep learning by many of the luminaries listed above — Bengio, Hinton, Hassabis, Ng, LeCun in particular. Ford groups these guys in the first half of the collection, interspersing them with philosophical covering fire from the likes of Nick Bostrom and Ray Kurzweil, the latter making the core point in his interview that “connectionism can emulate a rule-based approach, [but] a rule-based system really cannot emulate a connectionist system, so the converse statement is not the case.”

    This is all great as far as it goes. Don’t get me wrong. I love deep learning. I first came across Hinton’s work (with Rumelhart and McClelland) in connectionism (i.e. deep learning and neural nets) as a post-grad back in 1992, and it blew my mind. I think that the work that’s been in recent years to implement it has been amazing, and it doesn’t need me to tell you that it’s been so transformative for computing in that it has re-energised the whole industry. I’ve even coded some deep learning projects  of my own.

    At the same time, however, as a philosophy and neurophysiology student, and I was (and remain) sceptical of claims that deep learning neural nets, which really only approximate how we think the brain works in the vaguest sense, are all there is to intelligence. After all, intelligence is not just about learning. It’s also about wanting. The philosopher I loved best as a post-grad was Gilles Deleuze, who described the abstract processes that underpin life (and thus intelligence) as “desiring machines”. Deep learning captures the machinic aspect of this tuple as well as any process we’ve been able to instantiate on a computer. What it doesn’t capture is the “desiring”. So far, that bit — the bit that the system wants, the thing that justifies its existence, causes it to act, and assesses its outcomes — is still provided — at every juncture — by the human beings that build the system. Until the machine takes over at least some aspect of that, “artificial intelligence” will remain a misnomer. What we’ll have, as we have now, is “augmented intelligence” or, “machine learning”, the term I prefer.

    This niggle is the point at which “Architects of Intelligence” starts to gain purchase and subsequently gets really interesting. Ford has grouped the sceptics (by which I mean those who believe, as I do, that this is more than a niggle, but the whole crux of the problem), so just when you think the argument is done the voices of dissent begin to surface. The psychologist & technologist Gary Marcus is kind of their cheerleader, but there are many other folk here with strong hands on experience, particularly in robotics — Rana El Kaliouby, Rodney Brooks, Cynthia Breazeal, Joshua Tenenbaum — who point out, repeatedly and convincingly, the hard truth that deep learning, while being fantastic for pulling patterns out of vast data mines and finding signal in noise way beyond human abilities, is just not well-suited to generalising from smaller sample sizes. In the world of textual, voice or image analysis this isn’t a problem, at least not now we have the vast resources of the global internet to throw at it. But once you get out of server farms and into the physical world it’s a showstopper, as generalisation from small sample sizes is the way in which genuinely (as opposed to artificially) intelligent agents are able to extrapolate a desire-driven route through a world about which, in fact, they know very little.

    The conceptual covering fire in this second half of the book is provided by Judea Pearl, who like Hinton was inspired by the work of David Rumelhart, and like Kurzweil has genuinely technical chops, but who provides some serious insights into the nature of probability and causal modelling (and thus desire, to continue with my earlier terminology), demonstrating a philosophical depth well beyond the somewhat excitable futurist extrapolations of his counterparts in the first half the book.

    Ford summarises the point perfectly: “causation can never be learned from data alone.” In statistics, this is almost a cliché. “Causation is not correlation” is a phrase statisticians regularly repeat to each other (and their students) to remind themselves of the importance of not reading too much into raw data. But the cliché, as clichés so often are, is a flag pinned on the site of a profound truth. Not only is not causation not correlation, but they are entirely different beasts, one of which — any quantum physicist will tell you — we can only barely claim to understand. To quote Pearl:

    >
    A child learns causal structure by playful manipulation, and this is how a scientist learns causal structure — playful manipulation. But we have to have the abilities and the template to store what we learn from this playful manipulation so we can use it, test it and change it. […] That is the first thing that we have to learn; we have to program computers to accommodate and manage that template.
    — Judea Pearl

    Neural nets take sample sizes of thousands or millions, and discover patterns in the data. Humans take a sample size of one and generalise patterns from that seed. This is the difference between learning and intelligence. We’ve built machines that exhibit the former. We’re a very long way from building them to exhibit the latter.

    If you stick with it through all of its 500 or so pages, therefore (and I suggest that you do, and that you read the interviews sequentially), I think you’ll find as I did that far from being an ad hoc round up of the self-aggrandising thoughts of the current crop of machine learning geniuses, “Architects of Intelligence” is in fact a very dense and complete overview of the field and a great map of both its achievements and its very real challenges.

    While it doesn’t have much technical detail, what it does have is genuine philosophical and conceptual nuance, and I came away from reading it assured of what I’ve been feeling for some time — that artificial general intelligence is much further off that many would have us believe, but that far from being a disappointment this in fact means is that the current excitement around deep learning is only the first in many such similar flowerings that are going to keep humanity busily engaged for the next few centuries at least, as we edge our way towards an increasingly profound understanding of what it means to build a mind.