Research in Reading

Navigating the C++ forest

2018-05-07T02:17:00.000-07:00

I have recently been working my way through 'C++ Primer', 5th edition by Lippman, Lajoie and Moo. The front cover says the book has been a bestseller since 1986 and also that the most recent edition has been completely rewritten for the new C++11 standard. The authors have worked in leading companies and laboratories such the Bell Laboratories, Pixar, Microsoft, IBM and AT&T. And they have worked closely with the creator of C++, Bjarne Stroustrup. The contents of the book live up to the billing on the cover!

I learnt some C++ a long time ago with the help of a book called 'C++: A Beginner's Guide', 2nd edition by Schildt and a university course based on the book 'Guide to Scientific Computing in C++' by Pitt-Francis and Whiteley. These books are both fairly introductory. I would recommend them, not least because they are both much shorter than C++ Primer.

The advantages of C++ Primer, in my opinion, are that it gives a more comprehensive overview of the language and has a strong emphasis on Modern C++, e.g., features that were introduced in the C++11 standard. Somehow C++ Primer achieves this without being dry!

Another popular C++ book is Bjarne Stroustroup's 'The C++ Programming Language.' I borrowed this from my university library before I bought C++ Primer and didn't really like it. I think it covers advanced material in more depth than C++ Primer, but it reads much more like a reference book than a tutorial. It is longer than C++ Primer and I suspect it contains a lot of material that is not relevant to what I'm doing.

The main thing that I've learnt over the last few months is that C(++) is not really one programming language.

Although you can compile and run programs written in C, old-style C++, and Modern C++ using the same compiler, they look and feel very different from each other. When I only knew C and old-style C++ I found it very difficult to understand programs written in Modern C++.

The C language first appeared in 1972. If you write in C then you have to learn about pointers and use arrays if you want to do calculations with vectors and matrices. Dynamic memory allocation is particularly awkward to implement. There are not that many concepts to learn, and I think it is not a bad language to learn if you are prepared to learn some basic low-level programming and want to write code that runs quickly. Because C is so simple it tends to be easier to interface with other languages, such as Fortran and Matlab.

C++ introduces features that are not available in C such as function overloading, inheritance, and the idea of data abstraction. It is designed to facilitate Object-Oriented Programming (OOP). Code written in OOP separates interface from implementation. It is designed to give full control to library developers and stop users from doing anything stupid. I generally find code written in OOP difficult to read. I like to work in an environment where I can implement algorithms myself, but also have access to a library of algorithms that I can call easily. Old-style C++ does not really seem to be set up for this.

In contrast, it is relatively easy to learn Modern C++, at least to the level where you can use it for simple programming tasks. There doesn't seem to be a precise definition of Modern C++, but this page has a fairly good summary - https://docs.microsoft.com/en-gb/cpp/cpp/welcome-back-to-cpp-modern-cpp. I think it is fair to say that I knew almost nothing about Modern C++ from the initial learning I did (which was around 10 years ago).

Although I am not generally a fan of new standards for programming languages, I have noticed that a lot of answers on Stack Overflow for C++ questions use C++11. Often solutions for different standards are presented and the C++11 standard looks nicer / more elegant.

If you want to write programs that run fast without having to learn much / anything about low-level programming, Modern C++ is probably the way to go. The vector and string classes in the Standard Library implement dynamic memory allocation without the user having to know anything about how this is implemented. So it is possible to write clean looking programs that are actually doing quite sophisticated operations.

Really understanding Modern C++, to the point where you can write high-quality libraries requires a lot of effort. To give you some idea, C++ Primer starts with a section called The Basics, which runs to about 300 pages. The reason it is long is because there is a lot to cover, not because the writing is verbose! Object-Oriented Programming (including concepts such as inheritance) is not discussed until p600. And the section on Advanced Topics starts on p715.

In summary, the world of C++ is difficult to navigate, at least in my experience. C++ Primer is the best guide that I have found. Chapter 1 of C++ Primer on Getting Started (only 30 pages) is particularly impressive for the range of ideas that it introduces. I would recommend C++ Primer to people who are literally just getting started with programming as well as to people who want to write high-quality libraries.

an ancient annal of computer science

2018-03-29T01:44:00.000-07:00

Over the last year I have been interested in developing my programming / coding, to get to the point where I can be more confident of sharing my code with other people. And also to be able to contribute to general purpose numerical / statistical software.

As part of this effort I have dipped in to The Art of Computer Programming (TAOCP) by Donald Knuth. The cover says "this multivolume work is widely recognized as the definitive description of classical computer science." American Scientist listed it as one of the 12 top physical-science monographs of the 20th century alongside monographs by the likes of Albert Einstein, Bertrand Russell, von Neumann and Wiener - http://web.mnstate.edu/schwartz/centurylist2.html.

I am sure there are many other books that cover similar material at a more introductory level, but I find something exciting about going back the source and reading an author who was personally involved in fundamental discoveries and developments.

There are also probably more modern accounts of computer programming that better reflect more recent innovations. Knuth himself encourages readers of TAOCP to look at his more recent work on Literate Programming. But I also think it is worth dwelling on things that have proven to be useful to a wide range of people over an extended period of time.

I have Volume 1 in the Third Edition of TAOCP, published in 1997, which is already prehistoric in some senses - it is before Google was founded (1998) and way before Facebook was launched (2004). However parts of the book date a lot further back than that - Knuth's advice on how to write complex and lengthy programs was mostly written in 1964!

Here is a summary of that advice (p191-193 of TAOCP Volume 1),

Step 1 : develop a rough sketch of the main top-level program. Make a list of subroutines / functions that you will need to write. "It usually pays to extend the generality of each subroutine a little."
Step 2 : create a first working program starting from the lowest-level subroutines and working up to the main program.
Step 3 : Re-examine your code starting from the main program and working down studying for each subroutine all the calls made on it. Refactor your program and subroutines.

Knuth suggests that at the end of Step 3 "it is often a good idea to scrap everything and start again". He goes on to say "some of the best computer programs ever written owe much of the success to the fact that all the work was unintentionally lost, at about this stage, and the authors had to begin again." - quite a thought-provoking statement!

Step 4 : check that when you execute your program, everything is taking place as expected, i.e., debugging. "Many of today's best programmers will devote nearly half their programs to facilitating the debugging process in the other half; the first half, which usually consists of fairly straightforward routines that display relevant information in a readable format, will eventually be thrown away, but the net result is a surprising gain in productivity."

I don't know whether today's best programmers still do this. I know some pretty good programmers and have been surprised how much effort they devoted to the kind of activity that Knuth is describing. Personally I now rely quite a lot on the debugger in Visual Studio, and (indirectly) on compilers to give me most of the debugging information I need for not much effort.

Pensions for professors

2018-03-02T02:26:00.001-08:00

It is not often that universities make front page news but the recent strike by university lecturers seems to have got quite a lot of media coverage.

On the surface it looks like quite a straight-forward dispute about money. University vice-chancellors (represented by a body called Universities UK) are proposing to reduce the pensions that university staff will receive in the future. The reason they are doing this is that existing contributions to the pension fund for universities (the USS) are not expected to cover the cost of future pensions.

One political commentator, who I have a lot of respect for, Daniel Finkelstein, has said that lecturers are striking against themselves. He argues that increased contributions from universities to the USS would have a damaging effect on university lecturers. As a result of increasing contributions, universities would have to either pay lecturers a lower salary and/or employ fewer of them.

He also argues that it would be unfair for the government to increase funding to universities in order to pay generous pensions at a time when the NHS is strapped for cash, prisons seem to be nearing a state of anarchy and universities are already generously funded by students through expensive tuition fees. A large chunk of these tuition fees may end up being paid by the government if students are unable to pay back their loans.

While I find this line of reasoning quite persuasive, it seems to be predicated on the assumption that there will be an indefinite squeeze on the nation's finances. As country we have had around 7 years of government austerity. Recent news suggests that this austerity has been successful in eliminating the government deficit from around £100bn a year down to zero - https://www.ft.com/content/3f7db634-1cac-11e8-aaca-4574d7dabfb6.

So will the squeeze be indefinite or are we approaching the end of it? Nobody really knows. As of 12 months ago, the OBR, which produces official forecasts of the government deficit, was still forecasting a large deficit for 2018-19. But tax receipts have been a lot stronger than expected. Speaking from personal experience, these things are difficult to forecast!

My view is that economic growth and tax receipts will be stronger than they have been for much of the last 10 years. As a result, the USS will probably not run out of money and if it does, the government should inject some extra cash to keep it afloat. There are many competing spending priorities for the government, but I think that attracting and retaining bright people across the public sector is essential. While there are many who are drawn to the public sector purely with a desire to contribute to society, generous public sector pensions do play a big role in encouraging people to stay. I think these pensions should continue so that public services can flourish as they ought to.

tools for writing code, life, the universe and everything - can anything beat emacs?

2017-12-08T08:36:00.000-08:00

I have recently finished a 6 month placement with NAG (the Numerical Algorithms Group) based in Oxford. One of the things I picked up there was how to use emacs for writing code and editing other text.

Previously I have always written code in programs that are designed for specific languages, such as RStudio or Matlab.

Emacs is designed to be a more generic tool that, in principle, can be tailored to any kind of text editing, including coding. As a popular open source project emacs has many contributed packages. I used it mainly for writing code in Fortran, but it has modes for pretty much every widely used programming language. I also used it for writing LaTeX and for writing / editing To Do lists using Org mode.

Beyond it's usefulness as a text editor emacs has many other functions. For example it has a shell, which behaves similarly to a command-line terminal but with the useful property that you can treat printed output as you would any other text. I find myself quite frequently wanting to copy and paste from terminal output, or to search for things, such as error messages. This is quick and easy in emacs.

So will I ever use anything other than emacs again,... for anything? I think truly hardcore emacs fans do use it for literally everything - email, web browsing, even games emacs -amusements. But I am not part of that (increasingly exclusive) club. I find emacs a pain for things that you do infrequently - a shortcut isn't really a shortcut if you have to use google to remind you what it is!

I think the two main selling points of emacs are (i) anything that you do repeatedly using a mouse, you will be able to do at least as quickly in emacs, (ii) it does great syntax highlighting of pretty much any kind of text.

Statistics in medicine

2017-03-30T09:06:00.001-07:00

Last week I went to the AZ MRC Science Symposium organised jointly by Astra Zeneca and the MRC Biostatistics Unit. Among a line-up of great speakers was Stephen Senn, who has an impressively encyclopaedic knowledge of statistics and its history, particularly relating to statistics in medicine. Unfortunately his talk was only half an hour and in the middle of the afternoon when I was flagging a bit, so I came away thinking 'that would have been really interesting if I had understood it.' In terms of what I remember, he made some very forceful remarks directed against personalised medicine, i.e., giving different treatments to different people based on their demography or genetics. This was particularly memorable because several other speakers seemed to have great hopes for the potential of personalised medicine to transform healthcare.

His opposition to personalised medicine was based on the following obstacles, which I presume he thinks are insurmountable.

Large sample sizes are needed to test for effects by sub-population. This makes it much more expensive to run a clinical trial than the more traditional case where you only test for effects at the population level.
The analysis becomes more complicated when you include variables that cannot be randomized. Most demographic or genetic variables fall into this category. He talked about Nelder's theory of general balance which can apparently account for this in a principled way. Despite being developed in the 1970's it has been ignored by a lot of people due to its complexity.
Personalised treatment is difficult to market. I guess this point is about making things as simple as possible for clinicians. It is easier to say use treatment X for disease Y, instead of use treatment X_i for disease variant Y_j in sub-population Z_k.

Proponents of personalised medicine would argue that all these problems can be solved through the effective use of computers. For example,

Collecting data from GPs and hospitals may make it possible to analyse large samples of patients without needing to recruit any additional subjects for clinical trials.
There is already a lot of software that automates part or all of complicated statistical analysis. There is scope for further automation, enabling the more widespread use of complex statistical methodology.
It should be possible for clinicians to have information on personalised effects at their fingertips. It may even be possible to automate medical prescriptions.

It's difficult to know how big these challenges are. Some of the speakers at the AZ MRC symposium said things along the lines of 'ask me again in 2030 whether what I'm doing now is a good idea.' This doesn't exactly inspire confidence, but at least is an open and honest assessment.

As well as commenting on the future, Stephen Senn has also written a lot about the past. I particularly like his description of the origins of Statistics in chapter 2 of his book 'Statistical Issues in Drug Development',

Statistics is the science of collecting, analysing and interpreting data. Statistical theory has its origin in three branches of human activity: first the study of mathematics as applied to games of chance; second, the collection of data as part of the art of governing a country, managing a business or, indeed, carrying out any other human enterprise; and third, the study of errors in measurement, particularly in astronomy. At first, the connection between these very different fields was not evident but gradually it came to be appreciated that data, like dice, are also governed to a certain extent by chance (consider, for example, mortality statistics), that decisions have to be made in the face of uncertainty in the realms of politics and business no less than at the gaming tables, and that errors in measurement have a random component. The infant statistics learned to speak from its three parents (no wonder it is such an interesting child) so that, for example, the word statistics itself is connected to the word state (as in country) whereas the words trial and odds come from gambling and error (I mean the word!), has been adopted from astronomy.

Pushing the boundaries of what we know

2017-03-06T01:41:00.000-08:00

I have recently been dipping into a book called 'What we cannot know' by Marcus du Sautoy. Each chapter looks at a different area of physics. The fall of a dice is used as a running example to explain things like probability, Newton's Laws, and chaos theory. There are also chapters on quantum theory and cosmology. It's quite a wide-ranging book, and I found myself wondering how the author had found time to research all these complex topics, which are quite different from each other. That is related to one the messages of the book - that one person cannot know everything that humans have discovered. It seems like Marcus du Sautoy has had a go at learning everything, and found that even he has limits!

I think the main message of the book is that many (possibly all) scientific fields have some kind of knowledge barrier beyond which it is impossible to pass. There are fundamental assumptions which, when you assume they are true, explain empirical phenomena. The ideal in science (at least for physicists) is to be able to explain a wide range of (or perhaps even all) empirical phenomena from a small set of underlying assumptions. But science cannot explain why its most fundamental assumptions are true. They just are.

This raises an obvious question: where is the knowledge barrier? And how close are we to reaching it? Unfortunately this is another example of something we probably cannot know.

In my own field of Bayesian computation, I think there are limits to knowledge of a different kind. In Bayesian computation it is very easy to write down what we want to compute - the posterior distribution. It is not even that difficult to suggest ways of computing the posterior with arbitrary accuracy. The problem is that, for a wide range of interesting statistical models, all the methods that have so far been proposed for accurately computing the posterior are computationally intractable.

Here are some questions that could (at least in principle) be answered using Bayesian analysis. What will earth's climate be like in 100 years time? Or, given someone's current pattern of brain activity (e.g. EEG or fMRI signal) how likely are they to develop dementia in 10-20 years time?

These are both questions for which it is unreasonable to expect a precise answer. There is considerable uncertainty. I would go further and argue that we do not even know how uncertain we are. In the case of climate we have a fairly good idea of what the underlying physics is. The problem is in numerically solving physical models at a resolution that is high enough to be a good approximation to the continuous field equations. In the case of neuroscience, I am not sure we even know enough about the physics. For example, what is the wiring diagram (or connectome) for the human brain? We know the wiring diagram for the nematode worm brain - a relatively recent discovery that required a lot of work. The human brain is a lot harder! And even if we do get to the point of understanding the physics well enough, we will come up against the same problem with numerical computation that we have for the climate models.

There is a different route that can be followed to answering these questions, which is to simplify the model so that computation is tractable. Some people think that global temperature trends are fitted quite well by a straight line (see Nate Silver's book 'The signal and the Noise'.) When it comes to brain disease, if you record brain activity in a large sample of people and then wait 10-20 years to see whether they get the disease, it may be possible to construct a simple statistical model that predicts people's likelihood of getting the disease given their pattern of brain activity. I went to a talk by Will Penny last week, and he has made some progress in this area using an approach called Dynamic Causal Modelling.

I see this as a valuable approach, but somewhat limited. For its success it relies on ignoring things that we know. Surely by including more of what we know it should be possible to make better predictions? I am sometimes surprised by how often the answer to this question is 'not really' or 'not by much'.

The question of what is computable with Bayesian analysis is still an open question. This is both frustrating and motivating. Frustrating because a lot of things that people try don't work, and we have no guarantee that there are solutions to the problems we are working on. Motivating because science as a whole has a good track record of making the seemingly unknowable known.

Writing tools, silence & parenthood

2016-12-06T03:55:00.000-08:00

Collaborative writing tools

I have been working on a paper recently with two co-authors. It has been a bit of a challenge finding the right pieces of software that will allow us to track edits while remaining in LaTeX. When I worked in the civil service, Word was the de facto software for producing written documents. It was a lot better than I thought it would be, and I still think the Track Changes functionality beats everything else I have tried hands down when it comes to collaborative editing. I also learnt that, using Word, you can produce documents with typesetting that looks professional, if you know what you are doing, and if someone has invested the time in creating a good template for your needs. However in the last couple of years I have returned to LaTeX, because it is what mathematicians use, and because I find it better for equations, and for references.

In the last few weeks I have been trying out Overleaf. This is one of a handful of platforms for LaTeX documents with collaboration tools. As with a lot of good user-friendly pieces of software you have to pay to get the most useful features. With Overleaf, the free service provides a workable solution. Overleaf allows you to access your LaTeX documents through a web browser, and multiple people can edit the same online version. In the free version there are some basic bells and whistles, like being able to archive your work. I found this a bit confusing at first because I thought it was setting up multiple active versions with some kind of forking process. However this is not the case.

By combining Overleaf with git I have been able to fork the development process: I can edit one branch on my local computer (using my preferred LaTeX editor and compiler), while another person edits a different branch in the online version, or potentially on another computer. Using git also makes it easy to create a change log, and visualise differences between different versions, although this doesn't work quite as well for paragraphs of text as it does for code. Unless you put lots of line breaks into your paragraphs, you can only see which paragraphs have changed, and not which individual sentences have changed.

In the news...

2016 is drawing to a close and it has been a pretty shocking year for a lot of people in terms of national and global news. In the last few weeks, I have found an increasing tendency for people to be silent - to not want to talk about certain issues any more (you know what I mean - the T word and the B word). I guess this is partly because some topics have been talked to death, and nothing new is emerging, while a lot of uncertainty remains. However I also find it a bit worrying, that people may no longer be capable of meaningful engagement with people of different opinions and backgrounds. One thing I have become more convinced of over the last year is that blogs and tweets etc. are not a particularly helpful way of sharing political views (a form of silent outrage!?) So maybe the less I say here the better, even though I do remain passionately interested in current affairs and am fairly opinionated.

And in other news...

I have a baby boy! Born 4 weeks ago - both him and my wife are doing well. In the first 2 weeks I took a break from my PhD, and it was a bit like being on holiday, in that we had a lot of time, and a lot of meals cooked for us (by my wonderful mum). It hasn't all been plain sailing, but I am now under oath not to share the dark side on parenthood - especially not with non-parents, in case it puts them off! The last 2 weeks I have been getting back into my PhD. It is quite hard finding a schedule that works. We have a routine where he is supposed to be more active and awake between 5pm and 7pm, so that he sleeps well between 7pm and 7am. I have been trying to do a bit of work after he is settled in the evening and found it fairly challenging to be motivated and focused at that time. I have been wondering whether it would work better to try and get up before him in the mornings. I guess it will probably be challenging either way.

Learning about learning

2016-09-06T07:41:00.001-07:00

I recently attended the INCF (International Neuroinformatics Coordinating Facility) short courses and congress in Reading. It was quite wide-ranging with some people working primarily on MRI imaging, others on modelling of synaptic plasticity and learning algorithms, and quite a few other topics.

One area I was not really aware of before the conference was neuromorphic computing, which is about designing and building computing hardware based on principles of how the brain does computation. At the INCF short courses, this was presented by Giacomo Indiveri, and I subsequently looked at an introductory article by Steve Furber, who has lead the SpiNNaker project,

http://digital-library.theiet.org/content/journals/10.1049/iet-cdt.2015.0171

I am quite impressed by the dedication of people working in this field. Steve Furber says in his article that SpiNNaker has been 15 years in conception and 10 years in construction. This is enabling fast simulation of large-scale neural models, such as Spaun. On a standard computer, Spaun requires 2.5 hours of computation per second of real time. The system can perform simple cognitive tasks such as reinforcement learning and arithmetic. SpiNNaker aims to run Spaun in real time.

In the next few years, as part of the Human Brain Project, SpiNNaker will be used for larger models, and presumably be tested on progressively more demanding cognitive tasks. From my perspective, I am interested to see how large-scale neural models of biological intelligence will compare to engineered intelligence systems such as deep neural networks.

Engineered intelligence is free from the constraint of having to be faithful to biology. This gives it a massive advantage over simulated neural models when it comes to performing tasks. Ideas from biology have been influential in machine learning and artificial intelligence, but they have been heavily supplemented by numerical analysis and statistical computing.

At the moment many machine learning algorithms require huge amounts of computing power. So it will be interesting to see whether any new hardware emerges that can bring this down. It would be cool if state-of-the-art machine learning algorithms that today require the use of a supercomputer, could be run on an affordable battery operated device. And it will be interesting to see if the new neuromorphic machines that are emerging will drive engineers and scientists to further develop learning algorithms.

Summer reading

2016-08-01T05:44:00.001-07:00

I have recently been reading 'grit: the power of passion and perseverance' by Angela Duckworth, which I have found both fascinating and persuasive. Duckworth is a psychologist, interested in the differences between people who are talented but relatively low achievers compared with people who are high achievers. One of the main messages of the book is that talent counts but effort counts twice.

Determination, persistence, constancy, tenacity, and focus, especially in the face of setbacks and challenges appear to have a much larger effect on what people achieve than natural talent or innate giftedness.

I wish I could say these were all things I possessed in abundance, but I do not think that is the case. Nevertheless there is cause for hope as grit appears to increase with age. And perhaps being more aware of the importance of these qualities helps to cultivate them more.

In parallel I have been reading the Pickwick Papers by Charles Dickens, which tells the story of a group of kind hearted friends who travel around rural 19th century England, making new friends and getting into various kinds of trouble. It is quite good fun, but, in my opinion, not as well written as some of his later work such as Great Expectations. Perhaps a case in point where passion and perseverance on a single goal over a long period of time can lead to great things.

ISBA 2016

2016-06-20T05:30:00.000-07:00

I got back from ISBA 2016 at the weekend, having spent a week at the picturesque Forte Village resort in Sardinia. Last weekend also happened when UK astronaut Tim Peake returned from having spent 6 months on the International Space Station. Although I am sure returning from space requires more adjustment than returning from an international conference, I do feel like a bit like I have returned from another planet!

I cannot pretend to give an expert's view of the conference since there were many people there with decades more experience than me. The age distribution of the conference was heavily weighted towards young researchers (perhaps partly as a result of the generous travel support targeted towards this group). Nevertheless the age distribution was very wide with a fair number of people there in their seventies. One of these was Adrian Smith, who came to the first of the Valencia meetings, and gave an interesting perspective on how Bayesians have gone from being outsiders and regarded with a high degree of scepticism to being a dominant force in the world of statistical analysis. A simple illustration of this is the numbers at the conference which have grown from around 70 to around 700 over the course of around 40 years.

One feature of the conference that has remained the same (and perhaps a key ingredient to its continuing success!?) is the cabaret, which features Bayes inspired entertainment. The proceedings of the first Valencia meeting (which can be found here - http://www.uv.es/bernardo/Valencia1.pdf) printed the song "There's no Theorem like Bayes Theorem" to the tune of "There's no Business like Show Business" by the distinguished statistician G.E.P. Box.

I would strongly advise against searching for a YouTube rendition of Box's song. I do not know whether Box was as good a musician as he was a lyricist (and statistician), but his followers certainly seem to have a rather deficient sense of pitch and harmony.

Here are a few reflections on the academic program of ISBA 2016.

A lot of the talks fell into one of two broad categories. On the one hand, some talks focused on general inference problems, and the development of methodology that should be applicable to a wide range of problems in various application areas. On the other hand, some talks focused more on a specific application area, and looked at the challenge of adapting quite general statistical ideas to specific research questions.

The presenters who I found most stimulating were Adrian Raftery on demography and Sylvia Richardson on single-cell gene expression. These were both from the second category of talks (i.e. more oriented to a specific application), but the researchers have both also done important work on the first category (i.e. development of generally applicable statistical methodology). For me, their work demonstrates the value of working in both areas. They both have an impressive ability to identify problems that benefit from bayesian analysis. In Adrian Raftery's demography work, the novel application was the quantification of uncertainty in country-specific fertility rates by pooling information across countries through a hierarchical model. Sylvia Richardson's work on gene expression also used a hierarchical model, but in this case to quantify uncertainty in cell-specific gene expression levels, again by pooling information across cells. The main reason the bayesian approach is so effective in these problems is the small amount of data that is available per country (in demography) or per cell (in gene expression).

Although I found some of the presentations on general methodology quite stimulating (such as Peter Green's keynote lecture on structural learning), there were quite a few presentations which I felt were not well motivated, at least not in a way that I could understand. One area where there were quite a few presentations this year was Bayesian variable selection for linear regression models. In that setting you assume that the data is i.i.d. and the variable selection can be that of as a kind of model uncertainty, often encoded through the choice of prior. The reason I am somewhat sceptical about this kind of research is (i) the linear regression model may not be sufficiently complex to model the true relationships that exist between the variables in the dataset, (ii) if the linear regression model is appropriate, then the most important predictors for a given response can usually be picked up through a non-bayesian analysis such as a forward selection or a backwards elimination algorithm. This is based on my experience of fitting and using regression models as a practitioner in operational research.

To wrap up, I am deeply grateful to all the people who make the bayesian analysis community what it is, both through their research findings, through the hard administrative labour that must go into organising large scientific meetings, and through personal warmth and encouragement. I hope that it continues to be a vibrant community with vigorous exchanges between scientific applications and mathematical theory.

ABC in Helsinki

2016-05-23T05:02:00.000-07:00

Last week I went to a conference on ABC (Approximate Bayesian Computation) that was held on the Silja Symphony cruise ship between Helsinki and Stockholm. Thank you to Michael Gutmann and the other organisers for the opportunity to go on an exotic nordic odyssey with a stimulating scientific programme!

I came to the conference as something of an outsider, as I have not used ABC in my research so far. The main reasons I am interested in the area is that my supervisor Richard has done some work on ABC, with applications mainly focused on population genetics. And a lot of people who work with differential equation models are attracted to it.

I was interested in answering the following questions. When is ABC the preferred method compared to other statistical methods? And what situations is the use of ABC is to be avoided?

The short answer to the first question is that ABC is used when the likelihood of the model is not available. This may be because the likelihood is an intractable integral. For example, in population genetics the likelihood can only be obtained by integrating over the space of phylogenetic trees. As an aside, there is an alternative method for the population genetics example, which is the pseudo-marginal method. It is not clear to me whether ABC or pseudo-marginal methods are preferable for this example.

As well as situations where the likelihood is intractable (another example of this is the transition density in nonlinear SDEs), there may be situations where it is not possible to write down an expression for the likelihood at all. I have not encountered any situations like this myself, although I am told that a lot of agent-based or rule-based models fall into this category. In Reading there are people using these kind of models to describe population dynamics of earthworms and the interaction with pesticides.

Several of the conference talks (Richard Everitt, Richard Wilkinson and in an indirect way, Michael Gutman) focused primarily on the use of ABC in expensive simulators, such as climate simulators or epidemiology models. These talks mainly confirmed my view prior to the conference that expensive simulators should be avoided wherever possible because there are severe limitations on what you can do for parameter inference.

On a more positive note, there were examples in the talks where ABC was made tractable in models with 1 or 2 unknown parameters by combining ABC with other methods such as bootstrapping (RE) and Gaussian Processes (RW and MG). The basic idea, is that if we knew the probability distribution of the model's summary statistics as a function of the unknown parameters, we would be done. Tools like bootstrapping and Gaussian Processes can be used to estimate the summary statistics' probability distribution. They can also quantify uncertainty in estimators. This is not ideal since the relatively small sample size for these estimators means that the error in the estimators may be quite large and difficult to quantify accurately. However if you are only interested in classifying parameter sets as plausible vs non-plausible, or you only need estimates of mean parameter values, you may not need that many samples.

It is unclear to me what you would do when you have an expensive simulator with more than 2 unknown parameters. I am not sure that the methods presented in the conference would work that well in that setting, without (i) a lot of computing power, and/or (ii) strong assumptions on the probability distribution of the summary statistics as a function of the parameters.

One area that was not covered at the conference but that I would like to know more about is what to do when you have a deterministic simulator, like an ODE model. I have come across this situation in the literature, for example in the work of Michael Stumpf's group, where ABC has been used.

Suppose that a deterministic simulator is a perfect model of the process we were interested. Then repeated samples from the true process should yield exactly the same data. Furthermore, we should be able to identify a parameter set that perfectly reproduces the experimental results. (Or multiple parameter sets if the model is non-identifiable.) In practice, a far more common situation is that repeated samples from the true process give varied results even though the process model is deterministic. With a deterministic simulator there is no way of accounting for this variability. Two options in this situation are (i) introduce some stochasticity into the model, for example observation noise, (ii) define ranges for the summary statistics so that parameter values are accepted if the deterministic simulation results in summary statistics that fall within the pre-specified ranges.

If we choose option (i), I would then suggest using a likelihood based approach, and if that didn't work, then trying ABC. If we choose option (ii), this fits within the ABC framework. However in the standard ABC framework, the effect of reducing the tolerance on the posterior should diminish as the tolerance goes to 0. I.e. for small tolerances, dividing the tolerance by 2 should have a very small effect on the credible regions for the parameters. If you have a deterministic simulator, the effect of reducing the tolerance will not diminish. I haven't got all this worked out rigorously, but intuitively it seems like dividing the tolerance by 2 will always decrease uncertainty by a factor of 2. Furthermore the limit (tolerance = 0) is singular with respect to all non-zero tolerance values.

A quick google scholar search shows that Richard Wilkinson has thought about these issues. The gist of what he is saying seems to be that rather than treating the tolerance as a parameter that controls the accuracy of the approximation, it can be thought of as a parameter in a probabilistic model for the model error. If anyone knows of any other work that discusses these issues, please let me know! As you can see I am still rather a long way from being able to provide practical guidance on the use of ABC.

Parallels between finance and academic research environments

2016-05-09T14:15:00.002-07:00

I recently went to an event titled 'Heart of the City: What is Really Wrong with our Financial System?' where John Kay and Joris Luyendijk were speaking. John Kay is an economist and columnist for the Financial Times. Joris Luyendijk is an anthropologist by training who has done some financial journalism for The Guardian.

It was interesting learning about the workings and culture in the City, and I also found myself wondering about parallels in other walks of life.

John Kay emphasised that traditionally financial services have provided useful services by enabling capital accumulated by one individual or corporation to be borrowed and used for investment by another individual or corporation. In recent years this idea of banking as a service has been lost as banks have sought to maximise short-term profits by trading worthless bits of paper for a lot of money and taking a cut each time (this is somewhat simplistic - John Kay explained it better!)

While it is useful to establish what is going on the financial sector, it is also important to understand why this is happening. What are the causes for the unhealthy changes seen in finance? Joris Luyendijk's answer to this was that liquidity has replaced of loyalty. By loyalty he was referring to the loyalty that the employees of city banks have for the companies they work for. And by liquidity he was referring to the volume of transitions between banks made by city workers. His view was based on a fascinating collection of evidence obtained from interviewing over 200 city workers, which he wrote up as a book, "Swimming with Sharks."

For example, it is apparently very easy to lose your job in the city if you do not conform to what your manager asks of you, or if you miss the ambitious profit targets that banks set. People are terrified of losing their jobs and this means that they are willing to compromise on almost anything, whether that is working extremely long hours, or not blowing the whistle on other's malpractice. This leaves very little time for thinking about whether the bank's end-product has genuinely been of service to the bank's customers.

For me, all this raises some interesting questions for academic research. Should academic research be thought of as a service? If so, who is it a service for? I have met several academics who see their work primarily as finding out about interesting things for their own sake. This view is sometimes justified by saying that historically many important advances have come about by exactly those type of people being free to pursue their interests. Personally I think it is useful to think of academic research as serving some purpose, as this helps to direct where effort should be applied. However I do acknowledge that it is not always possible to identify in advance which discoveries will lead to the greatest impact.

In terms of liquidity, there are some shortcomings to the current academic research system. Academics can often be under a lot of pressure to produce short-term results in the form of publications in high-profile journals, so that they can win research funding and permanent positions. This can sometimes get in the way of longer-term research goals.

However I think on the whole I think there should be loyalty and liquidity in academia. It is a good thing for academic researchers to move between research groups, picking up different tools and perspectives in different places. And it is also good for academics to have long-term affiliations to a group or a particular academic sub-community to allow for relationships to develop and for paths to be fully explored.

In (reflective) praise of DJCM

2016-04-20T03:37:00.003-07:00

The scientist, David J. C. MacKay, passed away recently. I have always been impressed by the wide range of people he influenced, but I was particularly surprised, at his passing, to see what a diverse range of people paid tribute to him, from young scientists working in a range of disciplines who were taught by him and use his Information Theory, Inference, and Learning Algorithms book, to public communicators of science, a relatively small and exclusive group which he effectively joined when he wrote his book 'Sustainable Energy - without the hot air.'

My favourite tribute was a tweet from David Spiegelhalter, who said of him,

"probably the most intelligent, principled, and fun person I shall ever know."

I am not sure there is much I can add to that. He taught me when I was a master's student in Cambridge. Among all the lectures I have been to, his were probably the most intelligent and fun. And to see his principles, one only had to look at his jumpers, which were clearly designed to eliminate any need of central heating.

I think it would not be an understatement for me to say that I idolised him. Some people idolise great rock stars, or great football players. For me, at the tender age of 21, it was David MacKay.

I really thought, here is a man who has the power to change the world through the force of his intelligence. I have a more nuanced view now, both of what it is possible to change, and of the types of people who are needed to achieve that change.

There are others who I think still see him as an idol. In his obituary, Mark Lynas quotes David MacKay saying, “Please don’t get me wrong: I’m not trying to be pro-nuclear. I’m just pro-arithmetic.” There is obviously something important here, that we should try and quantify the costs and benefits of different options for energy production and consumption. However, there also seems to be an implicit suggestion, which I think is misguided, that if everyone was good at arithmetic, we would somehow be able to solve all the world's problems.

In the same obituary, David MacKay is described as a true polymath. Again there is truth here, in that he was able to move nimbly between different scientific fields, from error-correcting codes in computer science, to information theory in genetics, to neural networks for machine learning, to spin models for particle physics. The list could go on and on. However there is a recurring theme, which is the description of a physical system by a mathematical model. David MacKay was great at analysing models and doing inference for them, and he had a very good understanding of probability that allowed him to apply a relatively small set of principles to a wide range of of scientific problems.

Nevertheless, I am uncomfortable with the epitaph 'true polymath'. As far as I am aware, David MacKay had no great interests outside of science and the application of science in public policy. This is not a criticism of him as a person, I think it is very important that such people exist in society. However, there are many other things in life to enjoy and to be curious about - literature, food, music, philosophy etc. I think that a 'true polymath', of which there are very few, should be able to think in several different ways. Descartes is a good example. He invented cartesian coordinates and he probed the nature of the soul / existence.

What does the future hold for the areas where David MacKay did have greatest impact? And how can we best ensure that his work and life live on in some sense? There is clearly still a lot of work to be done.

Within statistical science and machine-learning there is still far too much of a tendency for people to rely on statistical tests that they don't understand very well (such as calculating p-values), and to just keep on trying statistical methods until something works, without really thinking problems through from first principles. Statisticians need to spend more time (as David MacKay did during his lifetime) teaching people about probability, particularly scientists.

Within energy and climate policy, the all-out war between climate scientists and climate sceptics seems to have died down, but without either side really having learnt very much from the other. Government work is modularised between different departments, which can create deep divisions. An example is between the Treasury, which sees things through a 5 year economic prism, and the Department for Energy and Climate Change (DECC), which, put somewhat simplistically, thinks that reducing carbon emissions in the UK should be the number 1 priority of government. There is a lot more that could be said about this, and I am probably not the best-placed person to say it. However I do think more needs to be done to develop a sustainable economy and preserve our environment in the long-term (i.e. in the 100 year time-frame), while also attending to the natural desire most people have to live a comfortable life in the short-term (i.e. in the 5-10 year time frame).

I would like to finish on a personal note. I remember talking to David MacKay about doing a PhD with him. (This was before I knew he was leaving academia to do public policy work.) He asked me what I was interested in researching. His question caught me slightly off-guard, as I hadn't really thought about it that much. I was also a bit embarrassed about sharing the half-formed ideas I did have with someone who I was in awe of. In the end that conversation didn't directly lead to anything, but it did help to instill in me a strong sense of curiosity, and a desire to identify research problems that are interesting and important to me. This is something that I think all academics should be cultivating, both in themselves and in others.

Machines surpass humans in gaming world

2016-03-09T06:54:00.001-08:00

This week is significiant as being the first time that a computer has beaten the world's best Go player.

http://www.theguardian.com/technology/2016/mar/09/google-deepmind-alphago-ai-defeats-human-lee-sedol-first-game-go-contest

The contest is a best of 5, with only the first game played, so it could all change, but at the time of writing AlphaGo, the machine that Google DeepMind have trained to play Go, is ahead.

Why is this significant? In the 1990's chess IBM DeepBlue beat the world chess champion Gary Kasparov. Two of the key ingredients of this success were brute force search algorithms, and a knowledge of chess strategy that was hard-wired into the computer program.

The number of possible Go games is much greater than the number of possible chess games because there is a much wider choice of moves in Go than in Chess. When professional Go players are asked how they decide their next move, they say they rely havily on intuition, whereas a professional chess player will always know why they made a particular move, making it a lot easier to program a chess strategy to a computer. So Alpha Go is impressive because it is playing a harder game, and one in which professional players depend on highly-developed sub-conscious decision making processes.

But what I find even more interesting is that the algorithms used to train AlphaGo are very general-purpose algorithms, that have been applied to a diverse range of Artificial Intellgence problems. For example Google DeepMind used the same ideas / framework to train a computer to play Atari video games at the level of a professional video games tester.

Demis Hassabis said he was more impressed by Gary Kasparov than DeepBlue when Kasparov was beaten. Gary Kasparov can do many other things apart from play chess - speak several languages, write books, tie his shoe-laces etc., whereas DeepBlue is a specialised intelligence for playing chess. We are still a long way from machines being able to outperform humans in every domain, but I think it is fair to say that the last 20 years have seen a lot more progress in general-purpose machine intelligence than the last 200 years have seen progress in human intelligence.

Will machine intelligence reach some kind of saturation well below human levels? Or will it catch up? Or even surpass human intelligence? The third option is starting to seem like an increasingly plausible idea to me.

For more, watch Demis Hassabis's recent lecture in Oxford,

https://podcasts.ox.ac.uk/artificial-intelligence-and-future

A statistician's bookshelf

2016-02-02T02:12:00.002-08:00

What should a statistician have on their bookshelf? In an age where most academics have access to electronic journal archives and Wikipedia is a fairly reliable source of information for basic facts, some statisticians might say they don't need anything on their bookshelf.

I often find myself eyeing up other people's bookshelves. My co-supervisor in Reading, Ingo Bojak, has quite a voluminous bookshelf covering advanced particle physics and neuroscience. A large personal library can be quite imposing in a way, making you feel small. But it can also inspire curiosity. A couple of the conversations I have recently had with Ingo ended with him picking a book off his shelf and lending it to me. It can feel like a rite of passage, being lent a book, and in this case it was also practically useful because Ingo has books that are not in the University of Reading library.

My main supervisor, Richard, has quite a different approach to his bookshelf. As well as a handful of core Statistics books, mainly related to his undergraduate teaching, he has a lot of proceedings - from the Valencia meetings, a series of influential meetings in Bayesian Statistics that has now morphed into the ISBA conference, and from Read Papers at the Royal Statistical Society. So occasionally when we are meeting, he will say, 'Oh yes, I remember somebody talking about something to do with that at a conference / Read Paper I went to 15 years ago' and then be able to look it up in the proceedings to remind himself of more of the details.

The other interesting thing I have seen on his, and on people's bookshelves is undergraduate / postgrad course material, which seem to provide useful reference material for some people, even several decades after the end of their course. My dad, who is an engineer for Total, recently commented that he wished he had had his own MSc thesis from the late 70's at a meeting recently because he wanted to look something up in it that was relevant to what they were discussing.

I think most of the books I have on my bookshelf have grown out of successful undergraduate / early graduate lecture courses (see photo below). A good lecture course distills the lecturer's knowledge, bringing out the ideas and methods that they think are useful for practice or research in the subject. A good textbook is one where you feel like you can converse with the author(s), asking 'How would you go about... ?', and getting a specific answer back.

I have recently been finding the Shumway & Stoffer book on time series tremendously useful. It is freely available online, and I used it in that format for quite a while, but I am glad I have a hard copy of it now, as I find it much easier to read from paper. They have a very skilful way of presenting many important results from the time series literature (which is vast) in a coherent way, tying together time-domain and frequency-domain approaches, which are often treated separately by different authors.

The Mathematical Foundations of Neuroscience is a book I am currently borrowing from the Reading library, and has been quite useful for getting a better understanding of the model that I am doing statistical inference for. I think this is something that statisticians should do more of - finding out more about the models and underlying science in whatever application area they are working in. Anecdotally, I have found lots of scientists that need help from statisticians, but can only be helped if a statistician is prepared to invest the time in actually learning something about the scientist's subject.

The other textbooks on my bookshelf cover a varied range of topics within statistics and machine-learning. They have proved practically useful on a number of occasions. Here are a few specific examples (which are far from exhaustive in terms of what the books cover),

Monte Carlo algorithms (MacKay)
Kalman Filters (Bishop)
Regression analysis (Ramsey & Schafer)
Boosting (Hastie, Tibshirani & Friedman)
Basic hierarachical models (Gelman et al)
Dynamic hierarachical models (Cressie & Wikle)

I sometimes flick through the books and think that there is still an awful lot there that I haven't looked at. Who knows what these books might still have to teach me?

The Laplace family

2016-01-05T04:02:00.001-08:00

Last year I went to a reading of letters written by my great-great-grandfather's family during the 1st World War. Although they were not particularly wealthy or famous, it was nevertheless fascinating to hear about what they were all doing in 1915. It was also quite humbling, in that they, along with millions of others across Britain and further afield, did great things - leaving home at a young age to fight in a foreign country, looking after the wounded both in the field and closer to home, and working in government at a very difficult time for the country. It was also fascinating to meet other members of my extended family, some of whom I had never met before. Despite not having met them, I did feel as though there was a shared identity, that somehow the experiences of the family from 100 years ago had somehow been transmitted down all branches of the family tree.

Around the same time last year, I went to the funeral of Alexei Likhtman, a maths professor at Reading who tragically died in a hiking accident. Hearing some of his closest colleagues and students speak at the funeral really brought it home to me how strong the bonds between academic colleagues are. Several of them described the relationship as being like family, and this analogy is especially apt for the relationship between a PhD supervisor and their student.

So all of this has made me more curious to find out more about my academic family. I have known for a while that my supervisor's supervisor was Peter Green, quite a prominent statistician, but until recently I hadn't traced my academic lineage back any further. The Mathematics Genealogy Project made it easy for me to find out more. The first thing I found out was that Peter Green's line goes all the way back to Laplace through Poisson and Dirichlet. Among other things, Laplace discovered Bayes' theorem (independently of Thomas Bayes), and applied it to problems in astronomy, as well as making significant contributions to probability theory.

By looking up several other prominent statisticians, I developed the family tree shown below. It is by no means complete - Laplace has around 88,000 descendents recorded on the Mathematics Genealogy Project! Nevertheless is gives an interesting insight into the relationships between a few of today's top statisticians.

As well as the Laplace family, the Newton family is also quite interesting, although somewhat more exclusive at a mere 12,200 recorded descendents. There is a line directly from Newton to Fisher, as well as to Galton and Pearson. And, every person on those lines studied / researched at the University of Cambridge. This is quite different from the Laplace family who seem to have moved around a lot more. In the 20th century, the Newton family finally managed to break free from the Cambridge bubble. The descendents of Fisher, Pearson and Galton are spread across the UK, and across the USA, and include Andrew Gelman (from the Galton branch).

Similarly to finding out about my biological family, I have found it quite fascinating and humbling to find out more about my academic family - where I come from and who I am related to. It is also tantalizing in the sense that the there is so much information not recorded in the Mathematics Genealogy Project. Laplace is only recorded as having one student and Newton only two - surely they must have had direct influence on a much greater number of people!? And there also no information about post-PhD collaborations. For example, on Herbert Scarf's wikipedia page, it says that he travelled between Bell Labs and Princeton every day in the summer of 1953 with John Tukey. So does that make Tukey an adopted member of the Laplace family, and a great-great-great Uncle of me? It seems we are all more closely related than we might think.

How to structure code for individual research projects

2015-12-09T01:53:00.000-08:00

I have been chatting a bit about coding recently with my supervisor (Richard Everitt). We have a new student joining the group soon, so I think Richard is thinking a bit about what skills they may need to learn, as well as what the rest of us in the group might benefit from. I have found it surprising how much my approach to coding continues to evolve, particularly the way that I organise my code.

I started coding in R during my master's where we had a lot of relatively short assignments that involved some computer programming. For this kind of task it was mostly possible to write a single script with all the code needed for a given assignment in it. That is quite easy to organise.

When I got into longer research projects that involved a significant amount of coding, I found that this approach was no longer very effective. It becomes difficult to keep track of which files are recent and being actively used and which are outdated and redundant. It also becomes difficult to find older versions of your code (e.g. when your most recent code has stopped working for some unknown reason and you want to find what you had last week when it was working).

My first solution to this problem was simply to number my scripts, starting at 1 and working upwards (e.g. r1.r, r2.r r3.r etc.) If you put all the scripts in the same folder and give the folder a project name then it is still relatively easy to find your code, and you have a very easy way of switching between versions provided that you increment your file-names every time you make a significant change.

However this approach is still quite restrictive in the sense that it only really works well when you can put all the code for a given project in one script / file.

After a few iterative improvements, I am currently quite happy with the following approach to organising my code:

Create a new folder for a new project
Create a subfolder for code development that contains numbered scripts.
Develop code in numbered scripts and then when you have an end-product save it in the parent folder and give it a name that describes what it does.
Put any functions that are used in multiple scripts in a functions file so that you can re-use the same version of that function in multiple scripts. In general, the less redundant / duplicated code in your active scripts the better.
Create an output sub-folder to save the results of running long scripts. After you have run a long script and generated an output file, rename it by prefixing it with a date so that it doesn't get accidentally over-written in future.
Use a version control system like git with BitBucket. Version control allows you to easily switch between different versions of your code / project, and to keep a well-structured record of the changes you have made.

It has taken me a long time to become convinced that version control systems like git are worth the effort for individual research projects, but I am a convert now. My main objection was that many of the things I wanted to do were easier to do using Dropbox (e.g. sharing code with other people, and being able to access my work on multiple computers). You can even find old versions of your files through Dropbox. However, in the end I found that going from Dropbox to git / Bitbucket was like going from Word to LaTeX. Once I got the hang of it, suddenly it seemed a lot quicker and a lot less hassle in the long-term. RStudio has nice graphical interface to git, which I now use every day for committing and pushing my code.

Version control is also useful if you ever want to work on software development in a group, or to make your code publicly available.

A (selective) history of Neuroscience in 10 minutes!

2015-11-11T02:00:00.002-08:00

I recently gave a talk in Oxford to the Computational Statistics & Machine Learning group (Thanks again to Jeremy Heng for the invitation to speak). I was given a 1 hour slot, which is longer than the normal 20-30 minutes that PhD students are given to present their work. Rather than drawing out my results over a longer period (which I would guess might have been quite boring for the audience) I decided to make my introduction longer to talk more generally about the field(s) I work in - Neuroscience and inference in differential equation models. I really enjoyed preparing this material and think it is something that PhD students should be given the opportunity to do more often. Here is what I chose to talk about.

Neuroscience and Artificial Intelligence: 1940-1975, 1990's & recent developments

This could easily take up the subject of a book (perhaps in 3 volumes!) What I said was highly selective (not least because of the gaps in the timeline) and heavily biased by the things I've seen and found interesting. Here is a summary of what I said.

Research in the years between 1940 and 1975 laid the foundations for modern Neuroscience and Artificial Intelligence, and many of the key ideas in the field have their origins in this period. It was also a time when there was a lot of crossover between experimental work and computational / mathematical work:

For example, in the 1940's McCulloch & Pitts developed what we now know as artificial neural networks. For McCulloch and Pitts, this was a model of how the brain works - they saw neurons essentially as logic gates that were either on or off, and the brain as a series of logic gates that simply calculate weighted sums of their input.

In the 1950's, Hodgkin & Huxley developed a more realistic single-neuron model that linked ionic currents to voltage inside the cell (what we know now as electrophysiology), for which they won the Nobel Prize in 1963. In my opinion it remains as one of the finest achievements in computational science because of its combination of a nonlinear differential equation model with experimental results, and because it has fundamentally changed the way people think about how neurons process information. For me, it makes things more mysterious in a way - how does all this electrophysiology end up generating a conscious self that is able to make (reasonably) effective decisions!? I think that is something that scientists don't have a very good answer to, yet.

Leaving aside those potentially unanswerable philosophical questions, there were also a whole host of practical questions that arose as a result of Hodgkin & Huxley's work. One question that is quite important for the work that I do is how to relate electrophysiology with the macroscopic obsverations that are obtained from EEG and MRI recordings. This is challenging because the brain contains around 10^10 neurons, and the Hodgkin-Huxley equations only describe the activity of a single neuron. Coupling together 10^10 neurons in a single model is (currently) computationally intractable, so in the early 1970's Wilson & Cowan developed what is called a mean-field model which lumps together populations of neurons with similar properties (e.g. excitatory / inhibitory) and describes how the mean activity of those populations evolve over time. These models are useful because they still give some insight into electrophysiology, but they can also be related to observations.

In the 1990's research became more specialised into sub-fields of neuroscience and artificial intelligence:

Here I am using Statistical Neuroscience mainly to refer to the community of people who do statistical analysis of brain imaging data. And with Artifical Intelligence I am mainly thinking of the development of artificial neural networks to solve things like classification problems.

Recently, more overlap has developed between the different fields (and there is also a lot more total research activity):

My PhD work is mainly in the intersection of Computational Neuroscience and Statistical Neuroscience. There is also interesting work going on in the intersection of Computational Neuroscience and Machine Learning (e.g. Demis Hassabis and Google DeepMind). At UCL, the Gatsby Computational Neuroscience Unit recently co-located with the new Sainsbury Wellcome Centre for Neural Circuits, which is led by Nobel Prize winner John O'Keefe. It will be exciting to see what happens in the intersection of the 3 circles in the future!

One slightly depressing development from the viewpoint of statisticians is that state-of-the-art neuroscience models tend to be quite computationally expensive and hence are very challenging to do inference for. I think statisticians should be more vocal in arguing that it is better to have a simpler model that you can estimate parameters for, than a more complex model that you can't estimate the parameters of.

Box's advice for statistical scientists

2015-10-05T03:04:00.000-07:00

I recently discovered a new word - mathematistry, coined in 1976 by the statistician / data scientist George Box. If I had to guess what it meant, I would think it might be something quite cool like a cross between mathematics and wizardry. In fact, the way George Box defined it, the word is closer to a cross between mathematics and sophistry.

Box's full definition for mathematistry is as follows,

'.. the development of theory for theory's sake, which, since it seldom touches down with practice, has a tendency to redefine the problem rather than solve it. Typically, there has once been a statistical problem with scientific relevance but this has long been lost sight of.'

According to Box, mathematistry also has a flip side which he calls 'cookbookery'. This is

'the tendency to force all problems into the molds of one or two routine techniques, insufficient thoughts being given to the real objectives of the investigation or to the relavance of the assumptions implied by the imposed methods.'

There is so much useful advice contained in those two concepts! Personally I find it quite easy to forget to 'touch down with practice'. How many times have I spent a long time coming up with a solution to something, only to find that the attempt to put it into practice reveals an obvious flaw that could have been identified quite early on? And I also find it much easier to focus on one way of solving a problem, rather than considering several different options, and investing time in evaluating which is the best approach.

I have recently been having a look at a kaggle challenge called Springleaf. I am not particularly interested in trying to get to the top of the leaderboard, since the difference between the top 1000 entries seems pretty marginal to me, but I am interested in finding out what makes a big difference to predictive accuracy.

One of the things I have learnt is that being able to make use of all the variables in the data-set makes a big difference. Regression trees are very easy to apply to multivariate data-sets (e.g. 100s or 1000s of variables), and I have found these to be quite effective for prediction. There are also relatively straight-forward ways of training them to avoid over-fitting.

The other thing I have discovered is boosting. This still seems a bit magical to me. The basic idea is that by using an ensemble of different models and averaging predictions over them, you get a better prediction. To me, it seems like there must be a single model that would give the same predictions as the ensemble. In any case, I have found that boosting makes a surprisingly big difference to predictive accuracy, compared to approaches that try to fit a single model, at least for regression trees.

In The Elements of Statistical Learning by Hastie, Tibshirani & Friedman there is a performance comparison of different classification methods. They include boosted trees, bagged neural networks and random forests, which (I think) are all different ways of creating an ensemble of models. The one method that didn't create an ensemble was Bayesian neural nets, which uses MCMC to explore the parameter space. The approach that came out on top was Bayesian neural networks but the ensembling methods were not too far behind. This suggests to me that averaging over models and averaging over different parameters values within a sufficiently flexible model have a similar effect. However it is something I would like to understand in more depth.

I would highly recommend The Elements of Statistical Learning as a great example of statistical science, that avoids both mathematistry and cookbookery. The drawback of books is that they get out of date quite quickly, but this one is relatively recent and definitely still relevant. The only thing that is worth being careful with is references to software packages, as these seem to evolving on a faster time-scale than the underlying concepts.

Late Summer Musings

2015-09-07T02:21:00.002-07:00

Yesterday I had a go at explaining what I do to someone who doesn't know much about my research area. I always find it interesting to discover what people think statisticians do. This person had been given the job of analysing the results of a survey that had been done in their organisation. This is something that requires some statistical training to do well. However I found that when it came to explaining what I am doing for my PhD, it was quite difficult to make a connection between that type of work and what I do. It made me wonder whether the statistics community is more like a sprawling family that have some common ancestry but have ended up doing quite different things? Or are we a group of people from disparate backgrounds that have been drawn together by the need to solve a set of common problems?

I think both of these are potentially useful metaphors for understanding the statistics community. On the one hand there has been branching between people who are practitioners of statistics and people who study mathematical statistics. But at the same time I would argue that all statisticians are interested in being able to draw reliable conclusions from experimental and/or observational data.

Earlier in the summer I went on another APTS (Academy for PhD Training in Statistics) week in Warwick. The courses were Applied Stochastic Processes (given by Stephen Connor) and Computer Intensive Statistics (given by Adam Johansen). So what were these courses about, and how is the material in them helpful for drawing reliable conclusions from data?

Applied Stochastic Processes (ASP)

Stochastic Processes is a branch of probability theory. Some people say that people are either statisticians or probabilists. Although there are people who study probability theory but are not statisticians, I think that all statisticians must have a little bit of a probabilist inside them! University courses that I have seen in maths and statistics teach probability distributions before they teach statistical inference. Understanding what a p-value is (one of the most basic tools used in statistical inference) requires you to know something about probability distributions.

More generally there are two main reasons (I can think of) why statisticians need probability theory. One is that they study phenomena that are intrinsically random, and are therefore best described by probability distributions. The other is that they use numerical methods that are intrinsically random, and so understanding the behaviour (e.g. convergence) of these numerical methods requires probability theory.

ASP was primarily concerned with the second area, and in particular with the convergence of MCMC algorithms, although quite a few of the ideas along the way were relevant to the first area.

Reliable numerical methods are pretty essential for drawing conclusions from data, and with MCMC it can be pretty challenging to ensure that the method is reliable. In an ideal world, probability theory would be able to tell you whether your numerical method was reliable. However, in the same way that mathematical theory cannot currently account for all physical phenomena, probability theory cannot currently account for all probabilistic numerical methods. Lots of the numerical methods that I use in my work are not particularly well understood from a theoretical point of view. However the mathematical theory that exists for special cases can provide some useful guidance for more general cases.

I would like to be more specific about how the theory is useful beyond the special cases, but it is something I am still in the process of trying to understand better.

Computer Intensive Statistics (CIS)

A lot of recent advances in statistics have been catalysed by increased computer power. Methods that would have been impractical 20-30 years are now routine. This helps to continually extend the scope of research questions that can be answered, both in applied statistics and in computational / mathematical statistics.

The easiest way to ensure that a numerical method is reliable (and hence enables you to draw reliable conclusions from the data) is to use one that is not computer-intensive. For a large part of the statistical community this approach works well, so many statisticians find that they never need to use computer-intensive methods.

Returning to our ideal world again for a moment, we would like statistical theory to be able to give us formulas that return the quantities we are interested in as a function of the data. However in practice, such results only exist for simple models. And if the phenomena you are interested in is not well described by a simple model, it is unlikely you will be able to draw reliable conclusions without a computer-intensive method.

The main areas covered in the course were the bootstrap and MCMC. One of the things I learnt about for the first time was the Swensen-Wang algorithm, which is used in statistical physics. It does seem a bit like a magic trick to me, the way you can make dependent variables conditionally independent through the addition of auxiliary variables. Worth checking out if like a bit of mathematical aerobics!

My Perspective on New Perspectives in MCMC

2015-06-14T09:36:00.002-07:00

I recently got back from a summer school in Valladolid on New Perspectives in MCMC.

Here are a few thoughts on the lectures that were given.

MCMC-based integrators for SDEs (Nawaf Bou-Rabee)

Nawaf's interests are mainly in numerical solvers for Stochastic Differential Equations (SDEs) that describe diffusion and drift processes. The first part of the course was about how standard MCMC algorithms can be seen as discretisations of SDEs. The aim of MCMC algorithms is to sample from a probability distribution. A valid MCMC algorithm can be thought of as an SDE discretisation that has the probability distribution of interest as its stationary distribution.

There are also many ways of discretising an SDE that do not correspond to existing MCMC algorithms, and that is what the second part of the course was about. It made me wonder whether there is scope for designing new MCMC methods from these (or other) SDE numerical solvers?

Exact approximations of MCMC algorithms (Christophe Andrieu)

These lectures were about the pseudo-marginal approach, which I have discussed quite a bit in previous blog posts. The key idea is to find ways of approximating the likelihood or the acceptance probability in such a way that resulting algorithm is still a valid MCMC algorithm, but is computationally much cheaper than the simple ideal algorithm (which might, for example, contain an intractable marginal likelihood term in the acceptance ratio).

The thing that I understood better than before was an idea called Rejuvenation. (From what I understand this is the idea used in the particle Gibbs variant of particle MCMC). Without Rejuvenation, the denominator in the acceptance ratio can be very large if the approximation of the likelihood is by chance very poor. This means that the noisy acceptance probability can be much smaller than the acceptance probability of the ideal algorithm, and therefore the noisy algorithm can have a tendency to get stuck. Rejuvenation is a way of revising the noisy estimates in the algorithm from one iteration to the next in a way that preserves the validity of the algorithm and makes it less likely to get stuck.

MCMC in High Dimensions (Andrew Stuart)

These lectures had a very clear message that was repeated through-out, which was that thinking in infinite dimensions results in good methodology for high dimensional problems, particularly high dimensional inverse problems and inference for diffusions.

From these lectures I got a much better idea of what it means for measures to be singular with respect to each other. Understanding these concepts is helpful for designing valid proposals for MCMC algorithms in infinite-dimensional spaces, and leads naturally to good proposals in high-dimensional problems. Coincidentally the concepts of singularity and equivalence are also very important in Reversible Jump methodology. For me this illustrates how potent some mathematical ideas are for solving a wide range of problems and for drawing connections between seemingly disparate ideas.

Course website - http://wmatem.eis.uva.es/npmcmc/?pc=41

Making intractable problems tractable

2015-06-01T01:43:00.000-07:00

I recently attended an i-like / SuSTaIn workshop in Bristol, organised by Christophe Andrieu. I-like is short for Intractable Likelihoods, although the scope of the conference was a bit more general than that. Most of the work that was presented was about making intractable problems tractable. Often the most conceptually simple way of solving a problem is computationally intractable, i.e., it would require months / years / aeons, for current computers to solve. Mathematics can be used to re-formulate or approximate problems in such a way that they become tractable, for example by identifying parts of the computation that do not make much difference to the final answer. Or by identifying unnecessary computations that duplicate (or perhaps approximately duplicate) other computations.

Here is a selection of application areas that were talked about.

Foot & Mouth epidemics - intractable because of the need to model the interactions between thousands of different farms and model the stochastic dynamics of disease transmission.

High risk behaviour in HIV positive youth - challenging because of the discrepancies between reported and actual behaviour. Intractable because there are so many different permutations of actual behaviour that underly / could explain reported behaviour.

Genetic recombination - given DNA sequences from a (current-day) population, scientists are interested in inferring how the DNA sequences have evolved / been modified over time. Recombination is one mechanism for DNA sequence modification. Inference in this setting is challenging because we cannot observe the past (i.e. ancestral DNA sequences). There is a connection with the HIV example above: the problem is intractable because there are so many different permutations of past DNA sequences that could have generated the DNA sequences of the current population.

Testing the predictions of general relativity using massive detectors - this reminded me of the little that I know about the Large Hadron Collider experiments However, in this case the detectors pick up natural signals coming from space, and the physics is completely different. The general relativity detector problem is intractable because of the huge volume of data generated by the detector. And also because generating predictions using the general relativity equations is computationally expensive. (As an aside, it is quite awe-insipring to think about the forces at work on such a large scale in our universe.)

Predicting driving times - the simplest way of predicting driving times is just to use speed limits and assume that there is little of no traffic. Anyone who has ever tried to get anywhere near a population center in the UK on a Friday evening will know that this does not produce very accurate driving time predictions. Companies like Microsoft, Google, and Apple collect GPS data that is essentially a set of positions associated with time-stamps. From this it is straight-forward to work out how long people's past journeys have taken. However producing predictions of an arbitrary future journey is challenging because many future journeys will not follow an identical route to past journeys.

There were also a couple of presentations that focused more on methodology that could be applied in a wide range of different settings.

Exact-approximate MCMC - this is something I have talked about in previous blog posts and also goes under the name of the pseudo-marginal approach. The basic question that motivates this work is, 'When is it ok to approximate the Metropolis-Hastings acceptance probability?' There are lots of settings where it is possible to obtain a tractable approximation to the acceptance probability (State State Models, Markov Random Fields, Spatial Processes, Trans-dimensional problems). Being able to use the tractable approximation makes it possible to do a lot more statistical inference for problems of this type.

Deep neural networks - this is something that is currently very popular in machine-learning / computer science / artificial intelligence. There the interest is in training computers to produce predictions from data. This is sometimes referred to as black-box prediction, because the way in which the computer produces the prediction does not necessarily give you any insight into the underlying process thats connects the x variables (determinants) with the y variable(s) (response). From my perspective it does seem a lot like regression with more flexibility and automation in the transformations that are applied to the data. Increasing flexibility means (at least to me) adding in more parameters. This makes it more challenging to ensure that the algorithms are computationally tractable.

For researcher names and presentation / poster abstracts, see http://www.sustain.bris.ac.uk/ws-ilike/ilike_programme.pdf.

Twelve twitter feeds worth following?

2015-05-05T01:30:00.000-07:00

Having spent a long time doing my best to ignore Twitter, I recently decided to take the plunge and start following some feeds. Below is a list of the ones I have found interesting, so far (in no particular order).

Tim Harford - this is the one that got me on Twitter in the first place. He mentioned his Twitter feed in one of his books. It's good for getting an evidence-based understanding of the news.

David Spiegelhalter - great statistician. Does a lot of good work communicating statistics / risk to everyone. Pretty funny as well. Look up micromorts to get a flavour of what he does, http://understandinguncertainty.org/micromorts.

Tom Whipple - science journalist at The Times. Also pretty funny.

Evan Davis - has down quite a wide range of news-related things for the BBC (Today programme, Newsnight). Some similarities with Tim Harford in that he is an economist who understands numbers and how to interpret them. I quite enjoyed his General Election Leader Interviews recently and his 'Mind the Gap' documentary last year.

Tim Montgomerie - writes for The Times. If anyone can persuade you that it is possible to care for the poor and be conservative (politically), it's probably him.

Tim Keller - Christian author and pastor in New York. Has written a lot of persuasive books contrasting the Christian worldview with the secular worldview. I have got more out of his books than his tweets, but he does have quite a pithy way of summing things up which works well on Twitter.

Stat Fact - a good feed for little statistics tips and links. Quite wide-ranging, not just about the theory of statistics.

Oxford Mindfulness - this feed is for a research group that has developed mindfulness practices and measured their effects in a scientific way. The feed sometimes has links to radio or TV that is related to their work.

Pizza Artisan Oxford - another Oxford-based feed, but sadly for something that can only be enjoyed in Oxford...

Nature News&Comment - good for getting an idea of what is going on in the top tier of science.

edX - I have enjoyed doing some courses on edX and coursera. It's great how accessible these courses are, and how many of them there are. I have looked at Ancient Greek Heros, Statistical Analysis of fMRI data, and Learning how to Learn.

Gresham College - lots of public lectures from academics on an eclectic range of topics.

Easter Miscellany

2015-04-14T02:20:00.003-07:00

Here are a few loosely connected things that I have been working on or thinking about over the past month or two...

Gaussian Processes

James Robert Lloyd, Zoubin Gharamani and others in his group have developed a really neat tool called the Automatic Statistician. The basic idea is that you feed the Automatic Statistician some data and it estimates a model that fits the data well. And it produces an automatically generated report describing what it has done. The thing that I find most interesting about it is how flexible the model is. The Automatic Statistician can identify linear trends, periodicity, and other patterns in the data. It is the machinery of Gaussian Processes that makes this flexibility possible. This 'Kernel Cookbook' page (from David Duvenand a former member of Zoubin Gharamani's group) gives some information about how to construct a simple Gaussian Process model.

I have had a go at fitting a Gaussian Process model to some bike-sharing data in order to forecast demand in bike-sharing schemes. This data is available here. I found that it was more difficult than I expected to find an appropriate Gaussian Process to model the daily pattern of usage. The key problem that I faced was that most of the simplest kernels assume that the process you are trying to model is stationary. However that is a lot of non-stationarity in the daily pattern of bike-sharing demand - there is a lot more variability in 7-9am (the rush hour peak) than there is in say 9-11pm.

All this make me intrigued to find out more about recent developments in Gaussian Processes, particularly for non-stationary processes. From what I have seen so far, it looks a lot more challenging. It will be interesting to see if the Automatic Statistician can model non-stationary processes.

Mean Field Models for brain activity

Brain activity can be modelled at a variety of temporal and spatial scales from the millisecond to minutes, and from single neurons to whole brain regions.

Mean Field Models describe the activity of populations of neurons, and can be used to model the evolution of a field of neural activity within a particular brain region. They are therefore sometimes called mesoscopic models (somewhere in the middle). This means that the models can be more biophysically realistic than models that describe interactions between brain regions. But it is still possible to do inference with these models and human brain imaging data such as EEG.

I am looking at a Mean Field Model developed by David Liley and Ingo Bojak that models the effect of anaesthesia on brain activity. This is proving quite challenging because there are so many unknowns: extracortical input, parameter values, initial conditions.

Broader Neuroscience reading

Some of the world's leading neuroscientists have written a collection of essays called The Future of the Brain. A lot of the essays describe 'Big Science' multi-million dollar research projects. There is a lot of focus on molecular biology experiments to probe the properties of single neurons, and on building large-scale microscopic models of brain activity that predict macroscopic effects.

As someone with a preference for simplicity in modelling, one thing I am interested in is to what extent mesoscopic (i.e. simpler) models can approximate microscopic (i.e. more complex) models. As far as I know this is an open question, that could be answered by some of the large-scale neuroscience research projects that are currently underway.

The Pseudo-Marginal Miracle

2015-03-06T08:57:00.001-08:00

Over the last month I have been making some interesting connections between different areas of statistics. Explaining these connections (or even the things being connected) is perhaps an overly ambitious aim for a blog post, but I think it is worth a try!

The three problems below are all interconnected.

MCMC for doubly intractable problems

This typically occurs when you want to use MCMC to estimate parameters of a model where the normalization term is intractable and it is dependent on the parameters you are interested in estimating.

With the basic MCMC approach, the normalization term needs to be evaluated on every iteration of the algorithm. Recent advances in MCMC (the Pseudo-Marginal approach) have lead to algorithms where a suitable approximation to the normalization terms can be used and the algorithm still gives the same results (asymptotically) as the basic algorithm.

Reversible Jump MCMC

This occurs when one of the things you don't know is the number of parameters in your model (e.g. number of clusters or components in a mixture model).

With the basic MCMC approach it is difficult to make good proposals for the parameter values when your proposal changes the number of parameters in the model. However the the Pseudo-Marginal approach can also be applied to Reversible Jump MCMC. This results in parameter proposals that are more likely to be accepted in the MCMC, and therefore makes the method much more computationally efficient because less time is spent generating proposals that subsequently get rejected.

MCMC parameter estimation for nonlinear stochastic processes

This is useful for some models in econometrics, epidemiology, and chemical kinetics. Suppose you have some data (e.g. stock prices, infection numbers, chemical concentrations) and a nonlinear stochastic model for how these quantities evolve over time. You may be interested in inferring posterior distributions for the parameter values in your model.

The Pseudo-Marginal approach was used to develop a method called particle MCMC that does this parameter estimation more efficiently than basic MCMC.

In all three cases (doubly intractable, reversible jump, nonlinear stochastic processes), applying basic MCMC results in the need to evaluate the density of an intractable marginal probability distribution. Somewhat miraculously, the Pseudo-Marginal approach obviates the need to evaluate this probability density exactly, while still preserving key theoretical properties of the algorithm related to convergence and accurate estimation.