Tuesday, December 6, 2016

Writing tools, silence & parenthood

Collaborative writing tools

I have been working on a paper recently with two co-authors.  It has been a bit of a challenge finding the right pieces of software that will allow us to track edits while remaining in LaTeX.  When I worked in the civil service, Word was the de facto software for producing written documents.  It was a lot better than I thought it would be, and I still think the Track Changes functionality beats everything else I have tried hands down when it comes to collaborative editing.  I also learnt that, using Word, you can produce documents with typesetting that looks professional, if you know what you are doing, and if someone has invested the time in creating a good template for your needs.  However in the last couple of years I have returned to LaTeX, because it is what mathematicians use, and because I find it better for equations, and for references.

In the last few weeks I have been trying out Overleaf.  This is one of a handful of platforms for LaTeX documents with collaboration tools.  As with a lot of good user-friendly pieces of software you have to pay to get the most useful features.  With Overleaf, the free service provides a workable solution.  Overleaf allows you to access your LaTeX documents through a web browser, and multiple people can edit the same online version.  In the free version there are some basic bells and whistles, like being able to archive your work.  I found this a bit confusing at first because I thought it was setting up multiple active versions with some kind of forking process.  However this is not the case.

By combining Overleaf with git I have been able to fork the development process: I can edit one branch on my local computer (using my preferred LaTeX editor and compiler), while another person edits a different branch in the online version, or potentially on another computer.  Using git also makes it easy to create a change log, and visualise differences between different versions, although this doesn't work quite as well for paragraphs of text as it does for code.   Unless you put lots of line breaks into your paragraphs, you can only see which paragraphs have changed, and not which individual sentences have changed.

In the news...

2016 is drawing to a close and it has been a pretty shocking year for a lot of people in terms of national and global news.  In the last few weeks, I have found an increasing tendency for people to be silent - to not want to talk about certain issues any more (you know what I mean - the T word and the B word).  I guess this is partly because some topics have been talked to death, and nothing new is emerging, while a lot of uncertainty remains.  However I also find it a bit worrying, that people may no longer be capable of meaningful engagement with people of different opinions and backgrounds.  One thing I have become more convinced of over the last year is that blogs and tweets etc. are not a particularly helpful way of sharing political views (a form of silent outrage!?)  So maybe the less I say here the better, even though I do remain passionately interested in current affairs and am fairly opinionated.

And in other news...

I have a baby boy!  Born 4 weeks ago - both him and my wife are doing well.  In the first 2 weeks I took a break from my PhD, and it was a bit like being on holiday, in that we had a lot of time, and a lot of meals cooked for us (by my wonderful mum).  It hasn't all been plain sailing, but I am now under oath not to share the dark side on parenthood - especially not with non-parents, in case it puts them off!  The last 2 weeks I have been getting back into my PhD.  It is quite hard finding a schedule that works.  We have a routine where he is supposed to be more active and awake between 5pm and 7pm, so that he sleeps well between 7pm and 7am.  I have been trying to do a bit of work after he is settled in the evening and found it fairly challenging to be motivated and focused at that time.  I have been wondering whether it would work better to try and get up before him in the mornings.  I guess it will probably be challenging either way.

Tuesday, September 6, 2016

Learning about learning

I recently attended the INCF (International Neuroinformatics Coordinating Facility) short courses and congress in Reading.  It was quite wide-ranging with some people working primarily on MRI imaging, others on modelling of synaptic plasticity and learning algorithms, and quite a few other topics.

One area I was not really aware of before the conference was neuromorphic computing, which is about designing and building computing hardware based on principles of how the brain does computation.  At the INCF short courses, this was presented by Giacomo Indiveri, and I subsequently looked at an introductory article by Steve Furber, who has lead the SpiNNaker project,

http://digital-library.theiet.org/content/journals/10.1049/iet-cdt.2015.0171

I am quite impressed by the dedication of people working in this field.  Steve Furber says in his article that SpiNNaker has been 15 years in conception and 10 years in construction.  This is enabling fast simulation of large-scale neural models, such as Spaun.  On a standard computer, Spaun requires 2.5 hours of computation per second of real time.  The system can perform simple cognitive tasks such as reinforcement learning and arithmetic.  SpiNNaker aims to run Spaun in real time.

In the next few years, as part of the Human Brain Project, SpiNNaker will be used for larger models, and presumably be tested on progressively more demanding cognitive tasks.  From my perspective, I am interested to see how large-scale neural models of biological intelligence will compare to engineered intelligence systems such as deep neural networks.

Engineered intelligence is free from the constraint of having to be faithful to biology.  This gives it a massive advantage over simulated neural models when it comes to performing tasks.  Ideas from biology have been influential in machine learning and artificial intelligence, but they have been heavily supplemented by numerical analysis and statistical computing.

At the moment many machine learning algorithms require huge amounts of computing power.  So it will be interesting to see whether any new hardware emerges that can bring this down.  It would be cool if state-of-the-art machine learning algorithms that today require the use of a supercomputer, could be run on an affordable battery operated device.  And it will be interesting to see if the new neuromorphic machines that are emerging will drive engineers and scientists to further develop learning algorithms.

Monday, August 1, 2016

Summer reading

I have recently been reading 'grit: the power of passion and perseverance' by Angela Duckworth, which I have found both fascinating and persuasive.  Duckworth is a psychologist, interested in the differences between people who are talented but relatively low achievers compared with people who are high achievers.  One of the main messages of the book is that talent counts but effort counts twice.

Determination, persistence, constancy, tenacity, and focus, especially in the face of setbacks and challenges appear to have a much larger effect on what people achieve than natural talent or innate giftedness.

I wish I could say these were all things I possessed in abundance, but I do not think that is the case.  Nevertheless there is cause for hope as grit appears to increase with age.  And perhaps being more aware of the importance of these qualities helps to cultivate them more.

In parallel I have been reading the Pickwick Papers by Charles Dickens, which tells the story of a group of kind hearted friends who travel around rural 19th century England, making new friends and getting into various kinds of trouble.  It is quite good fun, but, in my opinion, not as well written as some of his later work such as Great Expectations.  Perhaps a case in point where passion and perseverance on a single goal over a long period of time can lead to great things.

Monday, June 20, 2016

ISBA 2016

I got back from ISBA 2016 at the weekend, having spent a week at the picturesque Forte Village resort in Sardinia.  Last weekend also happened when UK astronaut Tim Peake returned from having spent 6 months on the International Space Station.  Although I am sure returning from space requires more adjustment than returning from an international conference, I do feel like a bit like I have returned from another planet!

I cannot pretend to give an expert's view of the conference since there were many people there with decades more experience than me.  The age distribution of the conference was heavily weighted towards young researchers (perhaps partly as a result of the generous travel support targeted towards this group).  Nevertheless the age distribution was very wide with a fair number of people there in their seventies.  One of these was Adrian Smith, who came to the first of the Valencia meetings, and gave an interesting perspective on how Bayesians have gone from being outsiders and regarded with a high degree of scepticism to being a dominant force in the world of statistical analysis.  A simple illustration of this is the numbers at the conference which have grown from around 70 to around 700 over the course of around 40 years.

One feature of the conference that has remained the same (and perhaps a key ingredient to its continuing success!?) is the cabaret, which features Bayes inspired entertainment.  The proceedings of the first Valencia meeting (which can be found here - http://www.uv.es/bernardo/Valencia1.pdf) printed the song "There's no Theorem like Bayes Theorem" to the tune of "There's no Business like Show Business" by the distinguished statistician G.E.P. Box.

I would strongly advise against searching for a YouTube rendition of Box's song.  I do not know whether Box was as good a musician as he was a lyricist (and statistician), but his followers certainly seem to have a rather deficient sense of pitch and harmony.

Here are a few reflections on the academic program of ISBA 2016.

A lot of the talks fell into one of two broad categories.  On the one hand, some talks focused on general inference problems, and the development of methodology that should be applicable to a wide range of problems in various application areas.  On the other hand, some talks focused more on a specific application area, and looked at the challenge of adapting quite general statistical ideas to specific research questions.

The presenters who I found most stimulating were Adrian Raftery on demography and Sylvia Richardson on single-cell gene expression.  These were both from the second category of talks (i.e. more oriented to a specific application), but the researchers have both also done important work on the first category (i.e. development of generally applicable statistical methodology).  For me, their work demonstrates the value of working in both areas.  They both have an impressive ability to identify problems that benefit from bayesian analysis.  In Adrian Raftery's demography work, the novel application was the quantification of uncertainty in country-specific fertility rates by pooling information across countries through a hierarchical model.  Sylvia Richardson's work on gene expression also used a hierarchical model, but in this case to quantify uncertainty in cell-specific gene expression levels, again by pooling information across cells.  The main reason the bayesian approach is so effective in these problems is the small amount of data that is available per country (in demography) or per cell (in gene expression).

Although I found some of the presentations on general methodology quite stimulating (such as Peter Green's keynote lecture on structural learning), there were quite a few presentations which I felt were not well motivated, at least not in a way that I could understand.  One area where there were quite a few presentations this year was Bayesian variable selection for linear regression models.  In that setting you assume that the data is i.i.d. and the variable selection can be that of as a kind of model uncertainty, often encoded through the choice of prior.  The reason I am somewhat sceptical about this kind of research is (i) the linear regression model may not be sufficiently complex to model the true relationships that exist between the variables in the dataset, (ii) if the linear regression model is appropriate, then the most important predictors for a given response can usually be picked up through a non-bayesian analysis such as a forward selection or a backwards elimination algorithm.  This is based on my experience of fitting and using regression models as a practitioner in operational research.

To wrap up, I am deeply grateful to all the people who make the bayesian analysis community what it is, both through their research findings, through the hard administrative labour that must go into organising large scientific meetings, and through personal warmth and encouragement.  I hope that it continues to be a vibrant community with vigorous exchanges between scientific applications and mathematical theory.

Monday, May 23, 2016

ABC in Helsinki

Last week I went to a conference on ABC (Approximate Bayesian Computation) that was held on the Silja Symphony cruise ship between Helsinki and Stockholm.  Thank you to Michael Gutmann and the other organisers for the opportunity to go on an exotic nordic odyssey with a stimulating scientific programme!

I came to the conference as something of an outsider, as I have not used ABC in my research so far.  The main reasons I am interested in the area is that my supervisor Richard has done some work on ABC, with applications mainly focused on population genetics.  And a lot of people who work with differential equation models are attracted to it.

I was interested in answering the following questions.  When is ABC the preferred method compared to other statistical methods?  And what situations is the use of ABC is to be avoided?

The short answer to the first question is that ABC is used when the likelihood of the model is not available.  This may be because the likelihood is an intractable integral.  For example, in population genetics the likelihood can only be obtained by integrating over the space of phylogenetic trees.  As an aside, there is an alternative method for the population genetics example, which is the pseudo-marginal method.  It is not clear to me whether ABC or pseudo-marginal methods are preferable for this example.

As well as situations where the likelihood is intractable (another example of this is the transition density in nonlinear SDEs), there may be situations where it is not possible to write down an expression for the likelihood at all.  I have not encountered any situations like this myself, although I am told that a lot of agent-based or rule-based models fall into this category.  In Reading there are people using these kind of models to describe population dynamics of earthworms and the interaction with pesticides.

Several of the conference talks (Richard Everitt, Richard Wilkinson and in an indirect way, Michael Gutman) focused primarily on the use of ABC in expensive simulators, such as climate simulators or epidemiology models.  These talks mainly confirmed my view prior to the conference that expensive simulators should be avoided wherever possible because there are severe limitations on what you can do for parameter inference.

On a more positive note, there were examples in the talks where ABC was made tractable in models with 1 or 2 unknown parameters by combining ABC with other methods such as bootstrapping (RE) and Gaussian Processes (RW and MG).   The basic idea, is that if we knew the probability distribution of the model's summary statistics as a function of the unknown parameters, we would be done.  Tools like bootstrapping and Gaussian Processes can be used to estimate the summary statistics' probability distribution.  They can also quantify uncertainty in estimators.  This is not ideal since the relatively small sample size for these estimators means that the error in the estimators may be quite large and difficult to quantify accurately.  However if you are only interested in classifying parameter sets as plausible vs non-plausible, or you only need estimates of mean parameter values, you may not need that many samples.

It is unclear to me what you would do when you have an expensive simulator with more than 2 unknown parameters.  I am not sure that the methods presented in the conference would work that well in that setting, without (i) a lot of computing power, and/or (ii) strong assumptions on the probability distribution of the summary statistics as a function of the parameters.

One area that was not covered at the conference but that I would like to know more about is what to do when you have a deterministic simulator, like an ODE model.  I have come across this situation in the literature, for example in the work of Michael Stumpf's group, where ABC has been used.

Suppose that a deterministic simulator is a perfect model of the process we were interested.  Then repeated samples from the true process should yield exactly the same data.  Furthermore, we should be able to identify a parameter set that perfectly reproduces the experimental results.  (Or multiple parameter sets if the model is non-identifiable.)  In practice, a far more common situation is that repeated samples from the true process give varied results even though the process model is deterministic.  With a deterministic simulator there is no way of accounting for this variability.  Two options in this situation are (i) introduce some stochasticity into the model, for example observation noise, (ii) define ranges for the summary statistics so that parameter values are accepted if the deterministic simulation results in summary statistics that fall within the pre-specified ranges.

If we choose option (i), I would then suggest using a likelihood based approach, and if that didn't work, then trying ABC.  If we choose option (ii), this fits within the ABC framework.  However in the standard ABC framework, the effect of reducing the tolerance on the posterior should diminish as the tolerance goes to 0.  I.e. for small tolerances, dividing the tolerance by 2 should have a very small effect on the credible regions for the parameters.  If you have a deterministic simulator, the effect of reducing the tolerance will not diminish.  I haven't got all this worked out rigorously, but intuitively it seems like dividing the tolerance by 2 will always decrease uncertainty by a factor of 2.  Furthermore the limit (tolerance = 0) is singular with respect to all non-zero tolerance values.

A quick google scholar search shows that Richard Wilkinson has thought about these issues.  The gist of what he is saying seems to be that rather than treating the tolerance as a parameter that controls the accuracy of the approximation, it can be thought of as a parameter in a probabilistic model for the model error.  If anyone knows of any other work that discusses these issues, please let me know!  As you can see I am still rather a long way from being able to provide practical guidance on the use of ABC.

Monday, May 9, 2016

Parallels between finance and academic research environments

I recently went to an event titled 'Heart of the City: What is Really Wrong with our Financial System?' where John Kay and Joris Luyendijk were speaking.  John Kay is an economist and columnist for the Financial Times.  Joris Luyendijk is an anthropologist by training who has done some financial journalism for The Guardian.

It was interesting learning about the workings and culture in the City, and I also found myself wondering about parallels in other walks of life.

John Kay emphasised that traditionally financial services have provided useful services by enabling capital accumulated by one individual or corporation to be borrowed and used for investment by another individual or corporation.  In recent years this idea of banking as a service has been lost as banks have sought to maximise short-term profits by trading worthless bits of paper for a lot of money and taking a cut each time (this is somewhat simplistic - John Kay explained it better!)

While it is useful to establish what is going on the financial sector, it is also important to understand why this is happening.  What are the causes for the unhealthy changes seen in finance?  Joris Luyendijk's answer to this was that liquidity has replaced of loyalty.  By loyalty he was referring to the loyalty that the employees of city banks have for the companies they work for.  And by liquidity he was referring to the volume of transitions between banks made by city workers.  His view was based on a fascinating collection of evidence obtained from interviewing over 200 city workers, which he wrote up as a book, "Swimming with Sharks."

For example, it is apparently very easy to lose your job in the city if you do not conform to what your manager asks of you, or if you miss the ambitious profit targets that banks set.  People are terrified of losing their jobs and this means that they are willing to compromise on almost anything, whether that is working extremely long hours, or not blowing the whistle on other's malpractice.  This leaves very little time for thinking about whether the bank's end-product has genuinely been of service to the bank's customers.

For me, all this raises some interesting questions for academic research.  Should academic research be thought of as a service?  If so, who is it a service for?  I have met several academics who see their work primarily as finding out about interesting things for their own sake.  This view is sometimes justified by saying that historically many important advances have come about by exactly those type of people being free to pursue their interests.  Personally I think it is useful to think of academic research as serving some purpose, as this helps to direct where effort should be applied.  However I do acknowledge that it is not always possible to identify in advance which discoveries will lead to the greatest impact.

In terms of liquidity, there are some shortcomings to the current academic research system.  Academics can often be under a lot of pressure to produce short-term results in the form of publications in high-profile journals, so that they can win research funding and permanent positions.  This can sometimes get in the way of longer-term research goals.

However I think on the whole I think there should be loyalty and liquidity in academia.  It is a good thing for academic researchers to move between research groups, picking up different tools and perspectives in different places.   And it is also good  for academics to have long-term affiliations to a group or a particular academic sub-community to allow for relationships to develop and for paths to be fully explored.

Wednesday, April 20, 2016

In (reflective) praise of DJCM

The scientist, David J. C. MacKay, passed away recently.  I have always been impressed by the wide range of people he influenced, but I was particularly surprised, at his passing, to see what a diverse range of people paid tribute to him, from young scientists working in a range of disciplines who were taught by him and use his Information Theory, Inference, and Learning Algorithms book, to public communicators of science, a relatively small and exclusive group which he effectively joined when he wrote his book 'Sustainable Energy - without the hot air.'

My favourite tribute was a tweet from David Spiegelhalter, who said of him,

"probably the most intelligent, principled, and fun person I shall ever know."

I am not sure there is much I can add to that.  He taught me when I was a master's student in Cambridge.  Among all the lectures I have been to, his were probably the most intelligent and fun.  And to see his principles, one only had to look at his jumpers, which were clearly designed to eliminate any need of central heating.

I think it would not be an understatement for me to say that I idolised him.  Some people idolise great rock stars, or great football players.  For me, at the tender age of 21, it was David MacKay.

I really thought, here is a man who has the power to change the world through the force of his intelligence.  I have a more nuanced view now, both of what it is possible to change, and of the types of people who are needed to achieve that change.

There are others who I think still see him as an idol.  In his obituary, Mark Lynas quotes David MacKay saying, “Please don’t get me wrong: I’m not trying to be pro-nuclear. I’m just pro-arithmetic.”  There is obviously something important here, that we should try and quantify the costs and benefits of different options for energy production and consumption.  However, there also seems to be an implicit suggestion, which I think is misguided, that if everyone was good at arithmetic, we would somehow be able to solve all the world's problems.

In the same obituary, David MacKay is described as a true polymath.  Again there is truth here, in that he was able to move nimbly between different scientific fields, from error-correcting codes in computer science, to information theory in genetics, to neural networks for machine learning, to spin models for particle physics.  The list could go on and on.  However there is a recurring theme, which is the description of a physical system by a mathematical model.  David MacKay was great at analysing models and doing inference for them, and he had a very good understanding of probability that allowed him to apply a relatively small set of principles to a wide range of of scientific problems.

Nevertheless, I am uncomfortable with the epitaph 'true polymath'.  As far as I am aware, David MacKay had no great interests outside of science and the application of science in public policy.  This is not a criticism of him as a person, I think it is very important that such people exist in society.  However, there are many other things in life to enjoy and to be curious about - literature, food, music, philosophy etc.  I think that a 'true polymath', of which there are very few, should be able to think in several different ways.  Descartes is a good example.  He invented cartesian coordinates and he probed the nature of the soul / existence.

What does the future hold for the areas where David MacKay did have greatest impact?  And how can we best ensure that his work and life live on in some sense?  There is clearly still a lot of work to be done.

Within statistical science and machine-learning there is still far too much of a tendency for people to rely on statistical tests that they don't understand very well (such as calculating p-values), and to just keep on trying statistical methods until something works, without really thinking problems through from first principles.  Statisticians need to spend more time (as David MacKay did during his lifetime) teaching people about probability, particularly scientists.

Within energy and climate policy, the all-out war between climate scientists and climate sceptics seems to have died down, but without either side really having learnt very much from the other.  Government work is modularised between different departments, which can create deep divisions.  An example is between the Treasury, which sees things through a 5 year economic prism, and the Department for Energy and Climate Change (DECC), which, put somewhat simplistically, thinks that reducing carbon emissions in the UK should be the number 1 priority of government.  There is a lot more that could be said about this, and I am probably not the best-placed person to say it.  However I do think more needs to be done to develop a sustainable economy and preserve our environment in the long-term (i.e. in the 100 year time-frame), while also attending to the natural desire most people have to live a comfortable life in the short-term (i.e. in the 5-10 year time frame).

I would like to finish on a personal note.  I remember talking to David MacKay about doing a PhD with him.  (This was before I knew he was leaving academia to do public policy work.)  He asked me what I was interested in researching.  His question caught me slightly off-guard, as I hadn't really thought about it that much.  I was also a bit embarrassed about sharing the half-formed ideas I did have with someone who I was in awe of.  In the end that conversation didn't directly lead to anything, but it did help to instill in me a strong sense of curiosity, and a desire to identify research problems that are interesting and important to me.  This is something that I think all academics should be cultivating, both in themselves and in others.

Wednesday, March 9, 2016

Machines surpass humans in gaming world

This week is significiant as being the first time that a computer has beaten the world's best Go player.

http://www.theguardian.com/technology/2016/mar/09/google-deepmind-alphago-ai-defeats-human-lee-sedol-first-game-go-contest

The contest is a best of 5, with only the first game played, so it could all change, but at the time of writing AlphaGo, the machine that Google DeepMind have trained to play Go, is ahead.

Why is this significant?  In the 1990's chess IBM DeepBlue beat the world chess champion Gary Kasparov.  Two of the key ingredients of this success were brute force search algorithms, and a knowledge of chess strategy that was hard-wired into the computer program.

The number of possible Go games is much greater than the number of possible chess games because there is a much wider choice of moves in Go than in Chess.  When professional Go players are asked how they decide their next move, they say they rely havily on intuition, whereas a professional chess player will always know why they made a particular move, making it a lot easier to program a chess strategy to a computer.  So Alpha Go is impressive because it is playing a harder game, and one in which professional players depend on highly-developed sub-conscious decision making processes.

But what I find even more interesting is that the algorithms used to train AlphaGo are very general-purpose algorithms, that have been applied to a diverse range of Artificial Intellgence problems.  For example Google DeepMind used the same ideas / framework to train a computer to play Atari video games at the level of a professional video games tester.

Demis Hassabis said he was more impressed by Gary Kasparov than DeepBlue when Kasparov was beaten.  Gary Kasparov can do many other things apart from play chess - speak several languages, write books, tie his shoe-laces etc., whereas DeepBlue is a specialised intelligence for playing chess.  We are still a long way from machines being able to outperform humans in every domain, but I think it is fair to say that the last 20 years have seen a lot more progress in general-purpose machine intelligence than the last 200 years have seen progress in human intelligence.

Will machine intelligence reach some kind of saturation well below human levels?  Or will it catch up?  Or even surpass human intelligence?  The third option is starting to seem like an increasingly plausible idea to me.

For more, watch Demis Hassabis's recent lecture in Oxford,

https://podcasts.ox.ac.uk/artificial-intelligence-and-future

Tuesday, February 2, 2016

A statistician's bookshelf

What should a statistician have on their bookshelf?  In an age where most academics have access to electronic journal archives and Wikipedia is a fairly reliable source of information for basic facts, some statisticians might say they don't need anything on their bookshelf.

I often find myself eyeing up other people's bookshelves.  My co-supervisor in Reading, Ingo Bojak, has quite a voluminous bookshelf covering advanced particle physics and neuroscience.  A large personal library can be quite imposing in a way, making you feel small.  But it can also inspire curiosity.  A couple of the conversations I have recently had with Ingo ended with him picking a book off his shelf and lending it to me.  It can feel like a rite of passage, being lent a book, and in this case it was also practically useful because Ingo has books that are not in the University of Reading library.

My main supervisor, Richard, has quite a different approach to his bookshelf.  As well as a handful of core Statistics books, mainly related to his undergraduate teaching, he has a lot of proceedings - from the Valencia meetings, a series of influential meetings in Bayesian Statistics that has now morphed into the ISBA conference, and from Read Papers at the Royal Statistical Society.  So occasionally when we are meeting, he will say, 'Oh yes, I remember somebody talking about something to do with that at a conference / Read Paper I went to 15 years ago' and then be able to look it up in the proceedings to remind himself of more of the details.

The other interesting thing I have seen on his, and on people's bookshelves is undergraduate / postgrad course material, which seem to provide useful reference material for some people, even several decades after the end of their course.  My dad, who is an engineer for Total, recently commented that he wished he had had his own MSc thesis from the late 70's at a meeting recently because he wanted to look something up in it that was relevant to what they were discussing.

I think most of the books I have on my bookshelf have grown out of successful undergraduate / early graduate lecture courses (see photo below).  A good lecture course distills the lecturer's knowledge, bringing out the ideas and methods that they think are useful for practice or research in the subject.  A good textbook is one where you feel like you can converse with the author(s), asking 'How would you go about... ?', and getting a specific answer back.

I have recently been finding the Shumway & Stoffer book on time series tremendously useful.  It is freely available online, and I used it in that format for quite a while, but I am glad I have a hard copy of it now, as I find it much easier to read from paper.  They have a very skilful way of presenting many important results from the time series literature (which is vast) in a coherent way, tying together time-domain and frequency-domain approaches, which are often treated separately by different authors.

The Mathematical Foundations of Neuroscience is a book I am currently borrowing from the Reading library, and has been quite useful for getting a better understanding of the model that I am doing statistical inference for.  I think this is something that statisticians should do more of - finding out more about the models and underlying science in whatever application area they are working in.  Anecdotally, I have found lots of scientists that need help from statisticians, but can only be helped if a statistician is prepared to invest the time in actually learning something about the scientist's subject.

The other textbooks on my bookshelf cover a varied range of topics within statistics and machine-learning.  They have proved practically useful on a number of occasions.  Here are a few specific examples (which are far from exhaustive in terms of what the books cover),

  • Monte Carlo algorithms (MacKay)
  • Kalman Filters (Bishop)
  • Regression analysis (Ramsey & Schafer)
  • Boosting (Hastie, Tibshirani & Friedman)
  • Basic hierarachical models (Gelman et al)
  • Dynamic hierarachical models (Cressie & Wikle)

I sometimes flick through the books and think that there is still an awful lot there that I haven't looked at.  Who knows what these books might still have to teach me?


Tuesday, January 5, 2016

The Laplace family


Last year I went to a reading of letters written by my great-great-grandfather's family during the 1st World War.  Although they were not particularly wealthy or famous, it was nevertheless fascinating to hear about what they were all doing in 1915.  It was also quite humbling, in that they, along with millions of others across Britain and further afield, did great things - leaving home at a young age to fight in a foreign country, looking after the wounded both in the field and closer to home, and working in government at a very difficult time for the country.  It was also fascinating to meet other members of my extended family, some of whom I had never met before.  Despite not having met them, I did feel as though there was a shared identity, that somehow the experiences of the family from 100 years ago had somehow been transmitted down all branches of the family tree.

Around the same time last year, I went to the funeral of Alexei Likhtman, a maths professor at Reading who tragically died in a hiking accident.  Hearing some of his closest colleagues and students speak at the funeral really brought it home to me how strong the bonds between academic colleagues are.  Several of them described the relationship as being like family, and this analogy is especially apt for the relationship between a PhD supervisor and their student.

So all of this has made me more curious to find out more about my academic family.  I have known for a while that my supervisor's supervisor was Peter Green, quite a prominent statistician, but until recently I hadn't traced my academic lineage back any further.  The Mathematics Genealogy Project made it easy for me to find out more.  The first thing I found out was that Peter Green's line goes all the way back to Laplace through Poisson and Dirichlet.  Among other things, Laplace discovered Bayes' theorem (independently of Thomas Bayes), and applied it to problems in astronomy, as well as making significant contributions to probability theory.

By looking up several other prominent statisticians, I developed the family tree shown below.  It is by no means complete - Laplace has around 88,000 descendents recorded on the Mathematics Genealogy Project!  Nevertheless is gives an interesting insight into the relationships between a few of today's top statisticians.

As well as the Laplace family, the Newton family is also quite interesting, although somewhat more exclusive at a mere 12,200 recorded descendents.  There is a line directly from Newton to Fisher, as well as to Galton and Pearson.  And, every person on those lines studied / researched at the University of Cambridge.  This is quite different from the Laplace family who seem to have moved around a lot more. In the 20th century, the Newton family finally managed to break free from the Cambridge bubble.  The descendents of Fisher, Pearson and Galton are spread across the UK, and across the USA, and include Andrew Gelman (from the Galton branch).

Similarly to finding out about my biological family, I have found it quite fascinating and humbling to find out more about my academic family - where I come from and who I am related to.  It is also tantalizing in the sense that the there is so much information not recorded in the Mathematics Genealogy Project.   Laplace is only recorded as having one student and Newton only two - surely they must have had direct influence on a much greater number of people!?  And there also no information about post-PhD collaborations.  For example, on Herbert Scarf's wikipedia page, it says that he travelled between Bell Labs and Princeton every day in the summer of 1953 with John Tukey.  So does that make Tukey an adopted member of the Laplace family, and a great-great-great Uncle of me?  It seems we are all more closely related than we might think.