Dr. James O. Berger is the Arts and Sciences Professor of Statistics at Duke University. He is the recipient of numerous major academic accolades for his research and service. He is a Fellow of the Institute of Mathematical Statistics (IMS), American Statistical Association (ASA), American Association for the Advancement of Science (AAAS), and the International Society for Bayesian Analysis (ISBA). He has served in numerous leadership roles such as President of the Institute of Mathematical Statistics (IMS), Chair of the Section of Bayesian Statistical Science of the ASA, President of the International Society for Bayesian Analysis, and Chair for the Advisory Committee for the NSF Directorate on Mathematical and Physical Sciences. He has directed 38 doctorates in statistics.
How did you become interested in statistics?
Prof. Berger completed all his degrees at Cornell University. He was originally interested in mathematics. Quite fortunately, there were three prominent statisticians in the mathematics department who piqued his interest: Larry Brown, Jack Kiefer, and Jack Wolfowitz. Eventually, he completed his thesis with Larry Brown and became a statistician.
Who were your influences?
“There were the three mentioned: Larry Brown, Jack Kiefer, and Jack Wolfowitz. At Cornell math there was a fourth influential statistician, Roger Farrell. All four of them have passed away, but they were my big influences at Cornell. Then I came directly to Purdue as an assistant professor. At Purdue, Herman Rubin was my big scientific influence. Having gone through a math program, there were a lot of things I did not know about statistics. Herman was the one who taught me a lot about statistics and statistical computing. Then there was Shanti Gupta, who was the department chairman; he was my mentor in many ways. Of course, the entire faculty was very strong in mathematical statistics, so I learned a lot from everyone.”
What was like being involved in the department at Duke?
Prof. Berger claims “[he] was always extremely happy [at Purdue]”. There were a collection of reasons for transitioning to Duke. “I moved to Duke [from Purdue] after my children had grown up and I had spent my entire scientific career (23 years [as a professor at Purdue]). I was interested in trying something new. Duke was also primarily, and this is still very much true today, a Bayesian statistics department. There were always Bayesians at Purdue, but I wondered what it would be like to be an in environment where everyone was a Bayesian. It was also the time that MCMC had just taken off. The Duke people were all developing the latest MCMC techniques. I was thinking that this would be something very useful for my research.”
How do we classify Bayesian thoughts?
“The three main classifications, I would say, are subjective Bayes, objective Bayes, and robust Bayes. Of course, there is also a huge difference between theoretical work and applied work in Bayesian analysis. Lately MCMC – and computation more generally – has become almost a separate world. Indeed, the MCMC computational tools have become so prominent that there are now purported frequentists who ‘do MCMC,’ not understanding that it is Bayesian.”
You were a contributor to the ASA Statement on Statistical Inference and P-values. What are your thoughts on this?
“I did not disagree with the final statement, but I was disappointed that there was not more. The problem is that the statement does not say anything that was not known eighty years ago… These things have been stated repeatedly for at least eighty years. When ASA made this effort, I was very hopeful that they would take the next step – to actually recommend something specific as an alternative to p-values.”
One can always say ‘don’t do this, don’t do that’, but the poor scientist is left wondering what to do instead and just falls back on continuing to use the P-value. I was hoping we could come up with some clear guidance on how to move forward.
The problem is that, while the contributors to the ASA statement could all agree that the way scientists use P-values is wrong and list the reasons it is wrong, the contributors could not reach any consensus about what to do instead.
The ASA allowed everyone to write their suggestions as to what should be done instead, and there is a supplementary website with these suggestions; but, again, most of the suggestions differ rather strongly leaving the scientist with no clear guidance. There was also a conference on this, and an upcoming issue of TAS devoted to the issue. The continuing ASA effort on this is highly welcome.
There was one fairly broad consensus that arose from enlisting a number of leading scientists from a variety of disciplines – mainly social sciences – and statisticians. The 72 leaders contacted all agreed that statistical practice would be much improved if we declared significance to be at the .005 level rather than the current .05. Again, many of the leaders had differing reasons as to why this would be a good change, but all agreed with the change.” This article is ‘Redefine statistical significance’ by Daniel J. Benjamin et al. published in Nature Human Behaviour 2, 6-10 (2018).
How has Bayesian statistics changed in your career?
Prof. Berger chuckles and says “It’s a lot more fun!” He then elaborates with “I became a Bayesian through talking with Herman [Rubin] and others. From a foundational standpoint, I believed the axioms of rationality. It all made sense to me. At first I was a lip-service Bayesian, but eventually I came to understand that Bayesian statistics is a very natural way to address applied problems; it is much easier to answer the problem and it is much easier to communicate with scientists. On the frequentist side, you cannot answer every question the scientist poses. You can only answer a list of possible questions. If the scientist’s question is not on the list, you have to say sorry. On the Bayesian side, any question the scientist poses is one you can attempt to answer. Thus the more applications I became involved with, the more convinced I became that Bayes was the way to go.”
What are good places in terms of books and websites for a general applied worker to learn about Bayesian statistics for a non-statistician hoping to integrate it into their research?
Prof. Berger recommends seeking out an introduction-to-Bayes book written in one’s own discipline as a starting point. There is a vast introductory Bayesian literature of this type.
For years you have expressed concern about false positives due to multiple testing effects and other biases. Do you see any progress in mitigating these problems– e.g., in the genetics field, in publication standards, etc., and how do you see the way forward?
“Yes, there is definite progress. … From 1997-2007 in Genome Wide Association Studies, very few ‘discoveries’ replicated because they were not dealing with multiple testing right. Suddenly, in 2007, things started replicating because they started dealing with multiple testing properly. Many fields of science at first do not understand what a severe problem multiple testing is, and they do not adjust enough. Then gradually they realize all their science is wrong because they are not adjusting enough, and somebody does something to change the field. That is progress. This happens more in the hard sciences and biology and medicine. In the social sciences, there are a lot more failures to adjust for multiple testing and other biases, and there is very little checking for reproducibility. So the problems are still there.
The other issue is that adjusting for multiple testing can be so difficult that there are many problems for which we do not know how to do it. People feel the need to produce answers so they do something, even if they know they have not adjusted properly. This a real area where we need more statistical research for the scientists – finding adjustments for multiple testing that the scientists can use.”
What do you think of the emerging field of data science?
“I was just a conference and on a panel, and this was one of the questions that was asked of the panel. My response was ‘I don’t really want to talk about it.’ The reason is that I understand there is an emerging field of data science and I understand what it is all about. I also understand it is going to be extremely important. I am all in favor of statistics departments adapting themselves to this field, but I do not feel personally that I am capable of giving advice on how to adapt. There is still an enormous amount of standard statistical problems that need to be solved. If someone wants to move in the data science direction, that is great, but (most?) statisticians do not need to do so.
For this reason, it is difficult to give statistics departments advice as to how to adapt to data science. The one recommendation I might make, regarding curriculum changes, is that it is important to give students options, but pursuing a traditional statistics degree should certainly be one of the options!”
How have statistics departments changed over your tenure?
“When I got my PhD, the highly respected thing was mathematical statistics. Applied statistics was acknowledged as important, but secondary (in academia) to mathematical statistics. When I was applying to assistant professorships, my mentors gave Purdue a strong recommendation because it was [at the time] one of the best departments in mathematical statistics. This primacy of mathematical statistics has changed over the years to where we are today, where all kinds of statistical research are respected. Indeed, the pendulum may have shifted the other way, with something done for pure mathematical statistics reasons being sometimes denigrated. Indeed, for that reason I encourage my students today to do more interdisciplinary research than theorem proving. It’s still good to prove a nice theorem or two, but you do not have to do so. Also, data scientists today have little to do with mathematical statistics (not a good thing).”
Do tenure processes fulfill the world’s current need for scholars in statistics?
“I think so. I do not see any problems with it. In times past there have been problems, particularly in the departments with a large number of people who only appreciated mathematical statistics, when someone highly interdisciplinary came up for tenure. But I don’t see this as much of a problem now.”
Are we pushing the boundaries of statistics at the same rate that has been done before?
“We are probably faster. As we are engaging the problems of the world, rather than self-generated problems, we are forced to advance much more quickly… not that all the rushed advancement is necessarily good (witness the history of the multiple testing problem). One area I work in these days is uncertainty quantification. It is the interface of statistics and data with complex computer models of processes (e.g., climate models). There can almost never be a firm statistical answer for most problems in this area, but we have to proceed anyway … It can be unsettling to work in an area and not be sure if your answers are rigorously based.”
Do you think statistics communication poses difficulties that other science communication fields do not have, in the sense of statisticians communicating with non-statisticians? If yes, why?
“It is a learning process. I directed SAMSI (Statistics and Applied Mathematics Sciences Institute) and its entire purpose was to bring people from different fields together to communicate. We had an elaborate process, starting with the opening workshops, to encourage communication. It is a difficult process, but necessary.
It is probably easier for statisticians to communicate with other scientists, than the reverse. Two scientists from other disciplines have a much worse time communicating than with a statistician. Partly, this is because most scientists have some knowledge of statistics. And most statisticians have experience communicating with those in other disciplines. Indeed, I feel it is quite important for statistics graduate programs to give their students the experience of talking to people in other disciplines.”
What are ways statisticians can improve their communication skills?
“I think most graduate students have consulting experience. And many graduate students are involved with large, interdisciplinary collaborations these days, in which they can learn how to communicate. And there are programs like the SAMSI program that have structures for doing this. Today, unless graduate students stick their heads in the sand, they will get exposed to communication with other scientists.
That said, when one moves to interdisciplinary work in a new discipline, one has to go through the learning process all over. One may understand the communication pitfalls from previous experiences, but one still has to learn enough of the language of the other discipline to be able to communicate, and needs to assess the statistical level of the scientists to know what to communicate.”
— Interview by Will A. Eagan —