A New Website Is Born: Estimite.com

Forecasting the 2021 Norwegian Election using R and Stan

Bayesian modeling has proven its usefulness for poll aggregation and election forecasting – for instance through FiveThirtyEight. However, in Norway, media coverage of public opinion trends still tends to focus on a single poll at a time, or a simple average of polls at best. I was convinced it would be possible to do better, and after about three times as many long days and nights as I thought it would take, the result is finally live – both in Norwegian and English – at Estimite. [Read More]

The Problem with Google’s Journal Rankings

Why they should not be used to assess research quality

I keep seeing Google’s journal rankings being used to measure research quality, and I want to explain why I think this is a bad idea. One could, of course, argue that citations do not reflect research quality anyway and therefore are of little relevance, but that is not the point I want to make here. I will simply run a small set of simulations to illustrate a key problem with how Google’s h-index is being used. [Read More]

How Efficient is Stan Compared to JAGS?

Conjugacy, pooling, centering, and posterior correlations

For a good while JAGS was your best bet if you wanted to do MCMC on a distribution of your own choosing. Then Stan came along, potentially replacing JAGS as the black-box sampler of choice for many Bayesians. But how do they compare in terms of performance? The obvious answer is: It depends. In fact, the question is nearly impossible to answer properly, as any comparison will be conditional on the data, model specifications, test criteria, and more. [Read More]

Bayesian Hierarchical Modeling

Comparing partially pooled and unpooled models in R

I used to think so-called multilevel models were a little boring. I was interested in causal inference, and the people using these models did not seem to have better causal identification strategies than those running plain old regressions. I have gradually come to change my mind on these models, although it is not because I think they solve challenges of causal identification. It is rather because I think a large share of our data can be thought of as hierarchical, and that proper modeling help us make the most of such data. [Read More]

An Introduction to Markov Chain Monte Carlo Sampling

Writing and diagnosing a Metropolis sampler in R

It is usually not too difficult to define priors and specify a likelihood function, which means we can calculate the unnormalized posterior for any combination of relevant parameter values. However, that is still insufficient to give us marginal posterior distributions for the parameters of interest. The grid method that was used in the previous post is not feasible for situations with a large number of parameters, and conjugate models with analytical solutions are mainly relevant for a subset of suitable problems. [Read More]

The Basics of Bayesian Inference

Evaluating continuous distributions over a grid in R

The goal of data analysis is typically to learn (more) about some unknown features of the world, and Bayesian inference offers a consistent framework for doing so. This framework is particularly useful when we have noisy, limited, or hierarchical data – or very complicated models. You may be aware of Bayes’ theorem, which states that the posterior is proportional to the likelihood times the prior. But what does that mean? This post offers a very basic introduction to key concepts in Bayesian statistics, with illustrations in R. [Read More]