The pain of hypothesis testing — Frequentist vs. Bayesian
Bayesian statistics is about updating one’s belief about something as the new information becomes available. It allows the analyst to incorporate prior knowledge (e.g., based on prior data, how much difference in user engagement can we expect between publication types?). As new data arrive, we thus continuously re-allocate the credibility between different options until we have enough evidence. This procedure is an ideal candidate for meta-analysis in science and continuous analyses in the business context — possibly it will save you the resources spent on an analysis that would not turn out to be fruitful anyways. This is a big plus in terms of efficiency!
The most fundamental difference between the two paradigms seems to be the reversed understanding of what probability is. The term frequentist means that probability refers to sampling distributions of simulated data. Under the hood, the computation of p-values relies on a distribution of imaginary test-statistics (e.g., t-values) if we imagined that there is no difference between A and B to begin with. Thus, we can say something about how unlikely it is to observe such data in the light of the null hypothesis. By looking at the proportion of outcomes our sample would leave behind, we assess how extreme or unusual our observations are. In fact, this is what the famous p-value estimates: the estimated probability to observe the data given the null hypothesis. In contrast, Bayesian probability is based on the degree of reasonable belief in a specific hypothesis given the data. So, the interpretation is exactly reversed.
You would probably agree that this interpretation of significance is more intuitive and in line with our natural way of thinking: We start with the data and make inferences about the presence of effects in nature instead of the other way around. On the other hand, we are usually not interested in the likelihood of the data in light of the null hypothesis — something we actually do not believe in either. There is no need to perform these mental gymnastics for Bayesian statistics. Moreover, it is a bit humbler because we embrace the degree of uncertainty that comes with every statistical estimation. The reason behind this lies in something called probability distributions. If we take user engagement as an example, the analysis is based on information about the specific values it usually takes and with what probabilities. We have an idea about the success of a cause (e.g., of social media content) that is based on our expert knowledge and collect data to update our belief.
Bayesians analyse data exactly in the way learning works: we adjust our beliefs as we observe and thus continuously grow our knowledge.
If you are interested in more theoretical details on this topic, read my article on Frequentist vs. Bayesian statistics to predict the weather on my wedding.
The key differences in a nutshell
Bayesian A/B testing…
A short note on R packages
By now, there are two packages in R that specialise on Bayesian A/B testing, and are based on different assumptions. The bayesAB package (Portman, 2017) is based on the ‘Independent Beta Estimation’ (IBE) approach which assumes that:
The Logit Transformation Testing (LTT) approach overcomes these restrictions which is implemented in the abtest package by (Gronau, 2019) in R (R Core Team, 2020). This is what we will use for our upcoming case study to yield more informative results and if you are interested in the technical details, I recommend you to read the paper by Hoffmann, Hofman & Wagenmakers (2020).
Benchmark social media performance
A typical example from the field of marketing is analysing the user engagement from a company’s website to improve its popularity. In this case study, we will look at just that. Technically, user or customer engagement is defined as the voluntary and potentially profitable behaviour towards a firm. For example, it can manifest itself in the fact that a person is willing to draw attention towards a firm as well as prospect customers by word of mouth, writing a comment, sharing information or referring to a product. This should ring a bell to marketers because user engagement is essentially free, creates a beneficial relationship towards a brand, and may increase sales in the long run. But why not simply use the click-through rate, so the number of users who have actually clicked on the post or advertisement?
Because user engagement is a stronger indicator of the degree to which our social media content triggers some kind of interest that requires the user to pay attention. Specifically, we would like to know if videos engage the user more than photos. Databox even suggest that videos require users to focus more because they are not digested so easily like photos and therefore generate twice as many clicks and 20–30 % more conversions.
The dataset we will use stems from an open-source paper from Moro and Rita (2016) which include 500 posts from the Facebook’s page of a worldwide renowned cosmetic brand that were collected between the 1st of January and the 31th of December of 2014. There are 12 outcome variables that were analysed by the authors, such as the following:
To further characterise the posts we deal with, there are five other variables available: