Likelihood ≠ Bayesian with flat prior
The likelihood function, and associated the confidence interval, are not the same (concept) as a Bayesian posterior probability constructed with a prior that specifies a uniform distribution.
In part 1 and 2 of this answer it is argued why likelihood should not be viewed as a Bayesian posterior probability based on a flat prior.
In part 3 an example is given where the confidence interval and credible interval are widely varying. Also it is pointed out how this discrepancy arises.
1 Different behavior when variable is transformed
Probabilities transform in a particular way. If we know the probability distribution distribution fx(x) then we also know the distribution of fξ(ξ) for the variable ξ defined by any function x=χ(ξ), according to the transformation rule:
fξ(ξ)=fx(χ(ξ))dχdξdξ
If you transform a variable then the mean and the mode may vary due to this change of the distribution function. That means x¯≠χ(ξ¯) and xmaxf(x)≠χ(ξmaxf(ξ)).
The likelihood function does not transform in this way. This is the contrasts between the likelihood function and the posterior probability. The (maximum of the) likelihood function remains the same when you transform the variable.
Lξ(ξ)=Lx(χ(ξ))
Related:
The flat prior is ambiguous. It depends on the form of the particular statistic.
For instance, if X is uniform distributed (e.g. U(0,1)), then X2 is not a uniform distributed variable.
There is no single flat prior that you can relate the Likelihood function to. It is different when you define the flat prior for X or some transformed variable like X2. For the likelihood this dependency does not exist.
The boundaries of probabilities (credibility intervals) will be different when you transform the variable, (for likelihood functions this is not the case). E.g for some parameter a and a monotonic transformation f(a) (e.g. logarithm) you get the equivalent likelihood intervals
aminf(amin)<<af(a)<<amaxf(amax)
2 Different concept: confidence intervals are independent from the prior
Suppose you sample a variable X from a population with (unknown) parameter θ which itself (the population with parameter θ) is sampled from a super-population (with possibly varying values for θ).
One can make an inverse statement trying to infer what the original θ may have been based on observing some values xi for the variable X.
- Bayesian methods do this by supposing a prior distribution for the distribution of possible θ
- This contrasts with the likelihood function and confidence interval, which are independent from the prior distribution.
The confidence interval does not use information of a prior like the credible interval does (confidence is not a probability).
Regardless of the prior distribution (uniform or not) the x%-confidence interval will contain the true parameter in x of the cases (confidence intervals refer to the success rate, type I error, of the method, not of a particular case).
In the case of the credible interval this concept ( of time that the interval contains the true parameter) is not even applicable, but we may interpret it in a frequentist sense and then we observe that the credible interval will contain the true parameter only x of the time when the (uniform) prior is correctly describing the super-population of parameters that we may encounter. The interval may effectively be performing higher or lower than the x% (not that this matters since the Bayesian approach answers different questions, but it is just to note the difference).
3 Difference between confidence and credible intervals
In the example below we examine the likelihood function for the exponential distribution as function of the rate parameter λ, the sample mean x¯, and sample size n:
L(λ,x¯,n)=nn(n−1)!xn−1λne−λnx¯
this functions expresses the probability to observe (for a given n and λ) a sample mean between x¯ and x¯+dx.
note: the rate parameter λ goes from 0 to ∞ (unlike the OP 'request' from 0 to 1). The prior in this case will be an improper prior. The principles however does not change. I am using this perspective for easier illustration. Distributions with parameters between 0 and 1 are often discrete distributions (difficult to drawing continuous lines) or a beta distribution (difficult to calculate)
The image below illustrates this likelihood function (the blue colored map), for sample size n=4, and also draws the boundaries for the 95% intervals (both confidence and credible).
The boundaries are created obtaining the (one-dimensional) cumulative distribution function. But, this integration/cumulation can be done in two directions.
The difference between the intervals occurs because the 5% area's are made in different ways.
The 95% confidence interval contains values λ for which the observed value x¯ would occur at least in 95% of the cases. In this way. whatever the value λ, we would only make a wrong judgement in 95% of the cases.
For any λ you have north and south of the boundaries (changing x¯) 2.5% of the weight of the likelihood function.
The 95% credible interval contains values λ which are most likely to cause the observed value x¯ (given a flat prior).
Even when the observed result x¯ is less than 5% likely for a given λ, the particular λ may be inside the credible interval. In the particular example higher values of λ are 'preferred' for the credible interval.
For any x¯ you have west and east of the boundaries (changing λ) 2.5% of the weight of the likelihood function.
A case where confidence interval and credible interval (based on improper prior) coincide is for estimating the mean of a Gaussian distributed variable (the distribution is illustrated here: https://stats.stackexchange.com/a/351333/164061 ).
An obvious case where confidence interval and credible interval do not coincide is illustrated here (https://stats.stackexchange.com/a/369909/164061). The confidence interval for this case may have one or even both of the (upper/lower) bounds at infinity.