มีการกระจายรูปที่ราบสูงหรือไม่?

30

ฉันกำลังมองหาการกระจายที่ความหนาแน่นของความน่าจะเป็นลดลงอย่างรวดเร็วหลังจากบางจุดห่างจากค่าเฉลี่ยหรือในคำพูดของฉันเป็น "การกระจายตัวของรูปที่ราบสูง"

บางสิ่งบางอย่างในระหว่าง Gaussian และเครื่องแบบ

distributions normal-distribution uniform

— dontloo
แหล่งที่มา

8

คุณสามารถรวม Gaussian RV และ RV เครื่องแบบได้

— StrongBad

3

บางครั้งได้ยินเกี่ยวกับการกระจายตัวของเพลทคอร์ติค

— JM ไม่ใช่นักสถิติเมื่อ

53

คุณอาจกำลังมองหาการกระจายที่รู้จักภายใต้ชื่อของGeneralized Normal (รุ่น 1) , การกระจาย Subbotinหรือการกระจายพลังงานแบบเลขชี้กำลัง มันถูกกำหนดด้วยตำแหน่ง $\mu$ , มาตราส่วน $\sigma$ และรูปร่าง $\beta$ ด้วย pdf

\frac{β}{2 σ Γ (1 / β)} \exp [- {(\frac{| x - μ |}{σ})}^{β}]

$\frac{\beta}{2\sigma\Gamma(1/\beta)} \exp\left[-\left(\frac{|x-\mu|}{\sigma}\right)^{\beta}\right]$

ในขณะที่คุณสามารถสังเกตเห็นสำหรับมันคล้ายกับลู่และการกระจาย Laplace กับมันลู่เข้าสู่ภาวะปกติและเมื่อเพื่อกระจายสม่ำเสมอ $\beta=1$ $\beta=2$ $\beta = \infty$

หากคุณกำลังมองหาซอฟต์แวร์ที่ติดตั้งไว้คุณสามารถตรวจสอบnormalpไลบรารี่สำหรับ R (Mineo และ Ruggieri, 2005) สิ่งที่ดีเกี่ยวกับแพคเกจนี้ก็คือเหนือสิ่งอื่นใดมันดำเนินการถดถอยด้วยข้อผิดพลาดการกระจายทั่วไปทั่วไปคือลดบรรทัดฐาน $L_p$

Mineo, AM, & Ruggieri, M. (2005) เครื่องมือซอฟต์แวร์สำหรับการแจกแจงพลังงานแบบเอ็กซ์โพเนนเชียล: แพ็คเกจปกติ วารสารซอฟต์แวร์เชิงสถิติ, 12 (4), 1-24

— ทิม
แหล่งที่มา

20

@ ความคิดเห็นของ StrongBad เป็นคำแนะนำที่ดีจริงๆ ผลรวมของ RV ที่เหมือนกันและ Gaussian RV สามารถให้สิ่งที่คุณต้องการได้อย่างแน่นอนหากคุณเลือกพารามิเตอร์ที่ถูกต้อง และมันมีทางออกแบบปิดที่ดีพอสมควร

ไฟล์ PDF ของตัวแปรนี้ถูกกำหนดโดยนิพจน์:

\frac{1}{4 a} [e r f (\frac{x + a}{σ \sqrt{2}}) - e r f (\frac{x - a}{σ \sqrt{2}})]

$\dfrac{1}{4a}\left[\mathrm{erf}\left(\dfrac{x+a}{\sigma\sqrt{2}}\right)-\mathrm{erf}\left(\dfrac{x-a}{\sigma\sqrt{2}}\right) \right]$

คือ "รัศมี" ของ RV เครื่องแบบที่มีค่าเฉลี่ยเป็นศูนย์ คือค่าเบี่ยงเบนมาตรฐานของ RV-gaussian zero ที่มีค่าเฉลี่ย $a$ $\sigma$

— สตีฟค็อกซ์
แหล่งที่มา

3

การอ้างอิง: Bhattacharjee, GP, Pandit, SNN และ Mohan, R. 1963 โซ่มิติที่เกี่ยวข้องกับการแจกแจงข้อผิดพลาดแบบสี่เหลี่ยมผืนผ้าและปกติ Technometrics, 5, 404–406

— ทิม

15

มีจำนวนอนันต์ของการแจกแจง "รูปที่ราบสูง"

คุณมีบางอย่างที่เฉพาะเจาะจงมากกว่า "อยู่ระหว่างเกาส์เซียนและชุดนักเรียน" หรือไม่? ค่อนข้างคลุมเครือ

ต่อไปนี้เป็นวิธีง่าย ๆ อย่างหนึ่ง: คุณสามารถติดครึ่งปกติได้ที่ปลายแต่ละด้านของชุด

คุณสามารถควบคุม "ความกว้าง" ของเครื่องแบบเมื่อเทียบกับขนาดปกติเพื่อให้คุณมีที่ราบสูงที่กว้างขึ้นหรือแคบลงให้การกระจายคลาสทั้งหมดซึ่งรวมถึง Gaussian และเครื่องแบบเป็นกรณี จำกัด

ความหนาแน่นคือ:

$\frac{h}{\sqrt{2\pi}\sigma} e^{-\frac{1}{2\sigma^2}(x-\mu+w/2)^2} \mathbb{I}_{x\leq \mu-w/2} \\ + \:\frac{h}{\sqrt{2\pi}\sigma}\quad\mathbb{I}_{\mu-w/2< x\leq \mu+w/2} \\ + \frac{h}{\sqrt{2\pi}\sigma} e^{-\frac{1}{2\sigma^2}(x-\mu-w/2)^2} \mathbb{I}_{x > \mu+w/2}$

where $h = \frac{1}{1 + w/(\sqrt{2\pi}\sigma)}$

As $\sigma \to 0$ for fixed $w$ , we approach the uniform on $(\mu-w/2,\mu+w/2)$ and as $w \to 0$ for fixed $\sigma$ we approach $N(\mu,\sigma^2)$ .

Here are some examples (with $\mu=0$ in each case):

We might perhaps call this density a "Gaussian-tailed uniform".

— Glen_b -Reinstate Monica
แหล่งที่มา

1

Ach! I love attending formal balls wearing a Gausian-tailed uniform! ;)

— Alexis

7

See my "Devil's tower" distribution in here [1]:

$f(x) = 0.3334$ , for $|x| < 0.9399$ ;
$f(x) = 0.2945/x^2$ , for $0.9399 \leq |x| < 2.3242$ ; and
$f(x) = 0$ , for $2.3242 \leq |x|$ .

The "slip-dress"distribution is even more interesting.

It is easy to construct distributions having whatever shape you want.

[1]: Westfall, P.H. (2014)
"Kurtosis as Peakedness, 1905 – 2014. R.I.P."
Am. Stat. 68(3): 191–195. doi:10.1080/00031305.2014.917055
public access pdf: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321753/pdf/nihms-599845.pdf

— Peter Westfall
แหล่งที่มา

Hi Peter -- I took the liberty of giving the function and inserting an image as well as giving a full reference. (If memory serves I think Kendall and Stuart giving the details of a similar debunking in their classic text. If I remember correctly - it has been a long while - I believe they also discuss that it's not heavy-tailedness)

— Glen_b -Reinstate Monica

Thanks, Glen_b. I never said kurtosis measured what the tail-index numbers measure. Rather, my article proves kurtosis is, for a very broad class of distributions, nearly equal to E(Z^4 * I(|Z| > 1)). Thus, kurtosis clearly tells you nothing about the 'peak,' which is typically found in the range {Z: |Z| <1}. Rather, it is determined mostly by the tails. Call it E(Z^4 * I(|Z| > 1)) if the term "heavy-tailedness" has another meaning.

— Peter Westfall

Also, @Glen_b which tail-index are you referring to? There are infinitely many. Tail crossings don't define "tailedness" properly. According to some tail crossing definitions of tail heaviness, N(0,1) is more "heavy-tailed" than .9999*U(-1,1) + .0001*U(-1000,1000), although the latter is obviously more heavy tailed, despite having finite tails. And, BTW, the latter has extremely high kurtosis, unlike N(0,1).

— Peter Westfall

I can't find me saying "tail index" anywhere in my comment; I am not quite sure what you're referring to there when you say "which tail-index are you referring to". If you mean the bit about heavy-tailedness the best thing to do is check what Kendall and Stuart actually say; I believe there they actually compare the asymptotic ratio of densities for symmetric standardized variables, but it might have been survivor functions perhaps; the point was theirs, not mine

— Glen_b -Reinstate Monica

Strange. Well, in any event, Kendall and Stuart got it wrong. Kurtosis is obviously a measure of tail weight as my theorems prove.

— Peter Westfall

5

Lots of nice answers. The solution proffered here has 2 features: (i) that it has a particularly simple functional form, and (ii) that the resulting distribution necessarily produces a plateau-shaped pdf (not just as a special case). I'm not sure if this already has a name in the literature, but absent same, let us call it a Plateau distribution with pdf $f(x)$ :

f (x) = k \frac{1}{1 + x^{2 a}} for x \in R

$f(x) = k \frac{1}{1 + x^{2 a}} \quad \quad \text{for } x \in \mathbb{R}$

where:

parameter $a$ is a positive integer, and
$k$ is a constant of integration: $k = \frac{a}{\pi} \sin \left(\frac{\pi}{2 a}\right)$

Here is a plot of the pdf, for different values of parameter $a$ :

.

As parameter $a$ becomes large, the density tends towards a Uniform(-1,1) distribution. The following plot also compares to a standard Normal (gray dashed):

— wolfies
แหล่งที่มา

3

Another one (EDIT: I simplified it now. EDIT2: I simplified it even further, though now the picture doesn't really reflect this exact equation):

f (x) = \frac{1}{3 \cdot α} \cdot \log (\frac{\cosh (α \cdot a) + \cosh (α \cdot x)}{\cosh (α \cdot b) + \cosh (α \cdot x)})

$f(x) = \frac{1}{3 \cdot \alpha} \cdot \log{\left( \frac{\cosh{\left(\alpha \cdot a\right)}+ \cosh{\left(\alpha \cdot x\right)}} {\cosh{\left(\alpha \cdot b\right)}+ \cosh{\left(\alpha \cdot x\right)}} \right)}$

Clunky, I know, but here I took advantage of the fact that $\log(\cosh(x))$ approaches a line as $x$ increases.

Basically you have control over how smooth is the transition ( $alpha$ ). If $a = 2$ and $b = 1$ I guarantee it's a valid probability density (sums to 1). If you choose other values then you'll have to renormalize it.

Here is some sample code in R:

f = function(x, a, b, alpha){
  y = log((cosh(2*alpha*pi*a)+cosh(2*alpha*pi*x))/(cosh(2*alpha*pi*b)+cosh(2*alpha*pi*x)))
  y = y/pi/alpha/6
  return(y)
}

f is our distribution. Let's plot it for a sequence of x

plot(0, type = "n", xlim = c(-5,5), ylim = c(0,0.4))
x = seq(-100,100,length.out = 10001L)

for(i in 1:10){
  y = f(x = x, a = 2, b = 1, alpha = seq(0.1,2, length.out = 10L)[i]); print(paste("integral =", round(sum(0.02*y), 3L)))
  lines(x, y, type = "l", col = rainbow(10, alpha = 0.5)[i], lwd = 4)
}
legend("topright", paste("alpha =", round(seq(0.1,2, length.out = 10L), 3L)), col = rainbow(10), lwd = 4)

Console output:

#[1] "integral = 1"
#[1] "integral = 1"
#[1] "integral = 1"
#[1] "integral = 1"
#[1] "integral = 1"
#[1] "integral = 1"
#[1] "integral = 1"
#[1] "integral = NaN" #I suspect underflow, inspecting the plots don't show divergence at all
#[1] "integral = NaN"
#[1] "integral = NaN"

And plot:

You could change a and b, approximately the start and end of the slope respectively, but then further normalization would be needed, and I didn't calculate it (that's why I'm using a = 2 and b = 1 in the plot).

— Firebug
แหล่งที่มา

2

If you are looking for something very simple, with a central plateau and the sides of a triangle distribution, you can for instance combine N triangle distributions, N depending on the desired ratio between the plateau and the descent. Why triangles, because their sampling functions already exist in most languages. You randomly sort from one of them.

In R that would give:

library(triangle)
rplateau = function(n=1){
  replicate(n, switch(sample(1:3, 1), rtriangle(1, 0, 2), rtriangle(1, 1, 3), rtriangle(1, 2, 4)))
}
hist(rplateau(1E5), breaks=200)

— agenis
แหล่งที่มา

2

Here's a pretty one: the product of two logistic functions.

(1/B) * 1/(1+exp(A*(x-B))) * 1/(1+exp(-A*(x+B)))

This has the benefit of not being piecewise.

B adjusts the width and A adjusts the steepness of the drop off. Shown below are B=1:6 with A=2. Note: I haven't taken the time to figure out how to properly normalize this.

— Adjwilley
แหล่งที่มา