จำนวนเต็ม“ เกือบเรียงลำดับ” ในเวลาเชิงเส้น

ฉันสนใจในการเรียงลำดับอาร์เรย์ของค่าจำนวนเต็มบวก $L = v_1, \ldots, v_n$ ในเวลาเชิงเส้น (ในรูปแบบ RAM ที่มีการวัดต้นทุนสม่ำเสมอคือจำนวนเต็มสามารถมีขนาดได้ถึงลอการิทึม แต่การดำเนินการทางคณิตศาสตร์กับพวกมัน ใช้เวลาหน่วย) แน่นอนว่ามันเป็นไปไม่ได้ด้วยอัลกอริธึมการเรียงลำดับแบบเปรียบเทียบดังนั้นฉันจึงสนใจในการคำนวณการเรียงลำดับ "โดยประมาณ" เช่นการคำนวณการเปลี่ยนแปลงบางอย่าง $v_{\sigma(1)}, \ldots, v_{\sigma(n)}$ ของ $L$ ซึ่งไม่ได้เรียงจริงๆในทั่วไป แต่ "ประมาณการที่ดี" ของรุ่นเรียงL $L$ ฉันจะสมมติว่าเรากำลังเรียงลำดับจำนวนเต็มในลำดับที่ลดลงเพราะมันจะทำให้ภาคต่อมีความสุขมากขึ้น แต่แน่นอนว่าเราสามารถพูดถึงปัญหาได้อีกทางหนึ่ง

เกณฑ์หนึ่งที่เป็นไปได้สำหรับการเรียงลำดับโดยประมาณดังต่อไปนี้ (*): การให้ $N$ เป็น $\sum_i v_i$ สำหรับทุกๆ $1 \leq i \leq n$ เราต้องการให้ $v_{\sigma(i)} \leq N/i$ (เช่น "quasi-เรียง "รายการถูกล้อมรอบจากด้านบนโดยฟังก์ชั่นลดลง $i \mapsto N/i$ ) มันง่ายที่จะเห็นว่าการจัดเรียงจริงตรงตามนี้: $v_{\sigma(2)}$ ต้องไม่มากกว่า $v_{\sigma(1)}$ ดังนั้นมันจึงเป็นอย่างมาก $(v_{\sigma(1)} + v_{\sigma(2)})/2$ ซึ่ง $\leq N/2$ และโดยทั่วไป $v_{\sigma(i)}$ จะต้องไม่มากกว่า $(\sum_{j \leq i} v_{\sigma(i)})/i$ ซึ่งเป็น $\leq N/i$ ฉัน

ตัวอย่างเช่นต้องการ (*) สามารถทำได้โดยอัลกอริทึมด้านล่าง (แนะนำโดย @Louis) คำถามของฉันคือ: มีงานในจำนวนเต็ม "เกือบเรียงลำดับ" นี้ในเวลาเชิงเส้นโดยกำหนดความต้องการเช่น (*) ที่เรียงลำดับจริงจะตอบสนอง? อัลกอริทึมด้านล่างหรือบางส่วนของมันมีชื่อที่จัดตั้งขึ้นหรือไม่?

แก้ไข: แก้ไขอัลกอริทึมและเพิ่มคำอธิบายเพิ่มเติม

ขั้นตอนวิธีการ:

INPUT: V an array of size n containing positive integers
OUTPUT: T

N = Σ_{i<n} V[i]
Create n buckets indexed by 1..n
For i in 1..n
| Add V[i] into the bucket min(floor(N/V[i]),n)
+

For bucket 1 to bucket n
| For each element in the bucket
| | Append element to T
| +
+

อัลกอริทึมนี้ใช้งานได้ตามวัตถุประสงค์ด้วยเหตุผลดังต่อไปนี้:

ถ้าองค์ประกอบ $v$ อยู่ในถัง $j$ แล้ว $v ≤ N/j$ ญ

$v$ จะใส่ลงในถัง $j=\min(N/v,n)$ จึง $j ≤ \lfloor N/v\rfloor ≤ N/v$
ถ้าองค์ประกอบ $v$ อยู่ในถัง $j$ แล้วทั้ง $N/(j+1) < v$ หรือ $j=n$ n

$v$ จะใส่ลงในถัง $j=\min(N/v,n)$ จึง $j = \lfloor N/v \rfloor$ หรือ $j=n$ nในครั้งแรกกรณี $j=\lfloor N/v\rfloor$ ซึ่งหมายความว่า $j ≤ N/v < j+1$ และทำให้ $N/(j+1) < v$ วี
สำหรับ $j<n$ มีที่มากที่สุด $j$ องค์ประกอบในถังตั้งแต่ 1 ถึงเจ $j$

ให้ $j<n$ และให้ $k$ เป็นจำนวนองค์ประกอบทั้งหมดในหนึ่งในที่เก็บข้อมูล 1..j โดย2.เรามีทุกองค์ประกอบ $v$ ในถัง $i$ (กับ $i ≤ j$ ) เป็นเช่นนั้น $N/(j+1)≤N/(i+1)<v$ วีดังนั้นผลรวม $K$ ขององค์ประกอบทั้งหมดในถังจาก $1$ ถึง $j$ มากกว่า $k×N/(J+1)$ . แต่ผลรวม $K$ นี้ก็น้อยกว่า $N$ ด้วยเช่นกัน $k×N/(j+1) < K ≤ N$ และ $k/(j+1) < 1$ ซึ่งให้เรา $k<j+1$ หรือ $k≤j$ เจ
$T$ พอใจ (*) คือองค์ประกอบ $j$ -th ของ $T$ เป็นเช่นนั้น $T[j] ≤ N/j$

โดย3.เรามี $T[j]$ ที่ $j$ องค์ประกอบ -th ของ $T$ มาจากถัง $i$ กับ $i ≥ j$ จึง $T[j] ≤ N/i ≤ N/j$ ญ
อัลกอริทึมนี้ใช้เวลาเชิงเส้น

การคำนวณของ $N$ ใช้เวลาเชิงเส้น ถังสามารถนำมาใช้กับรายการที่เชื่อมโยงซึ่งมีการแทรก $O(1)$ และการทำซ้ำ การวนซ้ำซ้อนกันทำงานหลายครั้งเท่าที่มีองค์ประกอบ (เช่น $n$ ครั้ง)

reference-request time-complexity sorting

— a3nm
แหล่งที่มา

ที่จะไม่ละทิ้งคำถาม (+1 มันเป็นคำถามที่ดี) แต่การเรียงลำดับของ Radix จะทำได้ดีกว่าสิ่งที่คุณต้องการหรือไม่?

— user541686

@ Mehrdad: ขอบคุณสำหรับความคิดเห็นของคุณ! Radix เรียงลำดับจะเรียงลำดับจำนวนเต็ม แต่มันจะใช้เวลา

)

O (n \log (max_{i} v_{i}))

$O(n \log (\max_i v_i))$

— a3nm

Could you comment on what exactly is undesirable about that time complexity? Do you have one very large integer and everything else is small, for example?

— user541686

การจัดเรียง @ a3nm radix ไม่ใช่ O (n log n) มันเป็น O (n) ดังนั้นจึงเป็นเส้นตรงถ้าขนาดของจำนวนเต็มคงที่เช่น 32 บิตหรือ 64 บิต ตัวเลขที่คุณเรียงลำดับมีขนาดผันแปรหรือไม่?

— Xavier Combelle

@ XavierCombelle: ใช่ฉันทำงานในรูปแบบ RAM และฉันไม่สามารถสมมติว่าจำนวนเต็มอินพุตถูก จำกัด ด้วยค่าคงที่

— a3nm

ฟังดูคล้ายกับ ASort algorithm ดูบทความนี้โดย Giesen et al .:

https://www.inf.ethz.ch/personal/smilos/asort3.pdf

$n$ $n^2/\nu(n)$ has a lower bound of $n*log (\nu(n))$ (assuming $\nu(n) < n$ ).

EDIT, in response to the clarifications in the question:

What you're doing is simply a bucket sort. However, the algorithm for bucket sort isn't linear in this case. The problem: you have to sum the natural numbers and then perform division on each one of them. Since the numbers are unbounded in size, $N/V[i]$ is no longer a constant-time operation. It will take longer to perform the more numbers you need to sum.

How much longer? Division depends on the number of digits, so it's $lg(n)$ , times $n$ division operations. That probably sounds familiar. :)

— Trixie Wolf
แหล่งที่มา

Thanks for pointing us to this article! Indeed it is a bit related to the question. However, my algorithm (neither the original version nor the slightly different revised version) is not so similar to ASort;. First, I believe my algorithm runs in

O (n)

$O(n)$ , not in superlinear time like ASort. Second, criterion (*) is pretty different from approximating Spearman's footrule distance; e.g., criterion (*) is more or less tight depending on the values of the integers, unlike the footrule distance. Third, althout both our algorithm and ASort are bucketing elements, the criteria are pretty different.

— a3nm

@a3nm The clarification of what you posted above suggests you're using a bucket sort, which is linear (and not comparison-based, which means testing two items against each other). The problem is that it doesn't work for all mathematical integers. It only works if the integer size is bounded.

— Trixie Wolf

When you say "It only works if the integer size is bounded", I think this is only true if I were actually sorting the integers. But in general the algorithm I posted does not actually sort them, it only enforces the weaker criterion (*). So I do think it runs in linear time even when the integer size is not bounded.

— a3nm

@a3nm It isn't linear. See my expanded response above.

— Trixie Wolf

Thanks for the answer, and sorry about the delay. I think there is some confusion about the model. I am working in the RAM model with uniform time measure (as in van Emde Boas, Machine Models and Simulations, in Handbook of Computation): so the numbers that I manipulate can have logarithmic size, but arithmetic operations on these numbers have unit cost. I have edited my question accordingly. I think that, in this model, the algorithm that I propose really runs in linear time (but of course in this model the

n \log n

$n \log n$ lower bound for actual comparison-based sorting still applies).

— a3nm

As it turns out, my question is quite irrelevant after all. Indeed, I am working on the RAM machine with uniform cost measure (i.e., we have registers whose registers are not necessarily of constant size but can store integers of logarithmic size in the input at most, and operations on these registers take constant time, including at least addition). And in fact, in this model, sorting integers (by essentially performing a radix sort) can be done in linear time. This is explained in the 1996 paper by Grandjean, Sorting, linear time and the satisfiability problem.

(This does not answer my question of whether there are well-studied notions of "almost sorting" a set of integers, but for them to be interesting one would probably need these weaker notions to be easier to enforce, i.e., work on a weaker model or somehow run in sublinear time. However, I'm currently not aware of a sense in which this would be the case.)

— a3nm
แหล่งที่มา