เป็นไปได้หรือไม่ที่จะปรับแต่งรหัสการรวมเพื่อให้ทำงานได้เร็วขึ้น?

9

double trap(double func(double), double b, double a, double N) {
  double j;
  double s;
  double h = (b-a)/(N-1.0); //Width of trapezia

  double func1 = func(a);
  double func2;

  for (s=0,j=a;j<b;j+=h){
    func2 = func(j+h);
    s = s + 0.5*(func1+func2)*h;
    func1 = func2;
  }

  return s;
}

ด้านบนคือรหัส C ++ ของฉันสำหรับการรวมตัวเลข 1D (โดยใช้กฎสี่เหลี่ยมคางหมูแบบขยาย) func()ระหว่างขีด จำกัด $[a,b]$ ใช้สี่เหลี่ยมคางหมู $N-1$

จริงๆแล้วฉันกำลังทำการรวมแบบสามมิติโดยที่รหัสนี้เรียกว่าแบบเรียกซ้ำ ฉันทำงานกับให้ผลลัพธ์ที่ดี $N = 50$

นอกจากการลดเพิ่มเติมใคร ๆ สามารถแนะนำวิธีเพิ่มประสิทธิภาพโค้ดด้านบนเพื่อให้ทำงานได้เร็วขึ้น? หรือแม้กระทั่งสามารถแนะนำวิธีการรวมที่เร็วขึ้น? $N$

c++ performance

— user2970116
แหล่งที่มา

5

นี่ไม่เกี่ยวข้องกับคำถามจริงๆ แต่ฉันขอแนะนำให้เลือกชื่อตัวแปรที่ดีกว่า เช่นtrapezoidal_integrationแทนที่จะtrap, sumหรือrunning_totalแทนs(และยังใช้+=แทนs = s +) trapezoid_widthหรือdxแทนh(หรือไม่ขึ้นอยู่กับโน้ตที่คุณต้องการสำหรับกฎสี่เหลี่ยมคางหมู) และการเปลี่ยนแปลงfunc1และfunc2สะท้อนให้เห็นถึงความจริงที่ว่าพวกเขาจะมีค่าไม่ได้ฟังก์ชั่น เช่นfunc1-> previous_valueและfunc2-> current_valueหรืออะไรทำนองนั้น

— David Z

5

ในทางคณิตศาสตร์การแสดงออกของคุณเทียบเท่ากับ:

I = h (\frac{1}{2} f_{1} + f_{2} + f_{3} + . . . + f_{n - 1} + \frac{1}{2} f_{n}) + O (\frac{(b - a)^{3} f^{″}}{n^{2}})

$I = h \left(\frac{1}{2}f_1 + f_2 + f_3 +...+f_{n-1} + \frac{1}{2}f_n \right) + O\left(\frac{(b-a)^3 f''}{n^2} \right)$

ดังนั้นคุณสามารถใช้มันได้ ตามที่ได้กล่าวไว้เวลาอาจถูกครอบงำด้วยการประเมินฟังก์ชั่นดังนั้นเพื่อให้ได้ความแม่นยำเท่ากันคุณสามารถใช้วิธีการรวมที่ดีกว่าซึ่งต้องใช้การประเมินฟังก์ชันน้อยลง

การสร้างพื้นที่สี่เหลี่ยมจัตุรัสเกาส์เซียนในสมัยนี้มีมากกว่าของเล่น มีประโยชน์เฉพาะในกรณีที่คุณต้องการการประเมินน้อยมาก หากคุณต้องการบางสิ่งที่ใช้งานง่ายคุณสามารถใช้กฎของ Simpson แต่ฉันจะไม่ไปไกลกว่าคำสั่ง $1/N^3$ ไม่มีเหตุผลที่ดี

หากความโค้งของฟังก์ชั่นเปลี่ยนไปมากคุณสามารถใช้รูทีนแบบปรับเปลี่ยนได้ซึ่งจะเลือกขั้นตอนที่ใหญ่กว่าเมื่อฟังก์ชั่นนั้นแบนและมีความแม่นยำที่เล็กกว่าเมื่อความโค้งสูงขึ้น

— Davidmh
แหล่งที่มา

หลังจากออกไปและกลับมาที่ปัญหาฉันตัดสินใจใช้กฎของ Simpson แต่ฉันสามารถตรวจสอบได้ว่าอันที่จริงข้อผิดพลาดในกฎของซิมป์สันคอมโพสิตนั้นเป็นสัดส่วนกับ 1 / (N ^ 4) (ไม่ใช่ 1 / (N ^ 3) ตามที่คุณบอกเป็นนัยในคำตอบของคุณ)?

— user2970116

1

คุณมีสูตรสำหรับ

1 / N^{3}

$1/N^3$ เช่นกัน

1 / N^{4}

$1/N^4$ . อันแรกใช้สัมประสิทธิ์

5 / 12, 13 / 12, 1, 1...1, 1, 13 / 12, 15 / 12

$5/12, 13/12, 1, 1...1, 1, 13/12, 15/12$ และที่สอง

1 / 3, 4 / 3, 2 / 3, 4 / 3...

$1/3, 4/3, 2/3, 4/3...$ .

— Davidmh

9

โอกาสที่การประเมินฟังก์ชั่นเป็นส่วนที่ใช้เวลานานที่สุดในการคำนวณนี้ หากเป็นเช่นนั้นคุณควรมุ่งเน้นที่การปรับปรุงความเร็วของ func () แทนที่จะพยายามเร่งความเร็วการรวมตัวเอง

ขึ้นอยู่กับคุณสมบัติของ func () ก็เป็นไปได้ว่าคุณจะได้รับการประเมินที่แม่นยำยิ่งขึ้นของอินทิกรัลด้วยการประเมินฟังก์ชั่นที่น้อยลงโดยใช้สูตรการรวมที่ซับซ้อนมากขึ้น

— Brian Borchers
แหล่งที่มา

1

Indeed. If your function is smooth, you can typically get away with fewer than your 50 function evaluations if you used, say, a Gauss-4 quadrature rule on only 5 intervals.

— Wolfgang Bangerth

7

Possible? Yes. Useful? No. The optimizations I'm going to list here are unlikely to make more than a tiny fraction of a percent difference in the runtime. A good compiler may already do these for you.

Anyway, looking at your inner loop:

    for (s=0,j=a;j<b;j+=h){
        func2 = func(j+h);
        s = s + 0.5*(func1+func2)*h;
        func1 = func2;
    }

At every loop iteration you perform three math operations that can be brought outside: adding j + h, multiplication by 0.5, and multiplication by h. The first you can fix by starting your iterator variable at a + h, and the others by factoring out the multiplications:

    for (s=0, j=a+h; j<=b; j+=h){
        func2 = func(j);
        s += func1+func2;
        func1 = func2;
    }
    s *= 0.5 * h;

Though I would point out that by doing this, due to floating point roundoff error it is possible to miss the last iteration of the loop. (This was also an issue in your original implementation.) To get around that, use an unsigned int or size_t counter:

    size_t n;
    for (s=0, n=0, j=a+h; n<N; n++, j+=h){
        func2 = func(j);
        s += func1+func2;
        func1 = func2;
    }
    s *= 0.5 * h;

As Brian's answer says, your time is better spent optimizing the evaluation of the function func. If the accuracy of this method is sufficient, I doubt you'll find anything faster for the same N. (Though you could run some tests to see if e.g. Runge-Kutta lets you lower N enough that the overall integration takes less time without sacrificing accuracy.)

— David Z
แหล่งที่มา

4

There are several changes I would recommend to improve the computation:

For performance and accuracy, use std::fma(), which performs a fused multiply-add.
For performance, defer multiplying the area of each trapezoid by 0.5 — you can do it once at the end.
Avoid repeated addition of h, which could accumulate round-off errors.

In addition, I would make several changes for clarity:

Give the function a more descriptive name.
Swap the order of a and b in the function signature.
Rename N → n, h → dx, j → x2, s → accumulator.
Change n to an int.
Declare variables in a tighter scope.

#include <cmath>

double trapezoidal_integration(double func(double), double a, double b, int n) {
    double dx = (b - a) / (n - 1);   // Width of trapezoids

    double func_x1 = func(a);
    double accumulator = 0;

    for (int i = 1; i <= n; i++) {
        double x2 = a + i * dx;      // Avoid repeated floating-point addition
        double func_x2 = func(x2);
        accumulator = std::fma(func_x1 + func_x2, dx, accumulator); // Fused multiply-add
        func_x1 = func_x2;
    }

    return 0.5 * accumulator;
}

— 200_success
แหล่งที่มา

3

If your function is a polynomial, possibly weighted by some function (e.g. a gaussian), you can do an exact integration in 3d directly with a cubature formula (e.g. http://people.sc.fsu.edu/~jburkardt/c_src/stroud/stroud.html ) or with a sparse grid (e.g. http://tasmanian.ornl.gov/ ). These methods simply specify a set of points and weights to multiply the function value by, so they are very fast. If your function is smooth enough to be approximated by polynomials, then these methods can still give a very good answer. The formulas are specialized to the type of function you're integrating, so it may take some digging to find the right one.

— Ronaldo Carpio
แหล่งที่มา

3

When you try to calculate an integral numerically, you try to get the precision that you want with the smallest possible effort, or alternatively, try to get the highest possible precision with a fixed effort. You seem to ask how to make the code for one particular algorithm run as fast as possible.

That may give you some little gain, but it will be little. There are much more efficient methods for numerical integration. Google for "Simpson's rule", "Runge-Kutta", and "Fehlberg". They all work quite similar by evaluating some values of the function and cleverly adding multiples of those value, producing much smaller errors with the same number of function evaluations, or the same error with a much smaller number of evaluations.

— gnasher729
แหล่งที่มา

3

There are lots of ways to do integration, of which the trapezoidal rule is about the simplest.

If you know anything at all about the actual function you're integrating, you can do better if you exploit that. The idea is to minimize the number of grid points within acceptable levels of error.

For example, trapezoidal is making a linear fit to consecutive points. You could make a quadratic fit, which if the curve is smooth would fit better, which could allow you to use a coarser grid.

Orbital simulations are sometimes done using conics, because orbits are very much like conic sections.

In my work, we are integrating shapes that approximate bell-shaped curves, so it is effective to model them as that (adaptive Gaussian quadrature is considered the "gold standard" in this work).

— Mike Dunlavey
แหล่งที่มา

1

So, as has been pointed out in other answers, this depends heavily on how expensive your function is. Optimizing your trapz code is only worth it if it is really your bottleneck. If it's not completely obvious, you should check this by profiling your code (tools like Intels V-tune, Valgrind or Visual Studio can do this).

I would however suggest a completely different approach: Monte Carlo integration . Here you simply approximate the integral by sampling your function at random points adding the results. See this pdf in addition to the wiki page for details.

This is works extremely well for high dimensional data, typically much better than the quadrature methods used in 1-d integration.

The simple case is very easy to implement (see the pdf), just be careful that the standard random function in c++98 is quite bad both performance and quality wise. In c++11, you can use the Mersenne Twister in .

If your function has a lot of variation in some areas and less in others, consider using stratified sampling. I would recommend using the GNU scientific library, rather than writing your own though.

— LKlevin
แหล่งที่มา

1

I am actually doing a 3D integration, where this code is called recursively.

"recursively" is the key. You are either going through a large data set and considering a lot of data more than once, or you are actually generating your data set yourself from (piecewise?) functions.

Recursively evaluated integrations will be ridiculously expensive, and ridiculously imprecise as the powers increase in recursion.

Create a model for interpolating your data set and do a piecewise symbolic integration. Since a lot of data is then collapsing into coefficients of base functions, the complexity for deeper recursion grows polynomially (and usually rather low powers) rather than exponentially. And you get "exact" results (you still need to figure out good evaluation schemes to get reasonable numeric performance, but it should still be rather feasible to get better than trapezoidal integration).

If you take a look at the error estimates for trapezoidal rules, you'll find that they are related to some derivative of the involved functions, and if the integration/definition is done recursively, the functions will not tend to have well-behaved derivatives.

If your only tool is a hammer, every problem looks like a nail. While you barely touch upon the problem in your description, I have the suspicion that applying the trapezoidal rule recursively is a bad match: you get an explosion of both inaccuracy and computational requirements.

— David
แหล่งที่มา

1

the original code evaluates the function at each N points, then adds the values up, and multiplies the sum by the step size. the only trick is that the values at the beginning and the end are added with weight $1/2$ , while all points inside are added with full weight. actually, they are also added with weight $1/2$ but twice. instead of adding them twice, add them only once with full weight. factor out the multiplication by the step size outside of the loop. that's all that can be done to speed this up, really.

    double trap(double func(double), double b, double a, double N){
double j, s;
double h = (b-a)/(N-1.0); //Width of trapezia

double s = 0;
j = a;
for(i=1; i<N-1; i++){
  j += h;
  s += func(j);
}
s += (func(a)+func(b))/2;

return s*h;
}

— Aksakal almost surely binary
แหล่งที่มา

1

Please give reasoning for your changes and code. A block of code is fairly useless for most people.

— Godric Seer

Agreed; please explain your answer.

— Geoff Oxberry