Although the previous answers have been really good, I would like to add my viewpoint about convolution where I just make it easier to visualize due to the figures.
One wonders if there is any method through which an output signal of a system can be determined for a given input signal. Convolution is the answer to that question, provided that the system is linear and time-invariant (LTI).
Assume that we have an arbitrary signal s[n]. Then, s[n] can be decomposed into a scaled sum of shifted unit impulses through the following reasoning. Multiply s[n] with a unit impulse shifted by m samples as δ[n−m]. Since δ[n−m] is equal to 0 everywhere except at n=m, this would multiply all values of s[n] by 0 when n is not equal to m and by 1 when n is equal to m. So the resulting sequence will have an impulse at n=m with its value equal to s[m]. This process is clearly illustrated in Figure below.
This can be mathematically written as
s[n]δ[n−m]=s[m]δ[n−m]
Repeating the same procedure with a different delay
m′ gives
s[n]δ[n−m′]=s[m′]δ[n−m′]
The value s[m′] is extracted at this instant. Therefore, if this multiplication is repeated over all possible delays −∞<m<∞, and all produced signals are summed together, the result will be the sequence s[n] itself.
s[n]=⋯+s[−2]δ[n+2]+s[−1]δ[n+1]+s[0]δ[n]+s[1]δ[n−1]+s[2]δ[n−2]+⋯=∑m=−∞∞s[m]δ[n−m]
In summary, the above equation states that s[n] can be written as a summation of scaled unit impulses, where each unit impulse δ[n−m] has an amplitude s[m]. An example of such a summation is shown in Figure below.
Consider what happens when it is given as an input to an LTI system with an impulse response h[n].
This leads to an input-output sequence as
During the above procedure, we have worked out the famous convolution equation that describes the output r[n] for an input s[n] to an LTI system with impulse response h[n].
Convolution is a very logical and simple process but many DSP learners can find it confusing due to the way it is explained. We will describe a conventional method and another more intuitive approach.
Conventional Method
Most textbooks after defining the convolution equation suggest its implementation through the following steps. For every individual time shift n,
[Flip] Arranging the equation as r[n]=∑∞m=−∞s[m]h[−m+n], consider the impulse response as a function of variable m, flip h[m] about m=0 to obtain h[−m].
[Shift] To obtain h[−m+n] for time shift n, shift h[−m] by n units to the right for positive n and left for negative n.
[Multiply] Point-wise multiply the sequence s[m] by sequence h[−m+n] to obtain a product sequence s[m]⋅h[−m+n].
[Sum] Sum all the values of the above product sequence to obtain the convolution output at time n.
[Repeat] Repeat the above steps for every possible value of n.
An example of convolution between two signals s[n]=[2−11] and h[n]=[−112] is shown in Figure below, where the result r[n] is shown for each n.
Note a change in signal representation above. The actual signals s[n] and h[n] are a function of time index n but the convolution equation denotes both of these signals with time index m. On the other hand, n is used to represent the time shift of h[−m] before multiplying it with s[m] point-wise. The output r[n] is a function of time index n, which was that shift applied to h[−m].
Next, we turn to the more intuitive method where flipping a signal is not required.
Intuitive Method
There is another method to understand convolution. In fact, it is built on the derivation of convolution equation, i.e., find the output r[n] as
r[n] = ⋯+s[−2]⋅h[n+2] +s[−1]⋅h[n+1] +s[0]⋅h[n] + s[1]⋅h[n−1] + s[2]⋅h[n−2] +⋯
Let us solve the same example as in the above Figure, where
s[n]=[2− 11] and
h[n]=[−112]. This is shown in Table below.
Such a method is illustrated in Figure below. From an implementation point of view, there is no difference between both methods.
To sum up, convolution tells us how an LTI system behaves in response to a particular input and thanks to intuitive method above, we can say that convolution is also multiplication in time domain (and flipping the signal is not necessary), except the fact that this time domain multiplication involves memory. To further understand at a much deeper level where flipping comes from, and what happens in frequency domain, you can download a sample section from my book here.