**News**- On 27th June, 2010 NVIDIA released CUDA 3.1 and on 21st July, released Parallel Nsight 1.0. The download sites for CUDA Toolkit 3.1 and NVIDIA Parallel Nsight 1.0 are here and here, respectively.

I recently wrote a small piece of code using CUDA to solve a 1D heat transfer problem. The heat transfer happens along the material whose properties are density

*ρ*= 930 kg/m

^{3}, specific heat

*C*= 1340 J/(kg K) and thermal conductivity

_{p}*k*= 0.19 W/(m K). As shown in the figure below, the computation region is illustrated by the thick orange line; the total distance is 1.6 m. There are two boundary conditions attached: at the left end, the temperature is fixed to be 0

^{o}C, while at the right end, a heat flux

*q*= 10 W/m

^{2}is imposed.

If the initial temperature for the entire region is 0

^{o}C, I want to calculate the temperature distribution along the region after 10 seconds' heat propogation.

In the previous figure, the governing heat equation, in partial differential form, has been given. It is then discretised for the internal region, i.e. the orange line, and the right end boundary, which is redly circled, respectively. In the computation, I discretised the entire distance, 1.6 m, into 32768 elements - within CUDA, 64 blocks can be used to handle these elements. On the other hand, for the time marching iteration, the time step can be determined as

in which

*α*is the thermal diffusivity.

The calculation results, with the help of CUDA and based on

**float**operation, are depicted as

The temperature values for x < 1.5926 m are all zero; therefore they are neglected in the picture. In order to verify the results, the same calculation was also implemented onto CPU and even COMSOL software. The implementation on CPU gave

The interesting thing concerns us is the code efficiency. Once again, I used CUDA 3.1 on my GeForce 9800 GTX+ and a single core of the Q6600 CPU, and the time durations elapsed on the GPU and the CPU for the same calculation are 1.43 s and 5.04 s, respectively. The speedup is 3.52. This speedup value is not that attractive, but it is actually supposed to be much higher when there are much more discretised elements.

I also record the temperature development, in a transient process, of the right end boundary, at which there is heat flux injected. The 10 seconds development curve is illustrated as

I didn't find an easy way to paste source code onto the blog. If you are interested, please leave a comment and the related code can be shared in any way.