Entries Tagged 'Programming' ↓

Programming for multicore: An introduction to OpenMP using GCC-4.4

About a couple of months back, I happened to attend a short seminar on multi-core programming by Intel here at Hyderabad. What I liked immensely about it was that it was not yet another blatant advertising campaign on some hardware or software product by some industry giant.

It was about the paradigm shift that the chip industry is undergoing – the trend towards more cores, rather than higher gigahertz horsepower. However, if you are an average Joe developer like me, you probably program your applications without leveraging the power of two or more cores simultaneously. By default, we don’t ‘think parallel’ for various reasons. For one, threading is not an easy concept. The seminar looked at some of Intel’s software offerings that help developers (especially Visual Studio developers) to create, debug and optimize threaded/multicore applications. (However, this post will not focus on those tools – you may want to visit www.intel.com/go/parallel for more information).

On a related thread (pun intended!), gcc-4.4.0 was released recently. This added support for version 3.0 of the OpenMP specification. OpenMP is something I had heard of before, but never actually tried. It is an API for C, C++ and Fortran programmers that enables you to ‘parallel program’ easily. Jargonspeak calls it ‘platform independent shared memory multiprocessing’. In effect, it’s threads without the associated headache of thread management. By the way, gcc has supported OpenMP way back from version 4.2. So, you don’t need the latest bleeding edge version for this. However, should you want to, on Windows you can always download the excellent TDM MingW builds for gcc-4.4.0 (latest direct link). If you’re a Linux geek, you probably know how to get gcc-4.4 for your distro anyway. Also, Microsoft Visual C++ Express does not include/support OpenMP – hence my experiments are limited to gcc on both Win and Lin.

All right then, let’s see how OpenMP aids a classic case of parallelization: matrix multiplication. Agreed – this is a rather simple programming problem, and real world problems are usually harder to parallelize than this. However, this should serve as a good starting point to explore further.

So, here’s the basic matrix multiplication loop that we want to parallelize, assuming arr1 and arr2 are inputs, and arr3 is the output array:


for(i=0; i<n; ++i) {
  for(j=0; j<n; ++j) {
    temp = 0;
    for(k=0; k<n; ++k) {
      temp += arr1[i][k] * arr2[k][j];
    }
    arr3[i][j] = temp;
  }
}

OpenMP is mostly a set of compiler directives (pragmas) and library routines. In this case, it's enough for us to add on single statement before our loop.


#pragma omp parallel for private(i, j, k, temp)
for(i=0; i<n; ++i) {
  for(j=0; j<n; ++j) {
    temp = 0;
    for(k=0; k<n; ++k) {
      temp += arr1[i][k] * arr2[k][j];
    }
    arr3[i][j] = temp;
  }
}

That’s it! This pragma tells the OpenMP subsystem to do it’s little magic behind the scenes and parallelize the ‘for loop’ following it.

Here’s the complete program, which contains additional code to initialize the arrays arr1 and arr2 pseudo-randomly, and to calculate the timings taken by the normal and the parallelized versions. You can compile the program with gcc-4.4 by the simple command:

gcc -fopenmp matmul.c

And on Windows, you might need to edit the PATH variable to include the GNU libgomp runtime (libgomp-1.dll). (Libgomp is GNU’s implementation of OpenMP). Here’s how I did it, for example:

set PATH=D:\MinGW\lib\gcc\mingw32\bin;%PATH%

So, the end result? Here are 4 sets of execution outputs. Two from Windows (TDM gcc-4.4.0):

Enter dimension ('N' for 'NxN' matrix) (100-2000): 1000
Populating array with random values...
Completed array init.
Crunching without OMP... took 23.032000 seconds.
Crunching with OMP... took 13.000000 seconds.

Enter dimension ('N' for 'NxN' matrix) (100-2000): 2000
Populating array with random values...
Completed array init.
Crunching without OMP... took 216.140000 seconds.
Crunching with OMP... took 118.641000 seconds.

And two from Linux (Ubuntu 9.04, gcc-4.3.3):

Enter dimension ('N' for 'NxN' matrix) (100-2000): 1000
Populating array with random values...
Completed array init.
Crunching without OMP... took 21.623144 seconds.
Crunching with OMP... took 13.686926 seconds.

Enter dimension ('N' for 'NxN' matrix) (100-2000): 2000
Populating array with random values...
Completed array init.
Crunching without OMP... took 189.184673 seconds.
Crunching with OMP... took 104.220751 seconds.

That’s almost doubling the speed, while adding one statement to your program! Actually two statements, if you include the include directive for <omp.h>. I’m sure you’d agree that for this case, OpenMP provides a really easy way of utilizing the idle core of most desktop machines out there. The good part is, even on a single core machine, the code works the way it should (the pragmas essentially NOP out, since there’d be no benefit in parallelizing on one core).

A look at the CPU utilization proves to be interesting too. (By the way, my home system runs an AMD Althon64 X2 4600 dual core, at a clock speed of 2.4GHz). In the first case, here’s a snap of the system information (using the excellent Process Explorer). Notice how the CPU usage remains peaked at around 50%, and the second CPU is mostly idle. Please click on the images below for the full view.

1

And here’s the usage when the OpenMP crunching is in action:

2

That’s more like it. Both horses in action, CPU peaked at 100%. Similar stuff can be seen on Linux, using Ubuntu’s (rather, GNOME’s) inbuilt System Monitor:

ubuntu sm

The portion where the red and orange worms collide at the top is the duration of the OpenMP version of the matrix multiplication program.

As already stated, matrix multiplication is an ideal case – and such 2x speedup on dual core machines are possible with only such ideal problems. However, there often are, if you look closely enough, parts of your program that can be parallelized. Further, we have not even scratched the surface of what’s possible using OpenMP 3.0. It goes way beyond parallelizing simple for loops. (Here’s the link to the spec in PDF).

And for sure, OpenMP is not the only way to go parallel portably. If you work in C++, you would have heard of the Boost C++ libraries. Give boost::threads a go!

With Intel gearing up for the release of its eight core Nehalem EX processors, and with AMD’s six core Istanbul processor already finding its way into mainstream desktop boards, there remains only one thing to say: if there is a time to think in parallel, this is it!

~Raj

Using Qt 4.4 opensource with Microsoft Visual C++ Express 2008

Qt from Trolltech is widely acknowledged as one of the best cross-platform GUI toolkits available. However, installing the Qt open source edition on Windows is not as effortless as “sudo apt-get install qt” on Ubuntu or other Linux flavors. It’s not that hard either, and this post shows you how to develop using the freely available Microsoft Visual C++ 2008 Express as our IDE.

1. I’m assuming you have MSVC 2008 Express already installed. If not, download the offline install ISO from here, mount it (using Daemon Tools for example), and launch the installer from the virtual drive.  Next, download the Windows open source version of Qt from here.

2. Now, you can either extract the Qt source package to a folder where you want it to be installed, or you might want to extract it to a temporary location, and install only the final files to your install directory. Doing the latter of course makes more sense. Except that it is NOT recommended for Windows. I have faced quite a few problems (which I will detail further down the line). Bottom line is – if you have no problems sparing about 1G for Qt, then choose the former approach.

Open up the “Visual Studio 2008 Command Prompt” (available in the “Tools” sub-menu in your Visual C++ start menu entry). For the former approach, issue the following command:

configure

If you want a separate install directory (let’s say in D:\Qt-4.4.3), use the ‘prefix’ flag in this manner:

configure -prefix "D:\Qt-4.4.3"

3. Depending on your system, this takes a quite a while. Oh, and if you face an error like this, fear not:

copy qmake.exe P:\qt-win-opensource-src-4.4.3\bin\qmake.exe
        1 file(s) copied.
Creating makefiles in src...
Generating Visual Studio project files...
Could not find mkspecs for your QMAKESPEC(win32-msvc2008)
after trying:
        D:\Qt-4.4.3\mkspecs
Error processing project file:
P:/qt-win-opensource-src-4.4.3/projects.pro
Qmake failed, return code 3

This is the first of a few problems that crop up when you use a custom install location (i.e. the latter approach). Just copy the “mkspecs” folder from your source directory tree over to your install directory and re-run the configure program.

4. Once ‘configure’ completes, run ‘nmake’. This takes a really long time. If you chose to have a separate install location, run ‘nmake install’ once this completes.

5. Another problem of a separate install directory is that the Makefile forgets to copy the MANIFEST files. So, if at this stage you try to start “designer.exe” from your install/bin folder, you may get an error saying that the application failed to start because MSVCP90.dll was not found.

To fix this, copy over all the “.manifest” files from your source “bin” and “lib” directories over to the install folder’s “bin” and “lib” directories. At this point, you should be able to run Qt-Designer, Qt-Assistant etc from your bin directory.

6. Let us set up a couple of environment variables that make life easier for us. To edit environment variables, you need to right click “My Computer > Properties > Advanced > Environment Variables”. Add a new variable QTDIR pointing to your Qt install directory, and edit your PATH to include Qt’s “bin” directory as follows:

Setting QTDIR

Adding to PATH

 

7. Now let’s try to get Qt’s “Hello World” tutorial program running from the command line. Fire up the Visual Studio Command Prompt, and create a file “Hello.cpp” containing the following code in a new directory called “hello”:


#include "QApplication"
#include "QPushButton"

int main(int argc, char *argv[])
{
    QApplication app(argc, argv);
    QPushButton hello("Hello world");
    hello.resize(100, 30);

    hello.show();
    return app.exec();
}

Now, type the following commands in this new folder:

qmake -project
qmake hello.pro
nmake

This should create an executable “hello.exe”, which you should be able to execute to see your first GUI program using Qt-4.4 and MSVC 2008.

7. I would suggest working from the command prompt, but should you wish to use the Visual Studio Express IDE, here’s what you should do.

Fire it up, and go to “Tools > Options > Projects and Solutions > VC++ Directories”. Add “$(QTDIR)\include” to the “Include files”, and “$(QTDIR)\lib” to the “Library files” drop-down lists respectively.

8. Create a new project (“File > New > Project > General > Makefile Project”) named “HelloQt”.

Go to “Project > Properties > Configuration Properties > Nmake” and enter the following in the build command line “qmake -project && qmake && nmake release-all”. Also enter “release\HelloQt.exe” in the “Output” field. (You may enter corresponding debug versions here as well).

Right click “Source Files” in the “Solution Explorer” and create a new file “HelloQt.cpp”. Copy paste the above program into it.

Run your program using “Ctrl+F5″. You should see this:


Sample Qt 4.4 program running inside Microsoft Visual C++ Express 2008

So there you have it. A crash HOWTO on developing Qt-4.4 programs using Visual Studio 2008 express. Feel free do comment on any problems you may have faced.

~Raj