Last week, I attended the BSC PRACE, "code porting and optimization workshop" given by some of its researchers.
The course was 3 days long, and I can say I quite enjoyed it.
First day, OpenMP . Full day of an openMP training. from quite easy levels to more complicated stuff, like tasks and other features only present in OpenMP 3.
Second and third day were not as productive as the first one, partly because of the subjects was not so much appliable to my daily work, and partly because of the speakers.
Alex Duran made a great OpenMP training course, and Xavier Teruel (my mus partner 5 years ago) helped us (Jdiaz, JMassanas, and $self) with great explanations when doing the exercises.
I don't know if I can share the slides, but I'll ask BSC people, and upload them (If I can).
We didn't do real complex things, but the overall feeling was that it's really cheap to try simple optimizations here and there in a large program (incremental optimizing), but converting a whole program into a highly parallel one is quite thrilling.
We looked at private,shared,threadprivate... clauses to define the scope of variables.
Follows a helloWorld example using OpenMP, with some calls to omp APIs, to return the number of active threads, max threads, or accurate time to do benchmarks.
Then we took a look at 'task' clause (a bit more complex, but good for task parallelism). I won't publish the code in this post, as it needs a thought explanation I'm not willing to do now. If you feel the absolute need to see a omp code using 'task', mail/comment.
After that, we looked at data parallelism, having a look at 'for', and different schedules. Here, we have a parallel matrix multiplication code, optimized using not only the typical 'for' but also using collapse.
Some insights about balancing loads and other common problems ended up the session.