30 July 2014

Support for DOLFIN 1.4.0

Lately I have not been very active on the blog since I am encountering some difficulties in the attempt to introduce MPI parallelism. Meanwhile, I have extended support to the latest version of the DOLFIN library.

Among other changes, one that strongly affects fem-fenics is the shift from the shared pointer implementation by the Boost libraries to the one included in the Standard Template Library with the new C++11 standard. This change alone calls for edits in almost all the codebase of the package, as basically all DOLFIN data structures are stored via smart pointers in the corresponding fem-fenics classes. However, currently version 1.3.0 is still present in the official repositories of the main Linux distributions, thus switching abruptly to the latest version would have prevented further releases of the package for a while.

In order to tackle the above mentioned issue, I resorted to the preprocessor capabilities, so as to infer from the DOLFIN version available on the compiling computer the right kind of pointers to use. Among other options, the preprocessor flags obtained using pkg-config define also a macro reporting the DOLFIN version. It is, then, possible to check it and choose the correct pointer implementation right before compilation. Currently in fem-fenics every occurrence of boost::shared_ptr has been replaced by a SHARED_PTR macro, which in turn is defined in a new header that takes care of setting it to the right value. There is just a catch: preprocessor conditionals cannot compare strings, but the DOLFIN_VERSION macro is indeed defined as a string. In order for this approach to work, the package Makefile, for the initial compilation, and the get_vars.m function, for the just-in-time ones, perform the actual check and define an auxiliary macro if the latest version is found on the system.

12 July 2014

Factories for assembly

fem-fenics provides a couple of utilities for assembling matrices and vectors, which compose the algebraic system weak formulations are reduced to when applying FEMs. Left aside all the checks needed to verify inputs, their job boils down to creating proper DOLFIN objects to store these matrices or vectors, possibly applying boundary conditions, then building an Octave-friendly representation. This last task is quite critical for the implementation of the MPI parallelisation, as the underlying DOLFIN representation of matrices and vectors is transparently distributed among processes, thus making the serial code used to pass them on to Octave useless. Lately I implemented some new classes to manage this aspect, so I will highlight my design considerations.

The translation from DOLFIN's to Octave's data structures is logically a well defined task, whilst its implementation needs to vary according to its serial or parallel execution. Furthermore, it strictly depends on the linear algebra back-end used, for each of them stores a different representation and exposes a different interface to access it. To address these difficulties, I wrote a hierarchy of factories to provide the adequate methods, based on the run-time necessities. Moreover, this way the code is easily expandable to allow for more back-ends to be used in fem-fenics (currently only uBLAS is available). There is an abstract class, to declare the interface of its derived ones, and a concrete factory implementing the uBLAS-specific methods.

Since in a future fem-fenics there will be several algebraic back-ends available for use, the hierarchy will expand. This means that the checks of the run-time configuration will eventually become more complex. Another issue comes from the need to use different types depending on information available only at run-time. Both to encapsulate those checks, avoiding code duplication, and to solve the problem of choosing the right class, I added to the hierarchy a class implementing the Pimpl idiom. With this design, the "user code" in the C++ implementation of assemble.oct and assemble_system.oct needs just to create a femfenics_factory object and use it to extract the data structures of interest, while every other hassle is dealt with behind the scenes by this class.

UML diagram of the new hierarchy

In the diagram above you can see the already implemented classes and an example class to point out where others will collocate amongst them. femfenics_factory has private methods to check which is the right concrete class to use each time, and implements the public methods of the abstract class dispatching the call through a reference. uBLAS_factory, as other concrete classes are expected to do, holds the real code for creating Octave matrices and vectors and exposes a static method, instance, which allows for access to the singleton object of this type. femfenics_factory, in turn, obtains with it the reference needed for dispatching.

08 July 2014

MPI and the problems to face

These days I have started my investigations on the actual implementation of the MPI parallelisation in fem-fenics. I found out some information that I will point out here, together with my goals for the next weeks.



First of all, apparently MPI can be used without user intervention on the serial code. This is a feature that DOLFIN boasts, but I would expect it not to pass on to fem-fenics, at least not without some effort on the implementation side. Furthermore, DOLFIN offers also a wrapper for MPI functionalities, thus probably it can be helpful in managing data transfers among threads in C++ code.

An issue that will need to be addressed is making ufl.m robust to parallel execution, since its serial implementation leads to all workers trying to open the same file, thus leading to an error that stops computation. Anyway, even if they could all open the file and write to it, this would entail that lines are copied in random order or more than once, so it must be fixed.

In the end, it seems that the partitioning procedure produces matrices that are not slices of the one assembled in serial execution. Due to this fact, I must go deep in the algorithm to find out the proper way to merge the pieces and obtain the complete matrix, which will be stored as octave_value to allow for further computation using Octave's features.