12 August 2014

SubDomain

As said in my previous post, a missing feature in fem-fenics was the marking of subdomains. Indeed, I proposed an example that needed a file generated with a run of the corresponding Python code, which is not, honestly, the best approach. In order to address this issue, these days I have implemented a new class, subdomain, which can be used to mark mesh entities. In the following I will describe how to use this new functionality. Here is the code:

pkg load fem-fenics msh

ufl start Subdomains
ufl fe = FiniteElement "(""CG"", triangle, 2)"
ufl u = TrialFunction (fe)
ufl v = TestFunction (fe)
ufl
ufl a0 = Coefficient (fe)
ufl a1 = Coefficient (fe)
ufl g_L = Coefficient (fe)
ufl g_R = Coefficient (fe)
ufl f = Coefficient (fe)
ufl
ufl a = "inner(a0*grad(u), grad(v))*dx(0) + inner(a1*grad(u), grad(v))*dx(1)"
ufl L = g_L*v*ds(1) + g_R*v*ds(3) + f*v*dx(0) + f*v*dx(1)
ufl end

# Create mesh and define function space
x = y = linspace (0, 1, 65);
[msh, facets] = Mesh (msh2m_structured_mesh (x, y, 0, 4:-1:1));

V = FunctionSpace ("Subdomains", msh);

# Define boundary conditions
bc1 = DirichletBC (V, @(x, y) 5.0, facets, 2);
bc2 = DirichletBC (V, @(x, y) 0.0, facets, 4);

# Define problem coefficients
a0 = Constant ("a0", 1.0);
a1 = Constant ("a1", 0.01);
g_L = Expression ("g_L", @(x, y) - 10*exp(- (y - 0.5) ^ 2));
g_R = Constant ("g_R", 1.0);
f = Constant ("f", 1.0);

# Define subdomains - Here are the edits #
domains = MeshFunction ("dx", msh, 2, 0);
obstacle = SubDomain (@(x,y) (y >= 0.5) && (y <= 0.7) && ...
                      (x >= 0.2) && (x <= 1.0), false);
domains = mark (obstacle, domains, 1);

# Define variational form
a = BilinearForm ("Subdomains", V, V, a0, a1, domains);
L = LinearForm ("Subdomains", V, g_L, g_R, f, facets, domains);

# Assemble system
[A, b] = assemble_system (a, L, bc1, bc2);
sol = A \ b;
u = Function ("u", V, sol);

# Save solution in VTK format
save (u, "subdomains");

# Plot solution
[X, Y] = meshgrid (x, y);
U = u (X, Y);
surf (X, Y, U);

As you can see, it is basically the same as in the previous post, except the line used to import the meshfunction. I wrote in the corresponding comment where the edits are to be found. Now the workflow comprises these steps: first of all, a meshfunction needs to be created, then a subdomain, in the end we should mark cells. 

The call to MeshFunction is something new, since it is now possible to instantiate a meshfunction given a mesh, the required topological dimension and the value to initialise it with. Moreover, the optional label "dx" means that it can be used in calls to BilinearForm or LinearForm to supply markers for subsets of the integration domains. In the example, this instruction returns a meshfunction of dimension 2, which means it holds values associated with each triangle in the mesh, initialised to be 0 in every entry.

The subsequent instruction, instead, defines a subdomain, passing as arguments a function handle and a logical flag. The former will be the override of the dolfin::SubDomain::inside method, so it must return true for entities contained in the subset and false otherwise. In facts it checks whether the coordinates are inside the 2-interval defining the obstacle. The latter, instead, can be used to ask for a boundary subdomain, when set to true.

At last, mark is called to set the entries corresponding to cells inside the subdomain to 1, so that the returned meshfunction now represents the obstacle: after these lines, the variable named domains assumes value 1 on cells inside the obstacle region and 0 outside. Thus, it is now possible to solve a problem whose formulation entails subdomains entirely using fem-fenics.

09 August 2014

New features of meshfunction

As you may recall from my last post, for DirichletBC to work in parallel runs I had to implement a new class, meshfunction. However it was still quite unfinished, with no way for the user to create one, except extracting it from a mesh produced by the msh package, no description to display, no way to save it. These days I have been tackling this issue: while at it I wondered what one could do with meshfunction and found out that they can come in handy when you are dealing with obstacles.

At this link you can find a detailed explanation of the problem. It is a Poisson equation with variable diffusion coefficient on the unit square. Precisely, on [0.2, 1]x[0.5, 0.7] its value is 0.01, otherwise it is 1. The mentioned subset is the obstacle to diffusion, so we study its effect applying u = 0 on the y = 0 edge and u = 5 on y = 1. Here is the fem-fenics code:

pkg load fem-fenics msh

ufl start Subdomains
ufl fe = FiniteElement "(""CG"", triangle, 2)"
ufl u = TrialFunction (fe)
ufl v = TestFunction (fe)
ufl
ufl a0 = Coefficient (fe)
ufl a1 = Coefficient (fe)
ufl g_L = Coefficient (fe)
ufl g_R = Coefficient (fe)
ufl f = Coefficient (fe)
ufl
ufl a = "inner(a0*grad (u), grad (v))*dx(0) + inner(a1*grad (u), grad (v))*dx(1)"
ufl L = g_L*v*ds(1) + g_R*v*ds(3) + f*v*dx(0) + f*v*dx(1)
ufl end

# Create mesh and define function space
x = y = linspace (0, 1, 65);
[msh, facets] = Mesh (msh2m_structured_mesh (x, y, 0, 4:-1:1));

V = FunctionSpace ("Subdomains", msh);

# Define boundary conditions
bc1 = DirichletBC (V, @(x, y) 5.0, facets, 2);
bc2 = DirichletBC (V, @(x, y) 0.0, facets, 4);

# Define problem coefficients
a0 = Constant ("a0", 1.0);
a1 = Constant ("a1", 0.01);
g_L = Expression ("g_L", @(x, y) - 10*exp(- (y - 0.5) ^ 2));
g_R = Constant ("g_R", 1.0);
f = Constant ("f", 1.0);

# Define subdomains
domains = MeshFunction ("dx", msh, "cells.xdmf");

# Define variational form
a = BilinearForm ("Subdomains", V, V, a0, a1, domains);
L = LinearForm ("Subdomains", V, g_L, g_R, f, facets, domains);

# Assemble system
[A, b] = assemble_system (a, L, bc1, bc2);
sol = A \ b;
u = Function ("u", V, sol);

# Save solution in VTK format
save (u, "subdomains");

# Plot solution
[X, Y] = meshgrid (x, y);
U = u (X, Y);
surf (X, Y, U);


In the beginning there is the now familiar ufl block. As you might have noticed, subscripted measures appear in the definition of the bilinear form a and of the linear functional L. This is UFL notation for the integration on specific subsets of the computational domain. For instance, dx(1) is an integral over the subdomain marked with label 1, while ds(3) is an integral over the exterior edges marked with label 3. A third possibility, even if not used in this example, is to use dS for integrals on interior facets, which could be of use for interior penalty methods. Going back to the example, you can see that markers are used to enforce non-homogeneous Neumann conditions on the side edges and to assign the proper coefficient on the two subdomains.

After defining the problem in UFL language, there are instructions to define the mesh, the function space, the essential boundary conditions and all the coefficients involved. All such lines come from fem-fenics before this summer or have been described in my previous posts, so I will not cover them in detail. The same applies for the assembly, solve and all the output in the end of the script. The only note is that the very last lines will error out in parallel runs: point-wise evaluations in DOLFIN can be performed only on local cells, but with meshgrid we are providing to every process the whole domain.
The computed solution
In between there are my latest efforts. At first, the brand new MeshFunction. With this, providing a mesh and a file name you can import a dolfin::MeshFunction. In this case it was saved in the XDMF format, here you can find the files needed to execute the script. DOLFIN uses this format for parallel input/output. It comprises a .h5 file storing data and a .xdmf with metadata useful to read the other one. The optional first argument is a string identifying the role of the returned meshfunction in the variational problem. In this case, with "dx" it will be searched for markers of the integrals on cells. All the previously mentioned measures are available, and "ds" is automatically attached to the meshfunction returned by Mesh. In the example this behaviour is exploited for the measure on edges.

Afterwards, the mesh functions are passed as arguments to BilinearForm and LinearForm, so that the markers are available to assemble the system. In addition to the usual parameters, such as the name of the imported UFL problem, the function spaces and the coefficients, it is now possible to provide mesh functions properly labeled and they will be used.

Currently fem-fenics allows for easily marking subdomains and exterior edges copying markers from the PDEtool representation returned by the functions of the msh package, which makes it quite tricky to properly identify the obstacle in the example. The approach used in the python interface to DOLFIN entails subclassing dolfin::Subdomain with the proper implementation of the inside method, then use an object of the derived class to mark a dolfin::MeshFunction. This could be an interesting feature to implement in the future also in fem-fenics.

04 August 2014

MPI parallelism

After quite a struggle, I have been able to obtain a working implementation of fem-fenics supporting MPI parallelism. Let's go through an example and highlight what has changed lately.

pkg load fem-fenics msh

ufl start Poisson
ufl element = FiniteElement '("Lagrange", triangle, 1)'
ufl u = TrialFunction (element)
ufl v = TestFunction (element)
ufl f = Coefficient (element)
ufl g = Coefficient (element)
ufl a = "inner (grad (u), grad (v))*dx"
ufl L = f*v*dx + g*v*ds
ufl end

# Create mesh and define function space
x = y = linspace (0, 1, 33);
[mesh, facets] = Mesh (msh2m_structured_mesh (x, y, 1, 1:4));

V = FunctionSpace ('Poisson', mesh);

# Define boundary condition
bc = DirichletBC (V, @(x, y) 0.0, facets, [2;4]);

f = Expression ('f', @(x,y) 10*exp(-((x - 0.5)^2 + (y - 0.5)^2) / 0.02));
g = Expression ('g', @(x,y) sin (5.0 * x));

a = BilinearForm ('Poisson', V, V);
L = LinearForm ('Poisson', V, f, g);

# Compute solution
[A, b] = assemble_system (a, L, bc);
sol = A \ b;
u = Function ('u', V, sol);

# Save solution in VTK format
save (u, 'poisson');

The basic structure has remained the same. DOLFIN boasts the capability to be run both in serial and in parallel execution without intervening on the code, so I did my best to have the same behaviour from fem-fenics. The Poisson.m m-file above can be run either as you usually would do with any other m-file, or from the command line with an invocation such as:

mpiexec -np 4 octave --eval Poisson

Now, how is this possible? In the beginning, with the ufl block, the variational problem is defined in UFL language, written to an .ufl file and compiled via FFC. Since IO is performed, ufl.m ensures that only process zero will open and write to the file. Moreover, a MPI barrier makes sure that no process will proceed before the .ufl file is imported.

As soon as the just-in-time compilation is over, there are two instructions to build the mesh, in this case on the unit square. For this, we rely on the msh package, which returns a PDE-tool-like representation of it. Mesh.oct must, then, convert it to DOLFIN internal representation and distribute it among processes. Here comes an issue: fem-fenics relies on markers present in the PDE-tool format to impose essential boundary conditions, and in serial runs dolfin::Mesh can store them, so that DirichletBC.oct needs just to know the boundary subset label. Unfortunately, this feature is not supported yet in parallel by the DOLFIN library, then Mesh.oct has been edited to return, if requested, also a meshfunction holding this information, in the example above facets. This way markers can be conveyed to DirichletBC.oct and boundary conditions can be applied on the correct edges.

Further intervention was needed for the assembly and solve phase. In assemble_system.oct both the matrix and the vector are assembled locally on each portion of the mesh and, afterwards, gathered on process zero and joined, so that the system can be solved with the backslash instruction of Octave. In order to allow output in VTK format, in Function.oct the solution is split up and properly distributed among processes, so that each one holds the portion of degrees of freedom related to its subdomain and to the neighbouring vertices. After save.oct has written the solution to poisson.pvd and its auxiliary files, it can be visualised with ParaView.

30 July 2014

Support for DOLFIN 1.4.0

Lately I have not been very active on the blog since I am encountering some difficulties in the attempt to introduce MPI parallelism. Meanwhile, I have extended support to the latest version of the DOLFIN library.

Among other changes, one that strongly affects fem-fenics is the shift from the shared pointer implementation by the Boost libraries to the one included in the Standard Template Library with the new C++11 standard. This change alone calls for edits in almost all the codebase of the package, as basically all DOLFIN data structures are stored via smart pointers in the corresponding fem-fenics classes. However, currently version 1.3.0 is still present in the official repositories of the main Linux distributions, thus switching abruptly to the latest version would have prevented further releases of the package for a while.

In order to tackle the above mentioned issue, I resorted to the preprocessor capabilities, so as to infer from the DOLFIN version available on the compiling computer the right kind of pointers to use. Among other options, the preprocessor flags obtained using pkg-config define also a macro reporting the DOLFIN version. It is, then, possible to check it and choose the correct pointer implementation right before compilation. Currently in fem-fenics every occurrence of boost::shared_ptr has been replaced by a SHARED_PTR macro, which in turn is defined in a new header that takes care of setting it to the right value. There is just a catch: preprocessor conditionals cannot compare strings, but the DOLFIN_VERSION macro is indeed defined as a string. In order for this approach to work, the package Makefile, for the initial compilation, and the get_vars.m function, for the just-in-time ones, perform the actual check and define an auxiliary macro if the latest version is found on the system.

12 July 2014

Factories for assembly

fem-fenics provides a couple of utilities for assembling matrices and vectors, which compose the algebraic system weak formulations are reduced to when applying FEMs. Left aside all the checks needed to verify inputs, their job boils down to creating proper DOLFIN objects to store these matrices or vectors, possibly applying boundary conditions, then building an Octave-friendly representation. This last task is quite critical for the implementation of the MPI parallelisation, as the underlying DOLFIN representation of matrices and vectors is transparently distributed among processes, thus making the serial code used to pass them on to Octave useless. Lately I implemented some new classes to manage this aspect, so I will highlight my design considerations.

The translation from DOLFIN's to Octave's data structures is logically a well defined task, whilst its implementation needs to vary according to its serial or parallel execution. Furthermore, it strictly depends on the linear algebra back-end used, for each of them stores a different representation and exposes a different interface to access it. To address these difficulties, I wrote a hierarchy of factories to provide the adequate methods, based on the run-time necessities. Moreover, this way the code is easily expandable to allow for more back-ends to be used in fem-fenics (currently only uBLAS is available). There is an abstract class, to declare the interface of its derived ones, and a concrete factory implementing the uBLAS-specific methods.

Since in a future fem-fenics there will be several algebraic back-ends available for use, the hierarchy will expand. This means that the checks of the run-time configuration will eventually become more complex. Another issue comes from the need to use different types depending on information available only at run-time. Both to encapsulate those checks, avoiding code duplication, and to solve the problem of choosing the right class, I added to the hierarchy a class implementing the Pimpl idiom. With this design, the "user code" in the C++ implementation of assemble.oct and assemble_system.oct needs just to create a femfenics_factory object and use it to extract the data structures of interest, while every other hassle is dealt with behind the scenes by this class.

UML diagram of the new hierarchy

In the diagram above you can see the already implemented classes and an example class to point out where others will collocate amongst them. femfenics_factory has private methods to check which is the right concrete class to use each time, and implements the public methods of the abstract class dispatching the call through a reference. uBLAS_factory, as other concrete classes are expected to do, holds the real code for creating Octave matrices and vectors and exposes a static method, instance, which allows for access to the singleton object of this type. femfenics_factory, in turn, obtains with it the reference needed for dispatching.

08 July 2014

MPI and the problems to face

These days I have started my investigations on the actual implementation of the MPI parallelisation in fem-fenics. I found out some information that I will point out here, together with my goals for the next weeks.



First of all, apparently MPI can be used without user intervention on the serial code. This is a feature that DOLFIN boasts, but I would expect it not to pass on to fem-fenics, at least not without some effort on the implementation side. Furthermore, DOLFIN offers also a wrapper for MPI functionalities, thus probably it can be helpful in managing data transfers among threads in C++ code.

An issue that will need to be addressed is making ufl.m robust to parallel execution, since its serial implementation leads to all workers trying to open the same file, thus leading to an error that stops computation. Anyway, even if they could all open the file and write to it, this would entail that lines are copied in random order or more than once, so it must be fixed.

In the end, it seems that the partitioning procedure produces matrices that are not slices of the one assembled in serial execution. Due to this fact, I must go deep in the algorithm to find out the proper way to merge the pieces and obtain the complete matrix, which will be stored as octave_value to allow for further computation using Octave's features.

25 June 2014

Mid term accomplishments

I will try to give a comprehensive feel of what I achieved in this first part of the Google Summer of Code, since it is time for the mid term evaluation. Let's start with an example: as usual, it is the Poisson equation, but today, as a twist, we consider a fully Neumann problem. In order for such a problem to be well posed there is the need of an additional constraint, otherwise the solution would not be unique, so in the Octave code there is the Lagrange multiplier cHere you can find more details and the C++ and Python code, I will just write down the differential problem for convenience:

- Δu = f in Ω
u ⋅ n = g on ∂Ω

Here is the Octave code that solves the above mentioned problem:

pkg load fem-fenics msh

ufl start NeumannPoisson
ufl CG = FiniteElement '("CG", triangle, 1)'
ufl R = FiniteElement '("R", triangle, 0)'
ufl W = CG * R
ufl
ufl "(u, c)" = TrialFunctions (W)
ufl "(v, d)" = TestFunctions (W)
ufl f = Coefficient (CG)
ufl g = Coefficient (CG)
ufl
ufl a = "(inner (grad (u), grad (v)) + c*v + u*d)*dx"
ufl L = f*v*dx + g*v*ds
ufl end

# Create mesh and function space
x = y = linspace (0, 1, 33);
mesh = Mesh(msh2m_structured_mesh (x, y, 1, 1:4));

W = FunctionSpace ("NeumannPoisson", mesh);

# Define variational problem
f = Expression ('f', @(x,y) 10*exp(-((x - 0.5)^2 + (y - 0.5)^2) / 0.02));
g = Expression ('g', @(x,y) - sin (5.0 * x));

a = BilinearForm ("NeumannPoisson", W, W);
L = LinearForm ("NeumannPoisson", W, f, g);

# Compute solution
[A, b] = assemble_system (a, L);
sol = A \ b;
solution = Function ('solution', W, sol);

u = Function ('u', solution, 1);

# Plot solution
[X, Y] = meshgrid (x, y);
U = u (X, Y);
surf (X, Y, U);

At the very beginning you can see a block with every line starting with ufl. That is what you would have to put in a separate UFL file before this summer. In a sense it is not plain UFL, but there are extra quotes and apices. They are needed because, using the current version of Octave, those brackets with commas inside would otherwise be interpreted as function calls. After this blocks closes with the ufl end line, the resulting UFL file is compiled to obtain a FunctionSpace, a BilinearForm and a LinearForm. These are oct-files that fem-fenics will use later on to define the corresponding variables in Octave. A robust implementation of ufl.m, the function that provides this binding to the UFL language, is one of the results of the first term.

In the end of the snippet you can see that the solution u is evaluated in its domain exactly as you expect to do with a regular function taking two arguments and returning one value. This is due to the new subsref method of the function class, which is used to represent the elements of a function space. Aside from surface plots, this feature can be of interest to generalise methods that rely on analytical solutions to differential problems, or to apply basically any algorithm to such functions. Here is the plot you will obtain with this script:


I wrote in an earlier post of the interpolate function: with this you can get the representation of a Function or Expression on a given FunctionSpace. It is useful, for instance, to compare your numerical solution with an exact one you happen to know. Or, in the example above, you might want to view what is the forcing term like:

f_cg = interpolate ("f_cg", f, u);
F = f_cg (X, Y);
surf (X, Y, F);


There is one last achievement to highlight for the mid term evaluation: currently both the initial compilation of the package and all the ones performed just-in-time when importing UFL instructions proceed smoothly without user intervention. To this end, now the build system relies on pkg-config to get at once all the flags needed for proper compilation and linking, since some dependencies of dolfin, the FEniCS interface, are not to be found in standard directories. In order to exploit the extracted information also for the subsequent run time builds, the autoconf substitution is performed also in the get_vars.m auxiliary function, which in turn provides it to generate_makefile.m. An implementation detail that proved quite tricky is how to pass all the preprocessor flags to mkoctfile: only a subset of the options of g++ are hard-coded in it, so I needed to resort to a workaround. Indeed, CPPFLAGS are always passed as environment variables and not as command line flags, so that mkoctfile will just copy and deliver them to the real compiler.


To further enhance the build system, I implemented other internal functions that hash the UFL file that was compiled and, later, check it to understand if it changed between the previous and the freshly requested build. In the example above, you will find in your working directory four new files after a run: the three already mentioned oct-files and a text file storing the md5 sum of the UFL that has been imported. Until one of these files gets somehow deleted or the problem in the ufl block changes, you will not need to take on a time consuming compilation any more.

23 June 2014

Function evaluation

As said in the previous post, my latest result is the possibility to evaluate a fem-fenics function on a point of its domain. This way it is possible to generalise methods that, otherwise, would rely on analytical solutions. In [1] we have an example of such a method.

The paper deals with the deposition of nanoparticles in tissues, for the treatment of cancer. The phenomenon is described with a Monte Carlo simulation of these particles' trajectories, assuming that the velocity field of the carrying fluid is known. In this study, some simplifying hypotheses about the geometry of the cells and the fluid layer nearby allow for an analytical solution of the Stokes equation. Unfortunately, these assumptions do not hold generally in human tissues: for instance, in the liver cells have cubic shape, contrasting to the spherical one used in this paper. Now, if this method is implemented in Octave, we can solve numerically the Stokes equation on a realistic domain and obtain right away a more general approach to this significant application.


The evaluation of a fem-fenics function was already possible via the feval method, but it had some glitches. One aspect is that the solution of a differential problem could not be used as if it was a regular Octave function, then a user should have adapted his/her algorithms to take advantage of it. One more critical issue is that the previous implementation did not handle the exception raised by the underlying FEniCS method when it gets as argument the coordinates of a point outside of the domain, thus leading to a crash of Octave.

In order to address these problems, I added the subsref method to the function class and implemented the proper exception handling in feval. To avoid code duplication, the former relies on the latter for the real computation, so it basically just forwards the parameters after checking that the right type of indexing was used. As a result, it is now possible to solve the equations:

- ν Δu + ∇p = 0
∇ ⋅ u = 0

with relevant border conditions, on a proper mesh and finite element space, and then evaluate the solution with the Octave expression  values = u (points), where points is a matrix holding the coordinates of every point where to do so, one per column. Moreover, a careless evaluation will not result in your Octave session crashing any more.

Even if this feature of the package underwent some improvement, there is still room for more. Two issues I have not addressed yet are the somehow weird interface and the possibility to create a function handle to perform evaluations with. Regarding the former, we might observe that the above mentioned expression remains exactly the same no matter what the geometrical dimension of the domain is. I should modify the implementation so that a vectorial function on a 3D space is evaluated with [ux, uy, uz] = velocity (x, y, z). Moving to the latter, in my understanding the class design should be modified to allow the exploitation of the Octave internals managing functions, so this would require a careful reflection on all the possible collateral effects of such a change.

16 June 2014

Goals for future development

The mid-term review is approaching, so it is time to highlight what is done, what is underway and what are the future goals. In this post I will try to do so as clearly as possible.

Mid-term review

The main effort during the first part of the Google Summer of Code was the implementation of the bindings for the UFL language in Octave. Now UFL code can be written directly in m-files, without the need of a separate file to define the problem. To this end, ufl has been implemented for opening a file, writing to it and importing the variational problem when it is complete.

Further, I implemented interpolate, which allows the interpolation of a Function or an Expression on a given FunctionSpace. This can be of interest to test the validity of a discretisation method, for instance if an analytical solution is available in closed form, so that it is possible to compare it with the numerically obtained one.

Lately, I focused on the build system, both for the package compilation and for the just-in-time ones needed to import variational problems in Octave. The former is now backed by pkg-config, so that all the proper compiling and linking options required by the dependencies are obtained at once. Thinking about the latter, this information is used to accordingly configure the get_vars function, which provides it to the one that generates the Makefiles used to compile oct-files just-in-time. In the end, currently these oct-files are compiled again only when necessity arises, for example if one of them has been deleted or if the UFL file has been changed.

In the upcoming week I will add another feature: it will be possible to get a function handle for the evaluation of a Function. This way the solution of a variational problem can be used exactly as any other function in Octave, for instance allowing the generalisation of algorithms relying on exact solutions of differential problems, which are thus limited to simple cases. I will provide some details on an application in my post about this feature.


Final review

In the second part of the project I will be mainly committed to the parallelisation of the package execution via MPI. As noted in an earlier post, the parallelisation through the OpenMP paradigm has been quickly abandoned because it does not provide a significant performance gain, while opening the way to bugs and errors. Parallelism is, anyway, an interesting feature for the package's use cases, so it will be the main goal of the final hand in.

15 June 2014

Just-in-time compilation

One of the known issues of the fem-fenics package was related to the errors during the just-in-time compilation, due to missing include directories. Among my preliminary contributions there is a changeset addressing the problem in the initial build of the package, when installing it into Octave. Now I went on and solved it also during the usage of fem-fenics.

At the moment of the first build, autoconf is in charge of finding out the relevant compiler and linker flags through pkg-config. They are, then, substituted in the Makefile, which compiles the package making use of them. This piece of information is needed also when an UFL file is imported and transformed into the oct-files used to transform the weak formulation at hand into an algebraic system, but until now the user had to supply it by the means of an environment variable.

Currently, I added a new utility function that provides those flags. In the configuration process they are substituted also in get_vars.m, which is called by generate_makefile.m when a differential problem is imported. The latter replaces two placeholders and writes the ad hoc Makefile with all the necessary compile and link options. This way users will not need to provide compilation flags anymore, instead the package will manage this aspect on its own.

As noted in a previous post, however, this just-in-time build is relatively time consuming, taking around half a minute each time. Nonetheless, a common usage pattern could entail the resolution of the same weak formulation on varying meshes or with different boundary conditions, forcing terms or physical parameters. Every mentioned situation does not need the recompilation of the problem's oct-files, since they carry information only about the function space and the formal expressions of the bilinear form and the linear operator. It is useful, then, to take on the build process only when needed.

To add this feature, I created three function to perform appropriate actions. After every successful just-in-time compilation, save_hash.m takes care of hashing the imported UFL file and writing the result to <ufl-filename>.md5sum. On the other hand, at the beginning of every import_ufl_*.m function, a check like this is performed:

         if (check_hash (var_prob) ||
             ! check_oct_files (var_prob, "Problem"))

You can see in it the remaining two functions implemented lately. The first one, check_hash.m, receives as argument the name of the variational problem, reconstructs the UFL file name, looks for a saved hash sum and compares it with the current file's. It returns true if the proper .md5sum file is not found or if the new and old hashes are not the same. Clearly, the oct-files should be rebuilt if one of them is missing: check_oct_files.m looks for the relevant files, with its second option stating which import is underway (thus, which files are expected as output), and returns true if they are all available.

OpenMP

These days I have worked on the implementation of the interface to the OpenMP-powered assembly offered by FEniCS. Despite being potentially a one-line intervention, it proved quite tricky: indeed, with the needed addition, the fem-fenics function for system assembly broke with a huge number of run time errors, probably due to a change in the underlying data structure that is transparent to the library users, but does not go unnoticed if you need to access it directly, as fem-fenics does. This led me to leave this functionality behind.

My choice is backed by some computational experiments. They show that the approach enacted by the FEniCS library is quite effective, with times required for assembly reduced by half using four threads instead of just one. However, they are negligible compared to the linear system solve phase, even when the OpenMP parallelisation is disabled. I used a great number of mesh nodes in order to have meaningful timings: even if linear systems took some minutes for resolution, the assembly phase lasted as much as a couple of hundredth of a second in serial code. If we add to these findings that the fem-fenics package requires a just-in-time compilation lasting around half a minute, we understand that there is no point in devoting effort for the implementation of this feature.

03 June 2014

Towards a parallel fem-fenics

These days I have been studying the documentation of the FEniCS project, mainly the FEniCS book, in order to understand the features related to parallel execution that it boasts. This preliminary study is aimed at adding them to the fem-fenics package. First of all I will summarise my findings, then I will comment the problems I need to address to implement this functionality.

Parallelism in FEniCS

FEniCS implements parallelism in such a way to be transparent to the user of the library. Moreover, it scales on different architectures, ranging from multi-core personal computers to distributed clusters. To this end, FEniCS makes use of two paradigms, which can be exploited both separately and together.

The first approach is tailored for shared memory architectures, such as the vast majority of the PCs nowadays, but also in many cases each node of a computational cluster. The implementation is based on OpenMP, and adding a simple instruction one can enable parallelisation to speed up the matrix assembly phase. It should be noted that this paradigm has little support in the underlying linear algebra libraries, so the resolution phase can take advantage of multi-threading only with the PaStiX solver. Since in a shared memory model parallel programs might suffer race conditions, the mesh is coloured to identify subsets, so that no two neighbouring elements belong to the same set. Obviously, the notion of proximity depends on the particular function space, then this is considered in the colouring algorithm. The assembly proceeds iterating over colours and splitting their nodes among threads: with this technique race conditions are avoided and the user can enjoy the benefits of parallelisation without incurring in unpredictable behaviour.

Contrasting to the first approach, the second paradigm is based on MPI and addresses the needs of distributed memory architectures. Unfortunately, the latter is less immediate than the former, requiring a DOLFIN program to be launched with the MPI execution utility, but in this case the code need not be modified. In this implementation, the mesh is split so that each process gets its part of it, with an algorithm striving to minimise inter-process communication. With scalability in mind, no single process holds the full matrix and, moreover, everything happens behind the scenes: this way the user has no need of taking care of low level issues. The distributed memory paradigm is diffusely supported in the algebraic back-ends, so it allows the usage of several solvers, both indirect and direct. As already noted, this and the previous approach can be combined, for instance distributing the computation on a cluster and further speeding up the assembly process enabling multi-threading within each node, provided they are multi-core machines. 

The implementation in fem-fenics

The shared memory paradigm should be quite straightforward to implement in fem-fenics. I expect to operate on a couple of functions: the private generate_makefile and the two assemble and assemble_system. The former should have the proper compilation flag (-fopenmp) added. The latter should have a new line reading like:

dolfin::parameters["num_threads"] = femfenicsthreads;

The number of threads could be passed to those functions as an argument, but this would ruin the interface compatibility with FEniCS, so this is a poor approach. Another way of addressing the issue is to define a global Octave variable in PKG_ADD and store in it the desired number of concurrent threads to use for the assembly.

The implementation of the distributed memory paradigm, instead, seems quite tricky. Basically, Octave does not use MPI, at least not Octave core. Nonetheless, there are two Forge packages with this goal, mpi and parallel. I will go through the documentation of these packages to understand if and, in case, how they address the problem of launching the oct-file with mpirun or mpiexec. Even leaving this aspect aside, I still do not know how easily the distributed objects storing matrices and vectors can be accessed to obtain the whole data.

In conclusion, I will initially work to add shared memory parallelism, at the same time looking deeper into the issues related to the distributed memory paradigm, which I suspect of being more than the ones highlighted.

23 May 2014

ufl binding

This week I started my work on the ufl function: it is now possible to write ufl code on-the-go, directly in your m-files. You can see below how the Poisson.ufl file of the homonymous example provided with fem-fenics (on the left) can be translated to a snippet of Octave code:

# Copyright (C) 2005-2009 Anders Logg

element = FiniteElement("Lagrange", triangle, 1)

u = TrialFunction(element)

v = TestFunction(element)
f = Coefficient(element)
g = Coefficient(element)

a = inner(grad(u), grad(v))*dx

L = f*v*dx + g*v*ds
# Copyright (C) 2005-2009 Anders Logg
ufl start Poisson
ufl element = FiniteElement("Lagrange", triangle, 1)
ufl
ufl u = TrialFunction(element)
ufl v = TestFunction(element)
ufl f = Coefficient(element)
ufl g = Coefficient(element)
ufl
ufl a = inner(grad(u), grad(v))*dx
ufl L = f*v*dx + g*v*ds
ufl end

How to use ufl

Basically, you just need to prepend what you would have written in your .ufl file with ufl. As you can see, anyway, there are also two new instructions. fem-fenics still needs to store your code in a separate file, which is then compiled using ffc, the FEniCS form compiler, but now ufl takes care of the process.

Your code should begin with the start command, and optionally with the name you want to assign to the file: in this example, we choose to open a new Poisson.ufl file. Be aware that ufl will not overwrite an existing file so, if you plan to use your script for several runs, my suggestion is to keep your working directory clean and tidy with a delete ('Poisson.ufl') after the snippet above.

When you are fine with your ufl code, the end command will tell ufl that it can compile and provide you with your freshly built problem. You can also specify options like BilinearForm (it is not the only one available, find a comprehensive list in the help message, in Octave), in case you wrote just part of the problem in your last lines.

What now?

A lot of commitment was devoted to this function. This is not due to intrinsic difficulties: a sketch of the function's code has been around for a while and the current implementation has not consistently slid away from it. The goal was to obtain a robust piece of code, since it will be the cornerstone of a new paradigm in fem-fenics usage. At least each and every example provided with the package needs to be modified to take advantage of this change, and this will be my next task.

18 May 2014

My first function - Follow up

As said in my previous post, I have been working on extending the implementation of interpolate to allow for an Expression as input. Currently it can also be used as in the Python dolfin interface, see here. Let's see how to use this new function in fem-fenics.

The Poisson equation

This example can be found in the FEniCS Book, it is the very first. The problem at hand is the Poisson equation with Dirichlet boundary conditions:
- Δu = f in Ω
u = u0 on ∂Ω
We will solve this problem on the unit square, with f constant and equal to -6 and u0 = 1 + x2 + 2y2. It can be verified that the exact solution is uex = 1 + x2 + 2y2. With the following ufl file:
element = FiniteElement("Lagrange", triangle, 1)

u = TrialFunction(element)
v = TestFunction(element)
f = Coefficient(element)

a = inner(grad(u), grad(v))*dx
L = f*v*dx

and Octave script:

pkg load fem-fenics msh
import_ufl_Problem ('Poisson')

# Create mesh and define function space
x = y = linspace (0, 1, 20);
mesh = Mesh(msh2m_structured_mesh (x, y, 1, 1:4));

V = FunctionSpace('Poisson', mesh);

func = @(x,y) 1.0 + x^2 + 2*y^2;

# Define boundary condition
bc = DirichletBC(V, func, 1:4);

f = Constant ('f', -6.0);

# Define exact solution
u_e = Expression ('u_ex', func);

a = BilinearForm ('Poisson', V, V);
L = LinearForm ('Poisson', V, f);

# Compute solution
[A, b] = assemble_system (a, L, bc);
sol = A \ b;
u = Function ('u', V, sol);

# Save solution
save (u, 'poisson');

# Interpolate and save the exact solution
u_e_int = interpolate (u_e, V);
save (u_e_int, 'exact');

it is possible to compute the numerical solution, interpolate the analytical one on the same function space and then compare them. Using a visualisation tool like Paraview, one can verify that the computed solution and the interpolation of the exact one are practically the same. This is due to the fact that the Finite Elements Method with triangle elements on a rectangular domain can exactly represent a second order polynomial, as the solution of the problem at hand.


Here you can see a good solution poorly post-processed in Paraview to the Poisson problem solved in the example.

17 May 2014

My first function

Lately I have coded my first function for fem-fenics: it is interpolate, which wraps the homonymous method of the dolfin::Function class. This allows to interpolate a FunctionG, on the FunctionSpace of Function F, even if they are not defined on the same mesh, with a call like this:
res = interpolate (F, G)

I am working on extending it to allow for an Expression as input. With this function it is possible to make a quantitative comparison between the results of different discretisation approaches or to check the accuracy of a method, comparing the computed solution and an analytically obtained one.

The implementation

I provide here a comprehensive overview of the code for interpolate. First of all, the number of input arguments is obtained and an octave_value is declared, in order to hold the output. Then there is a check ensuring that exactly two arguments are provided and no more than one output value is asked for.

After verifying these preliminary conditions, there are some instructions checking that the function type is loaded and, if necessary, registering it. This way Octave is aware of it and can store it in an octave_value.

Eventually, the real computation is performed. After checking that the inputs are of the function type, with a static_cast the actual objects are extracted from the arguments:

const function & u0 = static_cast<const function&> (args(0).get_rep ());
const function & u1 = static_cast<const function&> (args(1).get_rep ());

Here comes the tricky part. The classes in fem-fenics are designed to hold constant dolfin objects, but dolfin::Function::interpolate is not a constant method. In order to be able to call it, a local dolfin::Function is constructed, used to perform the interpolation, then fed to the function constructor and assigned to the return value:

boost::shared_ptr<dolfin::Function> output (new dolfin::Function (u0.get_fun ()));
const dolfin::Function & input = u1.get_fun ();

output->interpolate (input);
std::string name = u1.get_str ();

retval = new function (name, output);