Skip to end of metadata
Go to start of metadata

It often happens that a query yields multiple occurrences of the same object. When using such a query to drive e.g. focus recomputation or arbitrary bulk action execution, it may happen that some objects are processed more than once. See e.g. MID-4495 - or operator works bad in task filter Resolved or MID-3293 - Query with OR behaviour and repeated results Closed .

An example

Let us have user administrator that has assigned these 5 roles: SuperuserEnd userApproverReviewer, and Delegator.

And now let us consider the following bulk action. It is intended to process all users that have an assignment of either SuperuserEnd userApproverReviewer, or Delegator role (or any combination of them).

Simple attempt to select all users having any of given roles

How many times is the code executed for the administrator?

Before answering that, let us have a look at HQL code produced by the above query. We get it if we enter the query into query playground:

Query to be analyzed in the playground

HQL looks like this (after substituting the variables):

Resulting HQL query

(Note that and = 7)

So we are constructing 6-tuples of objects (u, a, a2, a3, a4, a5) where u is the user (administrator) and either:

  1. a is Approver assignment, or
  2. a2 is Delegator assignment, or
  3. a3 is End user assignment, or
  4. a4 is Reviewer assignment, or
  5. a5 is Superuser assignment.

How many such tuples exist? The easiest way of counting them is to take all possible tuples of (a, a2, a3, a4, a5) - which is 55 = 3125 and then exclude all non-compliant ones. Non-compliant tuples are those that have a != Approver and a2 != Delegator and a3 != End user and a4 != Reviewer and a5 != Superuser. How many of them exist? Each of a, a2, ..., a5 has only 4 possibilities. So, these are 45 = 1024. Therefore, there are 3125 - 1024 = 2101 compliant objects, as can be confirmed by the query interpreter:

Therefore, if such a query is used in bulk action task or recomputation task or similar one, each user having all of the mentioned 5 role assignments is processed 2101 times (!).

Limiting the redundancy, step 1: Exists filter

The above has been known for some time. In order to limit the redundancies, midPoint 3.4 introduced a special filter: exists. When used it looks like this:

It says: give me all users that have an assignment that has a target either Approver, or Delegator, ..., or Superuser.

The results are much better, but not ideal:

The reason is the way HQL is currently being constructed:

We return tuples of (u, a) where u is the user and a are its assignments complying with the condition. For administrator there are 5 such assignments, yielding 5 results.

It is much better than 2101 results and it might be reduced to 1 in the future, after we slightly change the way how HQL is constructed – if the performance point of view would allow.

Limiting the redundancy, step 2: 'distinct' option

Until that time, there are two remaining options.

The first one consists of additional filtering of search results in bulk action interpreter or iterative search task handler. This should be implemented in midPoint, and would comprise a small CPU/memory overhead. We would have to maintain a list of already processed OIDs and eliminate all attempts to redundantly process objects with OIDs already present in the list. It might be workable with a bit of quirks (e.g. limiting results via paging would not work).

And the second one is available today: it uses the distinct option to be used in search queries. The use of the option is context-dependent. Here we show the use within bulk actions and recomputation tasks.

Distinct option in bulk actions

Bulk action with 'distinct' query option

Note the code on lines 46-52, i.e.

Distinct option in recomputation task

MidPoint 3.8 and later

This feature is available only in midPoint 3.8 and later.

When doing recomputation (or basically any iterative task), mext:searchOptions extension item has to be used:

User recomputation with 'distinct' search option

The distinct option may have some effects on performance, perhaps depending on DBMS used and other circumstances. (It has to be tried in a particular environment to find out.)

  • No labels