It often happens that a query yields multiple occurrences of the same object. When using such a query to drive e.g. focus recomputation or arbitrary bulk action execution, it may happen that some objects are processed more than once. See e.g. MID-4495 - or operator works bad in task filter Resolved or MID-3293 - Query with OR behaviour and repeated results Closed .
Let us have user
administrator that has assigned these 5 roles:
And now let us consider the following bulk action. It is intended to process all users that have an assignment of either
Delegator role (or any combination of them).
How many times is the code executed for the
Before answering that, let us have a look at HQL code produced by the above query. We get it if we enter the query into query playground:
HQL looks like this (after substituting the variables):
So we are constructing 6-tuples of objects
(u, a, a2, a3, a4, a5) where
u is the user (
administrator) and either:
- a3 is
End userassignment, or
- a4 is
- a5 is
How many such tuples exist? The easiest way of counting them is to take all possible tuples of (a, a2, a3, a4, a5) - which is 55 = 3125 and then exclude all non-compliant ones. Non-compliant tuples are those that have a != Approver and a2 != Delegator and a3 != End user and a4 != Reviewer and a5 != Superuser. How many of them exist? Each of a, a2, ..., a5 has only 4 possibilities. So, these are 45 = 1024. Therefore, there are 3125 - 1024 = 2101 compliant objects, as can be confirmed by the query interpreter:
Therefore, if such a query is used in bulk action task or recomputation task or similar one, each user having all of the mentioned 5 role assignments is processed 2101 times (!).
Limiting the redundancy, step 1: Exists filter
The above has been known for some time. In order to limit the redundancies, midPoint 3.4 introduced a special filter:
exists. When used it looks like this:
It says: give me all users that have an assignment that has a target either Approver, or Delegator, ..., or Superuser.
The results are much better, but not ideal:
The reason is the way HQL is currently being constructed:
We return tuples of
(u, a) where
u is the user and
a are its assignments complying with the condition. For
administrator there are 5 such assignments, yielding 5 results.
It is much better than 2101 results and it might be reduced to 1 in the future, after we slightly change the way how HQL is constructed – if the performance point of view would allow.
Limiting the redundancy, step 2: 'distinct' option
Until that time, there are two remaining options.
The first one consists of additional filtering of search results in bulk action interpreter or iterative search task handler. This should be implemented in midPoint, and would comprise a small CPU/memory overhead. We would have to maintain a list of already processed OIDs and eliminate all attempts to redundantly process objects with OIDs already present in the list. It might be workable with a bit of quirks (e.g. limiting results via paging would not work).
And the second one is available today: it uses the
distinct option to be used in search queries. The use of the option is context-dependent. Here we show the use within bulk actions and recomputation tasks.
Distinct option in bulk actions
Note the code on lines 46-52, i.e.
Distinct option in recomputation task
MidPoint 3.8 and later
This feature is available only in midPoint 3.8 and later.
When doing recomputation (or basically any iterative task),
mext:searchOptions extension item has to be used:
distinct option may have some effects on performance, perhaps depending on DBMS used and other circumstances. (It has to be tried in a particular environment to find out.)