Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Segmentation definitionParametersDescription
all definitionsdiscriminatorItem whose values will used to segment objects into buckets (if applicable). Usually required.
matchingRuleMatching rule to be applied when creating filters (if applicable). Optional.
numberOfBucketsNumber of buckets to be created (if applicable). Optional.
numericSegmentationfromStart of the processing space (inclusive). If omitted, 0 is assumed.
to

End of the processing space (exclusive). If not present, both bucketSize and numberOfBuckets must be defined and the end of processing space is determined as their product. In the future we might implement dynamic determination of this value e.g. by counting objects to be processed.

bucketSize

Size of one bucket. If not present it is computed as the total processing space divided by number of buckets (i.e. to and numberOfBuckets must be present).

stringSegmentationboundaryCharacters

Characters that make up the prefix or interval. Currently, the string segmentation is done by creating all possible boundaries (by combining boundaryCharacters) and then using these boundaries either as interval boundaries (if comparisonMethod is interval) or as prefixes (if comparisonMethod is prefix).

This is a multivalued property: the first value contains characters that occupy the first place in the boundary. The second value contains characters destined for the second place, etc.

An example: if boundaryCharacters = ("qx", "0123456789", "0123456789", "0123456789") then the following boundaries are generated: q000, q001, q002, ..., q999, x000, x001, ..., x999. This might be suitable e.g. for accounts that start either with "q" or with "x" and then continue with numbers, like q732812.

Another example: if boundaryCharacters = ("abcdefghijklmnopqrstuvwxyz", "0123456789abcdefghijklmnopqrstuvwxyz") then the following boundaries are generated: a0, a1, a2, ..., a9, aa, ab, ..., az, b0, b1, ..., b9, ba, ..., bz, ..., z0, z1, ..., z9, za, ..., zz. This might be suitable e.g. for alphanumeric account names that always start with alphabetic character.

Beware: current implementation requires that the characters are specified in the order that complies with the matching rule used. Otherwise, empty intervals might be generated, like when using "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" there will be an interval of e.g. "values greater than z but lower than A" (empty one) or "values greater than Z" (covers items covered by earlier intervals of a-b, b-c, ...).

depth

If a value N greater than 1 is specified here, boundaryCharacters values are repeated N times. This means that if values of V1, V2, ..., Vk are specified, the resulting sequence is V1, V2, ..., Vk, V1, V2, ..., Vk etc, with N repetitions - so Nk values in total.

comparisonMethodEither interval (the default), resulting in interval queries like item >= 'a' and item < 'b'. Or prefix, resulting in prefix queries like item starts with 'a'. Beware, when using prefix method, all the discriminator values are covered by boundaryCharacters you specify. Otherwise some items will not be processed at all.
oidSegmentationThe same as stringSegmentation but providing defaults of discriminator# and boundaryCharacters0-9a-f (repeated depth times, if needed).
explicitSegmentationcontentExplicit content of work buckets to be used. This is useful e.g. when dealing with filter-based buckets. But any other bucket content (e.g. numeric intervals, string intervals, string prefixes) might be used here as well.

More examples

The oidSegmentation is the easiest one to be used when dealing with repository objects. The following creates 162 = 256 segments.

Code Block
languagexml
titleBuckets defined on first two characters of the OID
<workManagement>
    <buckets>
        <oidSegmentation>
            <depth>2</depth>
        </oidSegmentation>
    </buckets>
</workManagement>

...