On 01/09/2012 07:05 PM, Eric Blake wrote:
On 12/09/2011 08:07 AM, Stefan Berger wrote:
> In this patch we introduce testing whether the iterator points to a
> unique set of entries that have not been seen before at one of the previous
> iterations. The point is to eliminate duplicates and with that unnecessary
> filtering rules by preventing identical filtering rules from being
> instantiated.
> Example with two lists:
>
> list1 = [1,2,1]
> list2 = [1,3,1]
Is this something that happens frequently in practice due to user input
(that is, does the user really output list elements more than once), or
something that happens more because of combinatorial combinations as you
expand user inputs into something more precise? I guess I'd like to
know why such repetitions are likely to occur, although I can agree with
doing less firewall rules if we can determine that the rules are duplicates.
Fair question. It's an optimization that I would admit is not absolutely
necessary, but reduces the number of rules that are being generated in
case duplicates would occur. Doing this calculation upfront rather than
having duplicate rules evaluated on possibly thousands of packets seems
worth the effort. The algorithm is rather short but maybe not as
efficient as it should be.
> /*
> + * Test whether the iterator entry points to a distinguished set of entries
> + * that have not been seen before at one of the previous iterations.
> + *
> + * The point of this function is to eliminate duplicates.
> + * Example with two lists:
> + *
> + * list1 = [1,2,1]
> + * list2 = [1,3,1]
> + *
> + * The 1st iteration would take the 1st items of each list -> 1,1
> + * The 2nd iteration would take the 2nd items of each list -> 2,3
> + * The 3rd iteration would take the 3rd items of each list -> 1,1 but
> + * skip them since this pair has already been encountered in the 1st iteration
> + */
> +static bool
> +virNWFilterVarCombIterEntryAreUniqueEntries(virNWFilterVarCombIterEntryPtr cie,
> + virNWFilterHashTablePtr hash)
> +{
> + unsigned int i, j;
> + virNWFilterVarValuePtr varValue, tmp;
> + const char *value;
> +
> + varValue = virHashLookup(hash->hashTable, cie->varNames[0]);
> +
> + value = virNWFilterVarValueGetNthValue(varValue, cie->curValue);
> +
> + for (i = 0; i< cie->curValue; i++) {
> + if (STREQ(value, virNWFilterVarValueGetNthValue(varValue, i))) {
> + bool isSame = true;
This function is called in a loop over the list, and itself does a loop
over the list, making it a quadratic check for duplicates. Am I correct
that lists can provided by the user with arbitrary lengths?
Each variable can have an number of elements >= 1, for example
A=[10,20,30,20].
> + for (j = 1; j< cie->nVarNames; j++) {
Furthermore, we add another layer of looping (this time over the number
of parallel lists being combined), for an overall cubic algorithm. Is
the number of parallel lists that can be provided by the user arbitrary?
The number of lists depends on the number of variables that are provided
in one rule. So, $A, $B, and $C in one rule in different fields like
this one below has 3 different lists (where the iterators ('@<n>') only
change the behaviour for how the instantiated rules are generated):
<rule action='accept' direction='out'>
<tcp srcipaddr='$A[@1]' srcportstart='$B[@2]'
dstportstart='$C[@3]'
dscp='4'/>
</rule>
That is, I'm wondering if we need to start worrying about
reducing
complexity, and coming up with an algorithm that isn't quite so brute
force so as not to take quite so long on arbitrary-length inputs.
In case of no duplicates in $A, we have a complexity of O(n) [optimal
case] for checking of a single rule (only 'vertical' checking, no
'horizontal' checking)
In case $A has duplicates the complexity then becomes O(n^2) in the
worst case since we do horizonal checking accross $B and $C as well.
I think the elements in $A for example would have to be sorted for
faster detection of duplicates. A = [10,20,30,20] -> A = [10,20,20,30].
Now if I step on A[2], I would see that A[2-1] = A[2] and wouldn't have
to look at A[0] at all. So that would be O[1]. But sorting the elements
may alter their meaning in terms of the sequential processing of the
rules. Got to think about this...
On the other hand, until someone actually demonstrates slow input,
I'm
okay with making assumptions that most users won't be merging large
amounts of long parallel lists, and that the overhead of reducing
algorithmic complexity may cost more overhead than all expected use
cases just being brute-forced. So I'm okay checking this in as-is,
until (and unless) we have performance numbers proving that we need to
be smarter.
OK.
> @@ -414,10 +460,15 @@ virNWFilterVarCombIterNext(virNWFilterVa
> unsigned int i;
>
> for (i = 0; i< ci->nIter; i++) {
> +next:
> ci->iter[i].curValue++;
> - if (ci->iter[i].curValue<= ci->iter[i].maxValue)
> + if (ci->iter[i].curValue<= ci->iter[i].maxValue) {
> + if (!virNWFilterVarCombIterEntryAreUniqueEntries(
> +&ci->iter[i], ci->hashTable)) {
> + goto next;
Why do we need a backwards goto, when a 'continue' would get to the same
location?
Using the goto we don't increase 'i' but only do ci->iter[i].curValue++,
which is what is needed.
> + }
> break;
> - else
> + } else
> ci->iter[i].curValue = 0;
> }
Coding style - the else needs {} since you added {} for the if.
Fixed.
I think this is okay, but wait until I review the rest of the series
to
see how it was used in context.
I'll wait. Thanks.
Stefan