Re: [libvirt] [GSOC] project libvirt fuzzing

22 Mar 2017

      On Tue, Mar 21, 2017 at 17:09:58 +0100, Michal Privoznik wrote:
...
On 03/21/2017 04:34 PM, Peter Krempa wrote:
...
On Tue, Mar 21, 2017 at 16:15:35 +0100, Michal Privoznik wrote:
...
On 03/21/2017 04:39 AM, D L wrote:
...
On Thu, Mar 16, 2017 at 1:03 PM, Michal Privoznik <mprivozn@redhat.com>
wrote:
[...]
...
...
necessary. This might be one of the cases where programming is necessary for
 this project.
I don't think that we want to fuzz functions callde from
qemuBuildCommandLine() separately. That indeed would be too overwhelming. I
think we would be perfectly okay with fuzzing the qemuBuildCommandLine()
itself (well, with help of XML parsing as described in my previous e-mails).
So we might focus on generating XMLs for now (e.g. write a grammar that does
that? dunno - don't have much experience with fuzzers). The whole idea that
Ideally it should take the grammar we have for our XMLs so that we don't
have to update it manually all the time.
While this would certainly be interesting thing to do I'm afraid of two
things here:
1) state explosion - our XML schema is so complicated that trying to
generate each state it could be in depending on grammar would lead to
"uncountable" many states. Plus calling 2) + 3) over them would take ages to
Yes these are the problems of fuzzing. By definition [1] you need to
tell the fuzzer what is and what isn't a valid input. Otherwise you'd
already get an exploded state. Are you expecting to test any random
string as an XML? Or at least any valid XML as a libvirt xml? [2]

You also need the schema to do a partially valid input so that other
code paths can be reached, otherwise you'd mostly get stuck at the first
error check in the parser.

Basically the schema is quite the oposite. It very drastically limits
the amount of strings (or valid XML files) that you should feed to the
parser so that it actually tests reasonable stuff.
...
finish. But we can aim on a very basic subset for now and probably expand
that later?
I'm afraid that if you stick with a subset or don't make it automated,
it won't get finished ever.
...
2) Reversing the process from RNG to XML generation: how would that even
work? I mean, how do you parse RNG schema and reason about it? I know it's
an XML document just like any other, but what I am interested in is how to
catch the meaning of rules written in the schema. For instance:
<element name="blah">
  <zeroOrMore>
    <element name="subBlah">
      <text/>
    </element>
  </zeroOrMore>
</element>
You picked a very bad subset for demonstration since it basically allows
everyting, which is not very far from the infinite ape theorem. Mostly
such elements would be parsed verbatim, so the only failure you could
ever get is memory allocation problem.

If you pick a <optional> or something mandating a input format
(<choice>, etc.), you get a set of valid and invalid settings. The
fuzzer should test some of the valid ones along with a few random
invalid to see if it fails.
...
We all know what this simple grammar can generate. But if I were to write a
program that parses the rules and generates XML documents according to them,
I'd probably end up hiding under the desk.
Isn't that the job of the fuzzer?
...
...
...
I have in my mind is as follows:
1) let fuzzer genereate a XML document
2) def = virDomainDefParse*(document);
3) qemuBuildCommandLine(def);
BTW if you want to check the command line generator too, you need to
have a valid XML on input so the schema is actually the way to go.
...
...
...
4) if SIGSEGV store XML somewhere for future inspection
[...]

[1] https://en.wikipedia.org/wiki/Fuzz_testing
[2] https://en.wikipedia.org/wiki/Infinite_monkey_theorem