[libvirt] [GSOC] project libvirt fuzzing

4 Mar 2017

On Sun, Mar 5, 2017 at 2:47 AM, Michal Privoznik <mprivozn@redhat.com>
wrote:

> On 04.03.2017 07:23, Da L wrote:
> > Dear all,
> >
>
> Hey,
>
> > This is my first post in the list.
>
> Very well. Welcome. It is always nice to see people interested in libvirt.
>
> Hi Michal,

Thank you very much for the explanation and encouragement.
I am so glad to join the community.

> >
> > I am currently a graduate student studying computer science, particularly
> > interested in visualization technologies and I have been using QEMU for a
> > variety of projects for a while. Two of the courses that I am taking this
> > semester really attracted me to the libvirt community  are Advanced
> > Operating Systems and Secure Software Development. I have been learning
> > kernel fuzzing as well as other general fuzzing tools.
> >
> > Then I found the topic of "QEMU command line generator XML fuzzing" is
> > pretty interesting and totally in line with my interest and background.
> > Though I have read through the documentations on the website, just to
> make
> > sure I am doing it correctly, could anyone confirm this project is still
> > available? And what I need to do next in order to participate the project
> > this summer? Do I need to find a mentor by myself? Potentially, I could
> > find my OS or Security professor as my mentor, but I am not sure yet
> which
> > would be the best way.
>
> Yes, the project is still on. It does not have a mentor assigned yet,
> but don't worry about that now - there is a lot of mentors around. For
> now, I can be your point of contact.
>
> So, just to explain you some details of the project: libvirt's format
> for storing domain configuration is XML. However, none of the
> hypervisors out there uses XML to describe domain configuration. For
> instance, in qemu it's all about the command line. You want this disk
> for you domain? You have to put it onto the command line. And so on.
> Therefore, in a very simplistic way, for qemu libvirt translates the XML
> into qemu command line language. Now, this process is very complex and
> sort of tricky. That's why we would like to generate "all" possible
> combinations of XML, let the command line generator crunch them and
> produce qemu command line. Well, that's not entirely true, because
> command line generator works over some internal representation of domain
> (not XML) that is produced by our XML parser:
>
> Please correct me if I am wrong about  my following understanding:
1. Regarding XML config file, one typical usage with libvirt could be:
    $ virsh define <domain_config_file.xml <http://your_xml_config_file.ml>>
2. I noticed in the source code of libvirt, there exist several files in
close relation
to xml, including src/util/virxml.{c,h}, which might be the target of this
project?
3. And libvirt also is compiled with libxml2.
4. Then in virt-xml-validate, which is a bash script,
  (in build/bin directory after make install) calling xmllint.

I have not been able to get round to figure out the relations of the above
pieces yet.
I spent some time to try to instrument and compile the executables with
AFL, but so
 far with no luck. (The idea is as simple as changing gcc in
Makefile/configure to afl-gcc).
The attached figure is just a demo showing using AFL to fuzz virt-admin,
which is
not instrumented, (so kinda of boring and not quite useful). But I think
AFL could be
 one of the candidate as a fuzzer for this project due its prevalence and
proved effectiveness.

Regarding fuzzing, I think we can try several fuzzing tools to run in
parallel, as different
 fuzzers tend to find different kinds of bugs. Thus, AFL (American Fuzz
Lop) [1],
which is a coverage-guided mutation-based fuzzer with genetic algorithm,
can
take hand-crafted xml seed to fuzz our libvert target. Alternatively, we
could
develop generation-based grammar module in AFL (which is definitely
non-trivial);
so far I have not seen active development in AFL community on xml format
grammar generation. Another option could be clang-libfuzzer [2].

Several related articles show examples of fuzzing are using AFL to generate
SQL [3], llvm-afl [4], and hexml fuzzing with AFL [5]. In combination with
lcov, we
 could compare different fuzzers and guide our fuzzing tuning.

NOTE  the [5] example is quite interesting; it is fuzzing a haskell-written
xml paser.

I will probably not update more until next week; I am having three
mid-terms this week.

[1] http://lcamtuf.coredump.cx/afl/
[2] http://llvm.org/docs/LibFuzzer.html
[3]
https://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html
[4] http://lists.llvm.org/pipermail/llvm-dev/2014-December/079390.html
[5] https://github.com/ndmitchell/hexml/issues/6

Again, thanks a lot. Any guidance, comments, or suggestions would be more
than
welcome and highly appreciated.

Best,

Dan

  XML document -> XML parser -> QEMU cmd line generator -> QEMU cmd line
>
> There is plenty of fuzzing libraries available on the market, so I guess
> one of the first steps would be to explore our options and pick one that
> suits our needs. Do you have experience with any of them? Frankly, I
> have very little.
>
> Regarding the GSoC process, each organization makes their own rules for
> accepting students. Here at libvirt the rules are described here:
>
>   http://wiki.libvirt.org/page/Google_Summer_of_Code_FAQ
>
> Please let me know what are your thoughts on all of this, and also don't
> hesitate to ask anything.
>
> Michal
>
>

Da L

Michal Privoznik

D L

Michal Privoznik

D L

Michal Privoznik

D L

Michal Privoznik

D L

Michal Privoznik

Peter Krempa

Michal Privoznik

Peter Krempa

D L

Michal Privoznik

D L

Daniel P. Berrange

D L

Dan

tags

participants (6)