On Sun, Mar 5, 2017 at 2:47 AM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 04.03.2017 07:23, Da L wrote:
> Dear all,
>

Hey,

> This is my first post in the list.

Very well. Welcome. It is always nice to see people interested in libvirt.

Hi Michal,

Thank you very much for the explanation and encouragement. 
I am so glad to join the community. 
 
>
> I am currently a graduate student studying computer science, particularly
> interested in visualization technologies and I have been using QEMU for a
> variety of projects for a while. Two of the courses that I am taking this
> semester really attracted me to the libvirt community  are Advanced
> Operating Systems and Secure Software Development. I have been learning
> kernel fuzzing as well as other general fuzzing tools.
>
> Then I found the topic of "QEMU command line generator XML fuzzing" is
> pretty interesting and totally in line with my interest and background.
> Though I have read through the documentations on the website, just to make
> sure I am doing it correctly, could anyone confirm this project is still
> available? And what I need to do next in order to participate the project
> this summer? Do I need to find a mentor by myself? Potentially, I could
> find my OS or Security professor as my mentor, but I am not sure yet which
> would be the best way.

Yes, the project is still on. It does not have a mentor assigned yet,
but don't worry about that now - there is a lot of mentors around. For
now, I can be your point of contact.

So, just to explain you some details of the project: libvirt's format
for storing domain configuration is XML. However, none of the
hypervisors out there uses XML to describe domain configuration. For
instance, in qemu it's all about the command line. You want this disk
for you domain? You have to put it onto the command line. And so on.
Therefore, in a very simplistic way, for qemu libvirt translates the XML
into qemu command line language. Now, this process is very complex and
sort of tricky. That's why we would like to generate "all" possible
combinations of XML, let the command line generator crunch them and
produce qemu command line. Well, that's not entirely true, because
command line generator works over some internal representation of domain
(not XML) that is produced by our XML parser:

Please correct me if I am wrong about  my following understanding:
1. Regarding XML config file, one typical usage with libvirt could be:
    $ virsh define <domain_config_file.xml>
2. I noticed in the source code of libvirt, there exist several files in close relation 
to xml, including src/util/virxml.{c,h}, which might be the target of this project?
3. And libvirt also is compiled with libxml2. 
4. Then in virt-xml-validate, which is a bash script, 
  (in build/bin directory after make install) calling xmllint.

I have not been able to get round to figure out the relations of the above pieces yet. 
I spent some time to try to instrument and compile the executables with AFL, but so
 far with no luck. (The idea is as simple as changing gcc in Makefile/configure to afl-gcc). 
The attached figure is just a demo showing using AFL to fuzz virt-admin, which is 
not instrumented, (so kinda of boring and not quite useful). But I think AFL could be
 one of the candidate as a fuzzer for this project due its prevalence and 
proved effectiveness. 

Regarding fuzzing, I think we can try several fuzzing tools to run in parallel, as different
 fuzzers tend to find different kinds of bugs. Thus, AFL (American Fuzz Lop) [1], 
which is a coverage-guided mutation-based fuzzer with genetic algorithm, can 
take hand-crafted xml seed to fuzz our libvert target. Alternatively, we could 
develop generation-based grammar module in AFL (which is definitely non-trivial);
so far I have not seen active development in AFL community on xml format 
grammar generation. Another option could be clang-libfuzzer [2].

Several related articles show examples of fuzzing are using AFL to generate 
SQL [3], llvm-afl [4], and hexml fuzzing with AFL [5]. In combination with lcov, we
 could compare different fuzzers and guide our fuzzing tuning.

NOTE  the [5] example is quite interesting; it is fuzzing a haskell-written xml paser.

I will probably not update more until next week; I am having three mid-terms this week.

[1] http://lcamtuf.coredump.cx/afl/
[2] http://llvm.org/docs/LibFuzzer.html
[3] https://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html
[4] http://lists.llvm.org/pipermail/llvm-dev/2014-December/079390.html
[5] https://github.com/ndmitchell/hexml/issues/6

Again, thanks a lot. Any guidance, comments, or suggestions would be more than 
welcome and highly appreciated.

Best,


Dan

  XML document -> XML parser -> QEMU cmd line generator -> QEMU cmd line

There is plenty of fuzzing libraries available on the market, so I guess
one of the first steps would be to explore our options and pick one that
suits our needs. Do you have experience with any of them? Frankly, I
have very little.

Regarding the GSoC process, each organization makes their own rules for
accepting students. Here at libvirt the rules are described here:

  http://wiki.libvirt.org/page/Google_Summer_of_Code_FAQ

Please let me know what are your thoughts on all of this, and also don't
hesitate to ask anything.

Michal