On Sun, Mar 5, 2017 at 2:47 AM, Michal Privoznik <mprivozn(a)redhat.com>
wrote:
On 04.03.2017 07:23, Da L wrote:
> Dear all,
>
Hey,
> This is my first post in the list.
Very well. Welcome. It is always nice to see people interested in libvirt.
Hi Michal,
Thank you very much for the explanation and encouragement.
I am so glad to join the community.
>
> I am currently a graduate student studying computer science, particularly
> interested in visualization technologies and I have been using QEMU for a
> variety of projects for a while. Two of the courses that I am taking this
> semester really attracted me to the libvirt community are Advanced
> Operating Systems and Secure Software Development. I have been learning
> kernel fuzzing as well as other general fuzzing tools.
>
> Then I found the topic of "QEMU command line generator XML fuzzing" is
> pretty interesting and totally in line with my interest and background.
> Though I have read through the documentations on the website, just to
make
> sure I am doing it correctly, could anyone confirm this project is still
> available? And what I need to do next in order to participate the project
> this summer? Do I need to find a mentor by myself? Potentially, I could
> find my OS or Security professor as my mentor, but I am not sure yet
which
> would be the best way.
Yes, the project is still on. It does not have a mentor assigned yet,
but don't worry about that now - there is a lot of mentors around. For
now, I can be your point of contact.
So, just to explain you some details of the project: libvirt's format
for storing domain configuration is XML. However, none of the
hypervisors out there uses XML to describe domain configuration. For
instance, in qemu it's all about the command line. You want this disk
for you domain? You have to put it onto the command line. And so on.
Therefore, in a very simplistic way, for qemu libvirt translates the XML
into qemu command line language. Now, this process is very complex and
sort of tricky. That's why we would like to generate "all" possible
combinations of XML, let the command line generator crunch them and
produce qemu command line. Well, that's not entirely true, because
command line generator works over some internal representation of domain
(not XML) that is produced by our XML parser:
Please correct me if I am wrong about my following understanding:
1. Regarding XML
config file, one typical usage with libvirt could be:
$ virsh define <domain_config_file.xml <
http://your_xml_config_file.ml>>
2. I noticed in the source code of libvirt, there exist several files in
close relation
to xml, including src/util/virxml.{c,h}, which might be the target of this
project?
3. And libvirt also is compiled with libxml2.
4. Then in virt-xml-validate, which is a bash script,
(in build/bin directory after make install) calling xmllint.
I have not been able to get round to figure out the relations of the above
pieces yet.
I spent some time to try to instrument and compile the executables with
AFL, but so
far with no luck. (The idea is as simple as changing gcc in
Makefile/configure to afl-gcc).
The attached figure is just a demo showing using AFL to fuzz virt-admin,
which is
not instrumented, (so kinda of boring and not quite useful). But I think
AFL could be
one of the candidate as a fuzzer for this project due its prevalence and
proved effectiveness.
Regarding fuzzing, I think we can try several fuzzing tools to run in
parallel, as different
fuzzers tend to find different kinds of bugs. Thus, AFL (American Fuzz
Lop) [1],
which is a coverage-guided mutation-based fuzzer with genetic algorithm,
can
take hand-crafted xml seed to fuzz our libvert target. Alternatively, we
could
develop generation-based grammar module in AFL (which is definitely
non-trivial);
so far I have not seen active development in AFL community on xml format
grammar generation. Another option could be clang-libfuzzer [2].
Several related articles show examples of fuzzing are using AFL to generate
SQL [3], llvm-afl [4], and hexml fuzzing with AFL [5]. In combination with
lcov, we
could compare different fuzzers and guide our fuzzing tuning.
NOTE the [5] example is quite interesting; it is fuzzing a haskell-written
xml paser.
I will probably not update more until next week; I am having three
mid-terms this week.
[1]
http://lcamtuf.coredump.cx/afl/
[2]
http://llvm.org/docs/LibFuzzer.html
[3]
https://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html
[4]
http://lists.llvm.org/pipermail/llvm-dev/2014-December/079390.html
[5]
https://github.com/ndmitchell/hexml/issues/6
Again, thanks a lot. Any guidance, comments, or suggestions would be more
than
welcome and highly appreciated.
Best,
Dan
XML document -> XML parser -> QEMU cmd line generator -> QEMU cmd line
There is plenty of fuzzing libraries available on the market, so I guess
one of the first steps would be to explore our options and pick one that
suits our needs. Do you have experience with any of them? Frankly, I
have very little.
Regarding the GSoC process, each organization makes their own rules for
accepting students. Here at libvirt the rules are described here:
http://wiki.libvirt.org/page/Google_Summer_of_Code_FAQ
Please let me know what are your thoughts on all of this, and also don't
hesitate to ask anything.
Michal