
7 Mar
2017
7 Mar
'17
6:27 a.m.
On Sun, Mar 5, 2017 at 2:47 AM, Michal Privoznik <mprivozn@redhat.com> wrote: > On 04.03.2017 07:23, Da L wrote: > > Dear all, > > > > Hey, > > > This is my first post in the list. > > Very well. Welcome. It is always nice to see people interested in libvirt. > > Hi Michal, Thank you very much for the explanation and encouragement. I am so glad to join the community. > > > > I am currently a graduate student studying computer science, particularly > > interested in visualization technologies and I have been using QEMU for a > > variety of projects for a while. Two of the courses that I am taking this > > semester really attracted me to the libvirt community are Advanced > > Operating Systems and Secure Software Development. I have been learning > > kernel fuzzing as well as other general fuzzing tools. > > > > Then I found the topic of "QEMU command line generator XML fuzzing" is > > pretty interesting and totally in line with my interest and background. > > Though I have read through the documentations on the website, just to > make > > sure I am doing it correctly, could anyone confirm this project is still > > available? And what I need to do next in order to participate the project > > this summer? Do I need to find a mentor by myself? Potentially, I could > > find my OS or Security professor as my mentor, but I am not sure yet > which > > would be the best way. > > Yes, the project is still on. It does not have a mentor assigned yet, > but don't worry about that now - there is a lot of mentors around. For > now, I can be your point of contact. > > So, just to explain you some details of the project: libvirt's format > for storing domain configuration is XML. However, none of the > hypervisors out there uses XML to describe domain configuration. For > instance, in qemu it's all about the command line. You want this disk > for you domain? You have to put it onto the command line. And so on. > Therefore, in a very simplistic way, for qemu libvirt translates the XML > into qemu command line language. Now, this process is very complex and > sort of tricky. That's why we would like to generate "all" possible > combinations of XML, let the command line generator crunch them and > produce qemu command line. Well, that's not entirely true, because > command line generator works over some internal representation of domain > (not XML) that is produced by our XML parser: > > Please correct me if I am wrong about my following understanding: 1. Regarding XML config file, one typical usage with libvirt could be: $ virsh define <domain_config_file.xml <http://your_xml_config_file.ml>> 2. I noticed in the source code of libvirt, there exist several files in close relation to xml, including src/util/virxml.{c,h}, which might be the target of this project? 3. And libvirt also is compiled with libxml2. 4. Then in virt-xml-validate, which is a bash script, (in build/bin directory after make install) calling xmllint. I have not been able to get round to figure out the relations of the above pieces yet. I spent some time to try to instrument and compile the executables with AFL, but so far with no luck. (The idea is as simple as changing gcc in Makefile/configure to afl-gcc). The attached figure is just a demo showing using AFL to fuzz virt-admin, which is not instrumented, (so kinda of boring and not quite useful). But I think AFL could be one of the candidate as a fuzzer for this project due its prevalence and proved effectiveness. Regarding fuzzing, I think we can try several fuzzing tools to run in parallel, as different fuzzers tend to find different kinds of bugs. Thus, AFL (American Fuzz Lop) [1], which is a coverage-guided mutation-based fuzzer with genetic algorithm, can take hand-crafted xml seed to fuzz our libvert target. Alternatively, we could develop generation-based grammar module in AFL (which is definitely non-trivial); so far I have not seen active development in AFL community on xml format grammar generation. Another option could be clang-libfuzzer [2]. Several related articles show examples of fuzzing are using AFL to generate SQL [3], llvm-afl [4], and hexml fuzzing with AFL [5]. In combination with lcov, we could compare different fuzzers and guide our fuzzing tuning. NOTE the [5] example is quite interesting; it is fuzzing a haskell-written xml paser. I will probably not update more until next week; I am having three mid-terms this week. [1] http://lcamtuf.coredump.cx/afl/ [2] http://llvm.org/docs/LibFuzzer.html [3] https://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html [4] http://lists.llvm.org/pipermail/llvm-dev/2014-December/079390.html [5] https://github.com/ndmitchell/hexml/issues/6 Again, thanks a lot. Any guidance, comments, or suggestions would be more than welcome and highly appreciated. Best, Dan XML document -> XML parser -> QEMU cmd line generator -> QEMU cmd line > > There is plenty of fuzzing libraries available on the market, so I guess > one of the first steps would be to explore our options and pick one that > suits our needs. Do you have experience with any of them? Frankly, I > have very little. > > Regarding the GSoC process, each organization makes their own rules for > accepting students. Here at libvirt the rules are described here: > > http://wiki.libvirt.org/page/Google_Summer_of_Code_FAQ > > Please let me know what are your thoughts on all of this, and also don't > hesitate to ask anything. > > Michal > >