On Tue, Mar 7, 2017 at 4:08 AM, Michal Privoznik <mprivozn(a)redhat.com>
wrote:
On 03/07/2017 06:27 AM, D L wrote:
> On Sun, Mar 5, 2017 at 2:47 AM, Michal Privoznik <mprivozn(a)redhat.com>
> wrote:
>
>> On 04.03.2017 07:23, Da L wrote:
>>> Dear all,
>>>
>>
>> Hey,
>>
>>> This is my first post in the list.
>>
>> Very well. Welcome. It is always nice to see people interested in
libvirt.
>>
>> Hi Michal,
>
> Thank you very much for the explanation and encouragement.
> I am so glad to join the community.
>
>
>>>
>>> I am currently a graduate student studying computer science,
particularly
>>> interested in visualization technologies and I have been using QEMU
for a
>>> variety of projects for a while. Two of the courses that I am taking
this
>>> semester really attracted me to the libvirt community are Advanced
>>> Operating Systems and Secure Software Development. I have been learning
>>> kernel fuzzing as well as other general fuzzing tools.
>>>
>>> Then I found the topic of "QEMU command line generator XML
fuzzing" is
>>> pretty interesting and totally in line with my interest and background.
>>> Though I have read through the documentations on the website, just to
>> make
>>> sure I am doing it correctly, could anyone confirm this project is
still
>>> available? And what I need to do next in order to participate the
project
>>> this summer? Do I need to find a mentor by myself? Potentially, I could
>>> find my OS or Security professor as my mentor, but I am not sure yet
>> which
>>> would be the best way.
>>
>> Yes, the project is still on. It does not have a mentor assigned yet,
>> but don't worry about that now - there is a lot of mentors around. For
>> now, I can be your point of contact.
>>
>> So, just to explain you some details of the project: libvirt's format
>> for storing domain configuration is XML. However, none of the
>> hypervisors out there uses XML to describe domain configuration. For
>> instance, in qemu it's all about the command line. You want this disk
>> for you domain? You have to put it onto the command line. And so on.
>> Therefore, in a very simplistic way, for qemu libvirt translates the XML
>> into qemu command line language. Now, this process is very complex and
>> sort of tricky. That's why we would like to generate "all"
possible
>> combinations of XML, let the command line generator crunch them and
>> produce qemu command line. Well, that's not entirely true, because
>> command line generator works over some internal representation of domain
>> (not XML) that is produced by our XML parser:
>>
> Please correct me if I am wrong about my following understanding:
> 1. Regarding XML config file, one typical usage with libvirt could be:
> $ virsh define <domain_config_file.xml <
http://your_xml_config_file.
ml>>
The file has to be stored locally. Libvirt doesn't have an
'url-grabber'. In fact, our APIs expect XML document passed as string
(not a filename where it is stored). It's just virsh that allows users
to point it to a file which is read and passed to the define API.
Oh, my bad. The typing was somehow translated into an url in my browser.
But it is an interesting idea to have config files requested via http.
> 2. I noticed in the source code of libvirt, there exist several files in
> close relation
> to xml, including src/util/virxml.{c,h}, which might be the target of
this
> project?
Sort of. virxml.c file contains XML parsing helpers (mostly higher-level
APIs over libxml2). The XML parsing is done in src/conf/domain_conf.c
(or network_conf.c for libvirt networks, etc.). The entry point for
exploring domain XML parsing can be virDomainDefParseString() function.
BTW: while exploring libvirt sources I strongly advice to use so called
tagged sources ("make tags" or "ctags -R ." or some equivalent),
because
libvirt sources consists of lots of short functions calling other
functions. Tagged sources then allow developers to jump onto symbol
under cursor (in vim it is "CTRL + ]" or "g + ]" if the symbol is
defined at multiple locations).
I took a deeper look at the domain_conf.c and network_conf.c. It is just so
amazing
to see a single file having 26 K lines of code. I first thought it
must
be generated automatically, then I found there are ~1640 commit for that
single file over 8 years.
Yes, ctags is very very helpful!
Now that we have parsed the domain XML into internal representation
(virDomainDef), we can look into qemu command line generation. I think
the whole process is best visible in qemuDomainCreateXML() (e.g. "vim -t
qemuDomainCreateXML" ;-)). This is qemu driver implementation of public
API virDomainCreateXML(). It allows users to create so called transient
domains. Long story short: "here, I have domain XML, start it up for me,
will you?". Therefore at the beginning the domain XML is parsed (using
the function described above), several not-important-right-now functions
are called and then qemuProcessStart() is called which calls
qemuProcessLaunch() which calls qemuBuildCommandLine(). Finally, this is
the function that takes the virDomainDef (among other arguments) and
produces yet another internal representation of qemu command line
(virCommandPtr). This command line is then executed later in the process.
Here I traced through the invocations starting from qemuDomainCreateXML.
Indeed,
eventually, it returned a _virCommand struct with some process
information, like file descriptors, pid, uid, gid etc. And for different
purposes,
it is being passed as an argument in about 200 places, such as
in ./src/qemu/qemu_command.c, there are
qemuBuildMasterKeyCommandLine(), and
qemuBuildNVRAMCommandLine(),
in /src/util/vircommand.c: there are
virCommandSetWorkingDirectory(), and
virCommandProcessIO()
in /src/rpc/virnetsocket.c, there is
virNetSocketNewConnectCommand().
in /src/storage/storage_util.c, there is
storageBackendCreateQemuImgSetOptions(). etc
3. And libvirt also is compiled with libxml2.
Yes. This has strong historical background (hint: look who started
libvirt and who wrote libxml2 ;-)). Frankly, I don't think we've ever
considered a different xml parsing library.
Oh yeah, just for curiosity, I git cloned libxml2, and find the name of
Daniel
Veillard, then found out more stories. Really amazing work.
Maybe I should not ask, (but since I know nothing yet, it won't hurt), have
we
ever considered another format alternative to xml? Json, for example, since
xml is kind of hard to parse.
By the way, when I save a VM's state with "virsh save ID FILE_NAME", it
generated
a huge XML file (500 M to 16G) then I found out virDomainSnapshotCreateXML()
is called when executing that command (
https://libvirt.org/formatsnapshot.html).
When restoring the state, it is calling virDomainRevertToSnapshot(),
virDomainSnapshotGetXMLDesc etc. For the XML fuzzing project, do we need
to consider those situations?
> 4. Then in virt-xml-validate, which is a bash script,
> (in build/bin directory after make install) calling xmllint.
Yeah. Writing our XMLs by hand can be overwhelming. Moreover, libvirt
has this philosophy of ignoring unknown elements/attributes. So it might
happen that for instance you have a typo in an element name and you're
still wondering why libvirt ignores that particular setting (e.g. path
to disk of domain). Therefore we have grammar rules (RNG) that could
help you here - virt-xml-validate would error out in this example. Well,
even virsh errors our now because it instructs libvirt to do the XML
validation before parsing. But that hasn't been always the case.
>
> I have not been able to get round to figure out the relations of the
above
> pieces yet.
> I spent some time to try to instrument and compile the executables with
> AFL, but so
> far with no luck. (The idea is as simple as changing gcc in
> Makefile/configure to afl-gcc).
> The attached figure is just a demo showing using AFL to fuzz virt-admin,
> which is
> not instrumented, (so kinda of boring and not quite useful). But I think
> AFL could be
> one of the candidate as a fuzzer for this project due its prevalence and
> proved effectiveness.
We don't have to limit ourselves just for domain XML -> qemu cmd line
fuzzing. We can look into other areas too (there's a lot of inputs for
libvirt), e.g. RPC protocol (we have our own protocol for communication
with distant server/client over network), fuzz XML parsers themselves
(domain is not the only object that libvirt manages, we have networks,
interfaces, storage pools/volumes, etc.). It's just that qemu cmd line
fuzzing seemed complicated enough so that the chances of running a
fuzzer successfully are high.
All right. I think that's definitely a good idea. I will start looking
into
this
tomorrow and resume the fuzzing experiment that I left.
Thank you very much for the detailed explanation. I am having a
much better understanding about the scope and how I would plan to
confine/manage the timeline of the project.
Dan
>
> Regarding fuzzing, I think we can try several fuzzing tools to run in
> parallel, as different
> fuzzers tend to find different kinds of bugs.
True. I had this on my mind as well.
> Thus, AFL (American Fuzz
> Lop) [1],
> which is a coverage-guided mutation-based fuzzer with genetic algorithm,
> can
> take hand-crafted xml seed to fuzz our libvert target. Alternatively, we
> could
> develop generation-based grammar module in AFL (which is definitely
> non-trivial);
Yeah, I thought about this when watching a talk on AFL. We might explore
other possibilities - they already might have something we want.
> so far I have not seen active development in AFL community on xml format
> grammar generation. Another option could be clang-libfuzzer [2].
>
> Several related articles show examples of fuzzing are using AFL to
generate
> SQL [3], llvm-afl [4], and hexml fuzzing with AFL [5]. In combination
with
> lcov, we
> could compare different fuzzers and guide our fuzzing tuning.
Yes, good idea.
>
> NOTE the [5] example is quite interesting; it is fuzzing a
haskell-written
> xml paser.
Indeed.
>
> I will probably not update more until next week; I am having three
> mid-terms this week.
Good luck.
>
> [1]
http://lcamtuf.coredump.cx/afl/
> [2]
http://llvm.org/docs/LibFuzzer.html
> [3]
>
https://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-
grammar-with.html
> [4]
http://lists.llvm.org/pipermail/llvm-dev/2014-December/079390.html
> [5]
https://github.com/ndmitchell/hexml/issues/6
>
> Again, thanks a lot. Any guidance, comments, or suggestions would be more
> than
> welcome and highly appreciated.
>
> Best,
>
>
> Dan
Michal