On Thu, Mar 16, 2017 at 1:03 PM, Michal Privoznik <mprivozn(a)redhat.com>
wrote:
On 03/16/2017 09:08 AM, D L wrote:
> On Tue, Mar 7, 2017 at 4:08 AM, Michal Privoznik <mprivozn(a)redhat.com>
> wrote:
>
>
The file has to be stored locally. Libvirt doesn't have an
>> 'url-grabber'. In fact, our APIs expect XML document passed as string
>> (not a filename where it is stored). It's just virsh that allows users
>> to point it to a file which is read and passed to the define API.
>>
>> Oh, my bad. The typing was somehow translated into an url in my browser.
> But it is an interesting idea to have config files requested via http.
>
I'm not so sure. This looks like something for tools working on the top of
libvirt, e.g. virt-install. If we were to have everything in libvirt, it
would become unmanageable.
>
>> I took a deeper look at the domain_conf.c and network_conf.c. It is just
> so
> amazing to see a single file having 26 K lines of code. I first thought it
> must
> be generated automatically, then I found there are ~1640 commit for that
> single file over 8 years.
> Yes, ctags is very very helpful!
>
Yeah, it is our biggest file. The next one is src/qemu/qemu_driver.c. But
ctags are useful not because of big files, but because our code is
scattered into a lot of files. But that's not important now.
>
>
> Now that we have parsed the domain XML into internal representation
>> (virDomainDef), we can look into qemu command line generation. I think
>> the whole process is best visible in qemuDomainCreateXML() (e.g. "vim -t
>> qemuDomainCreateXML" ;-)). This is qemu driver implementation of public
>> API virDomainCreateXML(). It allows users to create so called transient
>> domains. Long story short: "here, I have domain XML, start it up for me,
>> will you?". Therefore at the beginning the domain XML is parsed (using
>> the function described above), several not-important-right-now functions
>> are called and then qemuProcessStart() is called which calls
>> qemuProcessLaunch() which calls qemuBuildCommandLine(). Finally, this is
>> the function that takes the virDomainDef (among other arguments) and
>> produces yet another internal representation of qemu command line
>> (virCommandPtr). This command line is then executed later in the process.
>>
>> Here I traced through the invocations starting from qemuDomainCreateXML.
>>
> Indeed, eventually, it returned a _virCommand struct with some process
> information, like file descriptors, pid, uid, gid etc. And for different
> purposes,
> it is being passed as an argument in about 200 places, such as
> in ./src/qemu/qemu_command.c, there are
> qemuBuildMasterKeyCommandLine(), and
> qemuBuildNVRAMCommandLine(),
>
The virCommand type is for generic command execution. Not just qemu. For
instance, when creating new storage volumes, libvirt spawns qemu-img tool.
That's why you can find some virCommand occurrences all over the place
(e.g. in storage_util.c).
Moreover, some functions take existing virCommand object and just add
something to it - just like qemuBuildMasterKeyCommandLine() is doing.
This way, the build process of command line is split into many functions.
The main reason to do so is better maintainability. Just like you'd use
functions in regular code to semantically divide code into parts.
What's important here is qemuBuildCommandLine() which takes domain
definition (well, pointer to it), and constructs correspoding qemu command
line (represented as a pointer to virCommand which is returned) by calling
several functions - each one constructing some part of the command line.
Now the GSoC idea could be to test this qemuBuildCommandLine() function. A
fuzzer would create the virDomainDef object, we would run
qemuBuildCommandLine() over it and see if it crashed or not, whether a sane
output was generated or not.
Then to take this one level up, virDomainDef is produced by
virDomainDefParse() which takes a string (read XML document) and parses it.
At this point, the fuzzer does not need to care about virDomainDef at all,
it can just create all possible XML documents and call virDomainDefParse()
over them, and then qemuBuildCommandLine() over the result of parser.
Therefore I think this is what we should aim at.
in /src/util/vircommand.c: there are
> virCommandSetWorkingDirectory(), and
> virCommandProcessIO()
> in /src/rpc/virnetsocket.c, there is
> virNetSocketNewConnectCommand().
> in /src/storage/storage_util.c, there is
> storageBackendCreateQemuImgSetOptions(). etc
>
> 3. And libvirt also is compiled with libxml2.
>>
>> Yes. This has strong historical background (hint: look who started
>> libvirt and who wrote libxml2 ;-)). Frankly, I don't think we've ever
>> considered a different xml parsing library.
>>
>> Oh yeah, just for curiosity, I git cloned libxml2, and find the name of
> Daniel
> Veillard, then found out more stories. Really amazing work.
>
Yeah. Daniel wrote libxml2 and then started libvirt. So choosing XML was a
nobrainer :-).
Maybe I should not ask, (but since I know nothing yet, it won't hurt), have
> we
> ever considered another format alternative to xml? Json, for example,
> since
> xml is kind of hard to parse.
>
It's not any harder than JSON. There's 1:1 mapping between JSON and XML.
Anything that can be expressed in one format can be expressed in the other
too. And we cannot really switch formats because we try to stay backward
compatible. Meaning, if you write a program that co-operates with libvirt,
any subsequent update of libvirt should not break your application.
Therefore, if your application knows how to create/parse XML documents
(because that's what you need currently if you talk to libvirt), we cannot
switch to JSON, because your application would stop working with XML parser
error.
Just a side note - there are plenty projects on the top of libvirt which
create/parse the XML for you so you don't even have to touch XML yourself.
You don't even know that there's XML behind the scenes. One of such
projects is libvirt-gobject, for instance. So XML is not an issue here.
> By the way, when I save a VM's state with "virsh save ID FILE_NAME",
it
> generated
> a huge XML file (500 M to 16G)
>
Yes, becuase the file does not only contain the domain XML, but also
insternal state of the domain (=guest memory + qemu memory). Therefore, if
you restore from it, you will get the very same state as when doing the
save.
then I found out virDomainSnapshotCreateXML()
> is called when executing that command (
>
https://libvirt.org/formatsnapshot.html).
> When restoring the state, it is calling virDomainRevertToSnapshot(),
> virDomainSnapshotGetXMLDesc etc.
>
Not really. 'virsh save' calls virDomainSave() which calls
qemuDomainSave() (because conn->driver points to qemuHypervisorDriver).
qemuDomainSave() -> qemuDomainSaveFlags() -> qemuDomainSaveInternal() ->
qemuDomainSaveMemory() where basically all the interesting work takes place.
For the XML fuzzing project, do we need
> to consider those situations?
>
I think the most important is to have XML -> qemu cmd line fuzzing in
place and only after that focus on extending that to what I'm describing
below in previous e-mails.
Hi Michal,
I have been digesting your comments. Then I switched concentration from
general
instrumentation and fuzzing to qemuBuildCommandLine(). I have been having
difficulties of resolving the dependencies/shared objects in order to fuzz
a particular
function. Then I came to a conclusion, I would imagine, but have not
started yet,
to target specific functions, some helper functions need to be in place to
be
responsible of the callbacks, and it seems hand-crafted instrumentation is
also
necessary. This might be one of the cases where programming is necessary for
this project.
Given the slow progress, or maybe I started later than an ideal situation,
I am a
bit worried if I could finish the requirement before the submission
deadline, not
to mention other libvirt community-specific requirement mentioned on the
website.
So to make sure I am on the right track, what are the concrete goals to
achieve,
specific requirement to meet, or procedures for me to follow in order to
submit
the application by the deadline?
Thanks,
Dan
>
>> 4. Then in virt-xml-validate, which is a bash script,
>>> (in build/bin directory after make install) calling xmllint.
>>>
>>
>> Yeah. Writing our XMLs by hand can be overwhelming. Moreover, libvirt
>> has this philosophy of ignoring unknown elements/attributes. So it might
>> happen that for instance you have a typo in an element name and you're
>> still wondering why libvirt ignores that particular setting (e.g. path
>> to disk of domain). Therefore we have grammar rules (RNG) that could
>> help you here - virt-xml-validate would error out in this example. Well,
>> even virsh errors our now because it instructs libvirt to do the XML
>> validation before parsing. But that hasn't been always the case.
>>
>>
>>> I have not been able to get round to figure out the relations of the
>>>
>> above
>>
>>> pieces yet.
>>> I spent some time to try to instrument and compile the executables with
>>> AFL, but so
>>> far with no luck. (The idea is as simple as changing gcc in
>>> Makefile/configure to afl-gcc).
>>> The attached figure is just a demo showing using AFL to fuzz virt-admin,
>>> which is
>>> not instrumented, (so kinda of boring and not quite useful). But I think
>>> AFL could be
>>> one of the candidate as a fuzzer for this project due its prevalence
>>> and
>>> proved effectiveness.
>>>
>>
>> We don't have to limit ourselves just for domain XML -> qemu cmd line
>> fuzzing. We can look into other areas too (there's a lot of inputs for
>> libvirt), e.g. RPC protocol (we have our own protocol for communication
>> with distant server/client over network), fuzz XML parsers themselves
>> (domain is not the only object that libvirt manages, we have networks,
>> interfaces, storage pools/volumes, etc.). It's just that qemu cmd line
>> fuzzing seemed complicated enough so that the chances of running a
>> fuzzer successfully are high.
>>
>> All right. I think that's definitely a good idea. I will start looking
>>
> into this
> tomorrow and resume the fuzzing experiment that I left.
> Thank you very much for the detailed explanation. I am having a
> much better understanding about the scope and how I would plan to
> confine/manage the timeline of the project.
>
Cheers.
Michal