Document the fuzzers in two ways.
1. Explain the high level working of the fuzzers under docs/kbase.
2. Add README to explain general setup of the fuzzer and its usage.
Signed-off-by: Rayhan Faizel <rayhan.faizel(a)gmail.com>
---
docs/kbase/index.rst | 3 +
docs/kbase/internals/meson.build | 1 +
docs/kbase/internals/xml-fuzzing.rst | 120 ++++++++++++++++++++++++
tests/fuzz/README.rst | 131 +++++++++++++++++++++++++++
4 files changed, 255 insertions(+)
create mode 100644 docs/kbase/internals/xml-fuzzing.rst
create mode 100644 tests/fuzz/README.rst
diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst
index e51b35cbfc..9cf6268800 100644
--- a/docs/kbase/index.rst
+++ b/docs/kbase/index.rst
@@ -116,3 +116,6 @@ Internals
`QEMU monitor event handling <internals/qemu-event-handlers.html>`__
Brief outline how events emitted by qemu on the monitor are handlded.
+
+`XML Fuzzing <internals/xml-fuzzing.html>`__
+ Working of the structure-aware XML fuzzers.
diff --git a/docs/kbase/internals/meson.build b/docs/kbase/internals/meson.build
index f1e9122f8f..86b6639419 100644
--- a/docs/kbase/internals/meson.build
+++ b/docs/kbase/internals/meson.build
@@ -9,6 +9,7 @@ docs_kbase_internals_files = [
'qemu-migration',
'qemu-threads',
'rpc',
+ 'xml-fuzzing',
]
diff --git a/docs/kbase/internals/xml-fuzzing.rst b/docs/kbase/internals/xml-fuzzing.rst
new file mode 100644
index 0000000000..85f565fda5
--- /dev/null
+++ b/docs/kbase/internals/xml-fuzzing.rst
@@ -0,0 +1,120 @@
+===================
+Libvirt XML fuzzing
+===================
+
+XML fuzzing is done using libFuzzer and libprotobuf-mutator. XML fuzzing
+cannot be done with normal fuzzing methods, as XML is a highly structured
+format. Structure-aware fuzzing is implemented using libprotobuf-mutator which
+mutates and fuzzes protobuf inputs. Protobufs are used as an intermediate
+format and serialized to XML.
+
+Protobuf to XML representation
+==============================
+
+A protobuf definition written to fuzz libvirt XML formats may resemble the
+following.
+
+::
+
+ message MainObj {
+ message SomeTagMessage {
+ optional uint32 A_number = 1;
+ optional DummyString A_name = 2;
+
+ enum typeEnum {
+ typeA = 0;
+ typeB = 1;
+ typeC = 2;
+ }
+
+ optional typeEnum A_type = 3;
+
+ message InnerTagMessage {
+ optional uint32 A_number = 1;
+ }
+
+ repeated InnerTagMessage T_innertag = 4;
+
+ message SecondInnerTagMessage {
+ optional uint32 V_value = 1;
+ }
+ optional SecondInnerTagMessage T_secondinner = 5;
+ }
+
+ optional SomeTagMessage T_sometag = 1;
+ }
+
+* Fields starting with ``T_`` represent XML tags. Their types are protobuf messages
+ which may further contain other protobuf-defined XML tags or attributes.
+
+* Fields starting with ``A_`` represent XML attributes. Most of the time,
+ it uses one of the primitive datatypes (Eg: ``uint32``, ``bool``, ``enum``, etc. )
available in protobuf.
+
+ * If the attribute can take multiple data types, it is encapsulated in a ``oneof``
statement.
+ The field name also has a prefix of ``A_OPTXX_`` where ``XX`` is a number between 0
to 99.
+ * If the attribute name contains special characters, the real name is stored in
+ ``libvirt::real_name`` which is extended by ``FieldOptions``.
+ * If an enum value contains special characters, the real value is stored in
+ ``libvirt::real_value`` which is extended by ``EnumValueOptions``.
+
+* Fields starting with ``V_`` represent raw text in XML.
+
+ * If ``T_`` and ``V_`` fields are defined in the same message, ``V_`` fields
+ will be preferred only if it has presence, otherwise it will process the
+ rest of the ``T`` fields as usual.
+ * ``V_`` fields can take on the same datatypes as ``A_`` fields.
+
+* ``repeated`` is used to allow multiple XML tags of the same name.
+
+``A_`` fields must always precede ``V_`` and ``T_`` fields. Likewise, ``V_``
+fields must precede ``T_`` fields if any.
+
+On fuzzing the above protobuf definition, one of the possible protobuf to XML
+serializations could be
+
+::
+
+ <sometag number='1' name='dummy' type='typeB'>
+ <innertag number='2'/>
+ <innertag number='3'/>
+ <secondinner>1241232</secondinner>
+ </sometag>
+
+Custom Protobuf Datatypes
+-------------------------
+
+Sometimes, primitive data types or enums are not enough to encode the
+desired attribute values, especially if they themselves are structured. In this
+case, such fields are represented by a handwritten protobuf message defined in
+``xml_domain_datatypes.proto``. To serialize these messages to XML attribute
+values, custom handlers are defined in ``proto_custom_datatypes.cc``.
+
+This is useful for data types such as IP addresses, MAC addresses, target
+device names, etc.
+
+Protobuf generation
+===================
+
+``proto`` files are automatically generated on compile-time using the script
+``relaxng_to_proto.py``. The script parses relaxng schemas to generate a protobuf
+file containing fields and messages representing all the defined XML tags and
+attributes.
+
+The script tries to figure out the correct datatype of the XML attribute.
+However, on its own it can only figure out the general datatype or enum values
+of the attribute but not the constraints or regex patterns. Some override tables
+are present to improve upon that.
+
+Fuzzer Harnesses
+================
+
+Driver-specific harnesses in general re-use the existing test driver setup
+as well as other existing test utilities under ``tests/``. Harnesses are
+available for the following drivers:
+
+* QEMU XML Domain
+* QEMU XML Hotplug
+* CH XML Domain
+* VMX XML Domain
+* libXL XML Domain
+* NWFilter XML
diff --git a/tests/fuzz/README.rst b/tests/fuzz/README.rst
new file mode 100644
index 0000000000..d92cdc94d7
--- /dev/null
+++ b/tests/fuzz/README.rst
@@ -0,0 +1,131 @@
+=======
+Fuzzing
+=======
+
+The XML fuzzing project was built as part of Google Summer of Code 2024.
+The fuzzing project aims to find edge-case XML configurations that may crash
+libvirt during parsing. The libvirt domain XML format is a highly structured
+grammar so normal methods of fuzzing will not work. We use a combination
+of libFuzzer and libprotobuf-mutator to perform structure-aware fuzzing of
+various libvirt XML formats. The XML is represented through an intermediate
+protobuf that is mutated by libprotobuf-mutator. This protobuf is automatically
+generated by a Python script ``relaxng_to_proto.py`` which parses relaxNG
+schemas.
+
+Currently, we fuzz the following:
+
+* QEMU XML Domain (qemu_xml_domain_fuzz, qemu_xml_domain_fuzz_disk,
qemu_xml_domain_fuzz_interface)
+* QEMU XML Hotplug (qemu_xml_hotplug_fuzz)
+* CH XML Domain (ch_xml_domain_fuzz)
+* VMX XML Domain (vmx_xml_domain_fuzz)
+* LibXL XML Domain (libxl_xml_domain_fuzz)
+* NWFilter XML (xml_nwfilter_fuzz)
+
+libprotobuf-mutator
+===================
+
+libprotobuf-mutator is the crux of our fuzzing methodology that
+allows us to perform grammar-aware fuzzing of the XML format in the first
+place. However, its setup is a bit involved. The general build and install
+instructions can be followed in
+https://github.com/google/libprotobuf-mutator/blob/master/README.md
+but we will have to tweak it depending on the distro. One of the biggest
+problems is that most distros have very outdated versions of protobuf
+which will cause various build and linkage issues with the mutator.
+
+- If you are on a rolling release distro, the system package can likely be
+ used as-is. However, you may need to pass ``-std=c++17`` in ``CXXFLAGS``
+ and ``-Wl,--copy-dt-needed-entries`` in ``LDFLAGS``.\
+- For every other distro with old protobuf installations, you can supply
+ ``-DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON`` during libprotobuf-mutator
+ setup. After this, provide ``-Dexternal_protobuf_dir=<dir>`` to libvirt
+ meson setup pointing to the ``external.protobuf`` directory generated
+ during libprotobuf-mutator compilation.
+- On some distros like Fedora which predominantly use PIC compiled
+ libraries, you may need to pass ``-fPIC`` in ``CFLAGS/CXXFLAGS`` or you
+ will encounter relocation errors during libvirt compilation.
+
+Setup
+=====
+
+::
+
+ env CC=clang CXX=clang++ \
+ meson setup build -Dsystem=true -Ddriver_qemu=enabled -Db_lundef=false \
+ -Db_sanitize=address,undefined -Dfuzz=enabled
-Dexternal_protobuf_dir=<dir>
+
+- This command line will introduce LLVM SanitizerCoverage across all
+ object files.
+- libFuzzer is supported only on clang/clang++.
+- To use an external protobuf dependency, use
+ ``-Dexternal_protobuf_dir=<dir>``. If your system has a new enough protobuf
+ dependency, you can ignore this.
+- ``b_sanitize`` is not compulsory but it does improve the odds of the fuzzer
+ finding interesting test cases. It is recommended to pass
+ ``address,undefined`` to enable both ASAN and UBSan. Note that ASAN will
+ cut your performance by a factor of 2 on average.
+- You can set ``b_sanitize`` to ``thread`` to enable TSAN which is useful for
+ fuzzing race conditions in the ``qemu_xml_hotplug_fuzz`` fuzzer especially.
+
+NOTE: This has only been tested on x86_64 and aarch64 Linux, but should work
+identically on other architectures and possibly even other UNIX based OSes
+(BSD, macOS, etc.).
+
+Usage
+=====
+
+Run ``./tests/fuzz/run_fuzz <fuzzer>``.
+
+If the fuzzer finds a crashing test case, it will dump a separate file in your
+working directory. Run
+``./tests/fuzz/run_fuzz <fuzzer> --testcase <file_name>`` to reproduce the
crash.
+More options to configure the fuzzer can be found with the ``-h`` flag. To save/
+load a corpus, add ``--corpus <corpus_dir>``.
+
+To merge or minimize corpuses, run
+::
+ ./tests/fuzz/run_fuzz <fuzzer> --libfuzzer-options="-merge=1
<dest_corpus> <src_corpus>"
+
+Notable options are listed below.
+
+- ``--arch``: Set architecture of the domain XML to fuzz.
+- ``-j, --jobs``: Run parallel fuzzing workers using either ``jobs`` or
+ ``fork`` based on ``--parallel-mode``. Eg:
+ ``./tests/fuzz/run_fuzz qemu_xml_domain_fuzz -j8 --parallel-mode fork``.
+- ``--dump-xml``: Print all fuzzed XMLs (useful for debugging reproducers)
+- ``--format-xml``: Exercise format function on XML domain fuzzers.
+- ``--corpus``: Save or use corpus on-disk.
+- ``--libfuzzer-options``: Pass additional libFuzzer flags as documented in
+
https://llvm.org/docs/LibFuzzer.html#options.
+
+Coverage Report
+===============
+
+- libvirt supports instrumenting builds with gcov for coverage data collection
+ using ``-Dtest_coverage=true``.
+::
+
+ ./tests/fuzz/run_fuzz <fuzzer> --total_time=<duration>
--corpus=<corpus_dir>
+ ./tests/fuzz/run_fuzz <fuzzer> --corpus=<corpus_dir>
--libfuzzer-options="-runs=0"
+ find -name '*.gcda' -exec llvm-cov gcov {} \; # Run in build directory
+ gcovr --gcov-executable "llvm-cov gcov" --html-details coverage.html -r
<source_directory>
+
+- Alternatively, we can use clang profile coverage instrumentation
+ enabled with ``-Dtest_coverage_clang=true``.
+::
+
+ ./tests/fuzz/run_fuzz <fuzzer> --total_time=<duration>
--corpus=<corpus_dir>
+ ./tests/fuzz/run_fuzz <fuzzer> --corpus=<corpus_dir>
--llvm-profile-file=coverage.profraw
+ llvm-profdata merge coverage.profraw -output coverage.profdata
+ llvm-cov show --instr-profile coverage.profdata <objects> --sources
<sources> --format html > coverage.html
+
+Tips
+====
+
+- libFuzzer will try to pass comparison checks using its internal TORC
+ (Table of Recent Comparisons), but this can get easily overwhelmed in the
+ case of libvirt due to its code being quite complex. You can alleviate
+ this to some extent by passing ``--use-value-profile`` to the fuzzer.
+- If you want the fuzzer to proceed even after encountering a crash,
+ add ``-j<N> --parallel-mode=fork``. Do note that the memory usage will
+ increase exponentially with each parallel fuzzing worker.
--
2.34.1