Blog

Introduction

The last few years have seen the proliferation of coverage-guided fuzzers: that is, fuzzers able to get information about the internal state of the program being tested and using that information to generate better test cases. One thing you learn is that coverage information might also work the other way around: looking at a meaningful visualization of code coverage will help you build better drivers for your fuzzers. Moreover, you can understand what parts of the program under test need more attention (e.g. to know what parts to audit next).

I am quite a fan of Sublime Text, and I wanted to look at code coverage in the same environment I usually look at other code. That is the reason why I wrote a simple plugin to overlay the information generated through source based Clang Coverage in Sublime Text. We will briefly look at how to generate coverage reports, explain why coverage information is useful while fuzzing and finally we will take a look at how the plugin works.

Source Based Clang Coverage

Clang ships with different types of instrumentation which can be used to compute code coverage: gcov, source based and Sanitizer Coverage. While the latter is the one used as a feedback during fuzzing, we will be looking at source based coverage here, as it is more precise and also the suggested tool to inspect code coverage.

As the Clang documentation covers all the information required to use source base code coverage, I will not be spending too much time on that. The main thing we need to know is the process we will be following to generate actionable code coverage information. The following assumes you have a working installation of LLVM as it makes use of clang, llvm-cov and llvm-profdata.

  1. Compile and link your program with coverage instrumentation:

    clang -o program{,.c} -fprofile-instr-generate -fcoverage-mapping
    

    I really like the fact that the coverage mapping is embedded into the LLVM IR and ends up in the executable itself, without requiring other ancillary files.

  2. Run the instrumented program.

    ./program --your-options
    

    By default, the coverage information is generated in the file default.profraw. You can specify a different path through the environment variable LLVM_PROFILE_FILE.

  3. Index the .profraw file using llvm-profdata:

    llvm-profdata merge -sparse *.profraw -o default.profdata
    
  4. Export the coverage in JSON format using llvm-cov:

    llvm-cov export ./program -instr-profile=default.profdata > coverage_report.json
    

    llvm-cov also allows to generate HTML reports which can be displayed in the browser. Another random note: you can exclude certain files from the report through the -ignore-filename-regex command line option. This is particularly useful if you have a project with a gazillion of files but you are only interested in a few of them.

That’s it! Now we can use the information in the JSON file to look at our coverage.


Disclaimer: None of the information above is rocket science or particularly novel: indeed, most of it is documented in the Clang and/or libFuzzer documentation or in the libFuzzer tutorial provided by Google. Head over to those pages if you want to get more information.


Visualizing Fuzzing Coverage

While all the information above applies to any kind of testing, the use case I built this plugin for focuses on fuzzing. That is because I often end up writing drivers to fuzz different libraries and it is extremely useful to me to be able to quickly visualize what code is being exercised in the library. Indeed, while most fuzzers provide a huge amount of statistics, I have found that simply looking at what code inside the program is being covered can provide immense benefits.

First of all, it can help you spot bugs in your own fuzzing driver. Yes, drivers can have bugs too, and they can make you waste hundreds of testing hours. For example, consider the situation where your code is not handling all the different return codes that some APIs might have. In this case, you might not realize that you will never be able to reach the more interesting parts of the testing process because you are always bailing out on non-existent errors. If this example looks overly specific, it is because it actually happened to me in the recent past.

Moreover, looking at the code coverage helps you learn more about the internals of the code that you are testing. This is especially true for larger libraries. I often write a fuzzing driver by looking at the usual test cases provided with the library that I want to test, or starting from examples that I can find online. However, the same API might have a lot of different nuances which make it possible to reach certain parts of the code only through very specific interactions (e.g. parameters, configurations, setup of stateful objects, etc.). This can be easily spotted by inspecting the exercised code and looking at where the execution diverges, or by identifying which parts of the library are never being executed.

I know, most of this information is quite obvious to the people who take fuzzing seriously. However, I thought it was useful to point it out as I don’t really see it being mentioned as often as it should.

So, without further ado, by applying the previously described steps to my use case in fuzz testing, this is how I end up generating code coverage reports. Here I assume that you have an up-to-date copy of clang (i.e. one that supports -fsanitize=fuzzer) and that you are, indeed, fuzzing with libFuzzer. With different configurations, YMMV.

  1. Compile your fuzz driver and perform your fuzzing campaign:

    clang -o fuzz{,.c} -fsanitize=fuzzer,address
    ./fuzz corpus [options]
    
  2. Now build a copy of your code instrumented to generate coverage information

    clang -o profiled fuzz.c -fsanitize=fuzzer -fprofile-instr-generate -fcoverage-mapping
    
  3. Now perform a single sweep through your current corpus:

    ./fuzz corpus -runs=1
    

    This provides more informative result once your corpus has been minimized.

  4. Now generate the coverage report:

    llvm-profdata merge -sparse *.profraw -o default.profdata
    llvm-cov export ./fuzz -instr-profile=default.profdata > coverage_report.json
    llvm-cov show ./fuzz -instr-profile=default.profdata -format=html > coverage-report.html
    

From here, you can take a look at the generated reports in the way you prefer.

The plugin

I had never written a plugin for Sublime Text before, so I took this chance to learn a bit about their APIs, which allow to write plugins in Python. As I said on Twitter, I think the API is quite good and fairly easy to work with. Considering that I am nowhere close to being an expert user, I am quite satisfied of the result I got so far.

The main command of the plugin itself is a subclass of the sublime_plugin.TextCommand class: these are typical commands that act on the sublime.View containing your code. Once the command is invoked, it requires the user to input a path to a JSON file generated as explained in the previous sections. As of now, the plugin only supports version “2.0.0” of the JSON export format. The structure of the generated JSON is described in the LLVM source.

After retrieving the file, it will parse the JSON and walk through the files included in the coverage report to find a path matching the currently displayed file. When a match is found, a mapping is built to compute line counts. This is achieved by processing the “segments” included in the coverage report, which are a direct representation of the CoverageSegment used within LLVM.

The information constructed with the mapping is then used to highlight the uncovered regions of the code (they might even be only small parts of certain lines) and to display the count for each line, i.e. how many times that line has been executed during our test run. The colors used are picked from some scopes that are available by default in every color scheme. The line counts are shown inline through sublime.Phantom objects.

Nothing fancy, both the idea and the implementation are pretty simple. The code is available on Github. It is obviously far from perfect but it does the job. As usual, the code is there if you want to use it, but don’t expect any kind of support 😜

Conclusion

I started writing this post with the idea to quickly talk about the Sublime Text plugin I wrote in the last couple of days. However, I decided along the road to take the chance to talk a bit more about Clang Coverage, and coverage information is much more than just a fitness function driving a genetic algorithm. Indeed, even if that sounds fancy enough, factoring in what can be understood by a human looking at it is even better, if you ask me.

Feel free to let me know on Twitter if the plugin is useful to you, how you fuzz things and how wrong I am about all of the above 😊