SoB: Week 5

Overview

Last week, I set out to resolve holdups in the existing tests and investigate potential bugs, and I’m happy to report that I reached solid conclusions on most of them. In addition, I addressed Matt’s feedback on the branches I had already pushed upstream and worked on a couple of new targets.

Goals accomplished

Fix the handle_peer_error_or_warning() target: I finally had a chance to look at this target, and contrary to what I said last week, the exiting behavior of the target function isn’t inappropriate—it’s intended to exit from the subdaemon and return control to the master daemon, lightningd. I found a way to work around this behavior by unlinking the object containing the error function and redefining it in the target file to jump back to a return point using setjmp.h. The PR for this target is now ready to be merged and can be accessed at PR #8304.
Investigated the potential bug in initial_channel_tx(): I revisited this issue but unfortunately still couldn’t determine whether it’s an actual bug or a problem with the target itself. I meant to ask Matt for help, but wasn’t able to this week due to other commitments. I’ll make sure to reach out to him next week.
Fixed the bug in route calculation: Last week, we confirmed that this was an actual bug that could be triggered from external sources. This week, the focus was on investigating the severity of the bug—specifically, what would happen if the project were compiled in a production setting without UBSan. We concluded that, while the bug may cause a couple of routes to be excluded from the route calculation process, it doesn’t result in anything catastrophic like a crash. The worst-case scenario is some inconvenience.

I’ve worked on a rudimentary fix and a test case to demonstrate the issue, both of which can be found in PR #8357.
Fixed the bug in tlv_span(): While addressing Matt’s feedback on a new fuzz target for tlv_span() last week, I encountered a breakage that appeared to be a genuine bug—and Matt confirmed it to be one. The next steps were to create a test to demonstrate the issue and implement a fix. The fix was straightforward (Matt pointed it out), and the test case was simple as well, so both were completed within the week. The changes are available in PR #8364.

However, the fuzzing work for this isn’t fully complete yet. I still need to address a couple of points from Matt’s feedback. Since the feedback came in late, I wasn’t able to handle it this week, but I’ll make sure to wrap it up in the next.

Addressed received feedback: I received a lot of feedback on my previous PRs this week, like the fuzz target for calculate_our_funding(), and a significant portion of my time was spent addressing that feedback. A couple of those PRs have already received an ACK, but there’s still some work left to be done on the others. I’ll aim to finish the remaining work by the end of next week and hopefully secure ACKs for those as well.
Added a target for parse_onionmsg: Last week, Matt suggested creating a fuzz target for the handler of onion messages. Since there are a couple of handlers operating at different levels, I chose to fuzz onion_message_parse(), as it’s the function responsible for parsing the untrusted input. The work was straightforward and didn’t take much time. I plan to push this change in the near future.
Added targets for open_channel and open_channel2 handlers: I added new fuzz targets for the handlers of open_channel and open_channel2 messages, as defined in BOLT #2. Once the first was done, the second was fairly straightforward since both handlers work with similar input types and structures. I plan to push these upstream for review soon.
Fixed fuzz-close_tx: This fix was a byproduct of the feedback on the fuzz target for calculate_our_funding(), where the maximum value for a satoshi was overflowing due to being stored in a u32. Since I had copied the max value from this target, I needed to fix the overflow here as well. The work is completed, and I’ll push the change in the next week.
Added wire targets for tlv_invoice_request and tlv_offer: While reviewing bolt12-wiregen.h, I noticed a few message types that had wire operations defined but weren’t being fuzzed in the current setup. I decided to add fuzz targets for a couple of them to see if anything would break, but nothing did. Since these messages all rely on the same underlying operations, I felt it would be redundant to add targets for the rest. That said, I still plan to push the targets I’ve already worked on in the near future.

Next week’s goals

I had to put my strategy of incrementally reviewing existing tests and working toward writing new ones on hold last week, as many of the tests I was working on were blocked by various errors. I believe I’ve now resolved most of those issues, so it should be fine to resume work on adding new fuzz targets. With this being said, here are some of my immediate goals for the next week:

Fix the tlv_unknown target: Toward the end of last week, I began working on a fuzz target for the tlv_unknown message type, drawing inspiration from common/test/run-tlv_unknown. The target ran into some issues, and since I was at the end of the week, I didn’t have time to fix them. I plan to address those problems in the coming week.
Report to Matt about the potential bug in initial_channel_tx(): I believe I’ve exhausted all my ideas for uncovering this bug, so at this point, I’ll need to refer to Matt for help. I’ll put together a write-up summarizing everything I’ve discovered so far and reach out to him—hopefully, we can sort this error out.
Follow up on feedback: I received some additional feedback on my PRs toward the end of the week, which I plan to address in the coming week—hopefully, that will wrap things up for those PRs.
Work a stateful target for gossipd: Among the fuzz target ideas Matt shared with me last week was one for adding a stateful target that mimics handle_recv_gossip(), which is responsible for handling incoming gossip messages from peers. After reading up on the function, I agree that the idea is worth pursuing and have started working on it. So far, I’ve created the test file and completed some initial setup, and I plan to finish the work on this target in the coming week.

Challenges

Toward the end of the week, I realized that I’d been somewhat avoiding the more complex targets over the past few weeks, mainly because of the challenging setup they require. But looking back, I’ve already added and improved a substantial number of fuzz targets during this project—so it would be shortsighted not to take on the tougher ones.

After all, there’s no guarantee that anyone else will pick up fuzzing in CLN once my project concludes. With that in mind, I think it’s worth banging my head and making the extra effort now to tackle the setup hurdles involved in these more complex tests, so I can uncover as many vulnerabilities as possible while I still have the opportunity.

I was a bit disheartened that none of the work I did this week led to the discovery of new bugs, but that’s okay—one thing I’ve come to learn over the past few weeks is that fuzz testing is a game of patience. You have to keep pushing, and eventually, you’ll get results—that’s how I discovered my first bug, after all.

With the experience I’ve gained from working on the simpler targets, I’ve decided to start tackling more complex ones now. I’m hoping that this next phase will help uncover a more substantial vulnerability in CLN.

Till next time,

Chandra.

SoB: Week 4

SoB: Week 6