SoB: Week 4

Overview

Last week, I set out to resolve the errors in the existing tests, and I’m happy to report that I reached solid conclusions on most of them. In addition, I completed the review of all remaining non-wire fuzzers and addressed Matt’s feedback on the branches I had already pushed upstream. All in all, I believe this was one of my more productive weeks in the project so far.

Goals accomplished

Discussed the handle_peer_error_or_warning() target: I had planned to discuss this issue with Matt before deciding whether to continue working on this target, and through our discussion, I came to understand that the target’s behavior when parsing an error message is indeed inappropriate—contrary to what I initially thought. Since the meeting took place toward the end of the week, I didn’t get a chance to review the target again, but I plan to do so in the upcoming week.
Reviewed the ping_pong and peer_init tests: Aside from the peer_error and peer_warning message handlers, the other BOLT #1 handler tests that were failing were for the ping-pong and init messages. I had to scale down the scope of testing in the ping-pong handler test to get it working, but in the end, I was able to get both of these handlers functioning correctly.
Investigated the potential bug in route calculation: To my pleasant surprise, this turned out to be an actual bug in the gossip store. It took some time to thoroughly verify the issue, create a high level test to trigger the bug, and prepare a write-up for Matt to understand the situation, but we’ve now confirmed that this is indeed a real bug that needs fixing—so none of the time invested was wasted.
Investigate the potential bug in initial_channel_tx(): I reviewed this test at the very beginning of the week, so I don’t clearly recall all the specifics, but I do remember that I wasn’t able to reach a concrete conclusion. I’ll likely need to discuss a few aspects with Matt to clarify the path forward.
Addressed received feedback: I received feedback on a couple of my previous PR’s—like the funder-policy fuzz test I worked on in week 2—and spent time addressing them to secure an ACK for each update. The work isn’t finished yet because I need some clarification on some of the feedback and the reviewer(s) haven’t replied yet, so this is a WIP until the next week.

A point of interest is that while I was addressing feedback on one of my PRs—specifically the new test for bolt12-tlvspan—I stumbled upon what appears to be another potential bug. I spent some time investigating the breakage and putting together a detailed write-up of the scenario. I believe it was time well spent, as I’m fairly confident this is a genuine bug.
Reviewed descriptor_checksum and close_tx tests: These test fuzz test the code in common/descriptor_checksum.h and common/close_tx.h respectively. Since they already cover their target files thoroughly, and I believe there is little room for hidden bugs, I marked these tests as reviewed for now.
Reviewed bolt11 test: This was an easy test to review. I updated it by adding a check for a function—bolt11_encode() to be exact—that was untested by the current setup and making a few other minor improvements. I plan to push the changes in the near future.
Reviewed channel_id tests: This was another easy test to review. I improved it by removing hard-coded values and making the existing wire test round-trip. I plan to push the changes in the near future.

Next week’s goals

I plan to put my current strategy—incrementally reviewing existing tests and working toward writing new ones—on hold for the next week. Many of the tests I’ve worked on are currently blocked by various errors, so I believe it’s better to focus on resolving those issues first before moving on to working on new tests. I’ll still probably review a test or two, but I plan on focusing mainly on resolving the outstanding issues. With this being said, here are some of my immediate goals for the next week:

Fix the handle_peer_error_or_warning() target: I plan to revisit this target and investigate the source of the exit() calls. I’ll aim to reach a definite conclusion, but if I’m unable to do so, I’ll bring it up with Matt to decide on the next steps.
Investigate the potential bug in initial_channel_tx(): I took a look at this last week but wasn’t able to reach any concrete conclusions. I’ll give it another attempt in the coming week, and if I’m still stuck, I’ll reach out to Matt for help.
Investigate the potential bug in route calculation: We’ve already confirmed that this is a real bug, and Matt believes it’s potentially triggerable from external sources. The next step, as per Matt’s guidance, is to determine the severity of the bug—that is, to understand what happens when UBSan isn’t present to catch the issue and the erroneous value is allowed to propagate down the call stack. Specifically, we need to see how it affects the route calculation process and whether it can lead to any significant consequences. Once we have a clear picture of the impact, we’ll decide whether to report the bug publicly on the CLN GitHub or through their security mailing list.
Follow up on feedback: I expect the reviewers to respond to my questions on their feedback sometime next week. Once I hear back, I’ll follow up accordingly and aim to wrap up the work on the outstanding targets.
Explore Matt’s ideas for fuzz targets: I’ve completed reviewing the non-wire fuzzers, and since Matt had been suggesting for a while during our meetings that I look into some of his ideas for new fuzz targets, I decided to pause the review of the wire fuzzers for now and focus on his suggestions instead. Given his deeper knowledge of the Lightning protocol, he likely has a better sense of which areas are worth fuzz testing and where hidden bugs are more likely to exist. He emailed me a couple of ideas yesterday, which I plan to explore in the upcoming week.

Challenges

The work I got done this week on the crashing targets was fairly satisfying, though I didn’t manage to wrap up all of them—which is a bit disappointing. Still, I’m confident I can sort out the remaining issues in the next week.

One thing that’s become increasingly clear the deeper I get into this project is how difficult it is to trigger potential bugs from a higher level. I brought this up with Matt in our last meeting, and he confirmed what I was suspecting: the best chances lie in modifying either the existing unit tests or the existing Python tests. Unit tests are definitely easier to work with, though there aren’t many of them to go around. On the other hand, while there are more Python tests, they’re harder to navigate—mostly because I’m still relatively new to Python.

Another major challenge has been isolating functions for testing. My earlier work on stubbing the wire fuzzers helps with this somewhat, but there’s still a lot of boilerplate and plumbing work needed to make even basic function-level testing possible. In some cases, it’s not worth pursuing a test at all, simply because the required setup is too complex or time-consuming relative to the value of the test.

This is why many of my more recent fuzz tests have been reinterpretations of existing unit tests. These tests already have a lot of the groundwork in place and serve as useful documentation of expected behavior. So far, this approach has proven to be quite productive, and I plan to stick with it and expand on it in the coming weeks.

I uncovered my first real bug in CLN the week before last, a not-very-serious one in week 2, and this week brought another confirmed bug along with a promising lead on a potential one—so I’m feeling pretty good about the value of my work on this project so far. I hadn’t explored security this seriously before, but I’m genuinely enjoying it. That said, the work is far from done. I’m planning to raise the bar for myself and aim to uncover several more bugs before the project wraps up.

One thing that has been a bit discouraging is the apparent indifference from the Core Lightning maintainers toward both my work and fuzzing in general, even though the effort has already exposed multiple bugs in their codebase. It’s frustrating, but not something I can control, so I’m focusing on the work itself and making the most of the opportunity.

Till next time,

Chandra.

SoB: Week 3

SoB: Week 5