Home | Bug Bounty | Threat Modeling | About me

Bug bounty future: taking a step back to move forward

There has been a lot of discussions recently about the future of bug bounties because of AI capabilities, particularly for open source projects:

A domino effect: more than meets the eye

Rapidly improving AI capabilities and the volume of reports they generate are pushing well beyond their limits existing vulnerability disclosure programs.
Yet, if you have been convinced for years of the value of your bug bounty, the rational decision should be trying to optimize it, not to suspend it.

The domino effect we are observing is telling another story. Let’s try to figure out the assumptions it could be based on:

Bug bounty programs can’t scale in the AI Vulnerability Storm era
ROI is not worth it and it becomes more visible with higher volumes
Momentum to get rid of this painful/boring activity (as triage and remediation are far from fun)

If 2 is true, has the security industry been wrong for years celebrating bug bounty programs?
If 1 is true, does it mean bug bounty programs only make sense for already really secure codebases where there are not many vulnerabilities left to discover? But then, does it offer more than marginal risk reduction? What could be the ROI in this case?
If 3 is true, how sustainable can be such a practice? Can it trigger emotional decision making?

Taking a step back

When faced with a dead end, the rational decision is to take a step back before considering giving up.
And Einstein suggests spending more time on understanding the context: “If I had an hour to solve a problem I’d spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.”

Doing so leads to the following conclusion: “find me what is wrong security wise” is a very poor problem definition.
Different actors will have distinct interpretations, particularly when they are driven by divergent incentives.

Illustration of the distinct implicit scopes

In an ideal world, we can define with clear boundaries the area A1 containing only the security issues that are worth fixing for the business.

In the real world, it is more than challenging to come up with such boundaries. To be on the safe side, not to call it FOMO because security teams have in their DNA to try reducing risk as much as possible, but also because it requires less effort, the area A2 ends up being much wider. And somehow gets defined iteratively by fine tuning the scope with reports that are obviously not in A1, and for those who are borderline by clarifying the threat model. (By looking at how restrictive the scope is you can have a clue on which teams spent more time on this topic).

On the other side researchers, depending on their experience and motivations, will come up with their own borders for areas B1, B2 and B3.

With AI improvements, curl was first flooded by B2 reports and now by B1 findings.
But B1 noise was already a concern for programs with significant payouts, big amounts also incentivizing B3 reports. Open source projects were somehow immune because they just didn’t have the budget. (Paid security researchers could present some findings at major conferences, but their effort was capped by the budget their employer was willing to trade against reputation building.)
With the wave of venture capital in AI security startups, aggressive marketing strategies from some of those vendors are now burning huge volumes of tokens to discover “critical” open source vulnerabilties in exchange of media coverage.

How could we focus only on area A1?
And what could we do if area A1 discovery rate is still greater than remediation throughput?

Moving forward

Reducing scope to make A2 closer to A1 has already been mentioned.
What about trying, as in this other diagram, to start with a deliberately more restricted scope A0 but where findings are by design impactful and can be proven automatically?

And in fact the security world is already familiar with those constraints: the same apply to Capture The Flag events.

So why not deploy a version of the software and include flags that a researcher is not supposed to be able to read?
It could be data of another user, an environment variable, accessing a file, etc. Anything that should not happen given your threat model.
By focusing first on impact, vulnerability type later, severity can be decided upfront (and also payouts).
The researcher only has to share the flag, gets confirmation it is valid and only then provides details in a report.

For sure this solution is not perfect, nor a one-size-fits-all approach. But could it be more efficient and more effective than the status quo?
Here is a short and non exhaustive list balancing the pros and cons.

Pros:

No time wasted: in triage for defenders, in sending out of scope/expected behaviour reports for researchers
Guiding researchers to focus their effort: clear scope information and severity expectations
Incident response can be triggered ASAP (rather than after triage delay)
Deriving security invariants from threat model: can be reused for automated negative testing or detection rules

Cons:

You need to know your threat model: otherwise you cannot define what really matters
Missing findings that can’t be validated automatically
Not directly usable for findings with only an Integrity or Confidentiality impact
Hosting costs (if you are not already a SaaS company)

It’s time to move from theory to practice.

Introducing the flagADA tool

Following the advice from Scott Behrens to solve by default, the flagADA tool has been implemented (with Claude Code giving a boost).
Its main goal is to make it easy enough for some open source projects to test this flag approach.

Because only data can prove if this other path makes sense in practice, I am looking for volunteers.
Do not hesitate to send a message at flagada@appsecmatters.com, communicate publicly via the github repo or share this post.