[SC-L] BSIMM update (informIT)
Steven M. Christey
coley at linus.mitre.org
Tue Feb 2 16:12:22 EST 2010
On Tue, 2 Feb 2010, Wall, Kevin wrote:
> To study something scientifically goes _beyond_ simply gathering
> observable and measurable evidence. Not only does data needs to be
> collected, but it also needs to be tested against a hypotheses that offers
> a tentative *explanation* of the observed phenomena;
> i.e., the hypotheses should offer some predictive value. Furthermore,
> the steps of the experiment must be _repeatable_, not just by
> those currently involved in the attempted scientific endeavor, but by
> *anyone* who would care to repeat the experiment. If the
> steps are not repeatable, then any predictive value of the study is lost.
I believe that the cross-industry efforts like BSIMM, ESAPI, top-n lists,
SAMATE, etc. are largely at the beginning of the data collection phase.
It shouldn't be much of a surprise that the many companies participate in
two or more of these efforts (although simultaneously disconcerting, but
that's probably what happens in brand-new areas).
Ultimately, I would love to see the kind of linkage between the collected
data ("evidence") and some larger goal ("higher security" whatever THAT
means in quantitative terms) but if it's out there, I don't see it, or
it's in tiny pieces... and it may be a few years before we get to that
point. CVE data and trends have been used in recent years, or should I
say abused or misused, because of inherent bias problems that I'm too lazy
to talk about at the moment.
In CWE, one aspect of our research is to tie attacks to weaknesses,
weaknesses to mitigations, etc. so that there is better understanding of
all the inter-related pieces. So when you look at the CERT C coding
standard and its ties back to CWE, you see which rules directly
reduce/affect which weaknesses, and which ones don't. (Or, you *could*,
if you wanted to look at it closely enough).
The 2010 OWASP Top 10 RC1 is more data-driven than previous versions; same
with the 2010 Top 25 (whose release has been delayed to Feb 16, btw).
Unlike last year's Top 25 effort, this time I received several sources of
raw prevalence data, but unfortunately it wasn't in sufficiently
consumable form to combine.
In tool analysis efforts such as SAMATE, we are still wrestling with the
notion of what a "false positive" really means, not to mention the
challenge of analyzing mountains of raw data, using tools that were
intended for developers in a third-party consulting context, combined with
the multitude of perspectives in how weaknesses are described (e.g., what
do you do if there's a chain from weakness X to Y, and tool 1 reports X,
and tool 2 reports Y?)
> In fact, I am willing to bet that the different members of my
> Application Security team who have all worked together for about 8 years
> would answer a significant number of the BSIMM Begin survey questions
> quite differently.
Even surveys using much lower-level detailed questions - such as which
weaknesses on a "nominee list" of 41 are the most important and prevalent
- have had distinct responses from multiple people within the same
organization. (I'll touch on this a little more when the 2010 Top 25 is
released). Arguably many of these differences in opinion come down to
variations in context and experience, but unless and until we can model
"context" in a way that makes our results somewhat shareable, we can't get
beyond the data collection phase.
I for one am pretty satisfied with the rate at which things are
progressing and am delighted to see that we're finally getting some raw
data, as good (or as bad) as it may be. The data collection process,
source data, metrics, and conclusions associated with the 2010 Top 25 will
probably be controversial, but at least there's some data to argue about.
So in that sense, I see Gary's article not so much as a clarion call for
action to a reluctant and primitive industry, but an early announcement of
a shift that is already underway.
- Steve
More information about the SC-L
mailing list