You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/1_foreword.md
+37-1Lines changed: 37 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ nav_order: 1
6
6
---
7
7
8
8
9
-
## 1. Foreword
9
+
## 1.1 Foreword
10
10
11
11
The intent of this document is to provide instructions for the replication of the study described in “A Socio-technical Perspective on Software Vulnerabilities: A Causal Analysis” hereafter called the “Original Study.”
12
12
@@ -15,3 +15,39 @@ Given the non-determinism of search algorithms provided in our primary tool for
15
15
We thus had an opportunity to follow our own replication package but with a newer version of the main tool used. What we learned was this: the details will change (e.g., which specific social smells in the current time period affect which specific work-rate variables in the next time period) but the overall conclusions documented at the end of our study still hold, in particular, quoting from the study’s conclusions:
16
16
17
17
> “results: …social smells are indeed important factors that mediate significant project outcomes in terms of the incidence of and the effort associated with security vulnerabilities.”
18
+
19
+
## 1.2 Why do Causal Discovery and Causal Inference?
20
+
21
+
Causal inference has entered the research methodology discourse in fields as diverse as Econometrics and Epidemiology; and indeed, almost all leading and significant research in these two fields is now conducted in this more rigorous setting.
22
+
23
+
A more technical discussion of how Causal Discovery and Causal Inference work can be found in the references; however, they are entering more widespread use in:
24
+
25
+
* Epidemiological studies, where how one establishes medical facts, guidance, and policy can be, literally, a matter of life or death
26
+
* Economics science, where major decisions are made regarding American and International economics policy.
27
+
28
+
For just two recent examples, we cite a paper by Miguel Hernán that has been described as one of the most influential papers in 100 years of the American Journal of Epidemiology [2, 3]; and the awarding of the 2021 Nobel Prize for Economics to Angrist and Imbens [4]
29
+
30
+
However, for our purposes, this may prove a sufficient explanation:
31
+
32
+
> Regarding the use of Causal Discovery algorithms (FGES): when evaluating candidate edges, causal discovery conditions on (this is the verb “conditions” not the “noun”) other variables when searching for which new edges best improve the score for an intermediate-stage graph during search, evaluating whether a given pair of variables remain correlated even when taking conditioning on a subset of the other variables into account. If the partial (conditional) correlation even fails once, no edge will be credited to the variable pair. Thus, each edge in the resulting graph (a Directed Acyclic Graph or DAG [3]), has had to survive a “gauntlet” set of conditional independence tests (or the score equivalent), with a significant partial correlation found each time (or the edge would be removed). Note that each such conditioning constitutes an alternative explanation for how causality plays out. Though it takes more computation time, Causal Discovery is superior to ordinary (marginal) correlation between variable pairs in a research study as it eliminates many other alternative causal hypotheses for how the values attained by one variable affects/drives what happens with a second variable.
33
+
34
+
Continuing on our answer to “why causal discovery/inference:”
35
+
36
+
> Likewise, causal discovery leads to superior covariate handling in linear/logistic regression [5, 6]. Key to estimating the direct casual effect of a treatment on a response by linear regression is determining which covariates to use. Presumably, the data analyst has experimented with different combinations of variables acting as covariates, but conditioning on the wrong set can create nonsense associations/edges between variable pairs.
37
+
38
+
And for these reasons, we employed Causal Discovery in our analysis of the OpenSSL dataset.
39
+
40
+
## 1.3 References
41
+
42
+
1. Muthén, L. K., & Muthén, B. O. (1998-2017). Mplus User's Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén.))
43
+
2. American Journal of Epidemiology, 2021. “100 Years of the American Journal of Epidemiology.” 100 years | American Journal of Epidemiology | Oxford Academic (oup.com)
44
+
3. Miguel A. Hernán, Sonia Hernández-Díaz, Martha M. Werler, Allen A. Mitchell, Causal Knowledge as a Prerequisite for Confounding Evaluation: An Application to Birth Defects Epidemiology, American Journal of Epidemiology, Volume 155, Issue 2, 15 January 2002, Pages 176–184, https://doi.org/10.1093/aje/155.2.176 (Selected as one of the most influential papers in the history of the journal.)
45
+
4. Nobel Prize Organization. “Press Release: The Prize in Economic Sciences.” Awarded to Joshua D. Angrist and Guido W. Imbens “for their methodological contributions to the analysis of causal relationships.” The Prize in Economic Sciences 2021 - Press release - NobelPrize.org
0 commit comments