Skip to content
This repository was archived by the owner on Oct 14, 2020. It is now read-only.

Commit cd4af69

Browse files
committed
Added a ADR folder to document design descions. Documents the hook concept descion (WIP) #5
1 parent 7ef0a2f commit cd4af69

File tree

3 files changed

+266
-0
lines changed

3 files changed

+266
-0
lines changed

docs/adr/adr_0000.adoc

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
[[ADR-0000]]
2+
= ADR-0000: Short present tense imperative phrase, less than 50 characters, like a email subject
3+
4+
[cols="h,d",grid=rows,frame=none,stripes=none,caption="Status",%autowidth]
5+
|====
6+
// Use one of the ADR status parameter based on status
7+
// Please add a cross reference link to the new ADR on 'superseded' ADR.
8+
// e.g.: {adr_suposed_by} <<ADR-0000>>
9+
| Status
10+
| PROPOSED | ACCEPTED | REJECTED | DEPRECATED | SUPOSED_BY <<ADR-0000>>
11+
12+
| Date
13+
| YYYY-MM-DD
14+
15+
| Author(s)
16+
| John Doe <john.doe@snafu.com> +
17+
jane Doe <jane.doe@snafu.com>
18+
// ...
19+
|====
20+
21+
== Context
22+
23+
<What is the issue that we're seeing that is motivating this decision or change.>
24+
25+
== Decision
26+
27+
<What is the change that we're actually proposing or doing.>
28+
29+
== Consequences
30+
31+
<What becomes easier or more difficult to do because of this change.>

docs/adr/adr_0001.adoc

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
[[ADR-0000]]
2+
= ADR-0000: How can we introduce a more general extension concept for data proccessing modules?
3+
4+
[cols="h,d",grid=rows,frame=none,stripes=none,caption="Status",%autowidth]
5+
|====
6+
// Use one of the ADR status parameter based on status
7+
// Please add a cross reference link to the new ADR on 'superseded' ADR.
8+
// e.g.: {adr_suposed_by} <<ADR-0000>>
9+
10+
| Status
11+
| ACCEPTED
12+
13+
| Date
14+
| 2020-05-20
15+
16+
| Author(s)
17+
| Jannik Hollenbach <Jannik.Hollenbach@iteratec.com>,
18+
Jorge Estigarribia <Jorge.Estigarribia@iteratec.com>,
19+
Robert Seedorff <Robert.Seedorff@iteratec.com>
20+
|====
21+
22+
== Context
23+
24+
=== Status Quo
25+
26+
One major challange implementing the _secureCodeBox_ is to provide an flexible and modular architecture which enables the open source community to easily understand the concepts and especially to extend the _secureCodeBox_ with individual features. Therefore we decided to seperate the process stages of an single security scan (instance of scanType CRD) in 3 major phases:
27+
```
28+
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
29+
│ Scanning ├─────────▶│ Parsing ├─────────▶│ Persisting │
30+
│ (Phase 1) │ │ (Phase 2) │ │ (Phase 3) │
31+
└──────────────────┘ └──────────────────┘ └──────────────────┘
32+
```
33+
By now the "Persisting Phase 3" was implemented by so called _persistenceProviders_ e.g the *persistence-elastic* provider which is responsible for persisting all findings in a given elasticsearch database. The _secureCodeBox_ Operator is aware of this 3 phases and is responsible for the state model and execution of each security scan (instance of scanType CRD).
34+
35+
=== Problem and Question
36+
37+
We identified different additional UseCases with a are more _data proccessing oriented_ pattern than the implemented _persisting phase3_ indicates. For example we implemented a so called "MetaDataProvider" feature which is responsible for enhancing each security finding with additional metadata. But the MetaDataProvider must be executed after the _parsing phase 2_ and before the _persisting phase 3_ because it depends on the parsed finding results (which will be enhanced) and the update findings should be persisted also.
38+
39+
40+
To find a proper solution we splitted the topic into the follwong two questions:
41+
42+
* Question 1: Should we unify the concepts MetaDataProvider and PersistenceProvider?
43+
* Question 2: How should the execution model look like for each?
44+
45+
==== Question 1: Should we unify the concepts MetaDataProvider and PersistenceProvider?
46+
==== Solution approach 1: Unify
47+
48+
Both "modules" are "processing" the security findings which are generated in the parsing phase.
49+
But there is one larger difference between them:
50+
51+
* `PersistenceProvider` is proccesing the findings with a *ReadOnly* pattern
52+
* `MetaDataProvider` is proccesing the findings with a *ReadAndWrite* pattern
53+
54+
There is a similar thing in kubernetes called [AdmissionController](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/), with the exception that the are executed before a resource gets created.
55+
56+
The are two variants:
57+
58+
* `ValidatingWebhookConfiguration` (ReadOnly) *Executed Last*
59+
* `MutatingWebhookConfiguration` (ReadAndWrite) *Executed First*
60+
61+
We could do a similar thing and introduce CRD which allows to execute "custom code" (generally speaking, depends on the second question) after a scan has completed (meaning scan and parser phase are done). Some name ideas:
62+
63+
- `ScanHooks`
64+
- `ScanCompletionHooks`
65+
- `FindingProcessors`
66+
67+
These could than have a `type` attribute which declares if they are `ReadOnly` or `ReadAndWrite`.
68+
69+
The operator would process all these resources in the namespace and execute the `ReadAndWrite` ones first (in serial: one at a time to avoid write conflicts) and then the `ReadOnly` ones (in parallel).
70+
71+
```yaml
72+
apiVersion: execution.experimental.securecodebox.io/v1
73+
kind: ScanCompletionHook
74+
metadata:
75+
name: my-metadata
76+
spec:
77+
type: ReadAndWrite
78+
# If implemented like the current persistence provider
79+
image: my-metadata:v2.0.0
80+
```
81+
82+
The Execution Flow would then look something like this:
83+
84+
```
85+
┌ ReadOnly─Hooks─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
86+
┌ ReadAndWriteHooks ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌────────────────────────────────┐ │
87+
┌───────────────────────┐ │ ┌──┼▶│ Elastic PersistenceProvider │
88+
┌──────────────────┐ ┌──────────────────┐ │ │ ReadAndWrite Hook #1 │ ┌───────────────────────┐ │ └────────────────────────────────┘ │
89+
│ Scan ├──▶│ Parsing │────▶│ "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2 │─┼──┤ │ ┌────────────────────────────────┐
90+
└──────────────────┘ └──────────────────┘ │ └───────────────────────┘ └───────────────────────┘ └───▶│ DefectDojo PersistenceProvider │ │
91+
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └────────────────────────────────┘
92+
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
93+
```
94+
95+
**Pros:**
96+
97+
- Only one Implementation
98+
- Pretty Generic to expand and test out new ideas without having to modify the secureCodeBox Operator
99+
100+
**Cons:**
101+
102+
- Possible "over abstraction"?
103+
- Need to refactor the ElasticSearch PersistenceProvider
104+
- The "General Implementation" will be harder than the individual ones -
105+
106+
==== Solution approach 1: Keep Split between Persistence Provider and MetaData Provider
107+
108+
Keep PersistenceProvider as they are and introduce new "MetaDataProvider" CRD which gets executed before the PersistenceProviders by the operator.
109+
110+
```
111+
┌ Persistence Provider─ ─ ─ ─ ─ ─ ─ ─
112+
┌ MetaData Provider ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌────────────────────────────────┐ │
113+
┌───────────────────────┐ │ ┌──┼▶│ Elastic PersistenceProvider │
114+
┌──────────────────┐ ┌──────────────────┐ │ │ ReadAndWrite Hook #1 │ ┌───────────────────────┐ │ └────────────────────────────────┘ │
115+
│ Scan ├──▶│ Parsing │────▶│ "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2 │─┼──┤ │ ┌────────────────────────────────┐
116+
└──────────────────┘ └──────────────────┘ │ └───────────────────────┘ └───────────────────────┘ └───▶│ DefectDojo PersistenceProvider │ │
117+
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └────────────────────────────────┘
118+
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
119+
```
120+
121+
**Pros:**
122+
123+
- Quicker to implement
124+
- Might be worth it to have a seperate concept for it
125+
126+
**Cons:**
127+
128+
- Really worth introducing a new CRD for everything, especially when the are conceptually pretty close?
129+
130+
=== Question 2: How should the execution model look like for each?
131+
132+
==== Solution approach 1: Like the persistence provider
133+
134+
Basically a docker container which process takes two command line args:
135+
136+
* A pre-signed URL to download the findings from
137+
* A pre-signed URL to upload the modified findings to
138+
139+
Examples:
140+
141+
* Node.js `node my-metadata.js "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."`
142+
* java `java my-metadata.jar "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."`
143+
* golang `./my-metadata "https://storage.googleapi.com/..." "https://storage.googleapi.com/..."`
144+
145+
**Pros:**
146+
147+
* on liner with the current implementations
148+
* code overhead / wrapper code is pretty minimal
149+
* zero scale - no resource costs when nothing is running
150+
151+
**Cons:**
152+
153+
* results in too many k8s jobs?
154+
** resource blocking on finished resources
155+
** ttlAfterFinished enabled
156+
* container runtime overhead (especially time)
157+
158+
### Option 2: A WebHooks like concept
159+
160+
Analog to kubernetes webhooks.
161+
Https server receiving findings and returning results.
162+
163+
**Pros:**
164+
165+
* MilliSeconds instead of seconds for processing
166+
* No ContainerCreation Overhead
167+
* No additional k8s jobs needed
168+
169+
**Cons:**
170+
171+
* Introduces new running Services that need to be maintained and have uptime
172+
* Code Overhead / Boilerplate (Can be mitigated by sdk)
173+
* Debugging of individual MetaDataProvider is harder as everything is handled by a single service
174+
* Introduces "New" Concept
175+
* Certificate Management for webhook services (`cert-manager` required by default?)
176+
* Scaling for systems with lots of load could be a problem
177+
* One service per namespace (multiple tenants) needed => results in many running active services which is ressource consuming
178+
179+
== Decision
180+
181+
Regarding the Question 1 it seems that both solution approaches are resulting in the same execution model. We descided to implement solution approach 1 and unify both concepts into a more general concept with the name _"hook concept"_. Therefore we exchange the existing name `persistenceProvider` for phase 3 in the excecution model with a more general term `DataProcessing`:
182+
183+
```
184+
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
185+
│ Scanning ├─────────▶│ Parsing ├─────────▶│ DataProcessing │
186+
│ (Phase 1) │ │ (Phase 2) │ │ (Phase 3) │
187+
└──────────────────┘ └──────────────────┘ └──────────────────┘
188+
```
189+
190+
Regarding the question 2 we descided to implement the solution approach 1 with a job based approach (no active service componend needed).
191+
The Phase 3 `DataProcessing` will be therefore splitt into to seperate phases named `ReadAndWriteHooks (3.1)` and `ReadOnlyHooks (3.2)`
192+
193+
```
194+
┌ DataProcessing: ReadOnlyHooks ─ ─ ─
195+
┌ DataProcessing: ReadAndWriteHooks ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌────────────────────────────────┐ │
196+
┌───────────────────────┐ │ ┌──┼▶│ Elastic PersistenceProvider │
197+
┌──────────────────┐ ┌──────────────────┐ │ │ ReadAndWrite Hook #1 │ ┌───────────────────────┐ │ └────────────────────────────────┘ │
198+
│ Scan ├──▶│ Parsing │────▶│ "MyMetaDataProvider" ├─▶│ ReadAndWrite Hook #2 │─┼──┤ │ ┌────────────────────────────────┐
199+
└──────────────────┘ └──────────────────┘ │ └───────────────────────┘ └───────────────────────┘ └───▶│ DefectDojo PersistenceProvider │ │
200+
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ └────────────────────────────────┘
201+
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
202+
```
203+
204+
== Consequences
205+
206+
With the new `Hook Concept` we open the `DataProcessing` Phase 3 to a more intuitive and flexible architecture. It is easier to understand because _WebHooks_ are already a well known concept. It is possible to keep the existing implementation of the `persistenceProviders` and to integrate them with a lot of other possible data processing components in a more general fashion. In the end this step will result in a lot of additional feature possibilities which go fare beyond the existing ones. Therefore we only need to implement this concept once in the secureCodeBox Operator and new ideas for extending the DataProcessing will not enforce conceptual or architectural changes.
207+
208+
Ideas for additional data processing hooks:
209+
* Notifier-Hooks (ReadOnlyHook) e.g. for chat systems (slack, teams...) or metric / alerting systems
210+
* MetaData Enrichment Hooks (ReadAndWriteHook)
211+
* FilterData Hooks (e.g. false/positive Handling) (ReadAndWriteHook)
212+
* SystemIntegration Hooks (ReadOnlyHook) e.g. for ticketing systems like JIRA
213+
* CascadingScans Hooks (ReadOnlyHook) e.g. for starting new security scans based on findings

docs/adr/adr_README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Architecture Decision Records (ADRs)
2+
3+
This subdirectory contains all ADRs for the architecture documentation. The ids of these ADRs are within the number range from 0001 to 0999. The 0000 is reserved for the example template.
4+
5+
Architectural decisions are, where appropriate, documented with the ADR template from [Michael Nygard][nygard]. A template can be found at ```0000.adoc``` and [here][template]. We extend this template with the date when this decision was made.
6+
7+
Important key points from the blog of [Michael Nygard][nygard]:
8+
9+
> We will keep a collection of records for "architecturally significant" decisions: those that affect the structure, non-functional characteristics, dependencies, interfaces, or construction techniques.
10+
>
11+
> Each record describes a set of forces and a single decision in response to those forces. Note that the decision is the central piece here, so specific forces may appear in multiple ADRs.
12+
>
13+
> We will keep ADRs in the project repository under ```adr/NNNN.adoc```
14+
>
15+
> ADRs will be numbered sequentially and monotonically. **Numbers will not be reused.**
16+
>
17+
> If a decision is reversed, we will **keep the old** one around, but mark it as superseded. (It's still relevant to know that it was the decision, but is no longer the decision.)
18+
19+
Please take a look in the ADR template ```0000.adoc``` for more information about the internal structure.
20+
21+
[nygard]: http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions
22+
[template]: https://github.com/joelparkerhenderson/architecture_decision_record/blob/master/adr_template_by_michael_nygard.md

0 commit comments

Comments
 (0)