Skip to content

Commit 8b77e77

Browse files
committed
add data privacy blogpost
1 parent 7925e4d commit 8b77e77

File tree

2 files changed

+231
-0
lines changed

2 files changed

+231
-0
lines changed
Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
---
2+
authors:
3+
name: Jimmy Yeh
4+
title: core member of VulcanSQL
5+
url: https://github.com/cyyeh
6+
image_url: https://avatars.githubusercontent.com/u/11023068?v=4
7+
email: jimmy.yeh@cannerdata.com
8+
---
9+
10+
# Data Privacy Mechanisms provided by VulcanSQL for Easier Data Sharing
11+
12+
![cover](./static/cover-data-privacy.jpg)
13+
14+
## What is data sharing and why it's important?
15+
16+
Data sharing is a behavior that we share data to other parties
17+
which maybe other departments in the same company, or customers outside the company, etc.
18+
The reason why we agree to share data is that we finally realize data is a valuable asset
19+
especially to business, since it can make the business process more smooth and enable better decision making results!
20+
21+
When thinking about how to share data with others, there are some common requirements coming up to our minds, such as
22+
what are the formats of data being stored, what kind of storage should persist the data, and how to share data with other parties.
23+
However, there are also other directions we need to consider and I would like to discuss data privacy with you in this article.
24+
25+
## Why data privacy is important to data sharing?
26+
27+
In terms of sharing data with others, there are certainly some scenraios that we can treat everyone we share with equally.
28+
<!--truncate-->
29+
For example, datasets found in Kaggle competitions are normally the same to all competition participants.
30+
However, in the business world, we should especially obey the principle of least privilege(PoLP). It's an information security concept which maintains that a user
31+
or entity should only have access to the specific data, resources and applications needed to complete a required task[^1].
32+
33+
Given that we understand data sharing is a great opportunity to share valuable resources with others,
34+
and we also realize we need to control how different parts of data can be accessed from different parties.
35+
Now the question comes to how do we share data with others in a more contrallable way that can fulfill the data privacy requirement?
36+
37+
38+
## How VulcanSQL can help?
39+
40+
VulcanSQL comes with several built-in data privacy mechanisms to enable a more controllable and scalable data sharing use case!
41+
42+
As of now, VulcanSQL has 5 techniques for handling data privacy, here are some brief introduction to each technique and [in the
43+
Showcase section](#showcase), we'll show you the code and explain further accordingly.
44+
45+
### Authentication
46+
47+
Authentication is the first layer of data privacy protection. Any entity that is not authenticated is not allowed to access any resources.
48+
At the moment, VulcanSQL only accepts three authentication methods, namely [HTTP Basic](../docs/data-privacy/authn#http-basic),
49+
[password file](../docs/data-privacy/authn#password-file) and [simple token](../docs/data-privacy/authn#simple-token). Since we realize there should be
50+
a more mature and easy way for users to authenticate, we plan to support OpenID Connect in the future.
51+
52+
:::info
53+
You can understand more about [the authentication mechanism in VulcanSQL here](../docs/data-privacy/authn)!
54+
:::
55+
56+
### Authorization
57+
58+
With authorization, VulcanSQL applies an attribute-based access control(ABAC) approach to control access
59+
based on user attributes provided by Authenticator. In VulcanSQL, we can configure each user's attributes in `vulcan.yaml`;
60+
then we can define different policies for different users based on their attributes for each data source in `profiles.yaml`.
61+
With this mechanism, different users would see different parts of the data based on their attributes defined in VulcanSQL!
62+
63+
:::info
64+
You can understand more about [the authorization mechanism in VulcanSQL here](../docs/data-privacy/authz)!
65+
:::
66+
67+
### Dynamic Data Masking
68+
69+
Sometimes, we want to share masked data to users. The purpose is to protect the actual data while having a functional substitute
70+
for occasions when the real data is not required!
71+
72+
With dynamic data masking, we can define a specific pattern for masking the real data, such as transforming an ID from F123456789 to F12xxxx89
73+
using a `partial(3, 'xxxx', 2)` function.
74+
75+
:::info
76+
You can understand more about [the dynamic data masking mechanism in VulcanSQL here](../docs/data-privacy/data-masking)!
77+
:::
78+
79+
### Column-level Security
80+
81+
If we need to have fine grained control over some specific columns, we can use the Column-level Security(CLS) mechanism to achieve the goal.
82+
In VulcanSQL, we can decide who can access the specific column based on their user attributes.
83+
84+
:::info
85+
You can understand more about [the column-level security mechanism in VulcanSQL here](../docs/data-privacy/cls)!
86+
:::
87+
88+
### Row-level Security
89+
90+
Similar to the case of Column-level Security, if we need to have fine grained control over some specific rows, we can use the Row-level Security(RLS) mechanism to achieve the goal.
91+
In VulcanSQL, we can decide who can access the specific row based on their user attributes.
92+
93+
:::info
94+
You can understand more about [the row-level security mechanism in VulcanSQL here](../docs/data-privacy/rls)!
95+
:::
96+
97+
## Showcase
98+
99+
Now we are going to show you the code to demonstrate how you can deliver data privacy mechanisms in SQL templates!
100+
If you would like to read the source code or try the example by yourself,
101+
welcome to [check it out here](https://github.com/Canner/vulcan-sql-examples/tree/main/data-sharing)!
102+
103+
Below is the sample dataset we'll use in the showcase:
104+
105+
|id|department|last_name|company_role|annual_salary|
106+
|---|---|---|---|---|
107+
|JDK32424|engineering|James|engineer|"$100,000"|
108+
|EKJ34124|sales|Harden|sales|"$120,000"|
109+
|MKO56124|sales|Michael|manager|"$110,000"|
110+
|ONP01124|engineering|Cindy|manager|"$115,000"|
111+
|NZP59124|ceo|Rosa|boss|"$150,000"|
112+
113+
Below is the code you may write in SQL templates in VulcanSQL:
114+
115+
```sql
116+
SELECT
117+
-- dynamic data masking
118+
{% masking id partial(2, 'xxxxxxx', 2) %} as id,
119+
department,
120+
last_name,
121+
company_role,
122+
-- column level security
123+
{% if context.user.attr.role == 'employer' %}
124+
annual_salary
125+
{% else %}
126+
NULL AS annual_salary
127+
{% endif %}
128+
FROM read_csv_auto('departments.csv', HEADER=True)
129+
-- row level security
130+
{% if context.user.attr.role != 'employer' %}
131+
WHERE department = {{ context.user.attr.department }}
132+
{% endif %}
133+
```
134+
135+
Here is the `auth` configuration in `vulcan.yaml`:
136+
137+
```yaml
138+
auth:
139+
enabled: true
140+
options:
141+
basic:
142+
# Read users and passwords from a text file.
143+
htpasswd-file:
144+
path: passwd.txt # Path to the password file.
145+
users: # (Optional) Add attributes for users
146+
- name: james
147+
attr:
148+
role: employee
149+
department: engineering
150+
- name: harden
151+
attr:
152+
role: employee
153+
department: sales
154+
- name: michael
155+
attr:
156+
role: employee
157+
department: sales
158+
- name: cindy
159+
attr:
160+
role: employee
161+
department: engineering
162+
- name: rosa
163+
attr:
164+
role: employer
165+
department: ceo
166+
```
167+
168+
The REST API results you'll see based on different users:
169+
170+
**James**
171+
172+
|id|department|last_name|company_role|annual_salary|
173+
|---|---|---|---|---|
174+
|JDxxxxxxx24|engineering|James|engineer||
175+
|ONxxxxxxx24|engineering|Cindy|manager||
176+
177+
**Harden**
178+
179+
|id|department|last_name|company_role|annual_salary|
180+
|---|---|---|---|---|
181+
|EKxxxxxxx24|sales|Harden|sales||
182+
|MKxxxxxxx24|sales|Michael|manager||
183+
184+
**Michael**
185+
186+
|id|department|last_name|company_role|annual_salary|
187+
|---|---|---|---|---|
188+
|EKxxxxxxx24|sales|Harden|sales||
189+
|MKxxxxxxx24|sales|Michael|manager||
190+
191+
**Cindy**
192+
193+
|id|department|last_name|company_role|annual_salary|
194+
|---|---|---|---|---|
195+
|JDxxxxxxx24|engineering|James|engineer||
196+
|ONxxxxxxx24|engineering|Cindy|manager||
197+
198+
**Rosa**
199+
200+
|id|department|last_name|company_role|annual_salary|
201+
|---|---|---|---|---|
202+
|JDxxxxxxx24|engineering|James|engineer|$100,000|
203+
|EKxxxxxxx24|sales|Harden|sales|$120,000|
204+
|MKxxxxxxx24|sales|Michael|manager|$110,000|
205+
|ONxxxxxxx24|engineering|Cindy|manager|$115,000|
206+
|NZxxxxxxx24|ceo|Rosa|boss|$150,000|
207+
208+
After observing the result tables shown above and the given SQL templates,
209+
we can clearly figure out several data privacy mechanisms provided by VulcanSQL:
210+
211+
1. Authentication: In the above example, we used HTTP Basic as the authentication method
212+
and the password of each user was stored in a text file called `passwd.txt`.
213+
2. Authorization: You can find that we had defined user attributes in `vulcan.yaml`.
214+
With these user attributes defined, we can have more fine grained control on what kind of data
215+
each user should access.
216+
3. Dynamic Data Masking: `{% masking id partial(2, 'xxxxxxx', 2) %} as id` makes only the first two and
217+
last two digits of id visible, and the rest is masked.
218+
4. Column-level Security: We can see that only the user who is in the employer role can see the salary field,
219+
so Rosa is the only one in this role.
220+
5. Row-level Security: We can see a user who is not in the employer role can only see the data
221+
in the same department as he/she.
222+
223+
## Conclusion
224+
225+
Data privacy is more important than ever. We may regard it as a special kind of human rights,
226+
so we should protect the data from being abused!
227+
228+
We hope this blog post effectively highlights the significance of data privacy when sharing data with other parties.
229+
It also showcases that VulcanSQL offers user-friendly solutions that are certainly worth exploring.
230+
231+
[^1]: The definition of the principle of least privilege(PoLP) is referenced from the [article from paloalto networks](https://www.paloaltonetworks.com/cyberpedia/what-is-the-principle-of-least-privilege#:~:text=The%20principle%20of%20least%20privilege%20(PoLP)%20is%20an%20information%20security,to%20complete%20a%20required%20task.).
590 KB
Loading

0 commit comments

Comments
 (0)