Skip to content

Commit c6ba0de

Browse files
Merge pull request #288 from cyyeh/feature/update-doc
update doc
2 parents 0be3780 + c3affb4 commit c6ba0de

File tree

6 files changed

+243
-3
lines changed

6 files changed

+243
-3
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ Below are some common scenarios that you may be interested:
6969
- [Error Handling](https://vulcansql.com/docs/develop/error)
7070
- [API Parameters Validation](https://vulcansql.com/docs/develop/validator)
7171
- [Data Privacy](https://vulcansql.com/docs/data-privacy/overview)
72+
- [Extensions](https://vulcansql.com/docs/extensions/overview)
7273
- [API Configurations](https://vulcansql.com/docs/api-plugin/overview)
7374
- [Deployment](https://vulcansql.com/docs/deployment)
7475

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
---
2+
date: 2023-08-25
3+
authors:
4+
name: Jimmy Yeh
5+
title: core member of VulcanSQL
6+
url: https://github.com/cyyeh
7+
image_url: https://avatars.githubusercontent.com/u/11023068?v=4
8+
email: jimmy.yeh@cannerdata.com
9+
---
10+
11+
# Data Privacy Mechanisms provided by VulcanSQL for Easier Data Sharing
12+
13+
![cover](./static/cover-data-privacy.jpg)
14+
15+
## What is data sharing and why it's important?
16+
17+
Data sharing is a behavior that we share data to other parties
18+
which maybe other departments in the same company, or customers outside the company, etc.
19+
The reason why we agree to share data is that we finally realize data is a valuable asset
20+
especially to business, since it can make the business process more smooth and enable better decision making results!
21+
22+
When thinking about how to share data with others, there are some common requirements coming up to our minds, such as
23+
what are the formats of data being stored, what kind of storage should persist the data, and how to share data with other parties.
24+
However, there are also other directions we need to consider and I would like to discuss data privacy with you in this article.
25+
26+
## Why data privacy is important to data sharing?
27+
28+
In terms of sharing data with others, there are certainly some scenraios that we can treat everyone we share with equally.
29+
<!--truncate-->
30+
For example, datasets found in Kaggle competitions are normally the same to all competition participants.
31+
However, in the business world, we should especially obey the principle of least privilege(PoLP). It's an information security concept which maintains that a user
32+
or entity should only have access to the specific data, resources and applications needed to complete a required task[^1].
33+
34+
Given that we understand data sharing is a great opportunity to share valuable resources with others,
35+
and we also realize we need to control how different parts of data can be accessed from different parties.
36+
Now the question comes to how do we share data with others in a more contrallable way that can fulfill the data privacy requirement?
37+
38+
39+
## How VulcanSQL can help?
40+
41+
VulcanSQL comes with several built-in data privacy mechanisms to enable a more controllable and scalable data sharing use case!
42+
43+
As of now, VulcanSQL has 5 techniques for handling data privacy, here are some brief introduction to each technique and [in the
44+
Showcase section](#showcase), we'll show you the code and explain further accordingly.
45+
46+
### Authentication
47+
48+
Authentication is the first layer of data privacy protection. Any entity that is not authenticated is not allowed to access any resources.
49+
At the moment, VulcanSQL only accepts three authentication methods, namely [HTTP Basic](../docs/data-privacy/authn#http-basic),
50+
[password file](../docs/data-privacy/authn#password-file) and [simple token](../docs/data-privacy/authn#simple-token). Since we realize there should be
51+
a more mature and easy way for users to authenticate, we plan to support OpenID Connect in the future.
52+
53+
:::info
54+
You can understand more about [the authentication mechanism in VulcanSQL here](../docs/data-privacy/authn)!
55+
:::
56+
57+
### Authorization
58+
59+
With authorization, VulcanSQL applies an attribute-based access control(ABAC) approach to control access
60+
based on user attributes provided by Authenticator. In VulcanSQL, we can configure each user's attributes in `vulcan.yaml`;
61+
then we can define different policies for different users based on their attributes for each data source in `profiles.yaml`.
62+
With this mechanism, different users would see different parts of the data based on their attributes defined in VulcanSQL!
63+
64+
:::info
65+
You can understand more about [the authorization mechanism in VulcanSQL here](../docs/data-privacy/authz)!
66+
:::
67+
68+
### Dynamic Data Masking
69+
70+
Sometimes, we want to share masked data to users. The purpose is to protect the actual data while having a functional substitute
71+
for occasions when the real data is not required!
72+
73+
With dynamic data masking, we can define a specific pattern for masking the real data, such as transforming an ID from F123456789 to F12xxxx89
74+
using a `partial(3, 'xxxx', 2)` function.
75+
76+
:::info
77+
You can understand more about [the dynamic data masking mechanism in VulcanSQL here](../docs/data-privacy/data-masking)!
78+
:::
79+
80+
### Column-level Security
81+
82+
If we need to have fine grained control over some specific columns, we can use the Column-level Security(CLS) mechanism to achieve the goal.
83+
In VulcanSQL, we can decide who can access the specific column based on their user attributes.
84+
85+
:::info
86+
You can understand more about [the column-level security mechanism in VulcanSQL here](../docs/data-privacy/cls)!
87+
:::
88+
89+
### Row-level Security
90+
91+
Similar to the case of Column-level Security, if we need to have fine grained control over some specific rows, we can use the Row-level Security(RLS) mechanism to achieve the goal.
92+
In VulcanSQL, we can decide who can access the specific row based on their user attributes.
93+
94+
:::info
95+
You can understand more about [the row-level security mechanism in VulcanSQL here](../docs/data-privacy/rls)!
96+
:::
97+
98+
## Showcase
99+
100+
Now we are going to show you the code to demonstrate how you can deliver data privacy mechanisms in VulcanSQL!
101+
For those who may not familiar with VulcanSQL yet, **VulcanSQL is a Data API framework for data folks to create REST APIs
102+
easily by writing templated SQL! It's mainly used for sharing data from databases, data warehouses and data lakes!**
103+
104+
If you would like to read the source code or try the example by yourself,
105+
welcome to [check it out here](https://github.com/Canner/vulcan-sql-examples/tree/main/data-sharing)!
106+
107+
Below is the dataset we'll use in the showcase:
108+
109+
|id|department|last_name|company_role|annual_salary|
110+
|---|---|---|---|---|
111+
|JDK32424|engineering|James|engineer|"$100,000"|
112+
|EKJ34124|sales|Harden|sales|"$120,000"|
113+
|MKO56124|sales|Michael|manager|"$110,000"|
114+
|ONP01124|engineering|Cindy|manager|"$115,000"|
115+
|NZP59124|ceo|Rosa|boss|"$150,000"|
116+
117+
Below is the code you may write in SQL templates in VulcanSQL:
118+
119+
```sql
120+
SELECT
121+
-- dynamic data masking
122+
{% masking id partial(2, 'xxxxxxx', 2) %} as id,
123+
department,
124+
last_name,
125+
company_role,
126+
-- column level security
127+
{% if context.user.attr.role == 'employer' %}
128+
annual_salary
129+
{% else %}
130+
NULL AS annual_salary
131+
{% endif %}
132+
FROM read_csv_auto('departments.csv', HEADER=True)
133+
-- row level security
134+
{% if context.user.attr.role != 'employer' %}
135+
WHERE department = {{ context.user.attr.department }}
136+
{% endif %}
137+
```
138+
139+
Here is the `auth` configuration in `vulcan.yaml`:
140+
141+
```yaml
142+
auth:
143+
enabled: true
144+
options:
145+
basic:
146+
# Read users and passwords from a text file.
147+
htpasswd-file:
148+
path: passwd.txt # Path to the password file.
149+
users: # (Optional) Add attributes for users
150+
- name: james
151+
attr:
152+
role: employee
153+
department: engineering
154+
- name: harden
155+
attr:
156+
role: employee
157+
department: sales
158+
- name: michael
159+
attr:
160+
role: employee
161+
department: sales
162+
- name: cindy
163+
attr:
164+
role: employee
165+
department: engineering
166+
- name: rosa
167+
attr:
168+
role: employer
169+
department: ceo
170+
```
171+
172+
The REST API results you'll see based on different users:
173+
174+
**James**
175+
176+
|id|department|last_name|company_role|annual_salary|
177+
|---|---|---|---|---|
178+
|JDxxxxxxx24|engineering|James|engineer||
179+
|ONxxxxxxx24|engineering|Cindy|manager||
180+
181+
**Harden**
182+
183+
|id|department|last_name|company_role|annual_salary|
184+
|---|---|---|---|---|
185+
|EKxxxxxxx24|sales|Harden|sales||
186+
|MKxxxxxxx24|sales|Michael|manager||
187+
188+
**Michael**
189+
190+
|id|department|last_name|company_role|annual_salary|
191+
|---|---|---|---|---|
192+
|EKxxxxxxx24|sales|Harden|sales||
193+
|MKxxxxxxx24|sales|Michael|manager||
194+
195+
**Cindy**
196+
197+
|id|department|last_name|company_role|annual_salary|
198+
|---|---|---|---|---|
199+
|JDxxxxxxx24|engineering|James|engineer||
200+
|ONxxxxxxx24|engineering|Cindy|manager||
201+
202+
**Rosa**
203+
204+
|id|department|last_name|company_role|annual_salary|
205+
|---|---|---|---|---|
206+
|JDxxxxxxx24|engineering|James|engineer|$100,000|
207+
|EKxxxxxxx24|sales|Harden|sales|$120,000|
208+
|MKxxxxxxx24|sales|Michael|manager|$110,000|
209+
|ONxxxxxxx24|engineering|Cindy|manager|$115,000|
210+
|NZxxxxxxx24|ceo|Rosa|boss|$150,000|
211+
212+
After observing the result tables shown above and the given SQL template,
213+
we can clearly figure out several data privacy mechanisms provided by VulcanSQL:
214+
215+
1. Authentication: In the above example, we used HTTP Basic as the authentication method
216+
and the password of each user was stored in a text file called `passwd.txt`.
217+
2. Authorization: You can find that we had defined user attributes in `vulcan.yaml`.
218+
With these user attributes defined, we can have more fine grained control on what kind of data
219+
each user should access.
220+
3. Dynamic Data Masking: `{% masking id partial(2, 'xxxxxxx', 2) %} as id` makes only the first two and
221+
last two digits of `id` visible, and the rest is masked.
222+
4. Column-level Security: We can see that only the user who is in the employer role can see the data in the salary field,
223+
so Rosa is the only person who can see other people's salary.
224+
5. Row-level Security: We can see a user who is not in the employer role can only see the data
225+
in the same department as he/she.
226+
227+
## Conclusion
228+
229+
Data privacy is more important than ever. We may regard it as a special kind of human rights,
230+
so we should protect the data from being abused!
231+
232+
We hope this blog post effectively highlights the significance of data privacy when sharing data with other parties.
233+
It also showcases that VulcanSQL offers user-friendly solutions that are certainly worth exploring.
234+
235+
[^1]: The definition of the principle of least privilege(PoLP) is referenced from the [article from paloalto networks](https://www.paloaltonetworks.com/cyberpedia/what-is-the-principle-of-least-privilege#:~:text=The%20principle%20of%20least%20privilege%20(PoLP)%20is%20an%20information%20security,to%20complete%20a%20required%20task.).

packages/doc/blog/powering-rapid-data-apps-with-vulcansql.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
---
2+
date: 2023-07-09
23
authors:
34
name: Andy Yen
4-
title: contributor of VulcanSQL
5+
title: core member of VulcanSQL
56
url: https://github.com/onlyjackfrost
67
image_url: https://avatars.githubusercontent.com/u/38731840?v=4
78
email: andy.yen@cannerdata.com

packages/doc/blog/querying-your-data-easily-and-smartly-through-huggingface.mdx

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,22 @@
11
---
2+
date: 2023-08-25
23
authors:
34
- name: Eason Kuo
4-
title: contributor of VulcanSQL
5+
title: core member of VulcanSQL
56
url: https://github.com/kokokuo
67
image_url: https://avatars.githubusercontent.com/u/5389253?v=4
78
email: eason.kuo@cannerdata.com
89
- name: Jimmy Yeh
9-
title: contributor of VulcanSQL
10+
title: core member of VulcanSQL
1011
url: https://github.com/cyyeh
1112
image_url: https://avatars.githubusercontent.com/u/11023068?v=4
1213
email: jimmy.yeh@cannerdata.com
1314
---
1415

1516
# Querying Your Data Easily and Smartly through Hugging Face
1617

18+
![cover](./static/cover-huggingface.jpg)
19+
1720
*TLDR: VulcanSQL, a free and open-source data API framework built specifically for data applications,
1821
empowers data professionals to generate and distribute data APIs quickly and effortlessly.
1922
It takes your SQL templates and transforms them into data APIs, with no backend expertise necessary.*
590 KB
Loading
571 KB
Loading

0 commit comments

Comments
 (0)