|
| 1 | +--- |
| 2 | +date: 2023-08-25 |
| 3 | +authors: |
| 4 | + name: Jimmy Yeh |
| 5 | + title: core member of VulcanSQL |
| 6 | + url: https://github.com/cyyeh |
| 7 | + image_url: https://avatars.githubusercontent.com/u/11023068?v=4 |
| 8 | + email: jimmy.yeh@cannerdata.com |
| 9 | +--- |
| 10 | + |
| 11 | +# Data Privacy Mechanisms provided by VulcanSQL for Easier Data Sharing |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | +## What is data sharing and why it's important? |
| 16 | + |
| 17 | +Data sharing is a behavior that we share data to other parties |
| 18 | +which maybe other departments in the same company, or customers outside the company, etc. |
| 19 | +The reason why we agree to share data is that we finally realize data is a valuable asset |
| 20 | +especially to business, since it can make the business process more smooth and enable better decision making results! |
| 21 | + |
| 22 | +When thinking about how to share data with others, there are some common requirements coming up to our minds, such as |
| 23 | +what are the formats of data being stored, what kind of storage should persist the data, and how to share data with other parties. |
| 24 | +However, there are also other directions we need to consider and I would like to discuss data privacy with you in this article. |
| 25 | + |
| 26 | +## Why data privacy is important to data sharing? |
| 27 | + |
| 28 | +In terms of sharing data with others, there are certainly some scenraios that we can treat everyone we share with equally. |
| 29 | +<!--truncate--> |
| 30 | +For example, datasets found in Kaggle competitions are normally the same to all competition participants. |
| 31 | +However, in the business world, we should especially obey the principle of least privilege(PoLP). It's an information security concept which maintains that a user |
| 32 | +or entity should only have access to the specific data, resources and applications needed to complete a required task[^1]. |
| 33 | + |
| 34 | +Given that we understand data sharing is a great opportunity to share valuable resources with others, |
| 35 | +and we also realize we need to control how different parts of data can be accessed from different parties. |
| 36 | +Now the question comes to how do we share data with others in a more contrallable way that can fulfill the data privacy requirement? |
| 37 | + |
| 38 | + |
| 39 | +## How VulcanSQL can help? |
| 40 | + |
| 41 | +VulcanSQL comes with several built-in data privacy mechanisms to enable a more controllable and scalable data sharing use case! |
| 42 | + |
| 43 | +As of now, VulcanSQL has 5 techniques for handling data privacy, here are some brief introduction to each technique and [in the |
| 44 | +Showcase section](#showcase), we'll show you the code and explain further accordingly. |
| 45 | + |
| 46 | +### Authentication |
| 47 | + |
| 48 | +Authentication is the first layer of data privacy protection. Any entity that is not authenticated is not allowed to access any resources. |
| 49 | +At the moment, VulcanSQL only accepts three authentication methods, namely [HTTP Basic](../docs/data-privacy/authn#http-basic), |
| 50 | +[password file](../docs/data-privacy/authn#password-file) and [simple token](../docs/data-privacy/authn#simple-token). Since we realize there should be |
| 51 | +a more mature and easy way for users to authenticate, we plan to support OpenID Connect in the future. |
| 52 | + |
| 53 | +:::info |
| 54 | +You can understand more about [the authentication mechanism in VulcanSQL here](../docs/data-privacy/authn)! |
| 55 | +::: |
| 56 | + |
| 57 | +### Authorization |
| 58 | + |
| 59 | +With authorization, VulcanSQL applies an attribute-based access control(ABAC) approach to control access |
| 60 | +based on user attributes provided by Authenticator. In VulcanSQL, we can configure each user's attributes in `vulcan.yaml`; |
| 61 | +then we can define different policies for different users based on their attributes for each data source in `profiles.yaml`. |
| 62 | +With this mechanism, different users would see different parts of the data based on their attributes defined in VulcanSQL! |
| 63 | + |
| 64 | +:::info |
| 65 | +You can understand more about [the authorization mechanism in VulcanSQL here](../docs/data-privacy/authz)! |
| 66 | +::: |
| 67 | + |
| 68 | +### Dynamic Data Masking |
| 69 | + |
| 70 | +Sometimes, we want to share masked data to users. The purpose is to protect the actual data while having a functional substitute |
| 71 | +for occasions when the real data is not required! |
| 72 | + |
| 73 | +With dynamic data masking, we can define a specific pattern for masking the real data, such as transforming an ID from F123456789 to F12xxxx89 |
| 74 | +using a `partial(3, 'xxxx', 2)` function. |
| 75 | + |
| 76 | +:::info |
| 77 | +You can understand more about [the dynamic data masking mechanism in VulcanSQL here](../docs/data-privacy/data-masking)! |
| 78 | +::: |
| 79 | + |
| 80 | +### Column-level Security |
| 81 | + |
| 82 | +If we need to have fine grained control over some specific columns, we can use the Column-level Security(CLS) mechanism to achieve the goal. |
| 83 | +In VulcanSQL, we can decide who can access the specific column based on their user attributes. |
| 84 | + |
| 85 | +:::info |
| 86 | +You can understand more about [the column-level security mechanism in VulcanSQL here](../docs/data-privacy/cls)! |
| 87 | +::: |
| 88 | + |
| 89 | +### Row-level Security |
| 90 | + |
| 91 | +Similar to the case of Column-level Security, if we need to have fine grained control over some specific rows, we can use the Row-level Security(RLS) mechanism to achieve the goal. |
| 92 | +In VulcanSQL, we can decide who can access the specific row based on their user attributes. |
| 93 | + |
| 94 | +:::info |
| 95 | +You can understand more about [the row-level security mechanism in VulcanSQL here](../docs/data-privacy/rls)! |
| 96 | +::: |
| 97 | + |
| 98 | +## Showcase |
| 99 | + |
| 100 | +Now we are going to show you the code to demonstrate how you can deliver data privacy mechanisms in VulcanSQL! |
| 101 | +For those who may not familiar with VulcanSQL yet, **VulcanSQL is a Data API framework for data folks to create REST APIs |
| 102 | +easily by writing templated SQL! It's mainly used for sharing data from databases, data warehouses and data lakes!** |
| 103 | + |
| 104 | +If you would like to read the source code or try the example by yourself, |
| 105 | +welcome to [check it out here](https://github.com/Canner/vulcan-sql-examples/tree/main/data-sharing)! |
| 106 | + |
| 107 | +Below is the dataset we'll use in the showcase: |
| 108 | + |
| 109 | +|id|department|last_name|company_role|annual_salary| |
| 110 | +|---|---|---|---|---| |
| 111 | +|JDK32424|engineering|James|engineer|"$100,000"| |
| 112 | +|EKJ34124|sales|Harden|sales|"$120,000"| |
| 113 | +|MKO56124|sales|Michael|manager|"$110,000"| |
| 114 | +|ONP01124|engineering|Cindy|manager|"$115,000"| |
| 115 | +|NZP59124|ceo|Rosa|boss|"$150,000"| |
| 116 | + |
| 117 | +Below is the code you may write in SQL templates in VulcanSQL: |
| 118 | + |
| 119 | +```sql |
| 120 | +SELECT |
| 121 | + -- dynamic data masking |
| 122 | + {% masking id partial(2, 'xxxxxxx', 2) %} as id, |
| 123 | + department, |
| 124 | + last_name, |
| 125 | + company_role, |
| 126 | + -- column level security |
| 127 | + {% if context.user.attr.role == 'employer' %} |
| 128 | + annual_salary |
| 129 | + {% else %} |
| 130 | + NULL AS annual_salary |
| 131 | + {% endif %} |
| 132 | +FROM read_csv_auto('departments.csv', HEADER=True) |
| 133 | +-- row level security |
| 134 | +{% if context.user.attr.role != 'employer' %} |
| 135 | + WHERE department = {{ context.user.attr.department }} |
| 136 | +{% endif %} |
| 137 | +``` |
| 138 | + |
| 139 | +Here is the `auth` configuration in `vulcan.yaml`: |
| 140 | + |
| 141 | +```yaml |
| 142 | +auth: |
| 143 | + enabled: true |
| 144 | + options: |
| 145 | + basic: |
| 146 | + # Read users and passwords from a text file. |
| 147 | + htpasswd-file: |
| 148 | + path: passwd.txt # Path to the password file. |
| 149 | + users: # (Optional) Add attributes for users |
| 150 | + - name: james |
| 151 | + attr: |
| 152 | + role: employee |
| 153 | + department: engineering |
| 154 | + - name: harden |
| 155 | + attr: |
| 156 | + role: employee |
| 157 | + department: sales |
| 158 | + - name: michael |
| 159 | + attr: |
| 160 | + role: employee |
| 161 | + department: sales |
| 162 | + - name: cindy |
| 163 | + attr: |
| 164 | + role: employee |
| 165 | + department: engineering |
| 166 | + - name: rosa |
| 167 | + attr: |
| 168 | + role: employer |
| 169 | + department: ceo |
| 170 | +``` |
| 171 | + |
| 172 | +The REST API results you'll see based on different users: |
| 173 | + |
| 174 | +**James** |
| 175 | + |
| 176 | +|id|department|last_name|company_role|annual_salary| |
| 177 | +|---|---|---|---|---| |
| 178 | +|JDxxxxxxx24|engineering|James|engineer|| |
| 179 | +|ONxxxxxxx24|engineering|Cindy|manager|| |
| 180 | + |
| 181 | +**Harden** |
| 182 | + |
| 183 | +|id|department|last_name|company_role|annual_salary| |
| 184 | +|---|---|---|---|---| |
| 185 | +|EKxxxxxxx24|sales|Harden|sales|| |
| 186 | +|MKxxxxxxx24|sales|Michael|manager|| |
| 187 | + |
| 188 | +**Michael** |
| 189 | + |
| 190 | +|id|department|last_name|company_role|annual_salary| |
| 191 | +|---|---|---|---|---| |
| 192 | +|EKxxxxxxx24|sales|Harden|sales|| |
| 193 | +|MKxxxxxxx24|sales|Michael|manager|| |
| 194 | + |
| 195 | +**Cindy** |
| 196 | + |
| 197 | +|id|department|last_name|company_role|annual_salary| |
| 198 | +|---|---|---|---|---| |
| 199 | +|JDxxxxxxx24|engineering|James|engineer|| |
| 200 | +|ONxxxxxxx24|engineering|Cindy|manager|| |
| 201 | + |
| 202 | +**Rosa** |
| 203 | + |
| 204 | +|id|department|last_name|company_role|annual_salary| |
| 205 | +|---|---|---|---|---| |
| 206 | +|JDxxxxxxx24|engineering|James|engineer|$100,000| |
| 207 | +|EKxxxxxxx24|sales|Harden|sales|$120,000| |
| 208 | +|MKxxxxxxx24|sales|Michael|manager|$110,000| |
| 209 | +|ONxxxxxxx24|engineering|Cindy|manager|$115,000| |
| 210 | +|NZxxxxxxx24|ceo|Rosa|boss|$150,000| |
| 211 | + |
| 212 | +After observing the result tables shown above and the given SQL template, |
| 213 | +we can clearly figure out several data privacy mechanisms provided by VulcanSQL: |
| 214 | + |
| 215 | +1. Authentication: In the above example, we used HTTP Basic as the authentication method |
| 216 | +and the password of each user was stored in a text file called `passwd.txt`. |
| 217 | +2. Authorization: You can find that we had defined user attributes in `vulcan.yaml`. |
| 218 | +With these user attributes defined, we can have more fine grained control on what kind of data |
| 219 | +each user should access. |
| 220 | +3. Dynamic Data Masking: `{% masking id partial(2, 'xxxxxxx', 2) %} as id` makes only the first two and |
| 221 | +last two digits of `id` visible, and the rest is masked. |
| 222 | +4. Column-level Security: We can see that only the user who is in the employer role can see the data in the salary field, |
| 223 | +so Rosa is the only person who can see other people's salary. |
| 224 | +5. Row-level Security: We can see a user who is not in the employer role can only see the data |
| 225 | +in the same department as he/she. |
| 226 | + |
| 227 | +## Conclusion |
| 228 | + |
| 229 | +Data privacy is more important than ever. We may regard it as a special kind of human rights, |
| 230 | +so we should protect the data from being abused! |
| 231 | + |
| 232 | +We hope this blog post effectively highlights the significance of data privacy when sharing data with other parties. |
| 233 | +It also showcases that VulcanSQL offers user-friendly solutions that are certainly worth exploring. |
| 234 | + |
| 235 | +[^1]: The definition of the principle of least privilege(PoLP) is referenced from the [article from paloalto networks](https://www.paloaltonetworks.com/cyberpedia/what-is-the-principle-of-least-privilege#:~:text=The%20principle%20of%20least%20privilege%20(PoLP)%20is%20an%20information%20security,to%20complete%20a%20required%20task.). |
0 commit comments