From 717af41bcca24bb6d08b17b9b3aa999fed00c793 Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Wed, 17 May 2017 22:54:13 -0700 Subject: [PATCH 1/3] config: Make capabilities and noNewPrivileges Linux-only (again) Roll back the genericization from 718f9f3f (minor narrative cleanup regarding config compatibility, 2017-01-30, #673). Lifting the restriction there seems to have been motivated by "Solaris supports capabilities", but that was before the split into a capabilities object which happened in eb114f05 (Add ambient and bounding capability support, 2017-02-02, #675). It's not clear if Solaris supports ambient caps, or what Solaris API noNewPrivileges were punting to [1]. And John Howard has recently confirmed that Windows does not support capabilities and is unlikely to do so in the future [2]. He also confirmed that Windows does not support rlimits [3]. John's statement didn't directly address noNewPrivileges, but we can always restore any of these properties to the Solaris/Windows platforms if/when we get docs about which API we're punting to on those platforms. Also add some backticks, remove the hyphens in "OPTIONAL) - the", standardize lines I touch to use "the process" [4], and use four-space indents here to keep Pandoc happy (see 7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, #495). [1]: https://github.com/opencontainers/runtime-spec/pull/673#discussion_r99353136 [2]: https://github.com/opencontainers/runtime-spec/pull/810#issuecomment-301594590 [3]: https://github.com/opencontainers/runtime-spec/pull/835#issuecomment-303455386 [4]: https://github.com/opencontainers/runtime-spec/pull/809#discussion_r116297660 Signed-off-by: W. Trevor King --- config.md | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/config.md b/config.md index ec17fab94..e8fbb41d2 100644 --- a/config.md +++ b/config.md @@ -156,16 +156,6 @@ For POSIX platforms the `mounts` structure has the following fields: * **`env`** (array of strings, OPTIONAL) with the same semantics as [IEEE Std 1003.1-2008's `environ`][ieee-1003.1-2008-xbd-c8.1]. * **`args`** (array of strings, REQUIRED) with similar semantics to [IEEE Std 1003.1-2008 `execvp`'s *argv*][ieee-1003.1-2008-xsh-exec]. This specification extends the IEEE standard in that at least one entry is REQUIRED, and that entry is used with the same semantics as `execvp`'s *file*. -* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process. - Valid values are platform-specific. - For example, valid values for Linux are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`. - Any value which cannot be mapped to a relevant kernel interface MUST cause an error. - `capabilities` contains the following properties: - * **`effective`** (array of strings, OPTIONAL) - the `effective` field is an array of effective capabilities that are kept for the process. - * **`bounding`** (array of strings, OPTIONAL) - the `bounding` field is an array of bounding capabilities that are kept for the process. - * **`inheritable`** (array of strings, OPTIONAL) - the `inheritable` field is an array of inheritable capabilities that are kept for the process. - * **`permitted`** (array of strings, OPTIONAL) - the `permitted` field is an array of permitted capabilities that are kept for the process. - * **`ambient`** (array of strings, OPTIONAL) - the `ambient` field is an array of ambient capabilities that are kept for the process. * **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for the process. Each entry has the following structure: @@ -176,13 +166,22 @@ For POSIX platforms the `mounts` structure has the following fields: If `rlimits` contains duplicated entries with same `type`, the runtime MUST error out. -* **`noNewPrivileges`** (bool, OPTIONAL) setting `noNewPrivileges` to true prevents the process from gaining additional privileges. - As an example, the ['no_new_privs'][no-new-privs] article in the kernel documentation has information on how this is achieved using a prctl system call on Linux. - For Linux-based systems the process structure supports the following process-specific fields. * **`apparmorProfile`** (string, OPTIONAL) specifies the name of the AppArmor profile for the process. For more information about AppArmor, see [AppArmor documentation][apparmor]. +* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process. + Valid values are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`. + Any value which cannot be mapped to a relevant kernel interface MUST cause an error. + `capabilities` contains the following properties: + + * **`effective`** (array of strings, OPTIONAL) the `effective` field is an array of effective capabilities that are kept for the process. + * **`bounding`** (array of strings, OPTIONAL) the `bounding` field is an array of bounding capabilities that are kept for the process. + * **`inheritable`** (array of strings, OPTIONAL) the `inheritable` field is an array of inheritable capabilities that are kept for the process. + * **`permitted`** (array of strings, OPTIONAL) the `permitted` field is an array of permitted capabilities that are kept for the process. + * **`ambient`** (array of strings, OPTIONAL) the `ambient` field is an array of ambient capabilities that are kept for the process. +* **`noNewPrivileges`** (bool, OPTIONAL) setting `noNewPrivileges` to true prevents the process from gaining additional privileges. + As an example, the [`no_new_privs`][no-new-privs] article in the kernel documentation has information on how this is achieved using a `prctl` system call on Linux. * **`oomScoreAdj`** *(int, OPTIONAL)* adjusts the oom-killer score in `[pid]/oom_score_adj` for the process's `[pid]` in a [proc pseudo-filesystem][procfs]. If `oomScoreAdj` is set, the runtime MUST set `oom_score_adj` to the given value. If `oomScoreAdj` is not set, the runtime MUST NOT change the value of `oom_score_adj`. From 5292e9c82b79ea6a663b3c809751f8c0d3f910f0 Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Tue, 23 May 2017 10:28:50 -0700 Subject: [PATCH 2/3] config: Make rlimits POSIX-specific This property was initially Linux-specific. 718f9f3f (minor narrative cleanup regarding config compatibility, 2017-01-30, #673) removed the Linux restriction, but the rlimit concept is from POSIX and Windows doesn't support it [1]. This commit adds new subsections for the POSIX-specific and Linux-specific process entries (to match the approach we currently use for process.user), and punts to POSIX for the Solaris values and compliance testing approach. If/when we get a Solaris-specific doc for valid values, we can replace the POSIX punt there, but we probably want to continue punting to POSIX for getrlimit(3)-based compliance testing. I've renamed the overly-specific LinuxRlimit to POSIXRlimit. We could use the generic Rlimit, but then we'd be stuck if/when Windows adds support for some rlimit-like thing that doesn't match up cleanly enough for us to use the POSIX structure. [1]: https://github.com/opencontainers/runtime-spec/pull/835#issuecomment-303455386 Signed-off-by: W. Trevor King --- config.md | 31 ++++++++++++++++++++++++------- specs-go/config.go | 6 +++--- 2 files changed, 27 insertions(+), 10 deletions(-) diff --git a/config.md b/config.md index e8fbb41d2..5463ad621 100644 --- a/config.md +++ b/config.md @@ -156,17 +156,33 @@ For POSIX platforms the `mounts` structure has the following fields: * **`env`** (array of strings, OPTIONAL) with the same semantics as [IEEE Std 1003.1-2008's `environ`][ieee-1003.1-2008-xbd-c8.1]. * **`args`** (array of strings, REQUIRED) with similar semantics to [IEEE Std 1003.1-2008 `execvp`'s *argv*][ieee-1003.1-2008-xsh-exec]. This specification extends the IEEE standard in that at least one entry is REQUIRED, and that entry is used with the same semantics as `execvp`'s *file*. + +### Linux and Solaris Process + +For POSIX-based systems (Linux and Solaris), the `process` object supports the following process-specific properties: + * **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for the process. Each entry has the following structure: - * **`type`** (string, REQUIRED) - the platform resource being limited, for example on Linux as defined in the [setrlimit(2)][setrlimit.2] man page. - * **`soft`** (uint64, REQUIRED) - the value of the limit enforced for the corresponding resource. - * **`hard`** (uint64, REQUIRED) - the ceiling for the soft limit that could be set by an unprivileged process. - Only a privileged process (e.g. under Linux: one with the CAP_SYS_RESOURCE capability) can raise a hard limit. + * **`type`** (string, REQUIRED) the platform resource being limited. + * Linux: valid values are defined in the [`getrlimit(2)`][getrlimit.2] man page, such as `RLIMIT_MSGQUEUE`. + * Solaris: valid values are defined in the [`getrlimit(3)`][getrlimit.3] man page, such as `RLIMIT_CORE`. + + The runtime MUST [generate an error](runtime.md#errors) for any values which cannot be mapped to a relevant kernel interface + For each entry in `rlimits`, a [`getrlimit(3)`][getrlimit.3] on `type` MUST succeed. + For the following properties, `rlim` refers to the status returned by the `getrlimit(3)` call. + + * **`soft`** (uint64, REQUIRED) the value of the limit enforced for the corresponding resource. + `rlim.rlim_cur` MUST match the configured value. + * **`hard`** (uint64, REQUIRED) the ceiling for the soft limit that could be set by an unprivileged process. + `rlim.rlim_max` MUST match the configured value. + Only a privileged process (e.g. one with the `CAP_SYS_RESOURCE` capability) can raise a hard limit. + + If `rlimits` contains duplicated entries with same `type`, the runtime MUST [generate an error](runtime.md#errors). - If `rlimits` contains duplicated entries with same `type`, the runtime MUST error out. +### Linux Process -For Linux-based systems the process structure supports the following process-specific fields. +For Linux-based systems, the `process` object supports the following process-specific properties. * **`apparmorProfile`** (string, OPTIONAL) specifies the name of the AppArmor profile for the process. For more information about AppArmor, see [AppArmor documentation][apparmor]. @@ -837,7 +853,8 @@ Here is a full example `config.json` for reference. [mount.8]: http://man7.org/linux/man-pages/man8/mount.8.html [mount.8-filesystem-independent]: http://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-INDEPENDENT_MOUNT%20OPTIONS [mount.8-filesystem-specific]: http://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-SPECIFIC_MOUNT%20OPTIONS -[setrlimit.2]: http://man7.org/linux/man-pages/man2/setrlimit.2.html +[getrlimit.2]: http://man7.org/linux/man-pages/man2/getrlimit.2.html +[getrlimit.3]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getrlimit.html [stdin.3]: http://man7.org/linux/man-pages/man3/stdin.3.html [uts-namespace.7]: http://man7.org/linux/man-pages/man7/namespaces.7.html [zonecfg.1m]: http://docs.oracle.com/cd/E86824_01/html/E54764/zonecfg-1m.html diff --git a/specs-go/config.go b/specs-go/config.go index 413d46d07..c00f96ebc 100644 --- a/specs-go/config.go +++ b/specs-go/config.go @@ -45,7 +45,7 @@ type Process struct { // Capabilities are Linux capabilities that are kept for the process. Capabilities *LinuxCapabilities `json:"capabilities,omitempty" platform:"linux"` // Rlimits specifies rlimit options to apply to the process. - Rlimits []LinuxRlimit `json:"rlimits,omitempty" platform:"linux"` + Rlimits []POSIXRlimit `json:"rlimits,omitempty" platform:"linux,solaris"` // NoNewPrivileges controls whether additional privileges could be gained by processes in the container. NoNewPrivileges bool `json:"noNewPrivileges,omitempty" platform:"linux"` // ApparmorProfile specifies the apparmor profile for the container. @@ -202,8 +202,8 @@ type LinuxIDMapping struct { Size uint32 `json:"size"` } -// LinuxRlimit type and restrictions -type LinuxRlimit struct { +// POSIXRlimit type and restrictions +type POSIXRlimit struct { // Type of the rlimit to set Type string `json:"type"` // Hard is the hard limit for the specified type From f7335bdcd7154da6787e2b959072b2cce4b27078 Mon Sep 17 00:00:00 2001 From: Daniel Dao Date: Wed, 5 Jul 2017 11:30:34 +0100 Subject: [PATCH 3/3] rephrase POSIX support for rlimits This change makes rlimits less abount linux and solaris, but expands the explanation a bit to all systems that supports POSIX rlimits, but with linux and solaris as examples. Signed-off-by: Daniel Dao --- config.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/config.md b/config.md index 5463ad621..fb33ff566 100644 --- a/config.md +++ b/config.md @@ -157,9 +157,9 @@ For POSIX platforms the `mounts` structure has the following fields: * **`args`** (array of strings, REQUIRED) with similar semantics to [IEEE Std 1003.1-2008 `execvp`'s *argv*][ieee-1003.1-2008-xsh-exec]. This specification extends the IEEE standard in that at least one entry is REQUIRED, and that entry is used with the same semantics as `execvp`'s *file*. -### Linux and Solaris Process +### POSIX process -For POSIX-based systems (Linux and Solaris), the `process` object supports the following process-specific properties: +For systems that support POSIX rlimits (for example Linux and Solaris), the `process` object supports the following process-specific properties: * **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for the process. Each entry has the following structure: