Skip to content

getMediaTypeParams() doesn't handle quoted-string values #245

@bkdotcom

Description

@bkdotcom

https://www.rfc-editor.org/rfc/rfc7231#section-3.1.1.1

media-type = type "/" subtype *( OWS ";" OWS parameter )
type       = token
subtype    = token

The type/subtype MAY be followed by parameters in the form of name=value pairs.

parameter      = token "=" ( token / quoted-string )

A parameter value that matches the token production can be transmitted either as a token or within a quoted-string. The quoted and unquoted values are equivalent. For example, the following examples are all equivalent, but the first is preferred for consistency:

text/html;charset=utf-8
text/html;charset=UTF-8
Text/HTML;Charset="utf-8"
text/html; charset="utf-8"

Contrived example

application/json;charSet="UTF-8"; FOO = "b; a\\"r"
(there shouldn't be whitespace around the "=", but we can handle it)

expected:

array(
  'charset' => 'UTF-8',  // charset value (considered case-insensitive) in particular should probably be strtolower'd 
  'foo' => 'b; a"r',
)

actual
Undefined array key 1

array(
  'charset' => '"UTF-8"',
  'foo·' => ' "b',
  'a\"r"' => null,

something like this

public function getMediaTypeParams(): array
{
    $contentType = $this->getContentType();

    if ($contentType === null) {
        return array();
    }

    $paramString = \preg_replace('/^.*?[;,]\s*/', '', $contentType);
    $regexToken = '[^\\s";,]+';
    $regexQuotedString = '"(?:\\\\"|[^"])*"';   // \" or not "
    $regex = '/
        (?P<key>' . $regexToken . ')
        \s*=\s*    # standard does not allow whitespace around =
        (?P<value>' . $regexQuotedString . '|' . $regexToken . ')
        /x';

    \preg_match_all($regex, $paramString, $matches, PREG_SET_ORDER);

    $params = array();
    foreach ($matches as $kvp) {
        $key = \strtolower($kvp['key']);
        $value = \stripslashes(\trim($kvp['value'], '"'));
        $params[$key] = $value;
    }
    return $params;
}

fix could go futher and strtolower the value if key is charset


related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions