Skip to content

Unable to update Unicode version to 16.0.0 #138

@adithyaov

Description

@adithyaov

I'm trying to upgrade Unicode to 16.0.0
On running ./ucd generate (after ./ucd/download) I'm greeted with this:

Too many script extensions: 271

Interesting code blocks to look at:

UCD2Haskell/Modules/ScriptExtensions.hs:150

        encodedExtensions :: Map.Map (NE.NonEmpty BS.ShortByteString) Word8
        encodedExtensions = let len = length extensionsList in if len > 0xff
            then error ("Too many script extensions: " <> show len)
            else Map.fromList (zip extensionsList [0..])
        -- Encode single script as their script value
        extensionsList = singleScriptExtensions
                      <> Set.toList multiScriptExtensions

It looks like
length (singleScriptExtensions <> Set.toList multiScriptExtensions)
is > 271.

UCD2Haskell/Modules/ScriptExtensions.hs:141

        singleScriptExtensions = pure . getScriptAbbr <$> scripts
        singleScriptExtensionsSet = Set.fromList singleScriptExtensions
        multiScriptExtensions :: Set.Set (NE.NonEmpty BS.ShortByteString)
        multiScriptExtensions = Set.fromList (Map.elems extensions)
                                Set.\\ singleScriptExtensionsSet

singleScriptExtensionsSet cannot be more than 204 as:

$ cat data/16.0.0/ucd/ScriptExtensions.txt | grep -v '^\s*#' | grep -v '^\s*$' | wc -l
204

So multiScriptExtensions bumps the value up?

I've not investigated this further. We use genEnumBitmapShamochu to generate
the bitmap, and Shamochu restricts the number of elements to be < 0xff. So by
design, we don't allow extensionsList to have more than 0xff elements.

This is a bug if there SHOULD NOT be more than 0xff extensions, or this is a
LIMITATION of the generator if we don't support more than 0xff extensions.

I'm not sure how to fix this as I don't fully understand the problem.

@wismill could you please suggest how I should proceed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions