-
Notifications
You must be signed in to change notification settings - Fork 6
Description
I'm trying to upgrade Unicode to 16.0.0
On running ./ucd generate (after ./ucd/download) I'm greeted with this:
Too many script extensions: 271
Interesting code blocks to look at:
UCD2Haskell/Modules/ScriptExtensions.hs:150
encodedExtensions :: Map.Map (NE.NonEmpty BS.ShortByteString) Word8
encodedExtensions = let len = length extensionsList in if len > 0xff
then error ("Too many script extensions: " <> show len)
else Map.fromList (zip extensionsList [0..]) -- Encode single script as their script value
extensionsList = singleScriptExtensions
<> Set.toList multiScriptExtensionsIt looks like
length (singleScriptExtensions <> Set.toList multiScriptExtensions)
is > 271.
UCD2Haskell/Modules/ScriptExtensions.hs:141
singleScriptExtensions = pure . getScriptAbbr <$> scripts
singleScriptExtensionsSet = Set.fromList singleScriptExtensions
multiScriptExtensions :: Set.Set (NE.NonEmpty BS.ShortByteString)
multiScriptExtensions = Set.fromList (Map.elems extensions)
Set.\\ singleScriptExtensionsSetsingleScriptExtensionsSet cannot be more than 204 as:
$ cat data/16.0.0/ucd/ScriptExtensions.txt | grep -v '^\s*#' | grep -v '^\s*$' | wc -l
204So multiScriptExtensions bumps the value up?
I've not investigated this further. We use genEnumBitmapShamochu to generate
the bitmap, and Shamochu restricts the number of elements to be < 0xff. So by
design, we don't allow extensionsList to have more than 0xff elements.
This is a bug if there SHOULD NOT be more than 0xff extensions, or this is a
LIMITATION of the generator if we don't support more than 0xff extensions.
I'm not sure how to fix this as I don't fully understand the problem.
@wismill could you please suggest how I should proceed?