Skip to content

Issues regarding changes incoming from the foundation-model-stack/gptq_bigcode PR branch #41

@cyang49

Description

@cyang49

In function get_max_gpu_blocks_available

The debugger shows that when peak_memory usage is larger than the default 0.8, the negative number computed eventually results in the function returning zero. I think at this point an error should be raised, since the code can't possibly work with 0 blocks in the paged cache. Adding an exception prevents mysterious queue empty error that will eventually happen later when trying to acquire blocks.

Here's the failed example

(Pdb) p total_gpu_memory
85049802752
(Pdb) p gpu_memory_utilization
0.8
(Pdb) p peak_memory
81784484864
(Pdb) p cache_block_size
20447232
(Pdb) (total_gpu_memory * gpu_memory_utilization - peak_memory) // cache_block_size
-673.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions