Skip to content

Some metrics fail on GPU #104

@Johanmkr

Description

@Johanmkr

TL;DR Can everyone test their metric on some GPU, most are failing on mine.

Issue

Hi all, I discovered this in my own code but it could affect more people:

In my metric class, Precision, I declare new tensors like this:

true_oh = torch.zeros(y_true.size(0), self.num_classes).scatter_(
1, y_true.unsqueeze(1), 1
)
pred_oh = torch.zeros(y_pred.size(0), self.num_classes).scatter_(
1, y_pred.unsqueeze(1), 1
)

However, this will cause the code to fail on the GPU since these are allocated to CPU memory.

I get the same issue with recall @salomaestro.

For Entropy I get a different GPU error @Seilmast:
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

And for f1 @sot176 I receive the following error:
/CollaborativeCoding/metrics/F1.py", line 161, in returnmetric
self.y_pred = torch.cat(self.y_pred)
^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated

Accuracy seems to work well on both cpu and gpu, well done @hzavadil98.

Suggestion

Either:

  1. We may pass the device variable to the metric classes
  2. Avoid allocating memory to new tensors within the metric classes.

Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or requestquestionFurther information is requested

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions