Create tests for load_model/metric/data, closes #46 #49

c-salomonsen · 2025-02-06T15:15:29Z

@sot176 I had to change a few things in your dataset to make it fit in with the format, however I'm not sure my changes are what we'll actually want in the end, which raises the question: how do we point data_path to our dataset? and do we always assume the datasets are stored as files, or directories?

c-salomonsen · 2025-02-06T15:16:55Z

I also am not sure how we ought to structure our targets. Can we expect them to always be an int, or should the dataloader one-hot encode them right away, thus expecting an array or tensor?

sot176

Since our dataset is a single HDF5 file (at least in the Kaggle version), I think the best solution is to ensure that data_path is a file path rather than a directory.

I usually pass data_path dynamically to the dataset loader via argparse and specify the path in my shell script.

Regarding the targets, I think we can assume they are always integers. However, it might be beneficial to ensure that all datasets follow the same structure for labels and decide on a standard format—whether to keep them as integers or convert them to one-hot encoding.

c-salomonsen · 2025-02-06T20:02:58Z

Since our dataset is a single HDF5 file (at least in the Kaggle version), I think the best solution is to ensure that data_path is a file path rather than a directory.

Yes I agree, I just didnt know if that was the case for others as well, do you know?

Regarding the targets, I think we can assume they are always integers. However, it might be beneficial to ensure that all datasets follow the same structure for labels and decide on a standard format—whether to keep them as integers or convert them to one-hot encoding.

Yep, I kinda assumed that at least for the loss function–which is cross-entropy–we had to one-hot encode the targets, but idk.

sot176 · 2025-02-07T11:04:34Z

The CrossEntropyLoss function from Pytorch expects the target labels to be integers representing class indices (not one-hot encoded) :)

sot176 · 2025-02-07T11:07:24Z

Regarding the file path: I unfortunately don't know. Maybe we open an issue to discuss that and decide on one way?

hzavadil98 · 2025-02-07T12:21:26Z

Hi, in #53 I threw a hammer into the datasets a little bit - the parameters passed through load_data are a bit differen, but it shouldnt affect the test for it I believe. Regarding the data_path argument - I store the mnist files in ./Data/MNIST folder and data_path points to it, but the mnist is split into 4 separate files.

hzavadil98

The test looks good!

c-salomonsen · 2025-02-08T12:14:31Z

The tests fails on gh-actions since its not capable of downloading the datasets. I'll make some changes to instead assume data is present somehow, or just skip checking the lengths.

c-salomonsen · 2025-02-08T12:19:19Z

Hi, in #53 I threw a hammer into the datasets a little bit - the parameters passed through load_data are a bit differen, but it shouldnt affect the test for it I believe. Regarding the data_path argument - I store the mnist files in ./Data/MNIST folder and data_path points to it, but the mnist is split into 4 separate files.

So perhaps the load_data should just take a directory path, then a fixed filename is stored in each class, i.e.:

class MyDataset(Dataset):
    filename = "mydataset.data"
    def __init__(self, data_dir: pathlib.Path | str, ...):
        self.data_path = data_dir / self.filename  # ?

c-salomonsen · 2025-02-13T11:03:09Z

Btw, while this PR implements a test for load_data, I made a test for the MetricWrapper/load_metrics in #54, here.

So what remains is a test for load_model. Sorry for the confusing structure with having tests split in two PR's...

c-salomonsen added 3 commits February 6, 2025 16:11

To not track .rst files generated from sphinx-autoapi

744699f

Had to modify to fit in the overall format

a4214d2

Create new test that verifies basic functionality of all datasets

1add669

c-salomonsen added the enhancement New feature or request label Feb 6, 2025

c-salomonsen requested review from Johanmkr, hzavadil98 and sot176 February 6, 2025 15:15

c-salomonsen self-assigned this Feb 6, 2025

c-salomonsen linked an issue Feb 6, 2025 that may be closed by this pull request

Discussion: Train/val/test splitting and dataset generation #46

Closed

sot176 reviewed Feb 6, 2025

View reviewed changes

hzavadil98 mentioned this pull request Feb 7, 2025

Changing the dataset to produce train/val/test splits #53

Closed

hzavadil98 approved these changes Feb 7, 2025

View reviewed changes

c-salomonsen mentioned this pull request Feb 11, 2025

Create tests for load_model/metric/data. #42

Closed

c-salomonsen marked this pull request as ready for review February 13, 2025 12:04

Johanmkr approved these changes Feb 13, 2025

View reviewed changes

Seilmast and others added 2 commits February 13, 2025 13:12

Merge branch 'main' into christian/test-model-metric-data

d5d6341

Remove double function definition

6be010a

Seilmast merged commit 018b669 into main Feb 13, 2025
3 of 4 checks passed

hzavadil98 deleted the christian/test-model-metric-data branch February 13, 2025 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create tests for load_model/metric/data, closes #46 #49

Create tests for load_model/metric/data, closes #46 #49

Uh oh!

c-salomonsen commented Feb 6, 2025

Uh oh!

c-salomonsen commented Feb 6, 2025

Uh oh!

sot176 left a comment

Uh oh!

c-salomonsen commented Feb 6, 2025

Uh oh!

sot176 commented Feb 7, 2025

Uh oh!

sot176 commented Feb 7, 2025

Uh oh!

hzavadil98 commented Feb 7, 2025

Uh oh!

hzavadil98 left a comment

Uh oh!

c-salomonsen commented Feb 8, 2025

Uh oh!

c-salomonsen commented Feb 8, 2025

Uh oh!

c-salomonsen commented Feb 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Create tests for load_model/metric/data, closes #46 #49

Create tests for load_model/metric/data, closes #46 #49

Uh oh!

Conversation

c-salomonsen commented Feb 6, 2025

Uh oh!

c-salomonsen commented Feb 6, 2025

Uh oh!

sot176 left a comment

Choose a reason for hiding this comment

Uh oh!

c-salomonsen commented Feb 6, 2025

Uh oh!

sot176 commented Feb 7, 2025

Uh oh!

sot176 commented Feb 7, 2025

Uh oh!

hzavadil98 commented Feb 7, 2025

Uh oh!

hzavadil98 left a comment

Choose a reason for hiding this comment

Uh oh!

c-salomonsen commented Feb 8, 2025

Uh oh!

c-salomonsen commented Feb 8, 2025

Uh oh!

c-salomonsen commented Feb 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants