-
Notifications
You must be signed in to change notification settings - Fork 0
Create tests for load_model/metric/data, closes #46 #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I also am not sure how we ought to structure our targets. Can we expect them to always be an |
sot176
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since our dataset is a single HDF5 file (at least in the Kaggle version), I think the best solution is to ensure that data_path is a file path rather than a directory.
I usually pass data_path dynamically to the dataset loader via argparse and specify the path in my shell script.
Regarding the targets, I think we can assume they are always integers. However, it might be beneficial to ensure that all datasets follow the same structure for labels and decide on a standard format—whether to keep them as integers or convert them to one-hot encoding.
Yes I agree, I just didnt know if that was the case for others as well, do you know?
Yep, I kinda assumed that at least for the loss function–which is cross-entropy–we had to one-hot encode the targets, but idk. |
|
The CrossEntropyLoss function from Pytorch expects the target labels to be integers representing class indices (not one-hot encoded) :) |
|
Regarding the file path: I unfortunately don't know. Maybe we open an issue to discuss that and decide on one way? |
|
Hi, in #53 I threw a hammer into the datasets a little bit - the parameters passed through load_data are a bit differen, but it shouldnt affect the test for it I believe. Regarding the data_path argument - I store the mnist files in |
hzavadil98
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test looks good!
|
The tests fails on gh-actions since its not capable of downloading the datasets. I'll make some changes to instead assume data is present somehow, or just skip checking the lengths. |
So perhaps the load_data should just take a directory path, then a fixed filename is stored in each class, i.e.: class MyDataset(Dataset):
filename = "mydataset.data"
def __init__(self, data_dir: pathlib.Path | str, ...):
self.data_path = data_dir / self.filename # ? |
|
Btw, while this PR implements a test for load_data, I made a test for the So what remains is a test for |
@sot176 I had to change a few things in your dataset to make it fit in with the format, however I'm not sure my changes are what we'll actually want in the end, which raises the question: how do we point
data_pathto our dataset? and do we always assume the datasets are stored as files, or directories?