From c9ca0854200c7cabdfcf78a5408dcbe6f731f916 Mon Sep 17 00:00:00 2001 From: seilmast Date: Sat, 1 Mar 2025 16:37:18 +0100 Subject: [PATCH 1/2] Added orcid file and added to magnus_page.md and readme.md --- CITATION.cff | 2 +- README.md | 8 +++++++ doc/Magnus_page.md | 52 +++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 60 insertions(+), 2 deletions(-) diff --git a/CITATION.cff b/CITATION.cff index 193533f..07e6704 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -9,7 +9,7 @@ authors: orcid: "https://orcid.org/0009-0007-4958-4544" - family-names: "Størdal" given-names: "Magnus" - orcid: "https://orcid.org/0000-0000-0000-0000" + orcid: "https://orcid.org/0009-0008-5226-8128" - family-names: "Zavadil" given-names: "Jan" orcid: "https://orcid.org/0000-0001-8502-0059" diff --git a/README.md b/README.md index f0b7b3f..3e9a209 100644 --- a/README.md +++ b/README.md @@ -46,3 +46,11 @@ to pull the latest build, or check the [packages](https://github.com/SFI-Visual- > [!NOTE] > The container is build for a `linux/amd64` architecture to properly build Cuda 12. For other architectures please build the docker image locally. + + +# Results +## JanModel & MNIST_0-3 +This section reports the results from using the model "JanModel" and the dataset MNIST_0-3 which contains MNIST digits from 0 to 3 (Four classes total). +For this experiment we use all five available metrics, and train for a total of a 100 epochs, but observe convergence around XXX epochs. We'll report the results from this epoch. + +We achieve a great fit on the data. Below are the results for the described run: diff --git a/doc/Magnus_page.md b/doc/Magnus_page.md index dd8a224..d0c4247 100644 --- a/doc/Magnus_page.md +++ b/doc/Magnus_page.md @@ -21,9 +21,28 @@ Each input is flattened over the channel, height and width channels. Then they a ## SVHN Dataset In-Depth +The dataloader I was tasked with making is to load the well-known SVHN dataset. This is a RGB dataset with real-life digits taken from house numbers. The class inherits from the torch Dataset class, and has four methods: +* __init__ : initialized the instance of the class +* _create_h5py: Creates the h5 object containing data from the downloaded .mat files for ease of use +* __len__ : Method needed in use of the DataLoader class. Returns length of the dataset +* __getitem__ : Method needed in use of the DataLoader class. Loads a image - label pair, applies any defined image transformations, and returns both image and label. +The __init__ method takes in a few arguments. +* data_path (Path): Path where either the data is downloaded or where it is to be downloaded to. +* train (bool): Which set to use. If true we use the training set of SVHN, and if false we use the test set of SVHN. +* transform: The transform functions to be applied to the returned image. +* nr_channels: How many channels to use. Can be either 1 or 3, corresponding to either greyscale or RGB images respectively. + +In the init we check for the existence of the SVHN dataset. If it does not exist, then we run the _create_h5py method which will be explained later. Then the labels are loaded into memory as they are needed for the __len__ method among other things. + +The _create_h5py method downloads a given SVHN set (train or test). We also change the label 10 to 0, as the SVHN dataset starts at index 1, with 10 representing images with the digit zero. After the download, we create two .h5 files. One with the labels and one with the images. + +Lastly, in __getitem__ we take index (number between 0 and length of label array). We retrive load the image h5 file, and retrive the row corresponding to the index. +We then convert the image to an Pillow Image object, then apply the defined transforms before returning the image and label. + + ## Entropy Metric In-Depth The EntropyPrediction class' main job is to take some inputs from the MetricWrapper class and store the batchwise Shannon Entropy metric of those inputs. The class has four methods with the following jobs: @@ -41,4 +60,35 @@ With permission I've used the scipy implementation to calculate entropy here. We Next we have the __returnmetric__ method which is used to retrive the stored metric. This returns the mean over all stored values. Effectively, this will return the average Shannon Entropy of the dataset. -Lastly we have the __reset__ method which simply emptied the variable which stores the entropy values to prepare it for the next epoch. \ No newline at end of file +Lastly we have the __reset__ method which simply emptied the variable which stores the entropy values to prepare it for the next epoch. + +## More on implementation choices +It should be noted that a lot of our decisions came from a top-down perspective. Many of our classes have design choices to accomendate the wrappers which handle the initialization and dataflow of the different metrics, dataloaders, and models. +All in all, we've made sure you don't really need to interact with the code outside setting up the correct arguments for the run, which is great for consistency. + + +# Challenges +## Running someone elses code +This section answers the question on what I found easy / difficult running another persons code. + +I found it quite easy to run others code. We had quite good tests, and once every test passed, I only had one error with the F1 score not handeling an unexpected edgecase. To fix this I raised an issue, and it was fixed shortly after. + +One thing I did find a bit difficult was when people would change integral parts of the common code such as wrappers or loader functions (usually for the better), but did not raise an issue or notify about the change. It did cause some moments of questions, but in the end we sorted it out through weekly meetings where we agreed on design choices and how to handle loading of the different modules. + +The issues mentioned above also lead to a week or so where there was always a test failing, and the person whos' code was failing did not have time to work on it for a few days. + +## Someone running my code +This section answers the question on what I found easy / difficult having someone run my code. + +I did not experience that anyone had issues with my code. After I fixed all issues and tests related to my code, it seems to have run fine, and no issues have been raised to my awareness about this. + + +# Tools +This section answers the question of which tools from the course I used during the home-exam. + +For this exam I used quite a few tools from the course. +I've never used pytest and test functions while writing code. This was quite fun to learn how to use, and having github actions also run the same tests was a great addition. + +Github actions we used for quite a few things. We checked for code formatting, documentation generation and run the code tests. + +Using sphinx for documentation was also a great tool. Turns out it's possible to write the doc-string in such a way that it automatically generates the documentation for you. This has helped reduce the workload with documentation a lot, and makes writing proper docstrings worthwile. \ No newline at end of file From 6f41723f0d5afb0cf6b3b96ef74b1d9db5ee0c38 Mon Sep 17 00:00:00 2001 From: seilmast Date: Sat, 1 Mar 2025 16:40:28 +0100 Subject: [PATCH 2/2] Added orcid file and added to magnus_page.md and readme.md --- README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.md b/README.md index 3e9a209..10142cb 100644 --- a/README.md +++ b/README.md @@ -54,3 +54,8 @@ This section reports the results from using the model "JanModel" and the dataset For this experiment we use all five available metrics, and train for a total of a 100 epochs, but observe convergence around XXX epochs. We'll report the results from this epoch. We achieve a great fit on the data. Below are the results for the described run: +| Dataset Split | Loss | Entropy | Accuracy | Precision | Recall | F1 | +|---------------|-------|---------|----------|-----------|--------|-------| +| Train | 0.000 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | +| Validation | 0.035 | 0.006 | 0.991 | 0.991 | 0.991 | 0.991 | +| Test | 0.024 | 0.004 | 0.994 | 0.994 | 0.994 | 0.994 | \ No newline at end of file