diff --git a/README.md b/README.md index e986abb..d16966d 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,22 @@ # Collaborative-Coding-Exam Repository for final evaluation in the FYS-8805 Reproducible Research and Collaborative coding course +## **Table of Contents** +1. [Project Description](#project-description) +2. [Installation](#installation) +3. [Usage](#usage) +4. [Results](#results) +5. [Citing](#citing) + +## Project Description +This project involves collaborative work on a digit classification task, where each participant works on distinct but interconnected components within a shared codebase.
+The main goal is to develop and train digit classification models collaboratively, with a focus on leveraging shared resources and learning efficient experimentation practices. +### Key Aspects of the Project: +- **Individual and Joint Tasks:** Each participant has separate tasks, such as implementing a digit classification dataset, a neural network model, and an evaluation metric. However, all models and datasets must be compatible, as we can only train and evaluate using partners' models and datasets. +- **Shared Environment:** Alongside working on our individual tasks, we collaborate on joint tasks like the main file, and training and evaluation loops. Additionally, we utilize a shared Weights and Biases environment for experiment management. +- **Documentation and Package Management:** To ensure proper documentation and ease of use, we set up Sphinx documentation and made the repository pip-installable +- **High-Performance Computing:** A key learning objective of this project is to gain experience with running experiments on high-performance computing (HPC) resources. To this end, we trained all models on a cluster + ## Installation Install from: @@ -25,9 +41,37 @@ python -c "import CollaborativeCoding" ## Usage -TODO: Fill in +To train a classification model using this code, follow these steps: + +### 1) Create a Directory for the reuslts +Before running the training script, ensure the results directory exists: + + `mkdir -p ""` + +### 2) Run the following command for training, evaluation and testing + + `python3 main.py --modelname "" --dataset "" --metric "" "" ... "" --resultfolder "" --run_name "" --device ""` +
Replace placeholders with your desired values: + +- ``: You can choose from different models ( `"MagnusModel", "ChristianModel", "SolveigModel", "JanModel", "JohanModel"`). + + +- ``: The following datasets are supported (`"svhn", "usps_0-6", "usps_7-9", "mnist_0-3", "mnist_4-9"`) + + +- ` ... `: Specify one or more evaluation metrics (`"entropy", "f1", "recall", "precision", "accuracy"`) + + +- ``: Folder where all model outputs, logs, and checkpoints are saved + + +- ``: Name for WANDB project + + +- ``: `"cuda", "cpu", "mps"` + -### Running on a k8s cluster +## Running on a k8s cluster In your job manifest, include: @@ -48,8 +92,8 @@ to pull the latest build, or check the [packages](https://github.com/SFI-Visual- > The container is build for a `linux/amd64` architecture to properly build Cuda 12. For other architectures please build the docker image locally. -# Results -## JanModel & MNIST_0-3 +## Results +### JanModel & MNIST_0-3 This section reports the results from using the model "JanModel" and the dataset MNIST_0-3 which contains MNIST digits from 0 to 3 (Four classes total). For this experiment we use all five available metrics, and train for a total of 20 epochs. @@ -62,7 +106,7 @@ We achieve a great fit on the data. Below are the results for the described run: | Test | 0.024 | 0.004 | 0.994 | 0.994 | 0.994 | 0.994 | -## MagnusModel & SVHN +### MagnusModel & SVHN The MagnusModel was trained on the SVHN dataset, utilizing all five metrics. Employing micro-averaging for the calculation of F1 score, accuracy, recall, and precision, the model was fine-tuned over 20 epochs. A learning rate of 0.001 and a batch size of 64 were selected to optimize the training process.