Feedback from Nav on output analysis section

Peer review from @NavonilNM .

This continues from #44, #62 and #130.

## Initialisation bias

* [x] Really good illustrations! However, I am not sure what it is you have in front of the A&E simulation model...

<img width="135" height="57" alt="Image" src="https://github.com/user-attachments/assets/98eddba0-3849-4833-87fd-41ec47e0c887" />

Perhaps an empty array (https://favtutor.com/articles/empty-array-javascript/) could be used instead; as it is an empty model, we are using data structures. _**--> It was supposed to look like tumbleweed 😂 Have changed to a clearer "Empty" and "Busy"**_

* [x]  The example of clinic opening empty in the morning is  included, I would add one line to the paragraph, to relate to the A&E illustration under the para. In the sentence perhaps also include A&E (Emergency Department), and readers from non-UK/Europe may not readily relate to the acronym A&E. In research papers as well, if the reviewer is from US/Canada, sometimes you have reviewers saying replace A&E with ER (American reviewers probably 😊).

* [ ] As we progressively following your book (editing previous examples), it might help to remind that all changes (which may span to multiple files) need to be saved before running the model. If they are using VSCode then they can go to File  Save All.

<img width="487" height="691" alt="Image" src="https://github.com/user-attachments/assets/1ee0236e-5d2b-4ec5-9008-a256f9a481e1" />

## Performance measures

* [ ] The execution of the Runner class may require a couple of lines of explanation, especially if the learners are creating one class per file (this is the approach I am following; not ideal though, too many files). Anyway, in my case, for the code to execute, I needed to import pandas and Model.

<img width="616" height="172" alt="Image" src="https://github.com/user-attachments/assets/54a17c97-997a-442e-9341-149bf061ef9c" />

Next, I am calling the Runnel from Jupyter. So I has to import Parameters and Runner (see below). Then the code executes.

<img width="550" height="728" alt="Image" src="https://github.com/user-attachments/assets/1c5b5732-ffcc-4c9b-b9d6-bed920da1ac6" />

* [ ] For the Patient class (eg_patient.py), I had to import numpy (for np.nan).

* [ ] In relation to mean time for consultation, note that some patients will have zero wait time as resources are free.  So, perhaps add a sentence to your paragraph (in blue below) for wait_time = 0 . “As it is an attribute, we have to give wait_time an initial value. A value like 0 would wrongly look like “no wait”, so np.nan is used instead to clearly mean “we don’t know this patient’s wait time yet”. Patients who still have wait_time = np.nan at the end of the simulation are those who were never seen, and this is used below when calculating metrics for unseen (backlogged) patients. Patients who have wait_time = 0 at the end of the simulation did not wait for the resource (doctor) as the resource was free. ”

<img width="735" height="267" alt="Image" src="https://github.com/user-attachments/assets/b051c562-db16-47c4-8e15-6b43fed8f3a7" />

* [ ] Mean doctor utilisation – at the end of Method 1:” While this method introduces a new class and may initially seem harder to grasp, it has the advantage of requiring fewer changes to the rest of the simulation code. It is also..”, I don’t think a new class was introduced? Perhaps this paragraph is for Method 2 and may need to move under it?

<img width="1232" height="308" alt="Image" src="https://github.com/user-attachments/assets/bb784194-6ac4-46ea-8615-0c808f06bdf9" />

* [ ] Excellent example of inheritance with MonitoredResource class! Those having a background in C/C++ may think of  *args, **kwargs, although, I think, Python is using * and ** for a different purpose to do with argument passing.

* [ ] Comment in relation to  super().__init__(*args, **kwargs): Perhaps we can add a reference to the constructor being called?
   * Initialise a SimPy Resource by calling the constructor of the parent class.
   * Also, in relation to the sentence “In the __init__ method, the class is initialised as it would be for a standard resource class”, here also we can refer specifically to the parent class constructor being called and the arguments passed to it.

* [ ] I created MonitoredResouce class in a separate .py file (again, may not be the best option as too many python files). Thus, I had to import simpy. For the Model class, I imported MonitoredResource.

* [ ] So the email exchange with Tom last week, he mentioned an important point, that focus is on RAP rather than a Simpy tutorial. With that in mind, we may want to “condense” the content presented in this section. For example:
   * Each of the performance measures have associated code snippets with the change highlighted. However, as I was working through them, I did feel a bit of cognitive overload,  for example, total arrivals, mean wait time, consultation etc. were all different sections. Perhaps some of these could be merged – use of different colors to identify code changes specific to the KPIs (although may not look aesthetically pleasing, but it may help speed up going through the tutorial.). Also, adopting this approach will mean that you see all the performance measures together (see screenshot below).
   * Similarly, the code for “mean doctor utilisation”, “mean queue length” could be clubbed togther as they are using the MonitoredResouce class.
   * In relation to “Mean doctor utilisation” I would just have the time-weighted implementation with the MonitoredResouce class (rather than present two options), since “mean queue length” is also using this class. The first approach could possibly be added at the end as an optional approach.

--> Perhaps I could have both... I find it very beneficial to separate it out, but as this feedback was also shared by Fatemeh (#144), it seems I should definitely implement the suggestion of providing a combined version on that page too (rather than just on the next page). As from Fatemeh's feedback:

> Since we have been incrementally adding performance measures throughout this section, I suggest including performance measures from previous discussed sections (e.g. mean wait time) in the code snippet. this way we can build the complete KPI table, mirroring the procedure used in a real sim project, and we can compare them at the end of the section.
>
> My response to that at the time: This is currently available on the following page - therefore, I've add two paragraphs at the start of the output measures page to explain that it will add them individually here, but to see all at once, check out the following page.

<img width="1462" height="438" alt="Image" src="https://github.com/user-attachments/assets/c80bd673-03e0-4a4e-bef7-f129d794fde3" />

## Replications

* [ ] Note that this section is using MonitoredResource class.
   * Thus (following the examples in the book), it appears, learners will have to create two parallel copies of code, **first set** which uses MonitoredResource class (**shown in green – see below**) and the **other set** which does not use it. The screenshot **above** is from code that **does not use** MonitoredResources class. However, as you can see from the output, the performance measures using the MR class could not be output.
   * Is this how we want to do it? Or is it better to have one integrated set of code (and which used MonitoredResources class? What would be good for the learner, who is following the book as I am.

<img width="438" height="506" alt="Image" src="https://github.com/user-attachments/assets/da5019cf-609d-48b3-bc10-57b57b2fc93a" />

* [ ] The function summary_stats, do we need to mention it is defined in the Runner class? Or, is is the function to be defined in Jupyter notebook?
   * If in Runner class --> change signature to def summary_stats(self, data)
   * If in Jupyter --> some imports may be needed (see comments below for “length of warm-up” time, where run_warmup_analysis is run from Jupyter.

Also, I had to import scipy.stats as st. In my Jupyter notebook, I imported HTML and to_html_datatable.

<img width="731" height="175" alt="Image" src="https://github.com/user-attachments/assets/abdc9849-0856-4160-a9d0-9051defe4c9e" />

Although, I could recreate the results, but all the performance measures could not be displayed since the code is now divided into two separate sets – see point (1) above. I am doing the replications using the classes that are using ResourceMonitor.

P.S: I tried to bring both versions of the code together (i.e, have one set which presents all the performance measures), but then started getting errors. I am sure it is possible but needs more time. This is the only part of the book (until now), I have not been able to recreate – thus, in the replication output below you can see, I do not have “mean_time_in_system”, “mean_patients_in_systems”, “unseen_count” and “unseen_wait_time” (however, these worked for the examples under performance measure [no replications]). This is true for the following:

* HTML(to_html_datatable(result["overall"]))
* HTML(to_html_datatable(result["patient"]))
* HTML(to_html_datatable(result["run"]))

Possible solution: Have one integrated set of code for performance measures and which use the class MonitoredResources – this code should produce all the performance indicators mentioned in the code. This code will then be used in the replication section, and following this to the section on length of warm-up.

<img width="1251" height="328" alt="Image" src="https://github.com/user-attachments/assets/c7834b58-02af-4aa4-9e6d-f7fa9d221414" />

*Not in Nav's feedback but from related emails:*

* [ ] Wasn't sure where to import `confidence_interval_method` from. Updated `sim_tools` to 1.0.1 but getting error.

['Any', 'ArrayLike', 'Bernoulli', 'Beta', 'CombinationDistribution', 'Dict', 'DiscreteEmpirical', 'Distribution', 'DistributionRegistry', 'Erlang', 'ErlangK', 'Exponential', 'FixedDistribution', 'Gamma', 'GroupedContinuousEmpirical', 'Hyperexponential', 'List', 'Lognormal', 'NDArray', 'Normal', 'Optional', 'PearsonV', 'PearsonVI', 'Poisson', 'Protocol', 'RawContinuousEmpirical', 'RawDiscreteEmpirical', 'SeedSequence', 'T', 'Triangular', 'TruncatedDistribution', 'Tuple', 'TypeVar', 'Uniform', 'Union', 'Weibull', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'go', 'inspect', 'is_integer', 'is_non_negative', 'is_numeric', 'is_ordered_pair', 'is_ordered_triplet', 'is_positive', 'is_positive_array', 'is_probability', 'is_probability_vector', 'json', 'math', 'np', 'px', 'runtime_checkable', 'spawn_seeds', 'validate']

Then resolved as found under `sim_tools.output_analysis`. --> Could I make anything clearer here?

* [ ] Ran the manual method, skipped the automated method as need to read up on Python adapter classes so that can understand code better --> This page is rather complex. I wonder what I could do to make it easier to understand.

## Warm-up analysis

* [ ] Warmup auditor class: Imported numpy (for np.nan)

* [ ] Executed run_warmup_analysis from Jupyter. Imported WarmupAuditor, Model and also panda (do we need to add in the code snippet?). The code worked.

* [ ] Executed plot_time_series from Jupyter. Imported plotly.express as px (do we need to add in code snippet?). The code worked.

* [ ] The learners may not readily realise that one needs to click each tab separately for the code to generate different graphs.

<img width="1193" height="120" alt="Image" src="https://github.com/user-attachments/assets/7c60e844-25ce-4dbf-af0f-2b7912620814" />

## Number of replications

* [ ] General question – several of the sub-section start with required packages – see screenshot below.

<img width="872" height="517" alt="Image" src="https://github.com/user-attachments/assets/5a385aa8-4ec7-46d3-9d2c-bca9942364ae" />

* The environment.yaml file includes several of these, e.g., numpy, pandas, sim_tools etc. I have environment called des-example (which is great for reproducibility!). 
* However, in the code we will still have to use import statements (as far as I know)?
* Several of the code snippets do not have the import statements – is it by design?. For example, in the parallel computing code, I had to import cpu_cpount form OS and other classes from Parallel. In the sections above, I have provided some other examples. Am I doing something wrong?

_**--> You are correct that you have to import them - indeed, that is what I am also doing at the start of every page, as it is best practice to have all your imports at the start of the page. I will edit the wording of these sections on each page, to hopefully make it clearer that you do indeed need to include `import from` etc. at the start as provided, and to highlight any changes in the imports needed between page (as I currently highlight code changes, but not changes in imports).**_

* [ ] I was running the code using Jupyter. I has to updated sim_tools to 1.0.1 and included the following import. from sim_tools.output_analysis import (confidence_interval_method).

* [ ] Question: when calculating the performance measures for each replication, are we calculating the metrics and the CI on the steady state data only (i.e, after discarding warm-up)? I think we are – like examples before; but just double checking.

* [ ] It may benefit having a bit more explanation (perhaps a highlight box?) on stopping condition (desired_precision) and it’s relations with confidence interval.

* [ ] I would add a note about Central Limit Theory and the rule of thumb of 30. In simulation, I think CLT is applied to replicated means. I think the validity of confidence interval has to do the CLT and the need to higher number of replicaitons.

In this particular case, it is working well as we have the desired_precison (stopping condition). Indeed, number of replications for the different performance measures hovers between 8 to 24 (min replications = 5). When I changed the min_replication to 30, the number of replications was always 31. So it reaches steady state, based on the precision value, before 30. So, we can say that CLT rule of thumb was not applied because we has desired_precision set to 0.1 (10%). Although, we recognise, and as the graph shows (from your example – mean patient time), more replications reduces the width of the CI, and which, I think, reflects increasing precision, but yet, more computational costs.

Anyway, please ask Tom whether a highlight box can be introduced and the exact framing. We can also keep it the way it is.

<img width="843" height="407" alt="Image" src="https://github.com/user-attachments/assets/e6287ad0-cb8b-4de0-b57b-d8c6724483ec" />

* [ ] I was running the code using Jupyter. For the graph, I imported  plotly_confidence_interval_method from sim-tools.

* [ ] Model adapter. I think it could be explained better, for example, what is an interface (or perhaps it is too technical)?

<img width="1212" height="323" alt="Image" src="https://github.com/user-attachments/assets/0b053cd2-0b32-483f-aeea-23b8305765fc" />

<img width="1226" height="171" alt="Image" src="https://github.com/user-attachments/assets/1a45f0ff-aea4-48c4-a79b-6385df181261" />

P.S: I am not running the Model adapter code for now, as I need to understand it better. 

* However, I can see that replication_budget is set to 100 (great!).
* Note: we have desired_precison=0.1 in manual inspection, and we have half_width_precision=0.1 in the automated code. Conceptually, are these variables similar? If so, perhaps edit variable name.

## Parallel processing

* [ ] Nice figure. To me, Core 1 --> R1, R2, R3 (the orientation of the arrow) is more intuitive (put that may be a personal preference).

* [ ] For the code to work, imported Parallel and delayed from joblib in Runner class. Alos impored cpu_count from os.

* [ ] Executed in Jupyter. 
   * Worked for me (pls. see image below) – did not get “Could not pickle the task..” error.
   * Imported time and plotly.express.

<img width="825" height="527" alt="Image" src="https://github.com/user-attachments/assets/1897020a-e23b-417a-8656-eed9e265bfad" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback from Nav on output analysis section #149

Initialisation bias

Performance measures

Replications

Warm-up analysis

Number of replications

Parallel processing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feedback from Nav on output analysis section #149

Description

Initialisation bias

Performance measures

Replications

Warm-up analysis

Number of replications

Parallel processing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions