new interface for samples and fba, fva solutions and workaround for #96 by hariszaf · Pull Request #98 · GeomScale/dingo

hariszaf · 2024-04-25T14:37:16Z

Aim of this PR is:

to provide an easier way to get flux values from fba and fva and flux distributions from samples, based on reaction ids.
to fix issues Redundant methods in MetabolicNetwork #95 and Error using fva after sampling #96

With respect to aim (1) instead of searching for the index of a reaction, the user gets a dataframe with the reaction ids as indices.

>>> samples.iloc[:, :5]
                 0          1             2             3          4 
PFK       7.477384   7.477381  7.477382e+00  7.477382e+00   7.477385
PFL       0.000001   0.000001  1.437957e-07  3.160731e-08   0.000002
PGI       4.860857   4.860857  4.860859e+00  4.860859e+00   4.860858
PGK     -16.023526 -16.023526 -1.602353e+01 -1.602353e+01 -16.023526

and

>>> samples.loc["O2t"]
0       21.799498
1       21.799493
2       21.799497
3       21.799496
4       21.799497

A relevant interface is available for fba and fva solutions too.

There is also a reactions_map to link a reaction's id to its complete name (quite useful in case of modelseed models).

>>> model.reactions_map
                                         reaction_name
PFK                                Phosphofructokinase
PFL                             Pyruvate formate lyase
PGI                      Glucose-6-phosphate isomerase

Regarding aim (2), an example of how to use the medium was added in the README file and the setter was fixed (#95), while using deepcopy for getting the bounds addressed #96

…; also fixes GeomScale#96

…a_issue

vfisikop

Thanks! These are really cool stuff.

I have some comments mostly on the efficiency of the proposed changes.

vfisikop · 2024-04-26T12:01:37Z

README.md

 The output of FVA method is tuple that contains `numpy` arrays. The vectors `min_fluxes` and `max_fluxes` contains the minimum and the maximum values of each flux. The vector `max_biomass_flux_vector` is the optimal flux vector according to the biomass objective function and `max_biomass_objective` is the value of that optimal solution.

+```python
+fva_output_df = model.fva_to_df()


Cool. Do we still need the old method model.fva() that changes the internal state of model? I suppose that one function is enough. Then the new one could be called fva() it is more clear I think. And the old one should be made private. Actually it is even simpler if there exist only one function fva() with the new interface.

Similar comments hold for fva.

It's a bit tricky as the original fva returns a set of 4 things that we use in other parts of the PolytopeSampler class. Yet, I changes those parts with an _fva and _fba and we now have fva() and fba() to return the dataframes. 😉

OK, but as the name suggests _fva and _fba should be private, right?

Also, I still cannot see the need of two fva (and fba) functions. I think it complicates the interface. Is it possible to keep the old functions and then the user (if needed) compute a dataframe on demand (it is one line of code).

vfisikop · 2024-04-26T12:03:55Z

dingo/MetabolicNetwork.py

                self._biomass_function,
                self._parameters["opt_percentage"],
            )
+        self._min_fluxes = min_fluxes


Do we really need them to be stored in the model?

My initial thought was about the sample_from_fva_output function that used to get as arguments the min_fluxes and the max_fluxes.

At first, I thought of keeping those in the model to provide them in the function but then I thought it would be simpler if the fva is performed with the sample_from_fva_output function.

Yet, maybe it's practical to keep them. No strong arguments on that.

Sorry, I was lost a bit. Are those class variables used even after that changes of this PR? As far as understand sample_from_fva_output function is not using them.

vfisikop · 2024-04-26T12:18:14Z

README.md

+For example: 
+
+```python 
+initial_medium = model.medium


I think this can be simplified avoiding to copy the entire dictionary (see #95 (comment))

check for my reply on the issue.
if you have any ideas for alternatives, let me know!

OK, I see so there is still an issue in setting the model correctly. Should we address it in a different PR?

vfisikop · 2024-04-26T12:39:41Z

dingo/PolytopeSampler.py


-        return steady_states
-
+        steady_states_df = pd.DataFrame(steady_states, index = self._metabolic_network.reactions)


Do we really want to make that transformation by default? As far as I can understand that could be memory intensive (especially for large matrices). I think it is better to add a method that takes steady_states and create a df or simply write that one line of code in the doc or README.

vfisikop · 2024-04-26T12:43:09Z

dingo/PolytopeSampler.py

        return steady_states

+    @staticmethod
+    def samples_as_df(model, samples):


samples or steady states?

vfisikop · 2024-04-26T12:49:57Z

dingo/loading_models.py

+    metabolites_map.columns = ["metabolite_name"]

+    return lb, ub, S, metabolites, reactions, \
+        biomass_index, biomass_function, medium, inter_medium, exchanges, \


Could this be in two lines instead of three?

vfisikop · 2024-04-26T12:55:36Z

dingo/loading_models.py


+    return lb, ub, S, metabolites, reactions, \
+        biomass_index, biomass_function, medium, inter_medium, exchanges, \
+            reactions_map, metabolites_map


I think you should also update the documentation of the function with medium, ..., metabolites_map

vfisikop · 2024-04-26T12:56:17Z

tests/fba.py

-        model.set_slow_mode()
+
+        # Check if script is running in GitHub action
+        if os.getenv('CI', 'false').lower() == 'true':


vfisikop · 2024-04-26T13:01:23Z

tutorials/dingo_tutorial.ipynb

    {
      "cell_type": "code",
-      "execution_count": null,
+      "execution_count": 3,


Unfortunately, notebooks cannot be reviewed in github... See https://github.com/orgs/community/discussions/12959

vfisikop · 2024-04-26T13:09:39Z

dingo/PolytopeSampler.py

-        self._N_shift = []
-        self._T = []
-        self._T_shift = []
+        self._metabolic_network = copy.deepcopy(metabol_net)


I think making a deep copy just shifting the problem (see #96 (comment))

Shouldn't the sampler only transform A and b and leave the model as is?

I think that this has to do with the fast_remove_redundant_facets.
I suggest we keep that as is for this PR so we have something working fine for now, and we open a new PR for the fast_remove_redundant_facets.

OK let's go with the workaround for now, but please keep the issue open until there is a proper fix. Thanks!

vfisikop

Hi again, sorry for late reply.

Let me comment to each change in this PR separately,

Regarding dataframes, is it simpler just to let is to the user to create a dataframe if needed, it is just a call pd.DataFrame(data, index) at the end of the day. This way we avoid complicating the interface.
Regarding #95 and medium I propose to handle it in a separate PR to solves in fully (it seems that now it is a work in progress)
Regarding #96 I agree to use the deepcopy workaround (but please keep the issue open)

vfisikop · 2024-05-17T12:44:59Z

README.md

 The output of FVA method is tuple that contains `numpy` arrays. The vectors `min_fluxes` and `max_fluxes` contains the minimum and the maximum values of each flux. The vector `max_biomass_flux_vector` is the optimal flux vector according to the biomass objective function and `max_biomass_objective` is the value of that optimal solution.

+```python
+fva_output_df = model.fva_to_df()


OK, but as the name suggests _fva and _fba should be private, right?

Also, I still cannot see the need of two fva (and fba) functions. I think it complicates the interface. Is it possible to keep the old functions and then the user (if needed) compute a dataframe on demand (it is one line of code).

vfisikop · 2024-05-17T12:49:10Z

README.md

+For example: 
+
+```python 
+initial_medium = model.medium


OK, I see so there is still an issue in setting the model correctly. Should we address it in a different PR?

vfisikop · 2024-05-17T12:59:24Z

dingo/PolytopeSampler.py

-        self._N_shift = []
-        self._T = []
-        self._T_shift = []
+        self._metabolic_network = copy.deepcopy(metabol_net)


OK let's go with the workaround for now, but please keep the issue open until there is a proper fix. Thanks!

vfisikop · 2024-05-17T13:01:34Z

dingo/MetabolicNetwork.py

                self._biomass_function,
                self._parameters["opt_percentage"],
            )
+        self._min_fluxes = min_fluxes


Sorry, I was lost a bit. Are those class variables used even after that changes of this PR? As far as understand sample_from_fva_output function is not using them.

hariszaf added 8 commits April 23, 2024 16:28

workaround for GeomScale#96

727c0c2

return samples and fba, fva solutions as df with reactions as indices…

6526db7

…; also fixes GeomScale#96

update with new interface with reactions as indices

8b1410a

Merge branch 'GeomScale:develop' into fva_issue

e319445

test if running in CI or locally

e6eeb9c

set medium after provided by user GeomScale#95

d138a58

add altering medium example GeomScale#95

dd468ef

Merge branch 'fva_issue' of https://github.com/hariszaf/dingo into fv…

e4acafd

…a_issue

vfisikop reviewed Apr 26, 2024

View reviewed changes

hariszaf mentioned this pull request Apr 27, 2024

Error using fva after sampling #96

Open

addressing review comments

a263584

vfisikop reviewed May 17, 2024

View reviewed changes

vfisikop requested a review from TolisChal May 17, 2024 13:09


		return steady_states

		steady_states_df = pd.DataFrame(steady_states, index = self._metabolic_network.reactions)

Conversation

hariszaf commented Apr 25, 2024

Uh oh!

vfisikop left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vfisikop left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants